Skip to main content
POST
/
extract
Extract
curl --request POST \
  --url https://api.example.com/extract \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>",
  "entities": {
    "description": "<string>",
    "threshold": 123,
    "top_n": 123
  },
  "threshold": 123,
  "top_n": 123
}
'
{
  "entities": [
    {
      "text": "<string>",
      "type": "<string>",
      "confidence": 123,
      "start": 123,
      "end": 123,
      "context": "<string>"
    }
  ]
}

Overview

The /extract endpoint runs Named Entity Recognition (NER) over a block of text. Unlike standard NER, you define the entity types you want in plain English — the model uses your descriptions to find matching spans, returning each one with a confidence score and surrounding context. Every result includes up to 500 characters of surrounding text on each side, so you can validate or use the extracted value without going back to the source.

Request

text
string
required
The input text to extract entities from. Can be a full document, web page content, article, or any plain text string.
entities
object
required
A mapping of entity type names to their definition. Each value can be either:
  • A plain string — a description of what to look for
  • An object — with optional description, threshold, and top_n fields that override the global values for that entity type only
threshold
number
Global confidence threshold (0–1). Entities below this score are filtered out. Can be overridden per entity type.
top_n
number
default:0
Global limit on how many results to return per entity type, ranked by confidence descending. 0 returns all results above the threshold. Can be overridden per entity type.

Response

entities
array
List of extracted entities, sorted by confidence descending within each type.

Error responses

StatusMeaning
400 Bad RequestMalformed request body, missing required fields, or empty entities map.
401 UnauthorizedMissing or invalid x-api-key.
429 Too Many RequestsRate limit exceeded. Back off and retry.
500 Internal Server ErrorInference service unavailable.

Authentication

Include your API key in every request using the x-api-key header.
-H "x-api-key: <your-api-key>"

Examples

Basic extraction

curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "text": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i.",
    "entities": {
      "Name": "Full name of the person",
      "Twitter": "Twitter or X handle",
      "Instagram": "Instagram username"
    }
  }'
Response:
{
  "entities": [
    {
      "text": "Henry Wang",
      "type": "Name",
      "confidence": 0.994,
      "start": 0,
      "end": 10,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    },
    {
      "text": "@henryw",
      "type": "Twitter",
      "confidence": 0.976,
      "start": 79,
      "end": 86,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    },
    {
      "text": "@b0i",
      "type": "Instagram",
      "confidence": 0.978,
      "start": 104,
      "end": 108,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    }
  ]
}

With per-entity overrides

Use per-entity threshold and top_n when different entity types need different precision, or when you only want the single best match for a given type.
curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "text": "...",
    "entities": {
      "Name": {
        "description": "Full name of a person",
        "threshold": 0.3,
        "top_n": 1
      },
      "Company": {
        "description": "Company or organization name",
        "threshold": 0.7
      },
      "Email": "Email address"
    },
    "threshold": 0.5,
    "top_n": 5
  }'
In this example:
  • Name uses threshold 0.3 and returns at most 1 result
  • Company uses threshold 0.7 and returns up to 5 results (global top_n)
  • Email uses the global threshold 0.5 and returns up to 5 results

Notes

  • Results within each entity type are ranked by confidence descending before top_n is applied.
  • The context field is always derived from the original text input — it is not generated by the model.
  • Character offsets (start, end) refer to byte positions in the original text string.
  • There is no fixed limit on the number of entity types you can define in a single request.