Extract - ScaleDown

Overview

The /extract endpoint runs Named Entity Recognition (NER) over a block of text. Unlike standard NER, you define the entity types you want in plain English - the model uses your descriptions to find matching spans, returning each one with a confidence score and surrounding context. Every result includes up to 500 characters of surrounding text on each side, so you can validate or use the extracted value without going back to the source.

Request

text

string

The input text to extract entities from. Can be a full document, web page content, article, or any plain text string. Either text or document must be provided. If both are given, the OCR text is appended after text.

document

string

A base64-encoded file to extract entities from. Supported formats: JPEG, PNG, TIFF, single-page PDF, multi-page PDF. The file is processed via AWS Textract OCR and the extracted text is used as input. Either text or document must be provided.

document_mime_type

string

MIME type of the document (e.g. "image/jpeg", "application/pdf"). Required when document is provided.

instruction

string

Optional global instruction prepended to the text before extraction. Use this to provide rules that apply across all entity types - for example, deduplication logic, ranking constraints, or output format requirements. This is separate from per-entity descriptions.

entities

object

required

A mapping of entity type names to their definition. Each value can be one of:

A plain string - a description of what to look for
An object - with optional description, threshold, and top_n fields
A nested object - defining a structured sub-schema (object with named fields)
An array of objects - defining a repeated structured schema (e.g. a list of line items)

Show Entity object fields

description

string

What this entity type represents. Used as the model’s search criteria.

threshold

number

Per-entity confidence cutoff (0–1). Overrides the global threshold for this type only.

top_n

number

Per-entity result limit. Overrides the global top_n for this type only.

threshold

number

Global confidence threshold (0–1). Entities below this score are filtered out. Can be overridden per entity type.

top_n

number

default:0

Global limit on how many results to return per entity type, ranked by confidence descending. 0 returns all results above the threshold. Can be overridden per entity type.

Response

entities

array

List of extracted scalar entities, sorted by confidence descending within each type. For nested or array entity types, values appear in structured_result instead.

Show Entity object

text

string

The exact text span extracted from the input.

type

string

The entity type name, matching a key from your entities request field.

confidence

number

Model confidence score between 0 and 1.

start

number

Character offset of the start of the entity in the input text.

end

number

Character offset of the end of the entity in the input text.

context

string

Up to 500 characters of surrounding text on each side of the entity. Prefixed/suffixed with ... when clipped.

structured_result

object | null

Present when any entity in the request uses a nested object or array schema. Contains the full structured extraction result keyed by entity name. Scalar fields from the same request also appear here alongside their nested counterparts. null for flat extraction requests.

ocr_text

string | null

The raw text extracted from the document via OCR. null if no document was provided.

Error responses

Status	Meaning
`422 Unprocessable Entity`	Malformed request body, neither `text` nor `document` provided, or OCR failed.
`500 Internal Server Error`	Inference service unavailable.
`504 Gateway Timeout`	Extraction request timed out.

Authentication

Include your API key in every request using the x-api-key header.

-H "x-api-key: <your-api-key>"

Examples

Basic extraction

curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "text": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i.",
    "entities": {
      "Name": "Full name of the person",
      "Twitter": "Twitter or X handle",
      "Instagram": "Instagram username"
    }
  }'

Response:

{
  "entities": [
    {
      "text": "Henry Wang",
      "type": "Name",
      "confidence": 0.994,
      "start": 0,
      "end": 10,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    },
    {
      "text": "@henryw",
      "type": "Twitter",
      "confidence": 0.976,
      "start": 79,
      "end": 86,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    },
    {
      "text": "@b0i",
      "type": "Instagram",
      "confidence": 0.978,
      "start": 104,
      "end": 108,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    }
  ]
}

Extracting from a document

Pass a base64-encoded image or PDF in the document field. OCR is performed automatically and the extracted text is used as the input. The raw OCR output is returned as ocr_text.

DOCUMENT=$(base64 -b 0 -i contract.pdf)

curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "document": "'"$DOCUMENT"'",
    "document_mime_type": "application/pdf",
    "entities": {
      "party_name": "Name of a party to the contract",
      "effective_date": "The date the contract takes effect",
      "governing_law": "The jurisdiction or governing law clause"
    }
  }'

Structured (nested) extraction

For more complex documents, you can define nested schemas to extract structured objects or arrays of objects. Use a nested object to extract a single structured group of fields, or an array of objects to extract a repeated structure such as invoice line items. The full structured output is returned in the structured_result field. Scalar fields in the same request are also included there, alongside any nested values. Nested object example - extract a single structured address:

curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "text": "Ship to: Jane Smith, 42 Maple Street, Springfield, IL 62701.",
    "entities": {
      "recipient": "The full name of the recipient",
      "address": {
        "street": "Street address including number",
        "city": "City name",
        "state": "Two-letter state code",
        "zip": "ZIP or postal code"
      }
    }
  }'

Response:

{
  "entities": [
    {
      "text": "Jane Smith",
      "type": "recipient",
      "confidence": 1.0,
      "start": 9,
      "end": 19,
      "context": "Ship to: Jane Smith, 42 Maple Street, Springfield, IL 62701."
    }
  ],
  "structured_result": {
    "recipient": "Jane Smith",
    "address": {
      "street": "42 Maple Street",
      "city": "Springfield",
      "state": "IL",
      "zip": "62701"
    }
  },
  "ocr_text": null
}

Array schema example - extract invoice line items:

curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "text": "Invoice: 1x Widget A @ $10.00, 3x Widget B @ $5.00, 1x Shipping @ $8.50",
    "entities": {
      "vendor": "The name of the vendor or supplier",
      "line_items": [
        {
          "description": "Description or name of the line item",
          "quantity": "Quantity ordered",
          "unit_price": "Price per unit"
        }
      ]
    }
  }'

Response:

{
  "entities": [],
  "structured_result": {
    "vendor": null,
    "line_items": [
      { "description": "Widget A", "quantity": "1", "unit_price": "$10.00" },
      { "description": "Widget B", "quantity": "3", "unit_price": "$5.00" },
      { "description": "Shipping", "quantity": "1", "unit_price": "$8.50" }
    ]
  },
  "ocr_text": null
}

When using array schemas, array fields appear only in structured_result. The entities array contains only scalar fields from the same request that could be matched to a span in the text.

With per-entity overrides

Use per-entity threshold and top_n when different entity types need different precision, or when you only want the single best match for a given type.

curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "text": "...",
    "entities": {
      "Name": {
        "description": "Full name of a person",
        "threshold": 0.3,
        "top_n": 1
      },
      "Company": {
        "description": "Company or organization name",
        "threshold": 0.7
      },
      "Email": "Email address"
    },
    "threshold": 0.5,
    "top_n": 5
  }'

In this example:

Name uses threshold 0.3 and returns at most 1 result
Company uses threshold 0.7 and returns up to 5 results (global top_n)
Email uses the global threshold 0.5 and returns up to 5 results

Writing good entity labels

The entity name and description are both used as part of the model’s search criteria - wording them well is the biggest lever you have on extraction quality. Use lowercase or Title Case. The model was trained with lowercase labels. Keeping your entity names lowercase (e.g. person, company) or Title Case (e.g. Person, Company) produces better results than ALL_CAPS or other conventions. Be specific with names, and test synonyms. The entity name itself influences what the model looks for. person and full name will find slightly different things. If results are missing or noisy, try rephrasing the name - person name, individual, or full name may all behave differently on your data. Labels can be descriptive phrases, not just single words. Instead of city, use capital city and population center. The extra context helps the model distinguish between entity types that might otherwise overlap. Descriptions can be full instructions. Rather than "Name of the person", write "Find the first and last name of the person mentioned in the text". Instruction-style descriptions consistently outperform short noun phrases on complex or ambiguous entities. Avoid mixing overlapping granularities in the same call. If you include both location and city, the model has to decide which label to assign to a city - and will often split results unpredictably between them. Pick one level of granularity per concept. Examples:

Instead of	Use
`CITY`	`city` or `City`
`city` + `location` in the same call	just `city` or just `location`
`"Name"`	`"Find the first and last name of the person in the text"`
`"city"` (when you want capitals specifically)	`"capital city and population center"`

Notes

Results within each entity type are ranked by confidence descending before top_n is applied.
The context field is always derived from the original text input - it is not generated by the model.
Character offsets (start, end) refer to byte positions in the original text string.
There is no fixed limit on the number of entity types you can define in a single request.

Authorizations

x-api-key

string

header

required

Body

application/json

text

string

required

The input text to extract entities from.

entities

object

required

A mapping of entity type names to their definition.

Show child attributes

threshold

number

default:0.5

Global confidence threshold (0–1).

top_n

number

default:0

Global limit on results per entity type. 0 returns all above threshold.

Response

Successful extraction

entities

object[]

Show child attributes

​Overview

​Request

​Response

​Error responses

​Authentication

​Examples

​Basic extraction

​Extracting from a document

​Structured (nested) extraction

​With per-entity overrides

​Writing good entity labels

​Notes

Authorizations

Body

Response

Overview

Request

Response

Error responses

Authentication

Examples

Basic extraction

Extracting from a document

Structured (nested) extraction

With per-entity overrides

Writing good entity labels

Notes