Extract is currently in private preview. Access is available by invitation. To request early access or get onboarded, contact us.
What it does
The/extract endpoint runs Named Entity Recognition (NER) over arbitrary text — but instead of a fixed set of entity types (person, place, organization), you define exactly what you’re looking for in plain English. The model interprets your descriptions and finds matching spans in the text, returning each one with a confidence score, character offsets, and surrounding context.
This makes it practical to extract domain-specific entities that generic NER models don’t support: contract parties, product SKUs, medical terms, financial figures, custom identifiers, or anything else you can describe.
When to use it
You need structured data from unstructured text. If your pipeline involves parsing documents, emails, web pages, or transcripts to pull out specific values, Extract replaces brittle regex patterns and rigid NLP pipelines with a flexible, description-driven approach. Your entity types don’t fit standard NER. Off-the-shelf NER handles names, locations, and organizations. Extract handles whatever you define — lease terms, part numbers, clinical findings, competitor mentions, or any other domain-specific concept. You need confidence scores. Extract returns a confidence score for every result. You can set per-entity thresholds to filter out low-confidence hits, or use the scores to route extractions to a human review queue. You want context alongside every result. Every extracted span comes with 500 characters of surrounding text on each side, so you can verify the extraction in context without going back to the source document.Common use cases
| Use case | Example entities |
|---|---|
| Contract analysis | Parties, effective dates, termination clauses, governing law |
| Resume parsing | Name, email, skills, companies, job titles, education |
| Medical records | Diagnoses, medications, dosages, lab values, dates of service |
| Financial documents | Revenue figures, dates, entities, regulatory references |
| Customer support tickets | Product names, error codes, account identifiers |
| News / media monitoring | People, organizations, locations, quoted statements |
| E-commerce | Product names, prices, SKUs, brands, specifications |
How it fits into your workflow
Extract sits between your document ingestion step and whatever system consumes structured data — a database, a downstream process, or a human review interface.start, end), so you can highlight or link back to the exact location in the source document.
Compared to traditional NER
| Traditional NER | ScaleDown Extract | |
|---|---|---|
| Entity types | Fixed (person, place, org, …) | You define them in plain English |
| Domain specificity | General-purpose | Works for any domain |
| Confidence scores | Varies by model | Always included |
| Per-entity thresholds | Not typically supported | Supported |
| Context window | Usually sentence-level | Full document |