Skip to main content
Extract is currently in private preview. Access is available by invitation. To request early access or get onboarded, contact us.

What it does

The /extract endpoint runs Named Entity Recognition (NER) over arbitrary text — but instead of a fixed set of entity types (person, place, organization), you define exactly what you’re looking for in plain English. The model interprets your descriptions and finds matching spans in the text, returning each one with a confidence score, character offsets, and surrounding context. This makes it practical to extract domain-specific entities that generic NER models don’t support: contract parties, product SKUs, medical terms, financial figures, custom identifiers, or anything else you can describe.

When to use it

You need structured data from unstructured text. If your pipeline involves parsing documents, emails, web pages, or transcripts to pull out specific values, Extract replaces brittle regex patterns and rigid NLP pipelines with a flexible, description-driven approach. Your entity types don’t fit standard NER. Off-the-shelf NER handles names, locations, and organizations. Extract handles whatever you define — lease terms, part numbers, clinical findings, competitor mentions, or any other domain-specific concept. You need confidence scores. Extract returns a confidence score for every result. You can set per-entity thresholds to filter out low-confidence hits, or use the scores to route extractions to a human review queue. You want context alongside every result. Every extracted span comes with 500 characters of surrounding text on each side, so you can verify the extraction in context without going back to the source document.

Common use cases

Use caseExample entities
Contract analysisParties, effective dates, termination clauses, governing law
Resume parsingName, email, skills, companies, job titles, education
Medical recordsDiagnoses, medications, dosages, lab values, dates of service
Financial documentsRevenue figures, dates, entities, regulatory references
Customer support ticketsProduct names, error codes, account identifiers
News / media monitoringPeople, organizations, locations, quoted statements
E-commerceProduct names, prices, SKUs, brands, specifications

How it fits into your workflow

Extract sits between your document ingestion step and whatever system consumes structured data — a database, a downstream process, or a human review interface.
[Ingest document] → [POST /extract] → [Structured entity list] → [Database / downstream system]
Each entity in the response includes character offsets (start, end), so you can highlight or link back to the exact location in the source document.

Compared to traditional NER

Traditional NERScaleDown Extract
Entity typesFixed (person, place, org, …)You define them in plain English
Domain specificityGeneral-purposeWorks for any domain
Confidence scoresVaries by modelAlways included
Per-entity thresholdsNot typically supportedSupported
Context windowUsually sentence-levelFull document