Skip to main content
Summarize is currently in private preview. Access is available by invitation. To request early access or get onboarded, contact us.

What it does

The /summarization/abstractive endpoint condenses text into a shorter, fluent rewrite in the model’s own words. Unlike extractive summarization — which lifts sentences directly from the source — abstractive summarization produces a coherent output that captures the key information without being constrained to the original phrasing.

When to use it

You need to condense long documents for humans or downstream models. Summaries reduce reading time for human reviewers and reduce token costs when passing document content to another model. You need format or focus control. The instructions field lets you specify bullet points, a particular language, a word limit, or a topical focus — without overriding the core faithful-summary behaviour. You want consistent, low-noise outputs. Sampling parameters (temperature, top-p) are fixed to a configuration tuned for faithful summarization. You get stable, predictable outputs rather than creative or hallucinated ones.

Common use cases

Use caseExample
Document reviewSummarize legal contracts, research papers, or policy documents
News digestsCondense articles to one-paragraph or bullet-point briefs
Customer feedbackSummarize support tickets, reviews, or survey responses at scale
Meeting transcriptsGenerate action-item-focused summaries from call recordings
Financial reportingExtract key figures and decisions from earnings calls or filings
Content pipelinesPre-process long articles before passing them to downstream models
RAG pre-processingSummarize retrieved chunks to reduce token overhead before model calls

How it fits into your workflow

Summarize works as a standalone step that can sit anywhere documents are consumed — before storing, before displaying, or before passing to another model.
[Ingest document] → [POST /summarization/abstractive] → [Summary] → [Store / display / pass to model]
The response includes input_chars and output_chars so you can track compression ratios across your pipeline, and latency_ms for performance monitoring.

Abstractive vs. extractive summarization

ExtractiveAbstractive (this endpoint)
How it worksSelects and stitches existing sentencesRewrites content in the model’s own words
Output qualityCan feel disjointedFluent and coherent
FaithfulnessDirectly tied to sourceValidated — rejects off-task outputs
FlexibilityLimitedControllable via instructions