Compress a prompt and context to reduce token usage while preserving semantic meaning.
/compress/raw/ endpoint compresses your prompt and context, reducing token count while maintaining the semantic integrity needed for high-quality AI responses. Compression ratios of 50–70% are typical, with no meaningful degradation in downstream model output quality.
| Status | Meaning |
|---|---|
400 Bad Request | Malformed request body or missing required fields. |
401 Unauthorized | Missing or invalid x-api-key. |
429 Too Many Requests | Rate limit exceeded. Back off and retry. |
500 Internal Server Error | Compression service unavailable. |
x-api-key header.
rate when you need a guaranteed token budget.
"auto" rate is recommended for most use cases. Fixed rates below 0.3 may noticeably affect output quality on dense technical content.compressed_prompt field is a single string — pass it as the full prompt to your downstream model, replacing both context and prompt.