The /compress/raw/ endpoint compresses your prompt and context, reducing token count while maintaining the semantic integrity needed for high-quality AI responses. Compression ratios of 50–70% are typical, with no meaningful degradation in downstream model output quality.
Background information, instructions, or supporting text that provides context for the prompt. This is the content most aggressively compressed — structure and meaning are preserved, but redundancy is removed.
Compression aggressiveness. Use "auto" to let ScaleDown pick the optimal rate based on content, or pass a number between 0 and 1 to set a fixed target ratio (e.g. 0.5 = compress to 50% of original tokens).
curl -X POST https://api.scaledown.xyz/compress/raw/ \ -H "Content-Type: application/json" \ -H "x-api-key: <your-api-key>" \ -d '{ "context": "ScaleDown is a context engineering platform. It compresses AI prompts while preserving semantic integrity...", "prompt": "Summarize what ScaleDown does in one sentence.", "scaledown": { "rate": "auto" } }'
Response:
{ "compressed_prompt": "ScaleDown: context engineering platform, compresses AI prompts, preserves semantic integrity.\n\nSummarize what ScaleDown does in one sentence.", "original_prompt_tokens": 150, "compressed_prompt_tokens": 65, "successful": true, "latency_ms": 2341, "request_metadata": { "compression_time_ms": 2341, "compression_rate": "auto", "prompt_length": 425, "compressed_prompt_length": 189 }}
"auto" rate is recommended for most use cases. Fixed rates below 0.3 may noticeably affect output quality on dense technical content.
The compressed_prompt field is a single string — pass it as the full prompt to your downstream model, replacing both context and prompt.
Token counts are estimated using the same tokenizer as the target model family. Exact counts may vary slightly depending on the model you use downstream.