Skip to main content
POST
/
compress
/
raw
Compress
curl --request POST \
  --url https://api.example.com/compress/raw/ \
  --header 'Content-Type: application/json' \
  --data '
{
  "context": "<string>",
  "prompt": "<string>",
  "scaledown": {
    "rate": {}
  }
}
'
{
  "compressed_prompt": "<string>",
  "original_prompt_tokens": 123,
  "compressed_prompt_tokens": 123,
  "successful": true,
  "latency_ms": 123,
  "request_metadata": {
    "compression_time_ms": 123,
    "compression_rate": {},
    "prompt_length": 123,
    "compressed_prompt_length": 123
  }
}

Overview

The /compress/raw/ endpoint compresses your prompt and context, reducing token count while maintaining the semantic integrity needed for high-quality AI responses. Compression ratios of 50–70% are typical, with no meaningful degradation in downstream model output quality.

Request

context
string
required
Background information, instructions, or supporting text that provides context for the prompt. This is the content most aggressively compressed — structure and meaning are preserved, but redundancy is removed.
prompt
string
required
The main query or question to send to your AI model. Kept intact where possible to preserve intent.
scaledown
object
required
Compression configuration.

Response

compressed_prompt
string
The compressed output, ready to pass directly to your AI model in place of the original context and prompt.
original_prompt_tokens
number
Token count of the original input.
compressed_prompt_tokens
number
Token count of the compressed output.
successful
boolean
Whether the compression completed successfully.
latency_ms
number
End-to-end request latency in milliseconds.
request_metadata
object

Error responses

StatusMeaning
400 Bad RequestMalformed request body or missing required fields.
401 UnauthorizedMissing or invalid x-api-key.
429 Too Many RequestsRate limit exceeded. Back off and retry.
500 Internal Server ErrorCompression service unavailable.

Authentication

Include your API key in every request using the x-api-key header.
-H "x-api-key: <your-api-key>"

Examples

Auto compression

curl -X POST https://api.scaledown.xyz/compress/raw/ \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "context": "ScaleDown is a context engineering platform. It compresses AI prompts while preserving semantic integrity...",
    "prompt": "Summarize what ScaleDown does in one sentence.",
    "scaledown": {
      "rate": "auto"
    }
  }'
Response:
{
  "compressed_prompt": "ScaleDown: context engineering platform, compresses AI prompts, preserves semantic integrity.\n\nSummarize what ScaleDown does in one sentence.",
  "original_prompt_tokens": 150,
  "compressed_prompt_tokens": 65,
  "successful": true,
  "latency_ms": 2341,
  "request_metadata": {
    "compression_time_ms": 2341,
    "compression_rate": "auto",
    "prompt_length": 425,
    "compressed_prompt_length": 189
  }
}

Fixed compression rate

Pass a number for rate when you need a guaranteed token budget.
curl -X POST https://api.scaledown.xyz/compress/raw/ \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "context": "...",
    "prompt": "What are the key points?",
    "scaledown": {
      "rate": 0.4
    }
  }'

Notes

  • "auto" rate is recommended for most use cases. Fixed rates below 0.3 may noticeably affect output quality on dense technical content.
  • The compressed_prompt field is a single string — pass it as the full prompt to your downstream model, replacing both context and prompt.
  • Token counts are estimated using the same tokenizer as the target model family. Exact counts may vary slightly depending on the model you use downstream.