Compress

Overview

The /compress/raw/ endpoint compresses your prompt and context, reducing token count while maintaining the semantic integrity needed for high-quality AI responses. Compression ratios of 50–70% are typical, with no meaningful degradation in downstream model output quality.

Request

context

string

required

Background information, instructions, or supporting text that provides context for the prompt. This is the content most aggressively compressed — structure and meaning are preserved, but redundancy is removed.

prompt

string

required

The main query or question to send to your AI model. Kept intact where possible to preserve intent.

scaledown

object

required

Compression configuration.

Show scaledown fields

rate

string | number

required

Compression aggressiveness. Use "auto" to let ScaleDown pick the optimal rate based on content, or pass a number between 0 and 1 to set a fixed target ratio (e.g. 0.5 = compress to 50% of original tokens).

Response

compressed_prompt

string

The compressed output, ready to pass directly to your AI model in place of the original context and prompt.

original_prompt_tokens

number

Token count of the original input.

compressed_prompt_tokens

number

Token count of the compressed output.

successful

boolean

Whether the compression completed successfully.

latency_ms

number

End-to-end request latency in milliseconds.

request_metadata

object

Show Metadata fields

compression_time_ms

number

Time spent on compression in milliseconds.

compression_rate

string | number

The compression rate that was applied — either "auto" or the numeric value provided.

prompt_length

number

Character length of the original input.

compressed_prompt_length

number

Character length of the compressed output.

Error responses

Status	Meaning
`400 Bad Request`	Malformed request body or missing required fields.
`401 Unauthorized`	Missing or invalid `x-api-key`.
`429 Too Many Requests`	Rate limit exceeded. Back off and retry.
`500 Internal Server Error`	Compression service unavailable.

Authentication

Include your API key in every request using the x-api-key header.

-H "x-api-key: <your-api-key>"

Examples

Auto compression

curl -X POST https://api.scaledown.xyz/compress/raw/ \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "context": "ScaleDown is a context engineering platform. It compresses AI prompts while preserving semantic integrity...",
    "prompt": "Summarize what ScaleDown does in one sentence.",
    "scaledown": {
      "rate": "auto"
    }
  }'

Response:

{
  "compressed_prompt": "ScaleDown: context engineering platform, compresses AI prompts, preserves semantic integrity.\n\nSummarize what ScaleDown does in one sentence.",
  "original_prompt_tokens": 150,
  "compressed_prompt_tokens": 65,
  "successful": true,
  "latency_ms": 2341,
  "request_metadata": {
    "compression_time_ms": 2341,
    "compression_rate": "auto",
    "prompt_length": 425,
    "compressed_prompt_length": 189
  }
}

Fixed compression rate

Pass a number for rate when you need a guaranteed token budget.

curl -X POST https://api.scaledown.xyz/compress/raw/ \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "context": "...",
    "prompt": "What are the key points?",
    "scaledown": {
      "rate": 0.4
    }
  }'

Notes

"auto" rate is recommended for most use cases. Fixed rates below 0.3 may noticeably affect output quality on dense technical content.
The compressed_prompt field is a single string — pass it as the full prompt to your downstream model, replacing both context and prompt.
Token counts are estimated using the same tokenizer as the target model family. Exact counts may vary slightly depending on the model you use downstream.

Authorizations

x-api-key

string

header

required

Body

application/json

context

string

required

Background information, instructions, or supporting text that provides context for the prompt.

prompt

string

required

The main query or question to send to your AI model.

scaledown

object

required

Show child attributes

Response

Successful compression

compressed_prompt

string

original_prompt_tokens

number

compressed_prompt_tokens

number

successful

boolean

latency_ms

number

request_metadata

object

Show child attributes

Getting Started

Endpoints

Guides

API Reference

Research

Overview

Request

Response

Error responses

Authentication

Examples

Auto compression

Fixed compression rate

Notes

Authorizations

Body

Response

Getting Started

Endpoints

Guides

API Reference

Research

Documentation Index

​Overview

​Request

​Response

​Error responses

​Authentication

​Examples

​Auto compression

​Fixed compression rate

​Notes

Authorizations

Body

Response

Overview

Request

Response

Error responses

Authentication

Examples

Auto compression

Fixed compression rate

Notes