Skip to main content

Getting Started with ScaleDown: Your AI Cost Optimization Guide

ScaleDown Team • March 13, 2025 • 10 min read

What is ScaleDown?

ScaleDown is a suite of task-specific small language models (SLMs) that reduce your AI token usage through context extraction — identifying and retaining only the information that matters for your task. You get the same quality responses while paying significantly less.

Before You Start

To use ScaleDown, you’ll need:
  • An API key
  • Basic knowledge of making API calls
  • Your existing AI prompts that you want to optimize
Ready to get your API key? Contact our sales team.

Your First ScaleDown Request

Step 1: Set Up Your Request

Here’s how to make your first API call to compress a prompt.
import requests
import json

# ScaleDown API endpoint
url = "https://api.scaledown.xyz/compress/raw/"

# Your headers (replace YOUR_API_KEY with your actual key)
headers = {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json'
}

Step 2: Configure Your Compression

Separate your context from your main prompt and set the compression rate to "auto" for the best results.
payload = {
    "context": "Context about your specific topic or instructions here",
    "prompt": "Your actual query or question here",
    "scaledown": {
        "rate": "auto" # Automatic compression rate optimization
    }
}

Step 3: Make the Request

With your request set up and configured, you can now execute the API call.
response = requests.post(url, headers=headers, data=json.dumps(payload))
result = response.json()

print(result)
That’s it! Your prompt is now compressed and ready to be used with your AI model.

Understanding the Response Structure

The API response provides the compressed prompt along with useful metadata about the operation.
{
  "compressed_prompt": "Your optimized context here...",
  "original_prompt_tokens": 150,
  "compressed_prompt_tokens": 65,
  "successful": true,
  "latency_ms": 2341,
  "request_metadata": {
    "compression_time_ms": 2341,
    "compression_rate": "auto",
    "prompt_length": 425,
    "compressed_prompt_length": 189
  }
}