Getting Started with ScaleDown: Your AI Cost Optimization Guide

What is ScaleDown?

ScaleDown is a suite of task-specific small language models (SLMs) that reduce your AI token usage through context extraction - identifying and retaining only the information that matters for your task. You get the same quality responses while paying significantly less.

Before You Start

To use ScaleDown, you’ll need:

An API key
Basic knowledge of making API calls
Your existing AI prompts that you want to optimize

Ready to get your API key? Contact our sales team.

Your First ScaleDown Request

Step 1: Set Up Your Request

Here’s how to make your first API call to compress a prompt.

Python
TypeScript
JavaScript

import requests
import json

# ScaleDown API endpoint
url = "https://api.scaledown.xyz/compress/raw/"

# Your headers (replace YOUR_API_KEY with your actual key)
headers = {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json'
}

// ScaleDown API endpoint
const url = "[https://api.scaledown.xyz/compress/raw/](https://api.scaledown.xyz/compress/raw/)";

// Your headers (replace YOUR_API_KEY with your actual key)
const headers = {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json'
};

// ScaleDown API endpoint
const url = "[https://api.scaledown.xyz/compress/raw/](https://api.scaledown.xyz/compress/raw/)";

// Your headers (replace YOUR_API_KEY with your actual key)
const headers = {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json'
};

Step 2: Configure Your Compression

Separate your context from your main prompt and set the compression rate to "auto" for the best results.

Python
TypeScript
JavaScript

payload = {
    "context": "Context about your specific topic or instructions here",
    "prompt": "Your actual query or question here",
    "scaledown": {
        "rate": "auto" # Automatic compression rate optimization
    }
}

interface CompressRequest {
    context: string;
    prompt: string;
    scaledown: {
        rate: string;
    };
}

const payload: CompressRequest = {
    context: "Context about your specific topic or instructions here",
    prompt: "Your actual query or question here",
    scaledown: {
        rate: "auto" // Automatic compression rate optimization
    }
};

const payload = {
    context: "Context about your specific topic or instructions here",
    prompt: "Your actual query or question here",
    scaledown: {
        rate: "auto" // Automatic compression rate optimization
    }
};

Step 3: Make the Request

With your request set up and configured, you can now execute the API call.

Python
TypeScript
JavaScript

response = requests.post(url, headers=headers, data=json.dumps(payload))
result = response.json()

print(result)

// Assuming you're using a fetch-like library (e.g., node-fetch)
const response = await fetch(url, {
    method: 'POST',
    headers: headers,
    body: JSON.stringify(payload)
});
const result = await response.json();

console.log(result);

// Using Fetch API in a browser or Node.js environment
fetch(url, {
    method: 'POST',
    headers: headers,
    body: JSON.stringify(payload)
})
.then(response => response.json())
.then(result => {
    console.log(result);
})
.catch(error => console.error('Error:', error));

That’s it! Your prompt is now compressed and ready to be used with your AI model.

Understanding the Response Structure

The API response provides the compressed prompt along with useful metadata about the operation.

{
  "compressed_prompt": "Your optimized context here...",
  "original_prompt_tokens": 150,
  "compressed_prompt_tokens": 65,
  "successful": true,
  "latency_ms": 2341,
  "request_metadata": {
    "compression_time_ms": 2341,
    "compression_rate": "auto",
    "prompt_length": 425,
    "compressed_prompt_length": 189
  }
}

​Getting Started with ScaleDown: Your AI Cost Optimization Guide

​What is ScaleDown?

​Before You Start

​Your First ScaleDown Request

​Step 1: Set Up Your Request

​Step 2: Configure Your Compression

​Step 3: Make the Request

​Understanding the Response Structure