Overview

Created as part of Eleuther SOAR by Nadya Devani, Pouya Sadeghi, Purva Kandalgaonkar, Suparnojit Sarkar

The Pareto Merging method is an advanced technique for combining multiple specialized AI models into a single, highly efficient model. Unlike traditional methods that create one static merged model, Pareto Merging produces a flexible framework that can generate an infinite number of model variations tailored to specific needs without retraining. This method merges two types of models: a concise, non-reasoning Large Language Model (LLM) and a powerful Large Reasoning Model (LRM). The key innovation is a two-part architecture:
  1. Preference-Independent Base Model: A single, fixed base model combines the general knowledge of the LLM with the reasoning capabilities of the LRM.
  2. Preference-Dependent Tensor: A small, trainable “control knob” that is adjusted on-the-fly based on a predicted question difficulty score.
This allows the system to be dynamic: for easy questions, it leans towards the concise base model to save costs, and for difficult questions, it engages more reasoning power to ensure high accuracy.

Architecture

pareto

Pareto Merging API

Overview

The ScaleDown Pareto Merging API provides dynamic, cost-effective reasoning. It merges a concise LLM with a powerful LRM, allowing you to balance response accuracy and token cost on a per-query basis. Instead of using a static model, our system adjusts its reasoning depth based on the predicted difficulty of your prompt, reducing costs by up to 30% while maintaining high accuracy.

Quick Start

Get started in minutes by using our hosted API endpoint or by running the model locally with our Python package.

Installation

# Placeholder for package installation
pip install scaledown-paretomerge