Skip to main content

Router Parameters

The model router exposes 4 parameters than can be used to control costs.

  • max_cost - The maximum cost of the total request in USD. Allows you to specify an upper bound on what you are willing to pay for the request.

  • max_cost_per_million_tokens - The maximum cost of each 1 million tokens in the request, in USD. Allows you to specify a minimum efficiency for your models.

  • model - The set of models from which we should be able to route. Allows you to select only models within a certain cost range.

  • willingness_to_pay - A parameter specifying the values of getting better output, measured in dollars. A value of 0.1, for example, indicates that each 10% improvement in performance is 10 cents. If this parameter is not set, it defaults to infinity, which indicates that we should optimize only for performance.

Each of these controls can be useful in different scenarios, and multiple methods of controlling cost and be used simultaneously.

OpenAI Python​

Using OpenAI's SDK, the following function.

from openai import OpenAI

client = OpenAI(
api_key="YOUR_UMAMIAI_API_KEY", # defaults to os.environ.get("UMAMIAI_API_KEY")
)

chat_completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Say this is a test",
}
],
)

becomes

from openai import OpenAI

client = OpenAI(
api_key="YOUR_UMAMIAI_API_KEY", # defaults to os.environ.get("UMAMIAI_API_KEY")
base_url="https://api.umamiai.xyz/router",
)

chat_completion = client.chat.completions.create(
model="router",
messages=[
{
"role": "user",
"content": "Say this is a test",
}
],
)

Set Routing Parameters​

Once you switch to the router, you can control the criteria used for routing.

For example, if you want to route between a specific set of models, you can include them as a list in the models field. We'll only route between the models you specify in that field.

curl https://api.umamiai.xyz/router/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_UMAMIAI_API_KEYY>" \
-d '{
"model": "router",
"models": ["gpt-3.5-turbo", "claude-2.1"],
"messages": [
{
"role": "user",
"content": "Hello world!"
}
],
"temperature": 1
}'

If you don't care only about performance, but also about other factors such as cost, you can specify things like the max_cost for your request or your willingness_to_pay i.e. how many dollars you are willing to pay for a 10% better answer on this request.

curl https://api.umamiai.xyz/router/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_UMAMIAI_API_KEYY>" \
-d '{
"model": "router",
"models": ["gpt-3.5-turbo", "claude-2.1"],
"max_cost": 0.02,
"willingness_to_pay": 0.01,
"messages": [
{
"role": "user",
"content": "Hello world!"
}
],
"temperature": 1
}'
}'

These parameters let you tightly control your unit economics. For example, if you have a free product, you can use the max_cost parameter to calculate your CAC. Or, if you know how much having a correct answer in your product is worth, you can set the willingness_to_pay parameter and get answers that perfectly balance cost and performance.

Finally, you can further capture any arbitrary metadata on each request as long as you set the optional extra parameter.

curl https://api.umamiai.xyz/router/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_UMAMIAI_API_KEYY>" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Write small poem"
}
],
"extra": {
"ip": "123.123.123.123",
"Timezone": "UTC+0",
"Country": "US",
"City": "New York"
}
}'

Other Benefits Of The Router​

In addition to providing higher performance and lower cost, the router provides several other benefits. The router lets you:

Use fallbacks for higher up-time​

Instead of using a single model or provider, which could go down at any time, using the router lets you switch to other providers in the event that one is down

Future proof your tech​

When new models come out, they're automatically added to the router. That way, you get the latest and greatest in performance

Focus on product​

Testing every single model and reading every single research paper to get the best in performance -- that requires hiring an entire engineering team. With the router, you can stay ahead of your competition and stop fiddling with models. Focus on your specialty: building a product your customers love.