Routing & strategies
How Nyuro picks a model — auto classification, strategy hints, fallback chains, and the transparency headers that tell you what happened.
Routing is the core of the gateway. You express intent; Nyuro resolves that into an ordered list of candidate models, tries them in order, and tells you exactly what it did.
Strategies
Pass a strategy: directive when you care about an outcome more than a specific
model:
| Strategy | Picks |
|---|---|
strategy:cost | The cheapest model that can handle the request |
strategy:quality | The strongest reasoner available |
strategy:latency | The fastest time-to-first-token |
strategy:local | Only models running on your own infrastructure |
client.chat.completions.create(
model="strategy:quality",
messages=[{"role": "user", "content": "Draft a precise refund-policy clause."}],
)Auto classification
With model="auto", a dependency-free heuristic classifier inspects the prompt
— code fences, reasoning keywords, length, math symbols — and maps it to a task
class, with no extra LLM call on the hot path:
- code →
qwen2.5-coder - reasoning →
claude-3-5-sonnet - summary →
gpt-4o-mini - long context →
gpt-4o - general chat →
gpt-4o-mini
Fallback chains
Send a models array (plural) instead of a single model. The gateway tries
each entry in order and falls through on a provider error. Each entry resolves
through the same router, so you can mix concrete aliases with auto,
strategy:cost, or industry:legal.
import httpx, os
resp = httpx.post(
"https://api.nyuro.ai/v1/chat/completions",
headers={"Authorization": f"Bearer {os.environ['NYURO_API_KEY']}"},
json={
"models": ["gpt-4o", "claude-3-5-sonnet", "strategy:cost"],
"messages": [{"role": "user", "content": "Summarize transformer attention in 2 lines."}],
},
)
print(resp.headers.get("X-Nyuro-Model"), resp.headers.get("X-Nyuro-Fallback-Used"))curl -D - https://api.nyuro.ai/v1/chat/completions \
-H "Authorization: Bearer neu_live_…" \
-H "Content-Type: application/json" \
-d '{"models": ["gpt-4o", "gpt-4o-mini"], "messages": [{"role": "user", "content": "Hi"}]}'Sorting candidates
Append sort:price or max_price:<usd> semantics through strategies to bias
the candidate ordering toward cheaper or faster models across the resolved
chain.
Transparency headers
Every response tells you what the router decided:
| Header | Meaning |
|---|---|
X-Nyuro-Model | The model that actually answered |
X-Nyuro-Route-Reason | Human-readable reason for the choice |
X-Nyuro-Fallback-Used | true if the primary candidate was skipped |
X-Nyuro-Candidates | The comma-separated candidate chain that was resolved |
X-Nyuro-Model: qwen2.5-coder
X-Nyuro-Route-Reason: detected code/engineering cues → code specialist
X-Nyuro-Fallback-Used: false
X-Nyuro-Candidates: qwen2.5-coder,claude-3-5-sonnet,gpt-4oThese headers make routing auditable — pair them with Observability to see the decision for every historical request.