Quickstart
From zero to a routed, streamed response — and pointing the gateway at your own VPS — in about five minutes.
This is a learn-by-doing guide. Every step has copy-paste code and a one-line explanation of what just happened.
Get a key
Sign up, finish the short onboarding, then go to Settings → API Keys and
click Create key. A string starting with neu_live_ appears once — copy
it now, it is never shown again.
Treat the key like a password. Backend-only, never in client-side code, never committed to git. Use environment variables.
Your first call
The gateway exposes an OpenAI-compatible endpoint at
https://api.nyuro.ai/v1/chat/completions. Any client that speaks OpenAI's
shape works — curl, fetch, the OpenAI SDK, anything.
curl https://api.nyuro.ai/v1/chat/completions \
-H "Authorization: Bearer neu_live_…" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Tell me a one-line joke about kubernetes."}
]
}'# pip install openai
from openai import OpenAI
client = OpenAI(
base_url="https://api.nyuro.ai/v1",
api_key="neu_live_…",
)
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a one-line joke about kubernetes."}],
)
print(resp.choices[0].message.content)// npm i openai
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.nyuro.ai/v1",
apiKey: process.env.NYURO_API_KEY!,
});
const resp = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Tell me a one-line joke about kubernetes." }],
});
console.log(resp.choices[0].message.content);What just happened? Your call hit the gateway, which checked the key, picked
gpt-4o-mini, called the provider on your behalf, and returned the same shape
OpenAI uses — but now one URL can reach dozens of models.
Let routing pick
Naming a model is fine when you know what you want. When you don't, pass auto,
or describe the shape of answer you need with a strategy.
resp = client.chat.completions.create(
model="strategy:cost", # cheapest viable
# model="strategy:quality", # strongest reasoner
# model="strategy:latency", # fastest first token
# model="strategy:local", # only your Ollama / vLLM
messages=[{"role": "user", "content": "Explain TLS like I'm five."}],
)await client.chat.completions.create({
model: "strategy:cost",
messages: [{ role: "user", content: "Explain TLS like I'm five." }],
});curl https://api.nyuro.ai/v1/chat/completions \
-H "Authorization: Bearer neu_live_…" \
-H "Content-Type: application/json" \
-d '{"model": "strategy:cost", "messages": [{"role": "user", "content": "Explain TLS like I am five."}]}'Mental model
strategy:cost means "cheapest that still works". strategy:quality means
"this answer matters, spare no expense". strategy:local means "do not phone
home". Same code, very different decisions. See Routing.
Stream it
Add one parameter and tokens flow back as they are generated.
stream = client.chat.completions.create(
model="claude-3-5-sonnet",
messages=[{"role": "user", "content": "Write a haiku about server logs."}],
stream=True, # the only change
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)const stream = await client.chat.completions.create({
model: "claude-3-5-sonnet",
messages: [{ role: "user", content: "Write a haiku about server logs." }],
stream: true, // the only change
});
for await (const chunk of stream) {
const delta = chunk.choices[0].delta?.content;
if (delta) process.stdout.write(delta);
}We forward Server-Sent Events from the underlying provider with no shape change, so any SDK that expects OpenAI-style streaming just works.
Tell us your industry
Tell the router the industry and it picks a model tuned for that kind of
question — pass it as metadata, or pin directly with industry:<name>.
resp = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Summarize this MSA in 3 bullets."}],
extra_body={"metadata": {"industry": "legal"}},
)
# or pin directly: model="industry:legal"await client.chat.completions.create({
model: "industry:legal",
messages: [{ role: "user", content: "Summarize this MSA in 3 bullets." }],
});Supported tags include legal, healthcare, finance, code, creative,
support, sales, data, education, and general. See the full list in
Models.
Point it at your VPS
Running Ollama or vLLM on your own box gives you privacy and predictable cost — without changing a line of client code. Set where your local models live on the gateway side:
# Ollama on your VPS, over HTTPS:
OLLAMA_BASE_URL=https://ollama.yourdomain.com:11434
# vLLM on a GPU node:
VLLM_BASE_URL=https://gpu1.yourdomain.com:8001/v1Then call exactly like before, naming a local model:
resp = client.chat.completions.create(
model="llama3.1", # lives on your VPS
# model="strategy:local", # any local model the router picks
messages=[{"role": "user", "content": "What's in the strawberry?"}],
)The migration trick
Move models onto your VPS without touching client code — flip a base URL on the gateway side and your apps don't notice. More in BYOK & local models.