Quickstart

From zero to a routed, streamed response — and pointing the gateway at your own VPS — in about five minutes.

This is a learn-by-doing guide. Every step has copy-paste code and a one-line explanation of what just happened.

Get a key

Sign up, finish the short onboarding, then go to Settings → API Keys and click Create key. A string starting with neu_live_ appears once — copy it now, it is never shown again.

Treat the key like a password. Backend-only, never in client-side code, never committed to git. Use environment variables.

Your first call

The gateway exposes an OpenAI-compatible endpoint at https://api.nyuro.ai/v1/chat/completions. Any client that speaks OpenAI's shape works — curl, fetch, the OpenAI SDK, anything.

curl https://api.nyuro.ai/v1/chat/completions \
  -H "Authorization: Bearer neu_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Tell me a one-line joke about kubernetes."}
    ]
  }'

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://api.nyuro.ai/v1",
    api_key="neu_live_…",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a one-line joke about kubernetes."}],
)
print(resp.choices[0].message.content)

// npm i openai
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.nyuro.ai/v1",
  apiKey: process.env.NYURO_API_KEY!,
});

const resp = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Tell me a one-line joke about kubernetes." }],
});

console.log(resp.choices[0].message.content);

What just happened? Your call hit the gateway, which checked the key, picked gpt-4o-mini, called the provider on your behalf, and returned the same shape OpenAI uses — but now one URL can reach dozens of models.

Let routing pick

Naming a model is fine when you know what you want. When you don't, pass auto, or describe the shape of answer you need with a strategy.

resp = client.chat.completions.create(
    model="strategy:cost",            # cheapest viable
    # model="strategy:quality",       # strongest reasoner
    # model="strategy:latency",       # fastest first token
    # model="strategy:local",         # only your Ollama / vLLM
    messages=[{"role": "user", "content": "Explain TLS like I'm five."}],
)

await client.chat.completions.create({
  model: "strategy:cost",
  messages: [{ role: "user", content: "Explain TLS like I'm five." }],
});

curl https://api.nyuro.ai/v1/chat/completions \
  -H "Authorization: Bearer neu_live_…" \
  -H "Content-Type: application/json" \
  -d '{"model": "strategy:cost", "messages": [{"role": "user", "content": "Explain TLS like I am five."}]}'

Mental model

strategy:cost means "cheapest that still works". strategy:quality means "this answer matters, spare no expense". strategy:local means "do not phone home". Same code, very different decisions. See Routing.

Stream it

Add one parameter and tokens flow back as they are generated.

stream = client.chat.completions.create(
    model="claude-3-5-sonnet",
    messages=[{"role": "user", "content": "Write a haiku about server logs."}],
    stream=True,                     # the only change
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

const stream = await client.chat.completions.create({
  model: "claude-3-5-sonnet",
  messages: [{ role: "user", content: "Write a haiku about server logs." }],
  stream: true,                     // the only change
});

for await (const chunk of stream) {
  const delta = chunk.choices[0].delta?.content;
  if (delta) process.stdout.write(delta);
}

We forward Server-Sent Events from the underlying provider with no shape change, so any SDK that expects OpenAI-style streaming just works.

Tell us your industry

Tell the router the industry and it picks a model tuned for that kind of question — pass it as metadata, or pin directly with industry:<name>.

resp = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this MSA in 3 bullets."}],
    extra_body={"metadata": {"industry": "legal"}},
)
# or pin directly:  model="industry:legal"

await client.chat.completions.create({
  model: "industry:legal",
  messages: [{ role: "user", content: "Summarize this MSA in 3 bullets." }],
});

Supported tags include legal, healthcare, finance, code, creative, support, sales, data, education, and general. See the full list in Models.

Point it at your VPS

Running Ollama or vLLM on your own box gives you privacy and predictable cost — without changing a line of client code. Set where your local models live on the gateway side:

.env

# Ollama on your VPS, over HTTPS:
OLLAMA_BASE_URL=https://ollama.yourdomain.com:11434

# vLLM on a GPU node:
VLLM_BASE_URL=https://gpu1.yourdomain.com:8001/v1

Then call exactly like before, naming a local model:

resp = client.chat.completions.create(
    model="llama3.1",                  # lives on your VPS
    # model="strategy:local",          # any local model the router picks
    messages=[{"role": "user", "content": "What's in the strawberry?"}],
)

The migration trick

Move models onto your VPS without touching client code — flip a base URL on the gateway side and your apps don't notice. More in BYOK & local models.

Quickstart

Get a key

Your first call

Let routing pick

Stream it

Tell us your industry

Point it at your VPS

Where to go next

Unified API reference

Sample code — 6 languages

Routing & strategies

Budgets & governance

On this page