BYOK & local models

Bring your own provider keys, or route to Ollama and vLLM on your own VPS / GPU node — without changing client code.

Nyuro is a bring-your-own-keys gateway. You supply the provider credentials and the infrastructure; Nyuro provides the routing, governance, and observability on top. Sensitive workloads can stay entirely on your perimeter.

Bring your own provider keys

Configure provider credentials on the gateway and they are used for outbound calls on your behalf. Keys are stored encrypted at rest:

.env

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Provider keys live only on the gateway side and are never returned to clients. Client applications authenticate with a Nyuro key (neu_live_…) and never see the upstream provider credentials.

Route to your own VPS

Point the gateway at Ollama or vLLM running on your own machines. Those models join the catalog and are reachable by name or via strategy:local — with no change to client code:

.env

# Ollama on your VPS, over HTTPS:
OLLAMA_BASE_URL=https://ollama.yourdomain.com:11434

# vLLM on a GPU node:
VLLM_BASE_URL=https://gpu1.yourdomain.com:8001/v1

# Same client, same key, same shape:
client.chat.completions.create(
    model="llama3.1",          # lives on your VPS
    # model="strategy:local",  # any local model the router picks
    messages=[{"role": "user", "content": "Keep this on-prem, please."}],
)

Why this matters

Data residency

Route privacy-sensitive prompts to strategy:local so they never leave your network, while everything else uses the best cloud model.

Predictable cost

Self-hosted models have no per-token meter. Mix them with cloud models and let budgets cap the rest.

No client churn

Move a model between cloud and your VPS by flipping a base URL on the gateway. Your applications never notice.

One integration

Local and cloud models share the same OpenAI-compatible API, the same keys, and the same observability.