Smoo AI LLM Gateway

79+ frontier models, one endpoint.

Drop your OpenAI SDK's base URL to https://llm.smoo.ai/v1. Same shapes, same streaming, same tool calls — with unified billing, org-scoped keys, and cross-lab fallback chains that catch 429s, 5xxs, and timeouts before they hit your code.

15 frontier42 smart12 fastOpenAI · Anthropic · Google · Groq · DeepSeek · Qwen · Kimi · GLM · MiniMax

Drop-in quickstart

Same OpenAI SDK you already use. Different base URL and your Smoo AI virtual key. That's it.

curl
curl https://llm.smoo.ai/v1/chat/completions \
  -H "Authorization: Bearer $SMOOAI_LLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
python
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SMOOAI_LLM_KEY"],
    base_url="https://llm.smoo.ai/v1",
)

resp = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
typescript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.SMOOAI_LLM_KEY,
  baseURL: 'https://llm.smoo.ai/v1',
});

const resp = await client.chat.completions.create({
  model: 'gemini-2.5-flash',
  messages: [{ role: 'user', content: 'Hello' }],
});
console.log(resp.choices[0].message.content);

Model catalog

Pricing shown is passthrough cost in USD per million tokens, used as the basis for org-metered overage. See /pricing for plan-included allowances and volume tiers.

ModelFamilyTierContextInput / 1MOutput / 1MBest for
claude-opus-4-6AnthropicFrontier200K$15.00$75.00Deepest multi-step reasoning, long-horizon planning, high-fidelity code
claude-sonnet-4-6AnthropicSmart200K$3.00$15.00Best tool-use + diff fidelity in our coding tests (BFCL v3, τ²-bench)
claude-sonnet-4-5AnthropicSmart200K$3.00$15.00Sonnet 4.5 — kept available for prompts pinned before 4.6
claude-haiku-4-5AnthropicFast200K$1.00$5.00Cheap, fast, strong JSON adherence — good for judges + classifiers
claude-opus-4-7AnthropicFrontier200K$5.00$25.0064.3% SWE-bench Pro — step-change agentic coding over 4.6 (GA 2026-04-16)
claude-opus-4-8AnthropicFrontier1M$5.00$25.00Current Anthropic flagship — 1M context, ~4× less likely than 4.7 to slip a flaw (GA 2026-05-28)
Fast mode available at $10/$25 per 1M for ~2.5× speed
gpt-5.5OpenAIFrontier400K$5.00$30.00Current OpenAI flagship — GA 2026-04-24, reduced hallucination on regulated domains
gpt-5.5-proOpenAIFrontier400K$30.00$180.00GPT-5.5 Pro reasoning tier — most capable, highest cost
gpt-5.4OpenAIFrontier400K$2.50$15.00Mid-frontier GPT-5.4 — between gpt-5 and gpt-5.5 on capability + cost
gpt-5.4-miniOpenAISmart400K$0.75$4.50Cheaper smart-tier GPT-5.4 sibling
gpt-5.4-nanoOpenAIFast400K$0.20$1.25Ultra-cheap GPT-5.4-nano — high-volume structured output
gpt-5.4-proOpenAIFrontier400K$30.00$180.00GPT-5.4 Pro reasoning tier
gpt-5.2OpenAISmart400K$1.75$14.00GPT-5.2 — mid-tier between 5.1 and 5.4
gpt-5.2-codexOpenAISmart400K$1.75$14.00GPT-5.2 with Codex-coding training
gpt-5.2-proOpenAIFrontier400K$21.00$168.00GPT-5.2 Pro reasoning tier
gpt-5.1OpenAISmart400K$1.25$10.00GPT-5.1 — refined GPT-5 family entry
gpt-5.1-codexOpenAISmart400K$1.25$10.00GPT-5.1 with Codex-coding training
gpt-5OpenAIFrontier256K$2.50$10.00Prior flagship — kept for pinned prompts
gpt-5-proOpenAIFrontier256K$15.00$120.00GPT-5 Pro reasoning tier
gpt-5-codexOpenAISmart256K$1.25$10.00GPT-5 with Codex-coding training
gpt-5-miniOpenAISmart256K$0.50$2.00Balanced smart-tier option with GPT-5 training
gpt-5-nanoOpenAIFast400K$0.20$1.25Cheapest GPT-5 variant, good for high-volume structured output
gpt-4.1OpenAISmart1M$2.00$8.00Big context, strong coding + tool use
gpt-4.1-miniOpenAIFast1M$0.40$1.60Long-context, low-cost workhorse
gpt-4.1-nanoOpenAIFast1M$0.10$0.40Ultra-cheap 1M-context option for ingestion + summaries
gpt-4oOpenAISmart128K$2.50$10.00Mature multimodal (text + image); good for stable prompts
gpt-4o-miniOpenAIFast128K$0.15$0.60Battle-tested cheap tier — wide SDK compatibility
o3OpenAISpecialty200K$2.00$8.00o-series reasoning — visible chain-of-thought, strong on math + logic
o3-miniOpenAISpecialty200K$1.10$4.40Cheaper o-series reasoning sibling
o3-proOpenAISpecialty200K$20.00$80.00o3 Pro — extended reasoning for hardest problems
o4-miniOpenAISpecialty200K$1.10$4.40Reasoning-optimized — strong at math, logic, code synthesis
omni-moderation-latestOpenAISpecialty32KFreeFreeFree content safety classifier — used by built-in guardrails
Free from OpenAI; Smoo passes through at cost
gemini-3.5-flashGoogleSmart1M$1.50$9.00Current Google flagship Flash — GA 2026-05-19, 76.2% Terminal-Bench 2.1
gemini-2.5-proGoogleFrontier1M$1.25$10.00Frontier reasoning with 1M context; great for large-doc analysis
gemini-2.5-flashGoogleSmart1M$0.30$2.50Best tool-use-per-dollar (BFCL v3 leader in its price band)
Smoo AI default smart model
gemini-2.5-flash-liteGoogleFast1M$0.10$0.40Very cheap, 1M context, fast first-token
gemini-2.0-flashGoogleFast1M$0.10$0.40Stable 2.0 family — kept for pinned prompts
gemini-3-flash-previewGoogleSmart1MNext-gen Flash preview — 3/3 PASS on CS escalation E2E
Preview pricing not yet published
gemini-3.1-flash-liteGoogleFast1MGA 3.1 Flash-Lite — 2.1s TTFT latency champion, voice-pipeline-tier
Pricing not yet posted in our catalog; check the dashboard for live rate
gemini-3.1-flash-lite-previewGoogleFast1MPreview alias retained for backwards-compat with pinned callers
Use gemini-3.1-flash-lite (GA) for new code
gemini-3-pro-previewGoogleFrontier1MNext-gen Pro preview
Preview pricing not yet published
gemini-3.1-pro-previewGoogleFrontier1MNext-gen Pro refresh preview
Preview pricing not yet published
groq-llama-3.3-70bGroqSmart128K$0.59$0.79Llama 3.3 70B on Groq — fast, cheap, clean tool loops
groq-llama-3.1-8bGroqFast128K$0.050$0.080Sub-300ms first token; cheapest path through Groq
Smoo AI default fast model (used for voice pipeline)
groq-llama-4-scoutGroqSmart10M$0.11$0.3410M context — ingestion + large document reasoning
groq-llama-4-maverickGroqSmart1M$0.20$0.60Larger Llama 4 variant, stronger reasoning than Scout
groq-kimi-k2GroqSmart128K$1.00$3.00Kimi K2-Instruct — MoE design, strong agentic task quality
groq-gpt-oss-120bGroqSmart128K$0.15$0.60OpenAI OSS 120B — best for single-turn generation
Not recommended for multi-turn tool loops; known to drop structured output
groq-gpt-oss-20bGroqFast128K$0.10$0.30OpenAI OSS 20B — cheap, fast single-shot generation
Not recommended for multi-turn tool loops
groq-gpt-oss-safeguard-20bGroqSpecialty128K$0.10$0.30Safety-tuned open-weight GPT — content moderation tasks
deepseek-v4-flashDeepSeekSmart1M$0.14$0.281M context, dual Thinking/Non-Thinking modes — smooth-reasoning primary
deepseek-v4-proDeepSeekSmart1M$0.43$0.87Pro-tier V4 reasoner — 75% intro discount through 2026-05-31
List price $1.74/$3.48 per 1M; refresh when intro ends
deepseek-chatDeepSeekSmart1M$0.14$0.28Legacy alias — routes to deepseek-v4-flash (retiring 2026-07-24)
deepseek-reasonerDeepSeekSmart1M$0.43$0.87Legacy alias — routes to deepseek-v4-pro (retiring 2026-07-24)
qwen-3.7-max-directAlibaba DashScopeFrontier1M$2.50$7.50Current Qwen flagship — agent-first, native thinking, 200 tok/s, SWE-Pro + Terminal-Bench tier winner (GA 2026-05-20)
90% cache-hit discount ($0.25/M); accepts both OpenAI ChatCompletions and Anthropic Messages format
qwen-3.6-plus-directAlibaba DashScopeSmart1M$0.33$1.95Qwen 3.6 Plus generalist — GA 2026-04-02, 1M context
qwen3-coder-flash-directAlibaba DashScopeSmart1M$0.30$1.50Bench-winning coder — smooth-coding primary, 16/16 aider-polyglot PASS
qwen3-coder-plus-directAlibaba DashScopeSmart1M$1.00$5.00PR-review tuned coder for large diffs
kimi-k2.6-directMoonshotSmart262K$0.95$4.00Current Kimi flagship — ties GPT-5.5 on SWE-Bench Pro, GA 2026-04-20
kimi-k2-thinking-directMoonshotSmart256K$0.60$2.50Deepest reasoner in the Kimi line — smooth-reasoning fallback
kimi-k2.5-directMoonshotSmart256K$0.60$2.50Flagship general-purpose Kimi via Moonshot direct
glm-5.1-directZ.aiSmart200K$0.60$2.2058.4% SWE-Pro — coder-forward, smooth-coding fallback (GA 2026-04-07)
glm-5-directZ.aiSmart200K$0.60$1.92Faster GLM (78 tok/s vs 5.1's 54) — GA 2026-02-11
minimax-m2-directMiniMaxSmart200K$0.30$1.20Frontier-class reviewer at $0.30 input — smooth-reviewing primary
minimax-m2.7-directMiniMaxSmart200K$0.30$1.20Current MiniMax flagship — same price as M2
minimax-m2.7-highspeed-directMiniMaxSmart200K$0.60$2.40Throughput-optimized M2.7 — ~100 tps
deepseek-v3.2DeepSeek (via aggregator)Smart128K$0.27$1.10Aggregator-routed V3.2 — emergency failover only
deepseek-r1DeepSeek (via aggregator)Frontier64K$0.55$2.19Aggregator-routed R1 reasoner — emergency failover only
glm-5.1Z.ai (via aggregator)Smart128K$0.60$2.20Aggregator-routed GLM 5.1 — emergency failover only
minimax-m2.7MiniMax (via aggregator)Smart200K$0.30$1.20Aggregator-routed M2.7 — emergency failover only
minimax-m2.5MiniMax (via aggregator)Smart200K$0.30$1.20Aggregator-routed M2.5 — emergency failover only
kimi-k2.5Moonshot (via aggregator)Smart256K$0.60$2.50Aggregator-routed K2.5 — emergency failover only
kimi-k2-thinkingMoonshot (via aggregator)Smart256K$0.60$2.50Aggregator-routed K2-Thinking — emergency failover only
qwen3-coder-plusAlibaba (via aggregator)Smart1M$1.00$5.00Aggregator-routed Coder-Plus — emergency failover only
qwen3-coder-flashAlibaba (via aggregator)Smart1M$0.30$1.50Aggregator-routed Coder-Flash — emergency failover only
text-embedding-3-smallOpenAIEmbedding8K$0.0201536-dim embeddings — Smoo AI default for knowledge base ingestion
text-embedding-3-largeOpenAIEmbedding8K$0.133072-dim embeddings — higher retrieval quality for specialist corpora
gemini-embedding-001GoogleEmbedding8K$0.153072-dim Gemini embeddings — strong on multilingual + code
gemini-embedding-002GoogleEmbedding8KFirst natively multimodal embedding — text + images + video + audio in one space, Matryoshka 3072→1536→768 dims (GA 2026-04-23)
Strict upgrade over -001 for new RAG surfaces; keep -001 for index compatibility

Prices refresh as upstream labs publish changes — your dashboard shows live rates and the effective rate after your plan's tier allowance. Overage is billed per your subscription tier.

Smooth semantic aliases

Point the Smooth coding runtime at llm.smoo.ai/v1 and use stable intent-based model names. Aliases re-target to new upstream models as better options ship — your code doesn't change.

AliasResolves toPurpose
smooth-codingqwen3-coder-flashCoding workhorse — best agentic tool-use, native OpenAI tool_calls, 1M ctx
smooth-reasoningdeepseek-v4-flashDeep reasoning + planning — 1M ctx, dual Thinking/Non-Thinking, $0.14/$0.28
smooth-reviewingMiniMax-M2Adversarial critique — different lab from the coder
smooth-judgegroq/llama-3.1-8b-instantSub-300ms JSON judge for guardrails + Narc verdicts
smooth-summarizegemini-2.5-flashLong-context summaries + compression (1M ctx)
smooth-planninggemini-2.5-flashStructured mapper / planning flows
smooth-fastgroq/llama-3.1-8b-instantSub-300ms utility — session naming, titles, autocomplete, voice fast-router
smooth-thinkingdeepseek-v4-flashDeprecated alias for smooth-reasoning (legacy callers)

Every alias has a fallback chain — a single provider outage degrades to the next-best option rather than failing the request.

Smooth Routing — bench-backed primaries, cross-vendor fallback chains

Each smooth-* alias points at the current bench-leader for that purpose and falls back through a hand-curated chain that spans at least three labs. A Vertex 503 or DeepSeek hiccup degrades to the next provider before the caller sees an error. Re-evaluated each time the published-benchmark picture shifts (most recent refresh: May 2026).

smooth-codingCoding workhorse — code generation, diffs, agentic tool loops.
Primary
qwen3-coder-flash
DashScope (Alibaba)
Why this primary

Best agentic tool-use in the cheap-coder tier — native OpenAI tool_calls with no thinking-mode contract. DeepSeek-V4 leads SWE-bench Verified at 80.6% but its thinking-mode trips LiteLLM's reasoning_content guard on turn 1, so we pin Qwen as the entry-point primary and let DeepSeek-class reasoning land in smooth-reasoning. 1M context, $0.30/$1.50.

Fallback chain (in order)
  1. glm-5.1 (Z.ai)
  2. kimi-k2-thinking (Moonshot)
  3. MiniMax-M2 (MiniMax)
smooth-reasoningDeep multi-step reasoning + planning.
Primary
deepseek-v4-flash
DeepSeek
Why this primary

1M context, dual Thinking / Non-Thinking modes, $0.14/$0.28 per 1M — the cheapest top-tier reasoner.

Fallback chain (in order)
  1. kimi-k2-thinking (Moonshot)
  2. qwen3-235b-thinking-2507 (Alibaba)
smooth-reviewingAdversarial critique of generated code (PR review, bug-spotting).
Primary
MiniMax-M2
MiniMax
Why this primary

Cheap ($0.30/$1.20), coding-forward, agentic. Different lab than the coder — catches bugs the coder's training didn't see.

Fallback chain (in order)
  1. qwen3-coder-plus (Alibaba)
  2. glm-5.1 (Z.ai)
  3. kimi-k2-thinking (Moonshot)
smooth-judgePer-tool JSON verdicts (guardrails, Narc, content gates).
Primary
groq/llama-3.1-8b-instant
Groq
Why this primary

Sub-300ms first-token. Verdicts are 1-line JSON; Gemini Flash was overkill at ~10× the cost. Matches accuracy in the small-verdict shape.

Fallback chain (in order)
  1. gemini-2.5-flash (Google)
  2. claude-haiku-4-5 (Anthropic)
  3. gpt-5-mini (OpenAI)
smooth-summarizeTranscript + long-document compression.
Primary
gemini-2.5-flash
Google
Why this primary

1M context, IFEval leader in the cheap Gemini tier, dialable thinking levels.

Fallback chain (in order)
  1. qwen3-coder-plus (Alibaba, 1M ctx backstop)
  2. gpt-5-mini (OpenAI)
smooth-fastLatency-critical utility — session titles, autocomplete, voice fast-router.
Primary
groq/llama-3.1-8b-instant
Groq
Why this primary

Sub-300ms first-token; P95 ~600ms end-to-end on production traffic. ~10× cheaper than the prior Gemini Flash Lite primary, ~1s faster on cold starts.

Fallback chain (in order)
  1. gemini-2.5-flash-lite (Google)
  2. claude-haiku-4-5 (Anthropic)
  3. gpt-5-mini (OpenAI)
smooth-planningStructured-plan mapper — read-only planning flows.
Primary
gemini-2.5-flash
Google
Why this primary

Long context, structured-output friendly, cheap. Avoids Kimi K2-Thinking (overkill + slow for read-only mapping work).

Fallback chain (in order)
  1. smooth-reasoning (DeepSeek)
  2. qwen3-coder-plus (Alibaba, 1M ctx)
  3. claude-haiku-4-5 (Anthropic)

Every primary also has a per-lab variant (e.g. smooth-coding-qwen, smooth-fast-gemini) so callers can pin a specific lab without the cross-vendor chain — useful for compliance, A/B testing, or sticky cache locality.

Why route through Smoo AI

Unified billing

One invoice across every lab. Tier-based token allowances, per-org metering, Stripe-synced overage.

Org-scoped virtual keys

Each organization gets its own key with optional model allowlist and budget cap. Rotate from the dashboard — no downtime.

Cross-lab fallback chains

Every model has a typed fallback chain spanning at least three labs. A Vertex 503, an Anthropic 429, or a DeepSeek timeout degrades to the next provider before surfacing an error.

Drop-in compatibility

OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK — anything that takes a base URL works unchanged.

Streaming, tool use, JSON mode

Everything the upstream model supports passes through untouched, plus kwargs the OpenAI shape does not cover.

Live OpenAPI spec

Full interactive reference below. The spec tracks upstream provider capabilities in real time.

Interactive API reference

Full endpoint + schema reference for everything the gateway serves — chat completions, embeddings, moderations, and key management. Try every endpoint inline.

GET/models

Model List

Use `/model/info` - to get detailed model information, example - pricing, mode, etc. This is just for compatibility with openai projects like aider. Query Parameters: - include_metadata: Include additional metadata in the response with fallback information - fallback_type: Type of fallbacks to include ("general", "context_window", "content_policy") Defaults to "general" when include_metadata=true - scope: Optional scope parameter. Currently only accepts "expand". When scope=expand is passed, proxy admins, team admins, and org admins will receive all proxy models as if they are a proxy admin.

Requires authentication

Query Parameters

NameTypeDescription
return_wildcard_routesstring
team_idstring
include_model_access_groupsstring
only_model_access_groupsstring
include_metadatastring
fallback_typestring
scopestring

Responses

200Successful Response
422Validation Error
PropertyTypeDescription
detail
array

Code Examples

curl -X GET https://llm.smoo.ai/models \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN"