Smoo AI LLM Gateway

80+ frontier models, one endpoint.

Drop your OpenAI SDK's base URL to https://llm.smoo.ai/v1. Same shapes, same streaming, same tool calls — with unified billing, org-scoped keys, and cross-lab fallback chains that catch 429s, 5xxs, and timeouts before they hit your code.

15 frontier43 smart12 fastOpenAI · Anthropic · Google · Groq · DeepSeek · Qwen · Kimi · GLM · MiniMax

Get an API key Browse models

Drop-in quickstart

Same OpenAI SDK you already use. Different base URL and your Smoo AI virtual key. That's it.

Mint your org's virtual key from the dashboard, or from the terminal with th llm create-key. Rotate it with th llm rotate-key and track spend with th llm usage.

import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: process.env.SMOOAI_LLM_KEY,
    baseURL: 'https://llm.smoo.ai/v1',
});

const resp = await client.chat.completions.create({
    model: 'gemini-2.5-flash',
    messages: [{ role: 'user', content: 'Hello' }],
});
console.log(resp.choices[0].message.content);

Model catalog

Pricing shown is passthrough cost in USD per million tokens, used as the basis for org-metered overage. See /pricing for plan-included allowances and volume tiers.

Each model carries capability tags and, where available, independently-published benchmark scores — SWE-bench Verified, GPQA Diamond, and the Artificial Analysis Intelligence Index. Benchmarks are approximate; vendor harnesses vary. Pick by capability and score, not just price.

Model	Family	Tier	Context	Input / 1M	Output / 1M	Best for
claude-opus-4-6	Anthropic	Frontier	200K	$15.00	$75.00	Deepest multi-step reasoning, long-horizon planning, high-fidelity codeVisionMultimodalSWE-bench 80.8%GPQA 90.5%
claude-sonnet-4-6	Anthropic	Smart	200K	$3.00	$15.00	Best tool-use + diff fidelity in our coding tests (BFCL v3, τ²-bench)VisionMultimodalSWE-bench 79.6%GPQA 89.3%
claude-sonnet-4-5	Anthropic	Smart	200K	$3.00	$15.00	Sonnet 4.5 — kept available for prompts pinned before 4.6VisionMultimodal
claude-haiku-4-5	Anthropic	Fast	200K	$1.00	$5.00	Cheap, fast, strong JSON adherence — good for judges + classifiers
claude-opus-4-7	Anthropic	Frontier	200K	$5.00	$25.00	64.3% SWE-bench Pro — step-change agentic coding over 4.6 (GA 2026-04-16)VisionMultimodal
claude-opus-4-8	Anthropic	Frontier	1M	$5.00	$25.00	Current Anthropic flagship — 1M context, ~4× less likely than 4.7 to slip a flaw (GA 2026-05-28) Fast mode available at $10/$25 per 1M for ~2.5× speed VisionMultimodalAA Index 61
gpt-5.5	OpenAI	Frontier	400K	$5.00	$30.00	Current OpenAI flagship — GA 2026-04-24, reduced hallucination on regulated domainsVisionMultimodalSWE-bench 82.6%
gpt-5.5-pro	OpenAI	Frontier	400K	$30.00	$180.00	GPT-5.5 Pro reasoning tier — most capable, highest costVisionMultimodal
gpt-5.4	OpenAI	Frontier	400K	$2.50	$15.00	Mid-frontier GPT-5.4 — between gpt-5 and gpt-5.5 on capability + costVisionMultimodal
gpt-5.4-mini	OpenAI	Smart	400K	$0.75	$4.50	Cheaper smart-tier GPT-5.4 sibling
gpt-5.4-nano	OpenAI	Fast	400K	$0.20	$1.25	Ultra-cheap GPT-5.4-nano — high-volume structured output
gpt-5.4-pro	OpenAI	Frontier	400K	$30.00	$180.00	GPT-5.4 Pro reasoning tierVisionMultimodal
gpt-5.2	OpenAI	Smart	400K	$1.75	$14.00	GPT-5.2 — mid-tier between 5.1 and 5.4VisionMultimodalSWE-bench 80%GPQA 92.4%
gpt-5.2-codex	OpenAI	Smart	400K	$1.75	$14.00	GPT-5.2 with Codex-coding trainingVisionMultimodal
gpt-5.2-pro	OpenAI	Frontier	400K	$21.00	$168.00	GPT-5.2 Pro reasoning tierVisionMultimodal
gpt-5.1	OpenAI	Smart	400K	$1.25	$10.00	GPT-5.1 — refined GPT-5 family entryVisionMultimodal
gpt-5.1-codex	OpenAI	Smart	400K	$1.25	$10.00	GPT-5.1 with Codex-coding trainingVisionMultimodal
gpt-5	OpenAI	Frontier	256K	$2.50	$10.00	Prior flagship — kept for pinned promptsVisionMultimodalSWE-bench 74.9%
gpt-5-pro	OpenAI	Frontier	256K	$15.00	$120.00	GPT-5 Pro reasoning tierVisionMultimodalGPQA 89.4%
gpt-5-codex	OpenAI	Smart	256K	$1.25	$10.00	GPT-5 with Codex-coding trainingVisionMultimodal
gpt-5-mini	OpenAI	Smart	256K	$0.50	$2.00	Balanced smart-tier option with GPT-5 training
gpt-5-nano	OpenAI	Fast	400K	$0.20	$1.25	Cheapest GPT-5 variant, good for high-volume structured output
gpt-4.1	OpenAI	Smart	1M	$2.00	$8.00	Big context, strong coding + tool useVisionMultimodal
gpt-4.1-mini	OpenAI	Fast	1M	$0.40	$1.60	Long-context, low-cost workhorse
gpt-4.1-nano	OpenAI	Fast	1M	$0.10	$0.40	Ultra-cheap 1M-context option for ingestion + summaries
gpt-4o	OpenAI	Smart	128K	$2.50	$10.00	Mature multimodal (text + image); good for stable promptsVisionMultimodal
gpt-4o-mini	OpenAI	Fast	128K	$0.15	$0.60	Battle-tested cheap tier — wide SDK compatibility
o3	OpenAI	Specialty	200K	$2.00	$8.00	o-series reasoning — visible chain-of-thought, strong on math + logicVisionMultimodal
o3-mini	OpenAI	Specialty	200K	$1.10	$4.40	Cheaper o-series reasoning sibling
o3-pro	OpenAI	Specialty	200K	$20.00	$80.00	o3 Pro — extended reasoning for hardest problemsVisionMultimodal
o4-mini	OpenAI	Specialty	200K	$1.10	$4.40	Reasoning-optimized — strong at math, logic, code synthesis
omni-moderation-latest	OpenAI	Specialty	32K	Free	Free	Free content safety classifier — used by built-in guardrails Free from OpenAI; Smoo passes through at cost ModerationSafety
gemini-3.5-flash	Google	Smart	1M	$1.50	$9.00	Current Google flagship Flash — GA 2026-05-19, 76.2% Terminal-Bench 2.1
gemini-2.5-pro	Google	Frontier	1M	$1.25	$10.00	Frontier reasoning with 1M context; great for large-doc analysis
gemini-2.5-flash	Google	Smart	1M	$0.30	$2.50	Best tool-use-per-dollar (BFCL v3 leader in its price band) Smoo AI default smart model
gemini-2.5-flash-lite	Google	Fast	1M	$0.10	$0.40	Very cheap, 1M context, fast first-token
gemini-2.0-flash	Google	Fast	1M	$0.10	$0.40	Stable 2.0 family — kept for pinned prompts
gemini-3-flash-preview	Google	Smart	1M	—	—	Next-gen Flash preview — 3/3 PASS on CS escalation E2E Preview pricing not yet published
gemini-3.1-flash-lite	Google	Fast	1M	—	—	GA 3.1 Flash-Lite — 2.1s TTFT latency champion, voice-pipeline-tier Pricing not yet posted in our catalog; check the dashboard for live rate
gemini-3.1-flash-lite-preview	Google	Fast	1M	—	—	Preview alias retained for backwards-compat with pinned callers Use gemini-3.1-flash-lite (GA) for new code
gemini-3-pro-preview	Google	Frontier	1M	—	—	Next-gen Pro preview Preview pricing not yet published
gemini-3.1-pro-preview	Google	Frontier	1M	—	—	Next-gen Pro refresh preview Preview pricing not yet published GPQA 94.3%
groq-gpt-oss-120b	Groq	Smart	128K	$0.15	$0.60	OpenAI OSS 120B — smart/judge tier on Groq (GA) No parallel tool calls; SMOODEV-446 flagged malformed tool-call JSON — single-turn validated. For agentic / parallel-tool flows use groq-qwen3.6-27b
groq-gpt-oss-20b	Groq	Fast	128K	$0.10	$0.30	OpenAI OSS 20B — fast / voice / eval tier on Groq (GA) reasoning_effort: low set in the gateway (sub-second TTFT for voice); single-turn validated, not for multi-turn tool loops
groq-qwen3.6-27b	Groq	Smart	128K	$0.80	$3.00	Qwen3.6 27B — tool-calling / agentic tier on Groq; supports parallel tool calls (gpt-oss cannot) PREVIEW status (no SLA, may be discontinued without notice); dedicated tool-calling route, not a default chat tier Agentic
groq-gpt-oss-safeguard-20b	Groq	Specialty	128K	$0.10	$0.30	Safety-tuned open-weight GPT — content moderation tasks
deepseek-v4-flash	DeepSeek	Smart	1M	$0.14	$0.28	1M context, dual Thinking/Non-Thinking modesCodingReasoningAgenticSWE-bench 79%
deepseek-v4-pro	DeepSeek	Smart	1M	$0.43	$0.87	Pro-tier V4 reasoner — 75% intro discount through 2026-05-31 List price $1.74/$3.48 per 1M; refresh when intro ends CodingReasoningAgenticSWE-bench 79.4%AA Index 52
deepseek-chat	DeepSeek	Smart	1M	$0.14	$0.28	Legacy alias — routes to deepseek-v4-flash (retiring 2026-07-24)CodingReasoning
deepseek-reasoner	DeepSeek	Smart	1M	$0.43	$0.87	Legacy alias — routes to deepseek-v4-pro (retiring 2026-07-24)ReasoningThinking
qwen-3.7-max-direct	Alibaba DashScope	Frontier	1M	$2.50	$7.50	Current Qwen flagship — agent-first, native thinking, 200 tok/s, SWE-Pro + Terminal-Bench tier winner (GA 2026-05-20) 90% cache-hit discount ($0.25/M); accepts both OpenAI ChatCompletions and Anthropic Messages format ReasoningAgenticSWE-bench 80.4%
qwen-3.6-plus-direct	Alibaba DashScope	Smart	1M	$0.33	$1.95	Qwen 3.6 Plus generalist — GA 2026-04-02, 1M context
qwen3-coder-flash-direct	Alibaba DashScope	Smart	1M	$0.30	$1.50	Bench-winning coder — 16/16 aider-polyglot PASSCodingAgenticTools
qwen3-coder-plus-direct	Alibaba DashScope	Smart	1M	$1.00	$5.00	PR-review tuned coder for large diffsCodingAgenticTools
kimi-k2.6-direct	Moonshot	Smart	262K	$0.95	$4.00	Current Kimi flagship — ties GPT-5.5 on SWE-Bench Pro, GA 2026-04-20CodingReasoningAgenticSWE-bench 80.2%AA Index 54
kimi-k2-thinking-direct	Moonshot	Smart	256K	$0.60	$2.50	Deepest reasoner in the Kimi lineCodingReasoningThinkingAgentic
kimi-k2.5-direct	Moonshot	Smart	256K	$0.60	$2.50	Flagship general-purpose Kimi via Moonshot directCodingReasoningAgentic
glm-5.1-direct	Z.ai	Smart	200K	$0.60	$2.20	58.4% SWE-Pro — coder-forward (GA 2026-04-07)CodingReasoningAgenticAA Index 51
glm-5-direct	Z.ai	Smart	200K	$0.60	$1.92	Faster GLM (78 tok/s vs 5.1's 54) — GA 2026-02-11CodingReasoningAgentic
minimax-m2-direct	MiniMax	Smart	200K	$0.30	$1.20	Frontier-class reviewer at $0.30 inputCodingAgenticReview
minimax-m2.7-direct	MiniMax	Smart	200K	$0.30	$1.20	Current MiniMax flagship — same price as M2CodingAgenticReviewSWE-bench 78%
minimax-m2.7-highspeed-direct	MiniMax	Smart	200K	$0.60	$2.40	Throughput-optimized M2.7 — ~100 tpsCodingAgenticReview
mimo-v2.5-pro	Xiaomi MiMo	Smart	1M	$0.20	$0.80	Flagship 1M-context reasoning + coding at a fraction of frontier pricingVisionMultimodal
mimo-v2.5	Xiaomi MiMo	Smart	1M	$0.080	$0.80	Cheaper 1M-context generalistVisionMultimodal
mimo-v2-pro	Xiaomi MiMo	Smart	1M	$0.20	$0.80	Prior flagship — auto-routes to V2.5 from 2026-06-01
mimo-v2-flash	Xiaomi MiMo	Fast	256K	$0.010	$0.80	Ultra-cheap 256K flash tier ($0.01/M input)
mimo-v2-omni	Xiaomi MiMo	Smart	—	—	—	Multimodal (text + vision + audio) — pricing TBD
deepseek-v3.2	DeepSeek (via aggregator)	Smart	128K	$0.27	$1.10	Aggregator-routed V3.2 — emergency failover onlyCodingReasoning
deepseek-r1	DeepSeek (via aggregator)	Frontier	64K	$0.55	$2.19	Aggregator-routed R1 reasoner — emergency failover onlyReasoningThinking
glm-5.1	Z.ai (via aggregator)	Smart	128K	$0.60	$2.20	Aggregator-routed GLM 5.1 — emergency failover onlyCodingReasoningAgenticAA Index 51
minimax-m2.7	MiniMax (via aggregator)	Smart	200K	$0.30	$1.20	Aggregator-routed M2.7 — emergency failover onlyCodingAgenticReviewSWE-bench 78%
minimax-m2.5	MiniMax (via aggregator)	Smart	200K	$0.30	$1.20	Aggregator-routed M2.5 — emergency failover onlyCodingAgenticReview
kimi-k2.5	Moonshot (via aggregator)	Smart	256K	$0.60	$2.50	Aggregator-routed K2.5 — emergency failover onlyCodingReasoningAgentic
kimi-k2-thinking	Moonshot (via aggregator)	Smart	256K	$0.60	$2.50	Aggregator-routed K2-Thinking — emergency failover onlyCodingReasoningThinkingAgentic
qwen3-coder-plus	Alibaba (via aggregator)	Smart	1M	$1.00	$5.00	Aggregator-routed Coder-Plus — emergency failover onlyCodingAgenticTools
qwen3-coder-flash	Alibaba (via aggregator)	Smart	1M	$0.30	$1.50	Aggregator-routed Coder-Flash — emergency failover onlyCodingAgenticTools
text-embedding-3-small	OpenAI	Embedding	8K	$0.020	—	1536-dim embeddings — Smoo AI default for knowledge base ingestionEmbeddings
text-embedding-3-large	OpenAI	Embedding	8K	$0.13	—	3072-dim embeddings — higher retrieval quality for specialist corporaEmbeddings
gemini-embedding-001	Google	Embedding	8K	$0.15	—	3072-dim Gemini embeddings — strong on multilingual + codeEmbeddings
gemini-embedding-002	Google	Embedding	8K	—	—	First natively multimodal embedding — text + images + video + audio in one space, Matryoshka 3072→1536→768 dims (GA 2026-04-23) Strict upgrade over -001 for new RAG surfaces; keep -001 for index compatibility Embeddings

Prices refresh as upstream labs publish changes — your dashboard shows live rates and the effective rate after your plan's tier allowance. Overage is billed per your subscription tier.

Why route through Smoo AI

Unified billing

One invoice across every lab. Tier-based token allowances, per-org metering, Stripe-synced overage.

Org-scoped virtual keys

Each organization gets its own key with optional model allowlist and budget cap. Rotate from the dashboard — no downtime.

Cross-lab fallback chains

Every model has a typed fallback chain spanning at least three labs. A Vertex 503, an Anthropic 429, or a DeepSeek timeout degrades to the next provider before surfacing an error.

Drop-in compatibility

OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK — anything that takes a base URL works unchanged.

Streaming, tool use, JSON mode

Everything the upstream model supports passes through untouched, plus kwargs the OpenAI shape does not cover.

Live OpenAPI spec

Full interactive reference below. The spec tracks upstream provider capabilities in real time.

Interactive API reference

The interactive spec is temporarily unavailable. Try again shortly, or browse models above.