For Business

Stop guessing your LLM bill

Per-org budgets, real-time spend dashboards, audit-friendly logs. One invoice across nine labs — OpenAI, Anthropic, Google, Groq, DeepSeek, Qwen, Kimi, GLM, MiniMax. Compliance-friendly schema isolation from day one.

Start free Talk to sales

Spend visibility, not after-the-fact

Every request lands in a per-org spend log within seconds. Set budgets, watch trend lines, get warned before a single team's experiment turns into next quarter's surprise.

Monthly budget caps per org

Hard cap or soft alert. Hits the limit, requests start returning 429 — not a sticker-shock invoice.

Per-model breakdown

Which model burned how much, by team, by day. Catch a runaway prompt before it eats your AI budget.

Single bill, every model

No more aggregating five vendor invoices. We bill once monthly through Stripe — pass it to finance unchanged.

Audit-friendly logs

Every prompt + response + cost is captured. SOC 2 evidence collection is one CSV export away.

Pick a tier — or mix-and-match

Three preset tiers, six slots, one bill

Pick a preset to anchor your routing — or swap individual slots row-by-row. Smoo-hosted slots route to our open-weight stack on Groq (cheaper, our own credentials). External slots hit a frontier lab directly.

Slot	Power Top-of-line for production agent work Bills add up, but you don't second-guess a result. Lean on Anthropic + DeepSeek V4 + the strongest direct-lab coders. Cost band: highest	Balanced Mid-range mix — capable + predictable bills Qwen3-Coder-Flash for coding, DeepSeek V4-Flash for reasoning, Gemini for context. Mirrors the live smooth-* defaults — most teams should start here. Cost band: mid	Lean Cheapest sustainable mix — own-provider first Smoo-hosted open-weight on Groq + Gemini Flash-Lite for everything we can. DeepSeek V4-Flash for the reasoning slot. Cost band: lowest
`smooth-coding`	Kimi K2-ThinkingExternal Moonshot $0.60 / $2.50 per MTok	Qwen3-Coder-FlashExternal DashScope $0.30 / $1.50 per MTok	MiniMax M2External MiniMax $0.30 / $1.20 per MTok
`smooth-reasoning`	DeepSeek V4-ProExternal DeepSeek $0.43 / $0.87 per MTok	DeepSeek V4-FlashExternal DeepSeek $0.14 / $0.28 per MTok	DeepSeek V4-FlashExternal DeepSeek $0.14 / $0.28 per MTok
`smooth-reviewing`	GLM 5.1External Z.ai $0.60 / $2.20 per MTok	MiniMax M2External MiniMax $0.30 / $1.20 per MTok	Qwen3-Coder-FlashExternal DashScope $0.30 / $1.50 per MTok
`smooth-judge`	Claude Haiku 4.5External Anthropic $1.00 / $5.00 per MTok	Gemini 2.5 FlashExternal Google $0.30 / $2.50 per MTok	Llama 3.1 8BSmoo-hosted Groq (Smoo-hosted) $0.050 / $0.080 per MTok
`smooth-summarize`	Gemini 2.5 FlashExternal Google $0.30 / $2.50 per MTok	Gemini 2.5 FlashExternal Google $0.30 / $2.50 per MTok	Gemini 2.5 Flash-LiteExternal Google $0.10 / $0.40 per MTok
`smooth-fast`	Claude Haiku 4.5External Anthropic $1.00 / $5.00 per MTok	Gemini 2.5 Flash-LiteExternal Google $0.10 / $0.40 per MTok	Gemini 2.5 Flash-LiteExternal Google $0.10 / $0.40 per MTok

Power

Production agents. Don't second-guess a result. Anthropic Haiku judges every action; DeepSeek does the deep thinking; Kimi K2-Thinking writes the code.

Balanced

Most teams should start here. Direct-lab coders for the heavy lifts, Smoo-hosted Groq for fast utility, Gemini Flash for the long context.

Lean

Cheapest sustainable mix. Smoo-hosted open-weight on Groq plus Gemini Flash-Lite carry the utility slots; only the coding workhorse calls externally.

Mix-and-match: every cell above corresponds to one routing slot. Set a different concrete model per slot in your ~/.smooth/providers.json or via the gateway dashboard. Cost figures are list price per million tokens (input / output) — actual spend depends on your usage profile.

Full model catalog

Every frontier model, one invoice

Eight direct providers — no aggregator markup on primary routes. Bring the model your team already knows, or opt into our smooth-* semantic slots for benchmark-picked defaults at 5–25× frontier cost savings.

Smooth-optimized slot picks

Our 6 semantic routing aliases — primary + two automatic fallbacks, picked from public 2026 benchmarks.

Last refreshed 2026-05-29

Slot	Used for	Primary	Fallback 1	Fallback 2	Frontier delta
smooth-coding	Coding workhorse — outer loop for every code-touching call	Qwen3-Coder-Flash DashScope	GLM 5.1 Z.ai	Kimi K2-Thinking Moonshot	~5× cheaper than the prior Kimi primary 16/16 PASS on aider-polyglot multi-role bench
smooth-reasoning	Deep reasoning — plan / think / research flows	DeepSeek V4-Flash DeepSeek	Kimi K2-Thinking Moonshot	Qwen3-235B-Thinking-2507 DashScope	~100× cheaper than Opus on input 1M context, dual Thinking/Non-Thinking modes
smooth-reviewing	Adversarial critique — code review with a different lab	MiniMax M2 MiniMax	Qwen3-Coder-Plus DashScope	GLM 5.1 Z.ai	~10× cheaper than Opus Different lab from coder — catches blind spots
smooth-judge	Safety + intent verdicts — prompt injection, content moderation	Llama 3.1 8B Instant Groq	Gemini 2.5 Flash Google	Claude Haiku 4.5 Anthropic	~10× cheaper than the prior Gemini Flash primary Sub-300ms first-token — matches accuracy on 1-line JSON verdicts
smooth-summarize	Transcript compression — long agent run context management	Gemini 2.5 Flash Google	Qwen3-Coder-Plus DashScope	GPT-5 mini OpenAI	~10× cheaper than Sonnet @ 1M 1M context, IFEval leader in cheap tier
smooth-planning	Mapper / structured plan — read-only mapping work	Gemini 2.5 Flash Google	DeepSeek V4-Flash DeepSeek	Qwen3-Coder-Plus DashScope	~50× cheaper than Opus Bench-winning mapper — long context + structured output
smooth-fast	Utility tier — session titles, 3–5 word labels, autocomplete, voice fast-router	Llama 3.1 8B Instant Groq	Gemini 2.5 Flash-Lite Google	Claude Haiku 4.5 Anthropic	~10× cheaper than the prior Gemini Flash Lite primary Sub-300ms first-token, P95 ~600ms end-to-end on prod

Full catalog by provider

Every model each of our 8 direct providers publishes, routed through llm.smoo.ai. Prices are per million tokens in USD.

OpenAI

Workhorse utility tier + frontier GPT-5 when you need it

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
GPT-5.5 Pro Frontier reasoning tier — GA 2026-04-24	gpt-5.5-pro	$30.00	$180.00	400K	Frontier reasoning tier — GA 2026-04-24
GPT-5.5 Current OpenAI flagship — GA 2026-04-24	gpt-5.5	$5.00	$30.00	400K	Current OpenAI flagship — GA 2026-04-24
GPT-5.4 Mid-frontier 5.4 — between gpt-5 and gpt-5.5	gpt-5.4	$2.50	$15.00	400K	Mid-frontier 5.4 — between gpt-5 and gpt-5.5
GPT-5.4 mini Cheaper smart-tier 5.4 sibling	gpt-5.4-mini	$0.75	$4.50	400K	Cheaper smart-tier 5.4 sibling
GPT-5.4 nano Ultra-cheap 5.4 nano	gpt-5.4-nano	$0.20	$1.25	400K	Ultra-cheap 5.4 nano
GPT-5.4 Pro GPT-5.4 Pro reasoning tier	gpt-5.4-pro	$30.00	$180.00	400K	GPT-5.4 Pro reasoning tier
GPT-5.2 GPT-5.2 — mid-tier between 5.1 and 5.4	gpt-5.2	$1.75	$14.00	400K	GPT-5.2 — mid-tier between 5.1 and 5.4
GPT-5.2 Codex GPT-5.2 with Codex-coding training	gpt-5.2-codex	$1.75	$14.00	400K	GPT-5.2 with Codex-coding training
GPT-5.2 Pro GPT-5.2 Pro reasoning tier	gpt-5.2-pro	$21.00	$168.00	400K	GPT-5.2 Pro reasoning tier
GPT-5.1 GPT-5.1 — refined GPT-5 family entry	gpt-5.1	$1.25	$10.00	400K	GPT-5.1 — refined GPT-5 family entry
GPT-5.1 Codex GPT-5.1 with Codex-coding training	gpt-5.1-codex	$1.25	$10.00	400K	GPT-5.1 with Codex-coding training
GPT-5 Frontier reasoning + tool use	gpt-5	$2.50	$10.00	400K	Frontier reasoning + tool use
GPT-5 Pro GPT-5 Pro reasoning tier	gpt-5-pro	$15.00	$120.00	400K	GPT-5 Pro reasoning tier
GPT-5 Codex GPT-5 with Codex-coding training	gpt-5-codex	$1.25	$10.00	400K	GPT-5 with Codex-coding training
GPT-5 miniSmooth fallback Cheap moderation + multi-slot fallback	gpt-5-mini	$0.50	$2.00	400K	Cheap moderation + multi-slot fallback
GPT-5 nano Cheapest GPT-5 variant	gpt-5-nano	$0.20	$1.25	400K	Cheapest GPT-5 variant
GPT-4.1 1M-context instruction following	gpt-4.1	$2.00	$8.00	1M	1M-context instruction following
GPT-4.1 mini Cheap long-context extraction	gpt-4.1-mini	$0.40	$1.60	1M	Cheap long-context extraction
GPT-4.1 nano Cost-floor 1M-context classification	gpt-4.1-nano	$0.10	$0.40	1M	Cost-floor 1M-context classification
GPT-4o Multimodal (vision + audio) legacy	gpt-4o	$2.50	$10.00	128K	Multimodal (vision + audio) legacy
GPT-4o mini Multimodal at near-embedding cost	gpt-4o-mini	$0.15	$0.60	128K	Multimodal at near-embedding cost
o4-mini Cheap reasoning-trace-visible o-series	o4-mini	$1.10	$4.40	200K	Cheap reasoning-trace-visible o-series
o3 o-series reasoning — visible chain-of-thought	o3	$2.00	$8.00	200K	o-series reasoning — visible chain-of-thought
o3-mini Cheaper o3 sibling	o3-mini	$1.10	$4.40	200K	Cheaper o3 sibling
o3-pro o3 Pro — extended reasoning tier	o3-pro	$20.00	$80.00	200K	o3 Pro — extended reasoning tier

Anthropic

Safety-tuned reasoning — strict refusal lineage + prompt caching

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
Claude Opus 4.8 Current Anthropic flagship — 1M ctx, ~4× less likely to slip flaws than 4.7 (GA 2026-05-28)	claude-opus-4-8	$5.00	$25.00	1M	Current Anthropic flagship — 1M ctx, ~4× less likely to slip flaws than 4.7 (GA 2026-05-28)
Claude Opus 4.7 Prior flagship — SWE-bench Pro 64.3%, GA 2026-04-16	claude-opus-4-7	$5.00	$25.00	200K	Prior flagship — SWE-bench Pro 64.3%, GA 2026-04-16
Claude Opus 4.6 Prior frontier — kept for pinned prompts	claude-opus-4-6	$15.00	$75.00	200K	Prior frontier — kept for pinned prompts
Claude Sonnet 4.6 1M-context balanced workhorse	claude-sonnet-4-6	$3.00	$15.00	1M	1M-context balanced workhorse
Claude Sonnet 4.5 Stable predecessor Sonnet	claude-sonnet-4-5	$3.00	$15.00	200K	Stable predecessor Sonnet
Claude Haiku 4.5Smooth fallback 597ms TTFT — safety-tuned utility tier	claude-haiku-4-5	$1.00	$5.00	200K	597ms TTFT — safety-tuned utility tier

Google

1M-context recall + dialable thinking budgets

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
Gemini 3.5 Flash Current Google flagship Flash — GA 2026-05-19, 76.2% Terminal-Bench 2.1	gemini-3.5-flash	$1.50	$9.00	1M	Current Google flagship Flash — GA 2026-05-19, 76.2% Terminal-Bench 2.1
Gemini 2.5 Pro Frontier reasoning + 1M multimodal	gemini-2.5-pro	$1.25	$10.00	1M	Frontier reasoning + 1M multimodal
Gemini 2.5 FlashSmooth default IFEval leader in cheap Gemini tier — smooth-summarize + smooth-planning primary	gemini-2.5-flash	$0.30	$2.50	1M	IFEval leader in cheap Gemini tier — smooth-summarize + smooth-planning primary
Gemini 2.5 Flash-LiteSmooth fallback Cost-floor utility tier — smooth-fast fallback (Google)	gemini-2.5-flash-lite	$0.10	$0.40	1M	Cost-floor utility tier — smooth-fast fallback (Google)
Gemini 2.0 Flash Legacy low-latency tier	gemini-2.0-flash	$0.10	$0.40	1M	Legacy low-latency tier
Gemini 2.0 Flash-Lite Ultra-cheap 2.x utility-tier sibling	gemini-2.0-flash-lite	—	—	1M	Ultra-cheap 2.x utility-tier sibling
Gemini 3 Flash (preview) Next-gen Flash — 3/3 PASS on CS escalation E2E	gemini-3-flash-preview	—	—	1M	Next-gen Flash — 3/3 PASS on CS escalation E2E
Gemini 3.1 Flash-Lite GA Flash-Lite — 2.1s TTFT latency champion, voice-pipeline candidate (promoted from preview 2026-05-29)	gemini-3.1-flash-lite	—	—	1M	GA Flash-Lite — 2.1s TTFT latency champion, voice-pipeline candidate (promoted from preview 2026-05-29)
Gemini 3.1 Flash-Lite (preview) Preview alias retained for pinned callers — use stable above for new code	gemini-3.1-flash-lite-preview	—	—	1M	Preview alias retained for pinned callers — use stable above for new code
Gemini 3 Pro (preview) Next-gen Pro — preview pricing TBD	gemini-3-pro-preview	—	—	1M	Next-gen Pro — preview pricing TBD
Gemini 3.1 Pro (preview) Next-gen Pro refresh — preview pricing TBD	gemini-3.1-pro-preview	—	—	1M	Next-gen Pro refresh — preview pricing TBD
gemini-flash-latest (alias) Always-current stable Flash	gemini-flash-latest	—	—	1M	Always-current stable Flash
gemini-flash-lite-latest (alias) Always-current stable Flash-Lite	gemini-flash-lite-latest	—	—	1M	Always-current stable Flash-Lite
gemini-pro-latest (alias) Always-current stable Pro	gemini-pro-latest	—	—	1M	Always-current stable Pro
Gemini Embedding 002 First natively multimodal embedding — text + images + video + audio, Matryoshka 3072/1536/768 dims (GA 2026-04-23)	gemini-embedding-002	—	—	8K	First natively multimodal embedding — text + images + video + audio, Matryoshka 3072/1536/768 dims (GA 2026-04-23)
Gemini Embedding 001 3072-dim text embeddings — kept for index compatibility	gemini-embedding-001	$0.15	—	8K	3072-dim text embeddings — kept for index compatibility

Groq

Open-weight models on sub-second LPU inference

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
Llama 3.3 70B Versatile Sub-second 70B for voice + balanced tasks	groq-llama-3.3-70b	$0.59	$0.79	128K	Sub-second 70B for voice + balanced tasks
Llama 3.1 8B InstantSmooth default Sub-300ms first-token — smooth-fast + smooth-judge primary, voice fast-router	groq-llama-3.1-8b	$0.05	$0.08	128K	Sub-300ms first-token — smooth-fast + smooth-judge primary, voice fast-router
Llama 4 Scout 17B 10M-context cheap retrieval	groq-llama-4-scout	$0.11	$0.34	10M	10M-context cheap retrieval
Llama 4 Maverick 17B Frontier-class open-weight at speed	groq-llama-4-maverick	$0.50	$0.77	1M	Frontier-class open-weight at speed
Kimi K2 (Groq host) Kimi K2 with Groq latency profile	groq-kimi-k2	$1.00	$3.00	256K	Kimi K2 with Groq latency profile
GPT-OSS 120B Open-weight GPT — single-turn only	groq-gpt-oss-120b	$0.15	$0.60	128K	Open-weight GPT — single-turn only
GPT-OSS 20B Cheap open-weight GPT — single-turn only	groq-gpt-oss-20b	$0.10	$0.30	128K	Cheap open-weight GPT — single-turn only
GPT-OSS Safeguard 20B Safety-tuned open-weight GPT	groq-gpt-oss-safeguard-20b	$0.10	$0.30	128K	Safety-tuned open-weight GPT
Groq Compound (agentic) Multi-tool agentic system — web search + code exec built in	groq-compound	—	—	128K	Multi-tool agentic system — web search + code exec built in
Groq Compound Mini (agentic) Single-tool agentic — ~3× lower latency than full Compound	groq-compound-mini	—	—	128K	Single-tool agentic — ~3× lower latency than full Compound

DeepSeek

Frontier reasoning at rock-bottom per-token cost

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
DeepSeek V4-FlashSmooth default 1M context, dual Thinking/Non-Thinking modes	deepseek-v4-flash	$0.14	$0.28	1M	1M context, dual Thinking/Non-Thinking modes
DeepSeek V4-Pro Pro-tier V4 — 75% intro discount through 2026-05-31	deepseek-v4-pro	$0.43	$0.87	1M	Pro-tier V4 — 75% intro discount through 2026-05-31
deepseek-chat (legacy alias → V4-Flash) Legacy alias — routes to V4-Flash; retiring 2026-07-24	deepseek-chat	$0.14	$0.28	1M	Legacy alias — routes to V4-Flash; retiring 2026-07-24
deepseek-reasoner (legacy alias → V4-Pro) Legacy alias — routes to V4-Pro; retiring 2026-07-24	deepseek-reasoner	$0.43	$0.87	1M	Legacy alias — routes to V4-Pro; retiring 2026-07-24
DeepSeek V3.2 (aggregator) Aggregator-routed V3.2 — emergency failover only	deepseek-v3.2	$0.27	$1.10	128K	Aggregator-routed V3.2 — emergency failover only
DeepSeek R1 (aggregator) Aggregator-routed R1 reasoner — emergency failover only	deepseek-r1	$0.55	$2.19	64K	Aggregator-routed R1 reasoner — emergency failover only

Moonshot

Kimi family — purpose-trained for agentic loops

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
Kimi K2.6 Current Kimi flagship — ties GPT-5.5 on SWE-Bench Pro, GA 2026-04-20	kimi-k2.6-direct	$0.95	$4.00	262K	Current Kimi flagship — ties GPT-5.5 on SWE-Bench Pro, GA 2026-04-20
Kimi K2-ThinkingSmooth fallback Deepest reasoner in the Kimi line	kimi-k2-thinking-direct	$0.60	$2.50	256K	Deepest reasoner in the Kimi line
Kimi K2.5 Prior general-purpose Kimi	kimi-k2.5-direct	$0.60	$2.50	256K	Prior general-purpose Kimi
Kimi K2-Thinking (aggregator) Aggregator-routed K2-Thinking — emergency failover only	kimi-k2-thinking	$0.60	$2.50	256K	Aggregator-routed K2-Thinking — emergency failover only
Kimi K2.5 (aggregator) Aggregator-routed K2.5 — emergency failover only	kimi-k2.5	$0.60	$2.50	256K	Aggregator-routed K2.5 — emergency failover only

Alibaba DashScope

Qwen family — 1M context at aggressive pricing

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
Qwen 3.7 Max Current Qwen flagship — agent-first, 200 tok/s, SWE-Pro/Terminal-Bench tier winner (GA 2026-05-20)	qwen-3.7-max-direct	$2.50	$7.50	1M	Current Qwen flagship — agent-first, 200 tok/s, SWE-Pro/Terminal-Bench tier winner (GA 2026-05-20)
Qwen 3.6 Plus Generalist Qwen flagship — GA 2026-04-02, 1M context	qwen-3.6-plus-direct	$0.33	$1.95	1M	Generalist Qwen flagship — GA 2026-04-02, 1M context
Qwen3-Coder-FlashSmooth default Bench-winning coder — 16/16 aider-polyglot PASS	qwen3-coder-flash-direct	$0.30	$1.50	1M	Bench-winning coder — 16/16 aider-polyglot PASS
Qwen3-Coder-PlusSmooth fallback PR-review tuned for large diffs	qwen3-coder-plus-direct	$1.00	$5.00	1M	PR-review tuned for large diffs
Qwen3-235B-Thinking-2507Smooth fallback Cheapest thinking-mode reasoning	qwen3-235b-a22b-thinking-2507	$0.13	$0.60	262K	Cheapest thinking-mode reasoning
Qwen3-Coder-Plus (aggregator) Aggregator-routed Coder-Plus — emergency failover only	qwen3-coder-plus	$1.00	$5.00	1M	Aggregator-routed Coder-Plus — emergency failover only
Qwen3-Coder-Flash (aggregator) Aggregator-routed Coder-Flash — emergency failover only	qwen3-coder-flash	$0.30	$1.50	1M	Aggregator-routed Coder-Flash — emergency failover only

Z.ai (Zhipu)

GLM family — SOTA on SWE-bench Pro

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
GLM 5.1Smooth fallback 58.4% SWE-Pro — coder-forward (GA 2026-04-07)	glm-5.1-direct	$0.60	$2.20	200K	58.4% SWE-Pro — coder-forward (GA 2026-04-07)
GLM 5 Faster GLM (78 tok/s vs 5.1's 54) — GA 2026-02-11	glm-5-direct	$0.60	$1.92	200K	Faster GLM (78 tok/s vs 5.1's 54) — GA 2026-02-11
GLM 5.1 (aggregator) Aggregator-routed GLM 5.1 — emergency failover only	glm-5.1	$0.60	$2.20	128K	Aggregator-routed GLM 5.1 — emergency failover only

MiniMax

Cheapest frontier-class coder + reviewer

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
MiniMax M2.7 Current MiniMax flagship — same price as M2	minimax-m2.7-direct	$0.30	$1.20	200K	Current MiniMax flagship — same price as M2
MiniMax M2.7 Highspeed ~100 tps throughput-optimized variant	minimax-m2.7-highspeed-direct	$0.60	$2.40	200K	~100 tps throughput-optimized variant
MiniMax M2Smooth default Frontier-class reviewer at $0.30 input	minimax-m2-direct	$0.30	$1.20	200K	Frontier-class reviewer at $0.30 input
MiniMax M2.7 (aggregator) Aggregator-routed M2.7 — emergency failover only	minimax-m2.7	$0.30	$1.20	200K	Aggregator-routed M2.7 — emergency failover only
MiniMax M2.5 (aggregator) Aggregator-routed M2.5 — emergency failover only	minimax-m2.5	$0.30	$1.20	200K	Aggregator-routed M2.5 — emergency failover only

ElevenLabs

TTS for the voice pipeline

Pricing source ↗

Model	ID	Input / MTok	Output / MTok	Context	Strength
Eleven Multilingual v2 High-fidelity multilingual TTS	elevenlabs-tts	—	—	TTS	High-fidelity multilingual TTS

Pricing is indicative — vendors change rates without notice. Routing + fallbacks auto-adjust against each provider's live pricing via LiteLLM. See API reference for the authoritative model list your key can hit right now.

Built for compliance

LLM data lives in an isolated database schema, invisible to the public REST API. RLS policies enforce that org members only see their org's keys + spend. No data leakage paths to design around.

Schema-isolated by design

Spend logs and key metadata live in a separate Postgres schema not exposed to PostgREST. The anon API key cannot reach LLM data — at all.

Org-scoped access

Row-level security: every read of a key or spend log enforces "your user is a member of this org" at the database layer. Defense in depth.

Ready to put a number on your AI spend?

Free to provision a key and watch your dashboard. Bring your own usage — pay only for what you call.

Get started