For Business

Stop guessing your LLM bill

Per-org budgets, real-time spend dashboards, audit-friendly logs. One invoice across nine labs — OpenAI, Anthropic, Google, Groq, DeepSeek, Qwen, Kimi, GLM, MiniMax. Compliance-friendly schema isolation from day one.

Spend visibility, not after-the-fact

Every request lands in a per-org spend log within seconds. Set budgets, watch trend lines, get warned before a single team's experiment turns into next quarter's surprise.

Monthly budget caps per org

Hard cap or soft alert. Hits the limit, requests start returning 429 — not a sticker-shock invoice.

Per-model breakdown

Which model burned how much, by team, by day. Catch a runaway prompt before it eats your AI budget.

Single bill, every model

No more aggregating five vendor invoices. We bill once monthly through Stripe — pass it to finance unchanged.

Audit-friendly logs

Every prompt + response + cost is captured. SOC 2 evidence collection is one CSV export away.

Pick a tier — or mix-and-match

Three preset tiers, six slots, one bill

Pick a preset to anchor your routing — or swap individual slots row-by-row. Smoo-hosted slots route to our open-weight stack on Groq (cheaper, our own credentials). External slots hit a frontier lab directly.

Slot
Power
Top-of-line for production agent work
Bills add up, but you don't second-guess a result. Lean on Anthropic + DeepSeek V4 + the strongest direct-lab coders.
Cost band: highest
Balanced
Mid-range mix — capable + predictable bills
Qwen3-Coder-Flash for coding, DeepSeek V4-Flash for reasoning, Gemini for context. Mirrors the live smooth-* defaults — most teams should start here.
Cost band: mid
Lean
Cheapest sustainable mix — own-provider first
Smoo-hosted open-weight on Groq + Gemini Flash-Lite for everything we can. DeepSeek V4-Flash for the reasoning slot.
Cost band: lowest
smooth-coding
Kimi K2-ThinkingExternal
Moonshot
$0.60 / $2.50 per MTok
Qwen3-Coder-FlashExternal
DashScope
$0.30 / $1.50 per MTok
MiniMax M2External
MiniMax
$0.30 / $1.20 per MTok
smooth-reasoning
DeepSeek V4-ProExternal
DeepSeek
$0.43 / $0.87 per MTok
DeepSeek V4-FlashExternal
DeepSeek
$0.14 / $0.28 per MTok
DeepSeek V4-FlashExternal
DeepSeek
$0.14 / $0.28 per MTok
smooth-reviewing
GLM 5.1External
Z.ai
$0.60 / $2.20 per MTok
MiniMax M2External
MiniMax
$0.30 / $1.20 per MTok
Qwen3-Coder-FlashExternal
DashScope
$0.30 / $1.50 per MTok
smooth-judge
Claude Haiku 4.5External
Anthropic
$1.00 / $5.00 per MTok
Gemini 2.5 FlashExternal
Google
$0.30 / $2.50 per MTok
Llama 3.1 8BSmoo-hosted
Groq (Smoo-hosted)
$0.050 / $0.080 per MTok
smooth-summarize
Gemini 2.5 FlashExternal
Google
$0.30 / $2.50 per MTok
Gemini 2.5 FlashExternal
Google
$0.30 / $2.50 per MTok
Gemini 2.5 Flash-LiteExternal
Google
$0.10 / $0.40 per MTok
smooth-fast
Claude Haiku 4.5External
Anthropic
$1.00 / $5.00 per MTok
Gemini 2.5 Flash-LiteExternal
Google
$0.10 / $0.40 per MTok
Gemini 2.5 Flash-LiteExternal
Google
$0.10 / $0.40 per MTok

Power

Production agents. Don't second-guess a result. Anthropic Haiku judges every action; DeepSeek does the deep thinking; Kimi K2-Thinking writes the code.

Balanced

Most teams should start here. Direct-lab coders for the heavy lifts, Smoo-hosted Groq for fast utility, Gemini Flash for the long context.

Lean

Cheapest sustainable mix. Smoo-hosted open-weight on Groq plus Gemini Flash-Lite carry the utility slots; only the coding workhorse calls externally.

Mix-and-match: every cell above corresponds to one routing slot. Set a different concrete model per slot in your ~/.smooth/providers.json or via the gateway dashboard. Cost figures are list price per million tokens (input / output) — actual spend depends on your usage profile.

Full model catalog

Every frontier model, one invoice

Eight direct providers — no aggregator markup on primary routes. Bring the model your team already knows, or opt into our smooth-* semantic slots for benchmark-picked defaults at 5–25× frontier cost savings.

Smooth-optimized slot picks

Our 6 semantic routing aliases — primary + two automatic fallbacks, picked from public 2026 benchmarks.

Last refreshed 2026-05-29
SlotUsed forPrimaryFallback 1Fallback 2Frontier delta
smooth-codingCoding workhorse — outer loop for every code-touching call
Qwen3-Coder-Flash
DashScope
GLM 5.1
Z.ai
Kimi K2-Thinking
Moonshot
~5× cheaper than the prior Kimi primary
16/16 PASS on aider-polyglot multi-role bench
smooth-reasoningDeep reasoning — plan / think / research flows
DeepSeek V4-Flash
DeepSeek
Kimi K2-Thinking
Moonshot
Qwen3-235B-Thinking-2507
DashScope
~100× cheaper than Opus on input
1M context, dual Thinking/Non-Thinking modes
smooth-reviewingAdversarial critique — code review with a different lab
MiniMax M2
MiniMax
Qwen3-Coder-Plus
DashScope
GLM 5.1
Z.ai
~10× cheaper than Opus
Different lab from coder — catches blind spots
smooth-judgeSafety + intent verdicts — prompt injection, content moderation
Llama 3.1 8B Instant
Groq
Gemini 2.5 Flash
Google
Claude Haiku 4.5
Anthropic
~10× cheaper than the prior Gemini Flash primary
Sub-300ms first-token — matches accuracy on 1-line JSON verdicts
smooth-summarizeTranscript compression — long agent run context management
Gemini 2.5 Flash
Google
Qwen3-Coder-Plus
DashScope
GPT-5 mini
OpenAI
~10× cheaper than Sonnet @ 1M
1M context, IFEval leader in cheap tier
smooth-planningMapper / structured plan — read-only mapping work
Gemini 2.5 Flash
Google
DeepSeek V4-Flash
DeepSeek
Qwen3-Coder-Plus
DashScope
~50× cheaper than Opus
Bench-winning mapper — long context + structured output
smooth-fastUtility tier — session titles, 3–5 word labels, autocomplete, voice fast-router
Llama 3.1 8B Instant
Groq
Gemini 2.5 Flash-Lite
Google
Claude Haiku 4.5
Anthropic
~10× cheaper than the prior Gemini Flash Lite primary
Sub-300ms first-token, P95 ~600ms end-to-end on prod

Full catalog by provider

Every model each of our 8 direct providers publishes, routed through llm.smoo.ai. Prices are per million tokens in USD.

OpenAI

Workhorse utility tier + frontier GPT-5 when you need it

Pricing source ↗
ModelIDInput / MTokOutput / MTok
GPT-5.5 Pro
Frontier reasoning tier — GA 2026-04-24
gpt-5.5-pro$30.00$180.00
GPT-5.5
Current OpenAI flagship — GA 2026-04-24
gpt-5.5$5.00$30.00
GPT-5.4
Mid-frontier 5.4 — between gpt-5 and gpt-5.5
gpt-5.4$2.50$15.00
GPT-5.4 mini
Cheaper smart-tier 5.4 sibling
gpt-5.4-mini$0.75$4.50
GPT-5.4 nano
Ultra-cheap 5.4 nano
gpt-5.4-nano$0.20$1.25
GPT-5.4 Pro
GPT-5.4 Pro reasoning tier
gpt-5.4-pro$30.00$180.00
GPT-5.2
GPT-5.2 — mid-tier between 5.1 and 5.4
gpt-5.2$1.75$14.00
GPT-5.2 Codex
GPT-5.2 with Codex-coding training
gpt-5.2-codex$1.75$14.00
GPT-5.2 Pro
GPT-5.2 Pro reasoning tier
gpt-5.2-pro$21.00$168.00
GPT-5.1
GPT-5.1 — refined GPT-5 family entry
gpt-5.1$1.25$10.00
GPT-5.1 Codex
GPT-5.1 with Codex-coding training
gpt-5.1-codex$1.25$10.00
GPT-5
Frontier reasoning + tool use
gpt-5$2.50$10.00
GPT-5 Pro
GPT-5 Pro reasoning tier
gpt-5-pro$15.00$120.00
GPT-5 Codex
GPT-5 with Codex-coding training
gpt-5-codex$1.25$10.00
GPT-5 miniSmooth fallback
Cheap moderation + multi-slot fallback
gpt-5-mini$0.50$2.00
GPT-5 nano
Cheapest GPT-5 variant
gpt-5-nano$0.20$1.25
GPT-4.1
1M-context instruction following
gpt-4.1$2.00$8.00
GPT-4.1 mini
Cheap long-context extraction
gpt-4.1-mini$0.40$1.60
GPT-4.1 nano
Cost-floor 1M-context classification
gpt-4.1-nano$0.10$0.40
GPT-4o
Multimodal (vision + audio) legacy
gpt-4o$2.50$10.00
GPT-4o mini
Multimodal at near-embedding cost
gpt-4o-mini$0.15$0.60
o4-mini
Cheap reasoning-trace-visible o-series
o4-mini$1.10$4.40
o3
o-series reasoning — visible chain-of-thought
o3$2.00$8.00
o3-mini
Cheaper o3 sibling
o3-mini$1.10$4.40
o3-pro
o3 Pro — extended reasoning tier
o3-pro$20.00$80.00

Anthropic

Safety-tuned reasoning — strict refusal lineage + prompt caching

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Claude Opus 4.8
Current Anthropic flagship — 1M ctx, ~4× less likely to slip flaws than 4.7 (GA 2026-05-28)
claude-opus-4-8$5.00$25.00
Claude Opus 4.7
Prior flagship — SWE-bench Pro 64.3%, GA 2026-04-16
claude-opus-4-7$5.00$25.00
Claude Opus 4.6
Prior frontier — kept for pinned prompts
claude-opus-4-6$15.00$75.00
Claude Sonnet 4.6
1M-context balanced workhorse
claude-sonnet-4-6$3.00$15.00
Claude Sonnet 4.5
Stable predecessor Sonnet
claude-sonnet-4-5$3.00$15.00
Claude Haiku 4.5Smooth fallback
597ms TTFT — safety-tuned utility tier
claude-haiku-4-5$1.00$5.00

Google

1M-context recall + dialable thinking budgets

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Gemini 3.5 Flash
Current Google flagship Flash — GA 2026-05-19, 76.2% Terminal-Bench 2.1
gemini-3.5-flash$1.50$9.00
Gemini 2.5 Pro
Frontier reasoning + 1M multimodal
gemini-2.5-pro$1.25$10.00
Gemini 2.5 FlashSmooth default
IFEval leader in cheap Gemini tier — smooth-summarize + smooth-planning primary
gemini-2.5-flash$0.30$2.50
Gemini 2.5 Flash-LiteSmooth fallback
Cost-floor utility tier — smooth-fast fallback (Google)
gemini-2.5-flash-lite$0.10$0.40
Gemini 2.0 Flash
Legacy low-latency tier
gemini-2.0-flash$0.10$0.40
Gemini 2.0 Flash-Lite
Ultra-cheap 2.x utility-tier sibling
gemini-2.0-flash-lite
Gemini 3 Flash (preview)
Next-gen Flash — 3/3 PASS on CS escalation E2E
gemini-3-flash-preview
Gemini 3.1 Flash-Lite
GA Flash-Lite — 2.1s TTFT latency champion, voice-pipeline candidate (promoted from preview 2026-05-29)
gemini-3.1-flash-lite
Gemini 3.1 Flash-Lite (preview)
Preview alias retained for pinned callers — use stable above for new code
gemini-3.1-flash-lite-preview
Gemini 3 Pro (preview)
Next-gen Pro — preview pricing TBD
gemini-3-pro-preview
Gemini 3.1 Pro (preview)
Next-gen Pro refresh — preview pricing TBD
gemini-3.1-pro-preview
gemini-flash-latest (alias)
Always-current stable Flash
gemini-flash-latest
gemini-flash-lite-latest (alias)
Always-current stable Flash-Lite
gemini-flash-lite-latest
gemini-pro-latest (alias)
Always-current stable Pro
gemini-pro-latest
Gemini Embedding 002
First natively multimodal embedding — text + images + video + audio, Matryoshka 3072/1536/768 dims (GA 2026-04-23)
gemini-embedding-002
Gemini Embedding 001
3072-dim text embeddings — kept for index compatibility
gemini-embedding-001$0.15

Groq

Open-weight models on sub-second LPU inference

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Llama 3.3 70B Versatile
Sub-second 70B for voice + balanced tasks
groq-llama-3.3-70b$0.59$0.79
Llama 3.1 8B InstantSmooth default
Sub-300ms first-token — smooth-fast + smooth-judge primary, voice fast-router
groq-llama-3.1-8b$0.05$0.08
Llama 4 Scout 17B
10M-context cheap retrieval
groq-llama-4-scout$0.11$0.34
Llama 4 Maverick 17B
Frontier-class open-weight at speed
groq-llama-4-maverick$0.50$0.77
Kimi K2 (Groq host)
Kimi K2 with Groq latency profile
groq-kimi-k2$1.00$3.00
GPT-OSS 120B
Open-weight GPT — single-turn only
groq-gpt-oss-120b$0.15$0.60
GPT-OSS 20B
Cheap open-weight GPT — single-turn only
groq-gpt-oss-20b$0.10$0.30
GPT-OSS Safeguard 20B
Safety-tuned open-weight GPT
groq-gpt-oss-safeguard-20b$0.10$0.30
Groq Compound (agentic)
Multi-tool agentic system — web search + code exec built in
groq-compound
Groq Compound Mini (agentic)
Single-tool agentic — ~3× lower latency than full Compound
groq-compound-mini

DeepSeek

Frontier reasoning at rock-bottom per-token cost

Pricing source ↗
ModelIDInput / MTokOutput / MTok
DeepSeek V4-FlashSmooth default
1M context, dual Thinking/Non-Thinking modes
deepseek-v4-flash$0.14$0.28
DeepSeek V4-Pro
Pro-tier V4 — 75% intro discount through 2026-05-31
deepseek-v4-pro$0.43$0.87
deepseek-chat (legacy alias → V4-Flash)
Legacy alias — routes to V4-Flash; retiring 2026-07-24
deepseek-chat$0.14$0.28
deepseek-reasoner (legacy alias → V4-Pro)
Legacy alias — routes to V4-Pro; retiring 2026-07-24
deepseek-reasoner$0.43$0.87
DeepSeek V3.2 (aggregator)
Aggregator-routed V3.2 — emergency failover only
deepseek-v3.2$0.27$1.10
DeepSeek R1 (aggregator)
Aggregator-routed R1 reasoner — emergency failover only
deepseek-r1$0.55$2.19

Moonshot

Kimi family — purpose-trained for agentic loops

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Kimi K2.6
Current Kimi flagship — ties GPT-5.5 on SWE-Bench Pro, GA 2026-04-20
kimi-k2.6-direct$0.95$4.00
Kimi K2-ThinkingSmooth fallback
Deepest reasoner in the Kimi line
kimi-k2-thinking-direct$0.60$2.50
Kimi K2.5
Prior general-purpose Kimi
kimi-k2.5-direct$0.60$2.50
Kimi K2-Thinking (aggregator)
Aggregator-routed K2-Thinking — emergency failover only
kimi-k2-thinking$0.60$2.50
Kimi K2.5 (aggregator)
Aggregator-routed K2.5 — emergency failover only
kimi-k2.5$0.60$2.50

Alibaba DashScope

Qwen family — 1M context at aggressive pricing

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Qwen 3.7 Max
Current Qwen flagship — agent-first, 200 tok/s, SWE-Pro/Terminal-Bench tier winner (GA 2026-05-20)
qwen-3.7-max-direct$2.50$7.50
Qwen 3.6 Plus
Generalist Qwen flagship — GA 2026-04-02, 1M context
qwen-3.6-plus-direct$0.33$1.95
Qwen3-Coder-FlashSmooth default
Bench-winning coder — 16/16 aider-polyglot PASS
qwen3-coder-flash-direct$0.30$1.50
Qwen3-Coder-PlusSmooth fallback
PR-review tuned for large diffs
qwen3-coder-plus-direct$1.00$5.00
Qwen3-235B-Thinking-2507Smooth fallback
Cheapest thinking-mode reasoning
qwen3-235b-a22b-thinking-2507$0.13$0.60
Qwen3-Coder-Plus (aggregator)
Aggregator-routed Coder-Plus — emergency failover only
qwen3-coder-plus$1.00$5.00
Qwen3-Coder-Flash (aggregator)
Aggregator-routed Coder-Flash — emergency failover only
qwen3-coder-flash$0.30$1.50

Z.ai (Zhipu)

GLM family — SOTA on SWE-bench Pro

Pricing source ↗
ModelIDInput / MTokOutput / MTok
GLM 5.1Smooth fallback
58.4% SWE-Pro — coder-forward (GA 2026-04-07)
glm-5.1-direct$0.60$2.20
GLM 5
Faster GLM (78 tok/s vs 5.1's 54) — GA 2026-02-11
glm-5-direct$0.60$1.92
GLM 5.1 (aggregator)
Aggregator-routed GLM 5.1 — emergency failover only
glm-5.1$0.60$2.20

MiniMax

Cheapest frontier-class coder + reviewer

Pricing source ↗
ModelIDInput / MTokOutput / MTok
MiniMax M2.7
Current MiniMax flagship — same price as M2
minimax-m2.7-direct$0.30$1.20
MiniMax M2.7 Highspeed
~100 tps throughput-optimized variant
minimax-m2.7-highspeed-direct$0.60$2.40
MiniMax M2Smooth default
Frontier-class reviewer at $0.30 input
minimax-m2-direct$0.30$1.20
MiniMax M2.7 (aggregator)
Aggregator-routed M2.7 — emergency failover only
minimax-m2.7$0.30$1.20
MiniMax M2.5 (aggregator)
Aggregator-routed M2.5 — emergency failover only
minimax-m2.5$0.30$1.20

ElevenLabs

TTS for the voice pipeline

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Eleven Multilingual v2
High-fidelity multilingual TTS
elevenlabs-tts

Pricing is indicative — vendors change rates without notice. Routing + fallbacks auto-adjust against each provider's live pricing via LiteLLM. See API reference for the authoritative model list your key can hit right now.

Built for compliance

LLM data lives in an isolated database schema, invisible to the public REST API. RLS policies enforce that org members only see their org's keys + spend. No data leakage paths to design around.

Schema-isolated by design

Spend logs and key metadata live in a separate Postgres schema not exposed to PostgREST. The anon API key cannot reach LLM data — at all.

Org-scoped access

Row-level security: every read of a key or spend log enforces "your user is a member of this org" at the database layer. Defense in depth.

Ready to put a number on your AI spend?

Free to provision a key and watch your dashboard. Bring your own usage — pay only for what you call.

Get started