For Business

Stop guessing your LLM bill

Per-org budgets, real-time spend dashboards, audit-friendly logs. One invoice, every model. Compliance-friendly schema isolation from day one.

Spend visibility, not after-the-fact

Every request lands in a per-org spend log within seconds. Set budgets, watch trend lines, get warned before a single team's experiment turns into next quarter's surprise.

Monthly budget caps per org

Hard cap or soft alert. Hits the limit, requests start returning 429 — not a sticker-shock invoice.

Per-model breakdown

Which model burned how much, by team, by day. Catch a runaway prompt before it eats your AI budget.

Single bill, every model

No more aggregating five vendor invoices. We bill once monthly through Stripe — pass it to finance unchanged.

Audit-friendly logs

Every prompt + response + cost is captured. SOC 2 evidence collection is one CSV export away.

Pick a tier — or mix-and-match

Three preset tiers, six slots, one bill

Pick a preset to anchor your routing — or swap individual slots row-by-row. Smoo-hosted slots route to our open-weight stack on Groq (cheaper, our own credentials). External slots hit a frontier lab directly.

Slot
Power
Top-of-line for production agent work
Bills add up, but you don't second-guess a result. Lean on Anthropic + DeepSeek + the strongest direct-lab coders.
Cost band: highest
Balanced
Mid-range mix — capable + predictable bills
Direct-lab coders for the heavy lifts, Smoo-hosted Groq for fast utility, Gemini for context. Most teams should start here.
Cost band: mid
Lean
Cheapest sustainable mix — own-provider first
Smoo-hosted open-weight on Groq + Gemini Flash-Lite for everything we can. Single external slot for the coding workhorse.
Cost band: lowest
smooth-coding
Kimi K2-ThinkingExternal
Moonshot
$0.60 / $2.50 per MTok
GLM 5.1External
Z.ai
$0.60 / $2.20 per MTok
MiniMax M2External
MiniMax
$0.30 / $1.20 per MTok
smooth-reasoning
DeepSeek V3.2External
DeepSeek
$0.27 / $1.10 per MTok
Qwen3-235B-ThinkingExternal
DashScope
$0.13 / $0.60 per MTok
Qwen3-235B-ThinkingExternal
DashScope
$0.13 / $0.60 per MTok
smooth-reviewing
GLM 5.1External
Z.ai
$0.60 / $2.20 per MTok
MiniMax M2External
MiniMax
$0.30 / $1.20 per MTok
Qwen3-Coder-FlashExternal
DashScope
$0.30 / $1.50 per MTok
smooth-judge
Claude Haiku 4.5External
Anthropic
$1.00 / $5.00 per MTok
Llama 3.3 70BSmoo-hosted
Groq (Smoo-hosted)
$0.59 / $0.79 per MTok
Llama 3.1 8BSmoo-hosted
Groq (Smoo-hosted)
$0.050 / $0.080 per MTok
smooth-summarize
Gemini 2.5 FlashExternal
Google
$0.30 / $2.50 per MTok
Gemini 2.5 FlashExternal
Google
$0.30 / $2.50 per MTok
Gemini 2.5 Flash-LiteExternal
Google
$0.10 / $0.40 per MTok
smooth-fast
GPT-5 nanoExternal
OpenAI
$0.20 / $1.25 per MTok
Llama 3.1 8BSmoo-hosted
Groq (Smoo-hosted)
$0.050 / $0.080 per MTok
Gemini 2.5 Flash-LiteExternal
Google
$0.10 / $0.40 per MTok

Power

Production agents. Don't second-guess a result. Anthropic Haiku judges every action; DeepSeek does the deep thinking; Kimi K2-Thinking writes the code.

Balanced

Most teams should start here. Direct-lab coders for the heavy lifts, Smoo-hosted Groq for fast utility, Gemini Flash for the long context.

Lean

Cheapest sustainable mix. Smoo-hosted open-weight on Groq plus Gemini Flash-Lite carry the utility slots; only the coding workhorse calls externally.

Mix-and-match: every cell above corresponds to one routing slot. Set a different concrete model per slot in your ~/.smooth/providers.json or via the gateway dashboard. Cost figures are list price per million tokens (input / output) — actual spend depends on your usage profile.

Full model catalog

Every frontier model, one invoice

Eight direct providers — no aggregator markup on primary routes. Bring the model your team already knows, or opt into our smooth-* semantic slots for benchmark-picked defaults at 5–25× frontier cost savings.

Smooth-optimized slot picks

Our 6 semantic routing aliases — primary + two automatic fallbacks, picked from public 2026 benchmarks.

Last refreshed 2026-04-25
SlotUsed forPrimaryFallback 1Fallback 2Frontier delta
smooth-codingCoding workhorse — outer loop for every code-touching call
Kimi K2-Thinking
Moonshot
GLM 5.1
Z.ai
MiniMax M2
MiniMax
~25× cheaper
−7pp vs Opus 4.7 (SWE-Verified)
smooth-reasoningDeep reasoning — plan / think / research flows
DeepSeek V3.2 (deepseek-chat)
DeepSeek
Kimi K2-Thinking
Moonshot
Qwen3-235B-Thinking-2507
DashScope
~17× cheaper than Opus on input
85.7% GPQA-D, 96.0% AIME’25
smooth-reviewingAdversarial critique — code review with a different lab
GLM 5.1
Z.ai
MiniMax M2
MiniMax
Qwen3-Coder-Plus
DashScope
~10× cheaper
58.4% SWE-Pro — #1 on benchmark
smooth-judgeSafety + intent verdicts — prompt injection, content moderation
Claude Haiku 4.5
Anthropic
Gemini 2.5 Flash
Google
GPT-5-mini
OpenAI
~15× cheaper than Opus
597ms TTFT, ASL-2 safety lineage
smooth-summarizeTranscript compression — long agent run context management
Gemini 2.5 Flash
Google
Qwen3-Coder-Plus
DashScope
GPT-5-mini
OpenAI
~6× cheaper than Sonnet @ 1M
1M context, IFEval leader in cheap tier
smooth-fastUtility tier — session titles, 3–5 word labels, autocomplete
GPT-5-nano
OpenAI
Gemini 2.5 Flash-Lite
Google
Claude Haiku 4.5
Anthropic
cheapest interactive tier
480ms TTFT, 161 tok/s

Full catalog by provider

Every model each of our 8 direct providers publishes, routed through llm.smoo.ai. Prices are per million tokens in USD.

OpenAI

Workhorse utility tier + frontier GPT-5 when you need it

Pricing source ↗
ModelIDInput / MTokOutput / MTok
GPT-5
Frontier reasoning + tool use
gpt-5$1.25$10.00
GPT-5 miniSmooth fallback
Cheap moderation + judge fallback
gpt-5-mini$0.40$1.60
GPT-5 nanoSmooth default
480ms TTFT — best utility-tier latency
gpt-5-nano$0.20$1.25
GPT-4.1
1M-context instruction following
gpt-4.1$2.00$8.00
GPT-4.1 mini
Cheap long-context extraction
gpt-4.1-mini$0.40$1.60
GPT-4.1 nano
Cost-floor 1M-context classification
gpt-4.1-nano$0.10$0.40
GPT-4o
Multimodal (vision + audio) legacy
gpt-4o$2.50$10.00
GPT-4o mini
Multimodal at near-embedding cost
gpt-4o-mini$0.15$0.60
o4-mini
Cheap reasoning-trace-visible o-series
o4-mini$1.10$4.40
omni-moderation-latest
Free content moderation classifier
omni-moderation-latestFreeFree
text-embedding-3-large
3072-dim embeddings
text-embedding-3-large$0.13
text-embedding-3-small
1536-dim embeddings
text-embedding-3-small$0.02

Anthropic

Safety-tuned reasoning — our judge-role default

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Claude Opus 4.6
Frontier reasoning + safety lineage
claude-opus-4-6$15.00$75.00
Claude Sonnet 4.6
1M-context balanced workhorse
claude-sonnet-4-6$3.00$15.00
Claude Sonnet 4.5
Stable predecessor Sonnet
claude-sonnet-4-5$3.00$15.00
Claude Haiku 4.5Smooth default
597ms TTFT — safety-tuned utility tier
claude-haiku-4-5$1.00$5.00

Google

1M-context recall + dialable thinking budgets

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Gemini 2.5 Pro
Frontier reasoning + 1M multimodal
gemini-2.5-pro$1.25$10.00
Gemini 2.5 FlashSmooth default
IFEval leader in cheap Gemini tier
gemini-2.5-flash$0.30$2.50
Gemini 2.5 Flash-LiteSmooth fallback
Cost-floor utility tier
gemini-2.5-flash-lite$0.10$0.40
Gemini 2.0 Flash
Legacy low-latency tier
gemini-2.0-flash$0.10$0.40
gemini-embedding-001
3072-dim multilingual embeddings
gemini-embedding-001$0.15

Groq

Open-weight models on sub-second LPU inference

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Llama 3.3 70B Versatile
Sub-second 70B for voice + balanced tasks
groq-llama-3.3-70b$0.59$0.79
Llama 3.1 8B Instant
Cheapest fast tier on Groq
groq-llama-3.1-8b$0.05$0.08
Llama 4 Scout 17B
10M-context cheap retrieval
groq-llama-4-scout$0.11$0.34
Llama 4 Maverick 17B
Frontier-class open-weight at speed
groq-llama-4-maverick$0.50$0.77
Kimi K2 (Groq host)
Kimi K2 with Groq latency profile
groq-kimi-k2$1.00$3.00

DeepSeek

Frontier reasoning at rock-bottom per-token cost

Pricing source ↗
ModelIDInput / MTokOutput / MTok
DeepSeek V3.2 (deepseek-chat)Smooth default
85.7% GPQA-D — frontier math reasoning
deepseek-chat$0.27$1.10
DeepSeek Reasoner
Visible reasoning trace for audits
deepseek-reasoner$0.55$2.19

Moonshot

Kimi K2 family — purpose-trained for agentic loops

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Kimi K2-ThinkingSmooth default
80.2% SWE-Verified — 25× cheaper than Opus
kimi-k2-thinking-direct$0.60$2.50
Kimi K2.6
Flagship general-purpose Kimi
kimi-k2.6-direct$0.60$2.50

Alibaba DashScope

Qwen family — 1M context at aggressive pricing

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Qwen3-Coder-PlusSmooth fallback
PR-review tuned for large diffs
qwen3-coder-plus-direct$1.00$5.00
Qwen3-Coder-Flash
Cheap coder for bulk autocomplete
qwen3-coder-flash-direct$0.30$1.50
Qwen3-235B-Thinking-2507Smooth fallback
Cheapest thinking-mode reasoning
qwen3-235b-a22b-thinking-2507$0.13$0.60
Qwen3-235B-Instruct-2507
Non-thinking variant for instruction flows
qwen3-235b-a22b-instruct-2507$0.13$0.60

Z.ai (Zhipu)

GLM family — SOTA on SWE-bench Pro

Pricing source ↗
ModelIDInput / MTokOutput / MTok
GLM 5.1Smooth default
58.4% SWE-Pro — #1 for review
glm-5.1-direct$0.60$2.20

MiniMax

Cheapest frontier-class coder + reviewer

Pricing source ↗
ModelIDInput / MTokOutput / MTok
MiniMax M2Smooth fallback
Frontier-class coder/reviewer at $0.30 input
minimax-m2-direct$0.30$1.20

ElevenLabs

TTS for the voice pipeline

Pricing source ↗
ModelIDInput / MTokOutput / MTok
Eleven Multilingual v2
High-fidelity multilingual TTS
elevenlabs-tts

Pricing is indicative — vendors change rates without notice. Routing + fallbacks auto-adjust against each provider's live pricing via LiteLLM. See API reference for the authoritative model list your key can hit right now.

Built for compliance

LLM data lives in an isolated database schema, invisible to the public REST API. RLS policies enforce that org members only see their org's keys + spend. No data leakage paths to design around.

Schema-isolated by design

Spend logs and key metadata live in a separate Postgres schema not exposed to PostgREST. The anon API key cannot reach LLM data — at all.

Org-scoped access

Row-level security: every read of a key or spend log enforces "your user is a member of this org" at the database layer. Defense in depth.

Ready to put a number on your AI spend?

Free to provision a key and watch your dashboard. Bring your own usage — pay only for what you call.

Get started