Stop guessing your LLM bill
Per-org budgets, real-time spend dashboards, audit-friendly logs. One invoice, every model. Compliance-friendly schema isolation from day one.
Spend visibility, not after-the-fact
Every request lands in a per-org spend log within seconds. Set budgets, watch trend lines, get warned before a single team's experiment turns into next quarter's surprise.
Monthly budget caps per org
Hard cap or soft alert. Hits the limit, requests start returning 429 — not a sticker-shock invoice.
Per-model breakdown
Which model burned how much, by team, by day. Catch a runaway prompt before it eats your AI budget.
Single bill, every model
No more aggregating five vendor invoices. We bill once monthly through Stripe — pass it to finance unchanged.
Audit-friendly logs
Every prompt + response + cost is captured. SOC 2 evidence collection is one CSV export away.
Three preset tiers, six slots, one bill
Pick a preset to anchor your routing — or swap individual slots row-by-row. Smoo-hosted slots route to our open-weight stack on Groq (cheaper, our own credentials). External slots hit a frontier lab directly.
| Slot | Power Top-of-line for production agent work Bills add up, but you don't second-guess a result. Lean on Anthropic + DeepSeek + the strongest direct-lab coders. Cost band: highest | Balanced Mid-range mix — capable + predictable bills Direct-lab coders for the heavy lifts, Smoo-hosted Groq for fast utility, Gemini for context. Most teams should start here. Cost band: mid | Lean Cheapest sustainable mix — own-provider first Smoo-hosted open-weight on Groq + Gemini Flash-Lite for everything we can. Single external slot for the coding workhorse. Cost band: lowest |
|---|---|---|---|
smooth-coding | Kimi K2-ThinkingExternal Moonshot $0.60 / $2.50 per MTok | GLM 5.1External Z.ai $0.60 / $2.20 per MTok | MiniMax M2External MiniMax $0.30 / $1.20 per MTok |
smooth-reasoning | DeepSeek V3.2External DeepSeek $0.27 / $1.10 per MTok | Qwen3-235B-ThinkingExternal DashScope $0.13 / $0.60 per MTok | Qwen3-235B-ThinkingExternal DashScope $0.13 / $0.60 per MTok |
smooth-reviewing | GLM 5.1External Z.ai $0.60 / $2.20 per MTok | MiniMax M2External MiniMax $0.30 / $1.20 per MTok | Qwen3-Coder-FlashExternal DashScope $0.30 / $1.50 per MTok |
smooth-judge | Claude Haiku 4.5External Anthropic $1.00 / $5.00 per MTok | Llama 3.3 70BSmoo-hosted Groq (Smoo-hosted) $0.59 / $0.79 per MTok | Llama 3.1 8BSmoo-hosted Groq (Smoo-hosted) $0.050 / $0.080 per MTok |
smooth-summarize | Gemini 2.5 FlashExternal Google $0.30 / $2.50 per MTok | Gemini 2.5 FlashExternal Google $0.30 / $2.50 per MTok | Gemini 2.5 Flash-LiteExternal Google $0.10 / $0.40 per MTok |
smooth-fast | GPT-5 nanoExternal OpenAI $0.20 / $1.25 per MTok | Llama 3.1 8BSmoo-hosted Groq (Smoo-hosted) $0.050 / $0.080 per MTok | Gemini 2.5 Flash-LiteExternal Google $0.10 / $0.40 per MTok |
Power
Production agents. Don't second-guess a result. Anthropic Haiku judges every action; DeepSeek does the deep thinking; Kimi K2-Thinking writes the code.
Balanced
Most teams should start here. Direct-lab coders for the heavy lifts, Smoo-hosted Groq for fast utility, Gemini Flash for the long context.
Lean
Cheapest sustainable mix. Smoo-hosted open-weight on Groq plus Gemini Flash-Lite carry the utility slots; only the coding workhorse calls externally.
Mix-and-match: every cell above corresponds to one routing slot. Set a different concrete model per slot in your ~/.smooth/providers.json or via the gateway dashboard. Cost figures are list price per million tokens (input / output) — actual spend depends on your usage profile.
Every frontier model, one invoice
Eight direct providers — no aggregator markup on primary routes. Bring the model your team already knows, or opt into our smooth-* semantic slots for benchmark-picked defaults at 5–25× frontier cost savings.
Smooth-optimized slot picks
Our 6 semantic routing aliases — primary + two automatic fallbacks, picked from public 2026 benchmarks.
| Slot | Used for | Primary | Fallback 1 | Fallback 2 | Frontier delta |
|---|---|---|---|---|---|
| smooth-coding | Coding workhorse — outer loop for every code-touching call | Kimi K2-Thinking Moonshot | GLM 5.1 Z.ai | MiniMax M2 MiniMax | ~25× cheaper −7pp vs Opus 4.7 (SWE-Verified) |
| smooth-reasoning | Deep reasoning — plan / think / research flows | DeepSeek V3.2 (deepseek-chat) DeepSeek | Kimi K2-Thinking Moonshot | Qwen3-235B-Thinking-2507 DashScope | ~17× cheaper than Opus on input 85.7% GPQA-D, 96.0% AIME’25 |
| smooth-reviewing | Adversarial critique — code review with a different lab | GLM 5.1 Z.ai | MiniMax M2 MiniMax | Qwen3-Coder-Plus DashScope | ~10× cheaper 58.4% SWE-Pro — #1 on benchmark |
| smooth-judge | Safety + intent verdicts — prompt injection, content moderation | Claude Haiku 4.5 Anthropic | Gemini 2.5 Flash Google | GPT-5-mini OpenAI | ~15× cheaper than Opus 597ms TTFT, ASL-2 safety lineage |
| smooth-summarize | Transcript compression — long agent run context management | Gemini 2.5 Flash Google | Qwen3-Coder-Plus DashScope | GPT-5-mini OpenAI | ~6× cheaper than Sonnet @ 1M 1M context, IFEval leader in cheap tier |
| smooth-fast | Utility tier — session titles, 3–5 word labels, autocomplete | GPT-5-nano OpenAI | Gemini 2.5 Flash-Lite Google | Claude Haiku 4.5 Anthropic | cheapest interactive tier 480ms TTFT, 161 tok/s |
Full catalog by provider
Every model each of our 8 direct providers publishes, routed through llm.smoo.ai. Prices are per million tokens in USD.
OpenAI
Workhorse utility tier + frontier GPT-5 when you need it
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
GPT-5 Frontier reasoning + tool use | gpt-5 | $1.25 | $10.00 | 400K | Frontier reasoning + tool use |
GPT-5 miniSmooth fallback Cheap moderation + judge fallback | gpt-5-mini | $0.40 | $1.60 | 400K | Cheap moderation + judge fallback |
GPT-5 nanoSmooth default 480ms TTFT — best utility-tier latency | gpt-5-nano | $0.20 | $1.25 | 400K | 480ms TTFT — best utility-tier latency |
GPT-4.1 1M-context instruction following | gpt-4.1 | $2.00 | $8.00 | 1M | 1M-context instruction following |
GPT-4.1 mini Cheap long-context extraction | gpt-4.1-mini | $0.40 | $1.60 | 1M | Cheap long-context extraction |
GPT-4.1 nano Cost-floor 1M-context classification | gpt-4.1-nano | $0.10 | $0.40 | 1M | Cost-floor 1M-context classification |
GPT-4o Multimodal (vision + audio) legacy | gpt-4o | $2.50 | $10.00 | 128K | Multimodal (vision + audio) legacy |
GPT-4o mini Multimodal at near-embedding cost | gpt-4o-mini | $0.15 | $0.60 | 128K | Multimodal at near-embedding cost |
o4-mini Cheap reasoning-trace-visible o-series | o4-mini | $1.10 | $4.40 | 200K | Cheap reasoning-trace-visible o-series |
omni-moderation-latest Free content moderation classifier | omni-moderation-latest | Free | Free | 32K | Free content moderation classifier |
text-embedding-3-large 3072-dim embeddings | text-embedding-3-large | $0.13 | — | 8K | 3072-dim embeddings |
text-embedding-3-small 1536-dim embeddings | text-embedding-3-small | $0.02 | — | 8K | 1536-dim embeddings |
Anthropic
Safety-tuned reasoning — our judge-role default
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Claude Opus 4.6 Frontier reasoning + safety lineage | claude-opus-4-6 | $15.00 | $75.00 | 200K | Frontier reasoning + safety lineage |
Claude Sonnet 4.6 1M-context balanced workhorse | claude-sonnet-4-6 | $3.00 | $15.00 | 1M | 1M-context balanced workhorse |
Claude Sonnet 4.5 Stable predecessor Sonnet | claude-sonnet-4-5 | $3.00 | $15.00 | 200K | Stable predecessor Sonnet |
Claude Haiku 4.5Smooth default 597ms TTFT — safety-tuned utility tier | claude-haiku-4-5 | $1.00 | $5.00 | 200K | 597ms TTFT — safety-tuned utility tier |
1M-context recall + dialable thinking budgets
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Gemini 2.5 Pro Frontier reasoning + 1M multimodal | gemini-2.5-pro | $1.25 | $10.00 | 1M | Frontier reasoning + 1M multimodal |
Gemini 2.5 FlashSmooth default IFEval leader in cheap Gemini tier | gemini-2.5-flash | $0.30 | $2.50 | 1M | IFEval leader in cheap Gemini tier |
Gemini 2.5 Flash-LiteSmooth fallback Cost-floor utility tier | gemini-2.5-flash-lite | $0.10 | $0.40 | 1M | Cost-floor utility tier |
Gemini 2.0 Flash Legacy low-latency tier | gemini-2.0-flash | $0.10 | $0.40 | 1M | Legacy low-latency tier |
gemini-embedding-001 3072-dim multilingual embeddings | gemini-embedding-001 | $0.15 | — | 2K | 3072-dim multilingual embeddings |
Groq
Open-weight models on sub-second LPU inference
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Llama 3.3 70B Versatile Sub-second 70B for voice + balanced tasks | groq-llama-3.3-70b | $0.59 | $0.79 | 128K | Sub-second 70B for voice + balanced tasks |
Llama 3.1 8B Instant Cheapest fast tier on Groq | groq-llama-3.1-8b | $0.05 | $0.08 | 128K | Cheapest fast tier on Groq |
Llama 4 Scout 17B 10M-context cheap retrieval | groq-llama-4-scout | $0.11 | $0.34 | 10M | 10M-context cheap retrieval |
Llama 4 Maverick 17B Frontier-class open-weight at speed | groq-llama-4-maverick | $0.50 | $0.77 | 1M | Frontier-class open-weight at speed |
Kimi K2 (Groq host) Kimi K2 with Groq latency profile | groq-kimi-k2 | $1.00 | $3.00 | 256K | Kimi K2 with Groq latency profile |
DeepSeek
Frontier reasoning at rock-bottom per-token cost
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
DeepSeek V3.2 (deepseek-chat)Smooth default 85.7% GPQA-D — frontier math reasoning | deepseek-chat | $0.27 | $1.10 | 128K | 85.7% GPQA-D — frontier math reasoning |
DeepSeek Reasoner Visible reasoning trace for audits | deepseek-reasoner | $0.55 | $2.19 | 128K | Visible reasoning trace for audits |
Moonshot
Kimi K2 family — purpose-trained for agentic loops
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Kimi K2-ThinkingSmooth default 80.2% SWE-Verified — 25× cheaper than Opus | kimi-k2-thinking-direct | $0.60 | $2.50 | 256K | 80.2% SWE-Verified — 25× cheaper than Opus |
Kimi K2.6 Flagship general-purpose Kimi | kimi-k2.6-direct | $0.60 | $2.50 | 256K | Flagship general-purpose Kimi |
Alibaba DashScope
Qwen family — 1M context at aggressive pricing
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Qwen3-Coder-PlusSmooth fallback PR-review tuned for large diffs | qwen3-coder-plus-direct | $1.00 | $5.00 | 1M | PR-review tuned for large diffs |
Qwen3-Coder-Flash Cheap coder for bulk autocomplete | qwen3-coder-flash-direct | $0.30 | $1.50 | 1M | Cheap coder for bulk autocomplete |
Qwen3-235B-Thinking-2507Smooth fallback Cheapest thinking-mode reasoning | qwen3-235b-a22b-thinking-2507 | $0.13 | $0.60 | 262K | Cheapest thinking-mode reasoning |
Qwen3-235B-Instruct-2507 Non-thinking variant for instruction flows | qwen3-235b-a22b-instruct-2507 | $0.13 | $0.60 | 262K | Non-thinking variant for instruction flows |
Z.ai (Zhipu)
GLM family — SOTA on SWE-bench Pro
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
GLM 5.1Smooth default 58.4% SWE-Pro — #1 for review | glm-5.1-direct | $0.60 | $2.20 | 128K | 58.4% SWE-Pro — #1 for review |
MiniMax
Cheapest frontier-class coder + reviewer
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
MiniMax M2Smooth fallback Frontier-class coder/reviewer at $0.30 input | minimax-m2-direct | $0.30 | $1.20 | 200K | Frontier-class coder/reviewer at $0.30 input |
ElevenLabs
TTS for the voice pipeline
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Eleven Multilingual v2 High-fidelity multilingual TTS | elevenlabs-tts | — | — | TTS | High-fidelity multilingual TTS |
Pricing is indicative — vendors change rates without notice. Routing + fallbacks auto-adjust against each provider's live pricing via LiteLLM. See API reference for the authoritative model list your key can hit right now.
Built for compliance
LLM data lives in an isolated database schema, invisible to the public REST API. RLS policies enforce that org members only see their org's keys + spend. No data leakage paths to design around.
Schema-isolated by design
Spend logs and key metadata live in a separate Postgres schema not exposed to PostgREST. The anon API key cannot reach LLM data — at all.
Org-scoped access
Row-level security: every read of a key or spend log enforces "your user is a member of this org" at the database layer. Defense in depth.
Ready to put a number on your AI spend?
Free to provision a key and watch your dashboard. Bring your own usage — pay only for what you call.
Get started