Stop guessing your LLM bill
Per-org budgets, real-time spend dashboards, audit-friendly logs. One invoice across nine labs — OpenAI, Anthropic, Google, Groq, DeepSeek, Qwen, Kimi, GLM, MiniMax. Compliance-friendly schema isolation from day one.
Spend visibility, not after-the-fact
Every request lands in a per-org spend log within seconds. Set budgets, watch trend lines, get warned before a single team's experiment turns into next quarter's surprise.
Monthly budget caps per org
Hard cap or soft alert. Hits the limit, requests start returning 429 — not a sticker-shock invoice.
Per-model breakdown
Which model burned how much, by team, by day. Catch a runaway prompt before it eats your AI budget.
Single bill, every model
No more aggregating five vendor invoices. We bill once monthly through Stripe — pass it to finance unchanged.
Audit-friendly logs
Every prompt + response + cost is captured. SOC 2 evidence collection is one CSV export away.
Three preset tiers, six slots, one bill
Pick a preset to anchor your routing — or swap individual slots row-by-row. Smoo-hosted slots route to our open-weight stack on Groq (cheaper, our own credentials). External slots hit a frontier lab directly.
| Slot | Power Top-of-line for production agent work Bills add up, but you don't second-guess a result. Lean on Anthropic + DeepSeek V4 + the strongest direct-lab coders. Cost band: highest | Balanced Mid-range mix — capable + predictable bills Qwen3-Coder-Flash for coding, DeepSeek V4-Flash for reasoning, Gemini for context. Mirrors the live smooth-* defaults — most teams should start here. Cost band: mid | Lean Cheapest sustainable mix — own-provider first Smoo-hosted open-weight on Groq + Gemini Flash-Lite for everything we can. DeepSeek V4-Flash for the reasoning slot. Cost band: lowest |
|---|---|---|---|
smooth-coding | Kimi K2-ThinkingExternal Moonshot $0.60 / $2.50 per MTok | Qwen3-Coder-FlashExternal DashScope $0.30 / $1.50 per MTok | MiniMax M2External MiniMax $0.30 / $1.20 per MTok |
smooth-reasoning | DeepSeek V4-ProExternal DeepSeek $0.43 / $0.87 per MTok | DeepSeek V4-FlashExternal DeepSeek $0.14 / $0.28 per MTok | DeepSeek V4-FlashExternal DeepSeek $0.14 / $0.28 per MTok |
smooth-reviewing | GLM 5.1External Z.ai $0.60 / $2.20 per MTok | MiniMax M2External MiniMax $0.30 / $1.20 per MTok | Qwen3-Coder-FlashExternal DashScope $0.30 / $1.50 per MTok |
smooth-judge | Claude Haiku 4.5External Anthropic $1.00 / $5.00 per MTok | Gemini 2.5 FlashExternal Google $0.30 / $2.50 per MTok | Llama 3.1 8BSmoo-hosted Groq (Smoo-hosted) $0.050 / $0.080 per MTok |
smooth-summarize | Gemini 2.5 FlashExternal Google $0.30 / $2.50 per MTok | Gemini 2.5 FlashExternal Google $0.30 / $2.50 per MTok | Gemini 2.5 Flash-LiteExternal Google $0.10 / $0.40 per MTok |
smooth-fast | Claude Haiku 4.5External Anthropic $1.00 / $5.00 per MTok | Gemini 2.5 Flash-LiteExternal Google $0.10 / $0.40 per MTok | Gemini 2.5 Flash-LiteExternal Google $0.10 / $0.40 per MTok |
Power
Production agents. Don't second-guess a result. Anthropic Haiku judges every action; DeepSeek does the deep thinking; Kimi K2-Thinking writes the code.
Balanced
Most teams should start here. Direct-lab coders for the heavy lifts, Smoo-hosted Groq for fast utility, Gemini Flash for the long context.
Lean
Cheapest sustainable mix. Smoo-hosted open-weight on Groq plus Gemini Flash-Lite carry the utility slots; only the coding workhorse calls externally.
Mix-and-match: every cell above corresponds to one routing slot. Set a different concrete model per slot in your ~/.smooth/providers.json or via the gateway dashboard. Cost figures are list price per million tokens (input / output) — actual spend depends on your usage profile.
Every frontier model, one invoice
Eight direct providers — no aggregator markup on primary routes. Bring the model your team already knows, or opt into our smooth-* semantic slots for benchmark-picked defaults at 5–25× frontier cost savings.
Smooth-optimized slot picks
Our 6 semantic routing aliases — primary + two automatic fallbacks, picked from public 2026 benchmarks.
| Slot | Used for | Primary | Fallback 1 | Fallback 2 | Frontier delta |
|---|---|---|---|---|---|
| smooth-coding | Coding workhorse — outer loop for every code-touching call | Qwen3-Coder-Flash DashScope | GLM 5.1 Z.ai | Kimi K2-Thinking Moonshot | ~5× cheaper than the prior Kimi primary 16/16 PASS on aider-polyglot multi-role bench |
| smooth-reasoning | Deep reasoning — plan / think / research flows | DeepSeek V4-Flash DeepSeek | Kimi K2-Thinking Moonshot | Qwen3-235B-Thinking-2507 DashScope | ~100× cheaper than Opus on input 1M context, dual Thinking/Non-Thinking modes |
| smooth-reviewing | Adversarial critique — code review with a different lab | MiniMax M2 MiniMax | Qwen3-Coder-Plus DashScope | GLM 5.1 Z.ai | ~10× cheaper than Opus Different lab from coder — catches blind spots |
| smooth-judge | Safety + intent verdicts — prompt injection, content moderation | Llama 3.1 8B Instant Groq | Gemini 2.5 Flash Google | Claude Haiku 4.5 Anthropic | ~10× cheaper than the prior Gemini Flash primary Sub-300ms first-token — matches accuracy on 1-line JSON verdicts |
| smooth-summarize | Transcript compression — long agent run context management | Gemini 2.5 Flash Google | Qwen3-Coder-Plus DashScope | GPT-5 mini OpenAI | ~10× cheaper than Sonnet @ 1M 1M context, IFEval leader in cheap tier |
| smooth-planning | Mapper / structured plan — read-only mapping work | Gemini 2.5 Flash Google | DeepSeek V4-Flash DeepSeek | Qwen3-Coder-Plus DashScope | ~50× cheaper than Opus Bench-winning mapper — long context + structured output |
| smooth-fast | Utility tier — session titles, 3–5 word labels, autocomplete, voice fast-router | Llama 3.1 8B Instant Groq | Gemini 2.5 Flash-Lite Google | Claude Haiku 4.5 Anthropic | ~10× cheaper than the prior Gemini Flash Lite primary Sub-300ms first-token, P95 ~600ms end-to-end on prod |
Full catalog by provider
Every model each of our 8 direct providers publishes, routed through llm.smoo.ai. Prices are per million tokens in USD.
OpenAI
Workhorse utility tier + frontier GPT-5 when you need it
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
GPT-5.5 Pro Frontier reasoning tier — GA 2026-04-24 | gpt-5.5-pro | $30.00 | $180.00 | 400K | Frontier reasoning tier — GA 2026-04-24 |
GPT-5.5 Current OpenAI flagship — GA 2026-04-24 | gpt-5.5 | $5.00 | $30.00 | 400K | Current OpenAI flagship — GA 2026-04-24 |
GPT-5.4 Mid-frontier 5.4 — between gpt-5 and gpt-5.5 | gpt-5.4 | $2.50 | $15.00 | 400K | Mid-frontier 5.4 — between gpt-5 and gpt-5.5 |
GPT-5.4 mini Cheaper smart-tier 5.4 sibling | gpt-5.4-mini | $0.75 | $4.50 | 400K | Cheaper smart-tier 5.4 sibling |
GPT-5.4 nano Ultra-cheap 5.4 nano | gpt-5.4-nano | $0.20 | $1.25 | 400K | Ultra-cheap 5.4 nano |
GPT-5.4 Pro GPT-5.4 Pro reasoning tier | gpt-5.4-pro | $30.00 | $180.00 | 400K | GPT-5.4 Pro reasoning tier |
GPT-5.2 GPT-5.2 — mid-tier between 5.1 and 5.4 | gpt-5.2 | $1.75 | $14.00 | 400K | GPT-5.2 — mid-tier between 5.1 and 5.4 |
GPT-5.2 Codex GPT-5.2 with Codex-coding training | gpt-5.2-codex | $1.75 | $14.00 | 400K | GPT-5.2 with Codex-coding training |
GPT-5.2 Pro GPT-5.2 Pro reasoning tier | gpt-5.2-pro | $21.00 | $168.00 | 400K | GPT-5.2 Pro reasoning tier |
GPT-5.1 GPT-5.1 — refined GPT-5 family entry | gpt-5.1 | $1.25 | $10.00 | 400K | GPT-5.1 — refined GPT-5 family entry |
GPT-5.1 Codex GPT-5.1 with Codex-coding training | gpt-5.1-codex | $1.25 | $10.00 | 400K | GPT-5.1 with Codex-coding training |
GPT-5 Frontier reasoning + tool use | gpt-5 | $2.50 | $10.00 | 400K | Frontier reasoning + tool use |
GPT-5 Pro GPT-5 Pro reasoning tier | gpt-5-pro | $15.00 | $120.00 | 400K | GPT-5 Pro reasoning tier |
GPT-5 Codex GPT-5 with Codex-coding training | gpt-5-codex | $1.25 | $10.00 | 400K | GPT-5 with Codex-coding training |
GPT-5 miniSmooth fallback Cheap moderation + multi-slot fallback | gpt-5-mini | $0.50 | $2.00 | 400K | Cheap moderation + multi-slot fallback |
GPT-5 nano Cheapest GPT-5 variant | gpt-5-nano | $0.20 | $1.25 | 400K | Cheapest GPT-5 variant |
GPT-4.1 1M-context instruction following | gpt-4.1 | $2.00 | $8.00 | 1M | 1M-context instruction following |
GPT-4.1 mini Cheap long-context extraction | gpt-4.1-mini | $0.40 | $1.60 | 1M | Cheap long-context extraction |
GPT-4.1 nano Cost-floor 1M-context classification | gpt-4.1-nano | $0.10 | $0.40 | 1M | Cost-floor 1M-context classification |
GPT-4o Multimodal (vision + audio) legacy | gpt-4o | $2.50 | $10.00 | 128K | Multimodal (vision + audio) legacy |
GPT-4o mini Multimodal at near-embedding cost | gpt-4o-mini | $0.15 | $0.60 | 128K | Multimodal at near-embedding cost |
o4-mini Cheap reasoning-trace-visible o-series | o4-mini | $1.10 | $4.40 | 200K | Cheap reasoning-trace-visible o-series |
o3 o-series reasoning — visible chain-of-thought | o3 | $2.00 | $8.00 | 200K | o-series reasoning — visible chain-of-thought |
o3-mini Cheaper o3 sibling | o3-mini | $1.10 | $4.40 | 200K | Cheaper o3 sibling |
o3-pro o3 Pro — extended reasoning tier | o3-pro | $20.00 | $80.00 | 200K | o3 Pro — extended reasoning tier |
Anthropic
Safety-tuned reasoning — strict refusal lineage + prompt caching
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Claude Opus 4.8 Current Anthropic flagship — 1M ctx, ~4× less likely to slip flaws than 4.7 (GA 2026-05-28) | claude-opus-4-8 | $5.00 | $25.00 | 1M | Current Anthropic flagship — 1M ctx, ~4× less likely to slip flaws than 4.7 (GA 2026-05-28) |
Claude Opus 4.7 Prior flagship — SWE-bench Pro 64.3%, GA 2026-04-16 | claude-opus-4-7 | $5.00 | $25.00 | 200K | Prior flagship — SWE-bench Pro 64.3%, GA 2026-04-16 |
Claude Opus 4.6 Prior frontier — kept for pinned prompts | claude-opus-4-6 | $15.00 | $75.00 | 200K | Prior frontier — kept for pinned prompts |
Claude Sonnet 4.6 1M-context balanced workhorse | claude-sonnet-4-6 | $3.00 | $15.00 | 1M | 1M-context balanced workhorse |
Claude Sonnet 4.5 Stable predecessor Sonnet | claude-sonnet-4-5 | $3.00 | $15.00 | 200K | Stable predecessor Sonnet |
Claude Haiku 4.5Smooth fallback 597ms TTFT — safety-tuned utility tier | claude-haiku-4-5 | $1.00 | $5.00 | 200K | 597ms TTFT — safety-tuned utility tier |
1M-context recall + dialable thinking budgets
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Gemini 3.5 Flash Current Google flagship Flash — GA 2026-05-19, 76.2% Terminal-Bench 2.1 | gemini-3.5-flash | $1.50 | $9.00 | 1M | Current Google flagship Flash — GA 2026-05-19, 76.2% Terminal-Bench 2.1 |
Gemini 2.5 Pro Frontier reasoning + 1M multimodal | gemini-2.5-pro | $1.25 | $10.00 | 1M | Frontier reasoning + 1M multimodal |
Gemini 2.5 FlashSmooth default IFEval leader in cheap Gemini tier — smooth-summarize + smooth-planning primary | gemini-2.5-flash | $0.30 | $2.50 | 1M | IFEval leader in cheap Gemini tier — smooth-summarize + smooth-planning primary |
Gemini 2.5 Flash-LiteSmooth fallback Cost-floor utility tier — smooth-fast fallback (Google) | gemini-2.5-flash-lite | $0.10 | $0.40 | 1M | Cost-floor utility tier — smooth-fast fallback (Google) |
Gemini 2.0 Flash Legacy low-latency tier | gemini-2.0-flash | $0.10 | $0.40 | 1M | Legacy low-latency tier |
Gemini 2.0 Flash-Lite Ultra-cheap 2.x utility-tier sibling | gemini-2.0-flash-lite | — | — | 1M | Ultra-cheap 2.x utility-tier sibling |
Gemini 3 Flash (preview) Next-gen Flash — 3/3 PASS on CS escalation E2E | gemini-3-flash-preview | — | — | 1M | Next-gen Flash — 3/3 PASS on CS escalation E2E |
Gemini 3.1 Flash-Lite GA Flash-Lite — 2.1s TTFT latency champion, voice-pipeline candidate (promoted from preview 2026-05-29) | gemini-3.1-flash-lite | — | — | 1M | GA Flash-Lite — 2.1s TTFT latency champion, voice-pipeline candidate (promoted from preview 2026-05-29) |
Gemini 3.1 Flash-Lite (preview) Preview alias retained for pinned callers — use stable above for new code | gemini-3.1-flash-lite-preview | — | — | 1M | Preview alias retained for pinned callers — use stable above for new code |
Gemini 3 Pro (preview) Next-gen Pro — preview pricing TBD | gemini-3-pro-preview | — | — | 1M | Next-gen Pro — preview pricing TBD |
Gemini 3.1 Pro (preview) Next-gen Pro refresh — preview pricing TBD | gemini-3.1-pro-preview | — | — | 1M | Next-gen Pro refresh — preview pricing TBD |
gemini-flash-latest (alias) Always-current stable Flash | gemini-flash-latest | — | — | 1M | Always-current stable Flash |
gemini-flash-lite-latest (alias) Always-current stable Flash-Lite | gemini-flash-lite-latest | — | — | 1M | Always-current stable Flash-Lite |
gemini-pro-latest (alias) Always-current stable Pro | gemini-pro-latest | — | — | 1M | Always-current stable Pro |
Gemini Embedding 002 First natively multimodal embedding — text + images + video + audio, Matryoshka 3072/1536/768 dims (GA 2026-04-23) | gemini-embedding-002 | — | — | 8K | First natively multimodal embedding — text + images + video + audio, Matryoshka 3072/1536/768 dims (GA 2026-04-23) |
Gemini Embedding 001 3072-dim text embeddings — kept for index compatibility | gemini-embedding-001 | $0.15 | — | 8K | 3072-dim text embeddings — kept for index compatibility |
Groq
Open-weight models on sub-second LPU inference
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Llama 3.3 70B Versatile Sub-second 70B for voice + balanced tasks | groq-llama-3.3-70b | $0.59 | $0.79 | 128K | Sub-second 70B for voice + balanced tasks |
Llama 3.1 8B InstantSmooth default Sub-300ms first-token — smooth-fast + smooth-judge primary, voice fast-router | groq-llama-3.1-8b | $0.05 | $0.08 | 128K | Sub-300ms first-token — smooth-fast + smooth-judge primary, voice fast-router |
Llama 4 Scout 17B 10M-context cheap retrieval | groq-llama-4-scout | $0.11 | $0.34 | 10M | 10M-context cheap retrieval |
Llama 4 Maverick 17B Frontier-class open-weight at speed | groq-llama-4-maverick | $0.50 | $0.77 | 1M | Frontier-class open-weight at speed |
Kimi K2 (Groq host) Kimi K2 with Groq latency profile | groq-kimi-k2 | $1.00 | $3.00 | 256K | Kimi K2 with Groq latency profile |
GPT-OSS 120B Open-weight GPT — single-turn only | groq-gpt-oss-120b | $0.15 | $0.60 | 128K | Open-weight GPT — single-turn only |
GPT-OSS 20B Cheap open-weight GPT — single-turn only | groq-gpt-oss-20b | $0.10 | $0.30 | 128K | Cheap open-weight GPT — single-turn only |
GPT-OSS Safeguard 20B Safety-tuned open-weight GPT | groq-gpt-oss-safeguard-20b | $0.10 | $0.30 | 128K | Safety-tuned open-weight GPT |
Groq Compound (agentic) Multi-tool agentic system — web search + code exec built in | groq-compound | — | — | 128K | Multi-tool agentic system — web search + code exec built in |
Groq Compound Mini (agentic) Single-tool agentic — ~3× lower latency than full Compound | groq-compound-mini | — | — | 128K | Single-tool agentic — ~3× lower latency than full Compound |
DeepSeek
Frontier reasoning at rock-bottom per-token cost
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
DeepSeek V4-FlashSmooth default 1M context, dual Thinking/Non-Thinking modes | deepseek-v4-flash | $0.14 | $0.28 | 1M | 1M context, dual Thinking/Non-Thinking modes |
DeepSeek V4-Pro Pro-tier V4 — 75% intro discount through 2026-05-31 | deepseek-v4-pro | $0.43 | $0.87 | 1M | Pro-tier V4 — 75% intro discount through 2026-05-31 |
deepseek-chat (legacy alias → V4-Flash) Legacy alias — routes to V4-Flash; retiring 2026-07-24 | deepseek-chat | $0.14 | $0.28 | 1M | Legacy alias — routes to V4-Flash; retiring 2026-07-24 |
deepseek-reasoner (legacy alias → V4-Pro) Legacy alias — routes to V4-Pro; retiring 2026-07-24 | deepseek-reasoner | $0.43 | $0.87 | 1M | Legacy alias — routes to V4-Pro; retiring 2026-07-24 |
DeepSeek V3.2 (aggregator) Aggregator-routed V3.2 — emergency failover only | deepseek-v3.2 | $0.27 | $1.10 | 128K | Aggregator-routed V3.2 — emergency failover only |
DeepSeek R1 (aggregator) Aggregator-routed R1 reasoner — emergency failover only | deepseek-r1 | $0.55 | $2.19 | 64K | Aggregator-routed R1 reasoner — emergency failover only |
Moonshot
Kimi family — purpose-trained for agentic loops
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Kimi K2.6 Current Kimi flagship — ties GPT-5.5 on SWE-Bench Pro, GA 2026-04-20 | kimi-k2.6-direct | $0.95 | $4.00 | 262K | Current Kimi flagship — ties GPT-5.5 on SWE-Bench Pro, GA 2026-04-20 |
Kimi K2-ThinkingSmooth fallback Deepest reasoner in the Kimi line | kimi-k2-thinking-direct | $0.60 | $2.50 | 256K | Deepest reasoner in the Kimi line |
Kimi K2.5 Prior general-purpose Kimi | kimi-k2.5-direct | $0.60 | $2.50 | 256K | Prior general-purpose Kimi |
Kimi K2-Thinking (aggregator) Aggregator-routed K2-Thinking — emergency failover only | kimi-k2-thinking | $0.60 | $2.50 | 256K | Aggregator-routed K2-Thinking — emergency failover only |
Kimi K2.5 (aggregator) Aggregator-routed K2.5 — emergency failover only | kimi-k2.5 | $0.60 | $2.50 | 256K | Aggregator-routed K2.5 — emergency failover only |
Alibaba DashScope
Qwen family — 1M context at aggressive pricing
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Qwen 3.7 Max Current Qwen flagship — agent-first, 200 tok/s, SWE-Pro/Terminal-Bench tier winner (GA 2026-05-20) | qwen-3.7-max-direct | $2.50 | $7.50 | 1M | Current Qwen flagship — agent-first, 200 tok/s, SWE-Pro/Terminal-Bench tier winner (GA 2026-05-20) |
Qwen 3.6 Plus Generalist Qwen flagship — GA 2026-04-02, 1M context | qwen-3.6-plus-direct | $0.33 | $1.95 | 1M | Generalist Qwen flagship — GA 2026-04-02, 1M context |
Qwen3-Coder-FlashSmooth default Bench-winning coder — 16/16 aider-polyglot PASS | qwen3-coder-flash-direct | $0.30 | $1.50 | 1M | Bench-winning coder — 16/16 aider-polyglot PASS |
Qwen3-Coder-PlusSmooth fallback PR-review tuned for large diffs | qwen3-coder-plus-direct | $1.00 | $5.00 | 1M | PR-review tuned for large diffs |
Qwen3-235B-Thinking-2507Smooth fallback Cheapest thinking-mode reasoning | qwen3-235b-a22b-thinking-2507 | $0.13 | $0.60 | 262K | Cheapest thinking-mode reasoning |
Qwen3-Coder-Plus (aggregator) Aggregator-routed Coder-Plus — emergency failover only | qwen3-coder-plus | $1.00 | $5.00 | 1M | Aggregator-routed Coder-Plus — emergency failover only |
Qwen3-Coder-Flash (aggregator) Aggregator-routed Coder-Flash — emergency failover only | qwen3-coder-flash | $0.30 | $1.50 | 1M | Aggregator-routed Coder-Flash — emergency failover only |
Z.ai (Zhipu)
GLM family — SOTA on SWE-bench Pro
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
GLM 5.1Smooth fallback 58.4% SWE-Pro — coder-forward (GA 2026-04-07) | glm-5.1-direct | $0.60 | $2.20 | 200K | 58.4% SWE-Pro — coder-forward (GA 2026-04-07) |
GLM 5 Faster GLM (78 tok/s vs 5.1's 54) — GA 2026-02-11 | glm-5-direct | $0.60 | $1.92 | 200K | Faster GLM (78 tok/s vs 5.1's 54) — GA 2026-02-11 |
GLM 5.1 (aggregator) Aggregator-routed GLM 5.1 — emergency failover only | glm-5.1 | $0.60 | $2.20 | 128K | Aggregator-routed GLM 5.1 — emergency failover only |
MiniMax
Cheapest frontier-class coder + reviewer
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
MiniMax M2.7 Current MiniMax flagship — same price as M2 | minimax-m2.7-direct | $0.30 | $1.20 | 200K | Current MiniMax flagship — same price as M2 |
MiniMax M2.7 Highspeed ~100 tps throughput-optimized variant | minimax-m2.7-highspeed-direct | $0.60 | $2.40 | 200K | ~100 tps throughput-optimized variant |
MiniMax M2Smooth default Frontier-class reviewer at $0.30 input | minimax-m2-direct | $0.30 | $1.20 | 200K | Frontier-class reviewer at $0.30 input |
MiniMax M2.7 (aggregator) Aggregator-routed M2.7 — emergency failover only | minimax-m2.7 | $0.30 | $1.20 | 200K | Aggregator-routed M2.7 — emergency failover only |
MiniMax M2.5 (aggregator) Aggregator-routed M2.5 — emergency failover only | minimax-m2.5 | $0.30 | $1.20 | 200K | Aggregator-routed M2.5 — emergency failover only |
ElevenLabs
TTS for the voice pipeline
| Model | ID | Input / MTok | Output / MTok | Context | Strength |
|---|---|---|---|---|---|
Eleven Multilingual v2 High-fidelity multilingual TTS | elevenlabs-tts | — | — | TTS | High-fidelity multilingual TTS |
Pricing is indicative — vendors change rates without notice. Routing + fallbacks auto-adjust against each provider's live pricing via LiteLLM. See API reference for the authoritative model list your key can hit right now.
Built for compliance
LLM data lives in an isolated database schema, invisible to the public REST API. RLS policies enforce that org members only see their org's keys + spend. No data leakage paths to design around.
Schema-isolated by design
Spend logs and key metadata live in a separate Postgres schema not exposed to PostgREST. The anon API key cannot reach LLM data — at all.
Org-scoped access
Row-level security: every read of a key or spend log enforces "your user is a member of this org" at the database layer. Defense in depth.
Ready to put a number on your AI spend?
Free to provision a key and watch your dashboard. Bring your own usage — pay only for what you call.
Get started