TL;DR
- Median CPMU varies 14× by feature class. A chatbot runs $0.45 per monthly active user. An agent / tool-use feature runs $6.40. Picking the wrong feature class for a $9 starter tier kills the product before it scales.
- The cheap model wins more often than the expensive one. Claude Haiku 4.5 and Gemini 2.5 Flash deliver 80%+ of frontier-model quality at 5–10% of the cost. For most SaaS feature classes, the higher-end model is paying for capability the workload doesn't exercise.
- Prompt caching is the cheapest performance lever. Average cache hit rate of 28% on RAG features cuts CPMU by ~22%. Most teams ship without it because the integration is two days of work nobody scheduled.
AI features that look great in demo can quietly destroy the unit economics of a SaaS product. The interesting question is not whether GPT-5 or Claude 4.5 is "better". It's whether the feature can be sold at $29/mo and still leave gross margin on the table.
We pulled together a sample of 47 production AI SaaS deployments across eight feature classes — chatbots, RAG / doc Q&A, code generation, copy generation, OCR/PDF extraction, agentic tool-use, image generation, and voice / transcription. For each deployment we logged median tokens in / out per call, calls per monthly active user, prompt cache hit rate, and the actual provider invoice cost attributable to the feature.
Three original metrics anchor the analysis: the Cost-Per-Monthly-User (CPMU), the Gross Margin at Cost (GMC) ratio across pricing tiers, and the Break-Even Usage Ratio (BUR), which answers the practical question of how many users a tier needs to support before the feature stops being a money-loser.
Methodology
The deployments in this study are not a random sample. They are products we either built, advised on, or were granted telemetry access to: 38 B2B SaaS, 8 consumer AI products, and 4 internal AI tools as comparators. Median MAU across the sample is 1,840.
Token-cost figures use vendor public pricing — verified against the live OpenAI pricing page, Anthropic pricing, and Google AI pricing as of mid-April 2026. Where products used routing across multiple models, we attribute cost by call share. Cache savings are computed using the actual cache-write and cache-read rates published by each vendor.
Finding 1: The 14× cost spread between feature classes is the headline number
The most consequential decision a founder makes is which feature class to build, not which model to call. This is a different framing from the "which model wins" question covered in our companion AI prototype codebase audit — but they connect: the prototypes that survive production are usually the ones that picked a defensible feature class first, then made the model decision second. Median CPMU ranges from $0.45 (chatbot / support) to $6.40 (agent / tool-use). The range is not subtle — agentic features cost roughly 14× more per user than basic chat. That difference compounds with scale.
Two factors drive most of the spread: input token volume (RAG, OCR, voice) and output token volume (code generation, agents). A code-gen feature averages 1,200 output tokens per call at $0.05 per thousand. Sixty calls per MAU per month means $3.60 in output cost alone, before any input. A chatbot averages 240 output tokens × 18 calls × $0.015 = $0.06 per MAU on a frontier model. The difference is the shape of the workload, not the model price.
Finding 2: Frontier models are oversold for most SaaS workloads
The price-quality scatter shows the diminishing returns clearly. Quality scores above 90 cluster in a narrow band — GPT-5 (95), Claude Sonnet 4.5 (94), Gemini 2.5 Pro (92) — while prices range from $1.25 to $12.50 per million blended tokens. The frontier-tier price differences buy you 1-3 quality points on a fixed eval set. Below the frontier band, Claude Haiku 4.5 and Gemini 2.5 Flash sit at 78-84 quality for under $0.40 per million tokens — a 30× price advantage for an ~10-15% quality drop.
Whether that drop matters is workload-dependent. For well-bounded tasks (extraction, classification, support triage, routine summarisation) the cheaper tier reaches parity. For open-ended reasoning, multi-step agents, or high-stakes generation, the frontier still wins. The practical pattern that emerged in the dataset: route 70-80% of calls to the cheap tier, escalate the rest, and pay attention to the eval set rather than the marketing page.
Finding 3: Prompt caching is the cheapest CPMU lever you have
Across the RAG features in the sample, cache hit rates averaged 28%. On average, that translates to a 22% CPMU reduction — roughly $0.27 saved per MAU on a $1.20 baseline. Some teams reached 50%+ cache hits with deliberate engineering (long stable system prompts, identical tool descriptions, user-context segmentation). Most teams shipped cache-off because nobody had two days free in the sprint to wire it up.
The math is one-sided: cache write costs are nominal; cache reads are 90%+ cheaper than fresh prompt processing on the major providers. For features above $1 CPMU, prompt caching is the single highest-leverage optimisation. Lower than that, it's nice-to-have.
How we score per-MAU economics
1. Cost per Monthly User (CPMU)
CPMU = Total feature spend ÷ Monthly active users
The single most useful number for unit-economic decisions. Compute it per feature class — aggregate CPMU across an entire product hides which feature is bleeding.
2. Gross Margin at Cost (GMC)
GMC = (Tier price − CPMU) ÷ Tier price
Per-tier gross margin contribution from the AI feature alone, ignoring other infra. Useful as a fast sanity check — if a tier shows GMC below 50%, the AI feature is not carrying its weight against the rest of COGS the tier has to absorb.
3. Break-Even Usage Ratio (BUR)
BUR = Tier COGS budget ÷ Feature CPMU
The number of users at the tier price the feature will support before it eats the COGS line. BUR < 1 means even one user costs more than the tier's COGS allows. Useful when stack-ranking which feature classes can fit which pricing tiers.
Reading the chart: the dark line is the COGS budget for each SaaS pricing tier (assuming standard gross-margin targets). The bars are median feature CPMU. Where the line is below the bar, the feature does not fit the tier — those features need either a higher tier, usage-based pricing, or a quota that caps cost.
Patterns we keep seeing in token-cost data
- Free-tier AI features at scale will lose money regardless of how cheap the model gets. The model price has dropped 30× in 18 months and a free tier is still a money-loser at meaningful MAU. If a free tier is the acquisition channel, set hard usage caps.
- Output tokens are the cost line that actually hurts. Output is 4-5× the per-token price of input on every major provider. A feature that shortens its average output by 30% saves more cost than a model downgrade at the same quality target.
- Self-hosted Llama is rarely cheaper than the hosted alternatives until you cross ~50M tokens per day. Below that, the GPU and ops cost exceeds the API cost. Above that, the math flips fast — and an extraction-heavy product can hit it sooner than expected.
- Latency cost is real. Faster models produced 8-12% higher conversion and engagement metrics in A/B tests across the sample. Cheaper models that are also faster (Haiku, Flash) win on two axes simultaneously.
- The biggest CPMU reduction in the dataset was achieved by changing UI, not models. A doc-Q&A product that asked "is this what you meant?" before the full LLM call cut median CPMU by 41% on the same model and workload. The cheapest token is the one you don't send.
Recommendations
For founders pricing an AI SaaS
Compute CPMU before you set tier prices, not after. The exercise is half a day of work and prevents the most expensive mistake in this category — launching a $9 starter with a $4 CPMU agent feature inside it. Use the BUR table above as a sanity check.
Building this kind of unit-economic discipline in is exactly what our AI SaaS product development engagement is built around — multi-tenant architecture, billing infrastructure, model routing, prompt caching, and the dashboards founders actually need to watch CPMU drift.
For founders adding an AI feature to existing SaaS
Treat the model layer as an internal API. Wrap it once, route across providers, log token-level cost per call, implement prompt caching from day one. The team that does this in week one saves three weeks of refactor work in month six. Our API & integration practice has shipped this pattern across most of our recent AI engagements.
Limitations
The 47-product sample skews B2B SaaS. Consumer AI products with very high call-per-user counts (chat companions, creative tools) have a different cost shape and don't fit cleanly on the tier chart above.
Vendor pricing changes every few months. The relative rank of cheap-vs-expensive models has held for the last 18 months, but absolute numbers should be re-checked at the time of any decision.
The dataset, summarised
| Feature | Median CPMU | p90 CPMU | In tokens | Out tokens | Calls/mo | Cache hit |
|---|---|---|---|---|---|---|
| Chatbot / support | $0.45 | $1.80 | 1,200 | 240 | 18 | 32% |
| Doc Q&A (RAG) | $1.20 | $4.50 | 6,000 | 400 | 12 | 28% |
| Code generation | $3.80 | $14.00 | 4,000 | 1,200 | 60 | 18% |
| Email / copy generation | $0.65 | $2.40 | 800 | 600 | 22 | 12% |
| Data extraction (OCR/PDF) | $0.85 | $3.10 | 3,500 | 300 | 10 | 8% |
| Agent / tool-use | $6.40 | $28.00 | 8,000 | 1,800 | 35 | 22% |
| Image generation | $2.20 | $8.50 | 400 | 0 | 14 | 0% |
| Voice / transcription | $1.10 | $4.20 | 9,000 | 600 | 8 | 4% |
The decision that moves CPMU more than the model choice
The number that matters is CPMU by feature class, not the model price card. Two products that pay GPT-5 the same per-token rate will have wildly different unit economics depending on which feature they built and how the workload shapes up. Pricing your SaaS without knowing CPMU is pricing a restaurant menu without knowing food cost.
If you want a CPMU model run against your own product telemetry, send us a sample — we'll fit it to the framework above and send back a worked spreadsheet.
■ Related research
Related research
The cost and architecture studies that pair with token economics for AI SaaS planning:
■ Related services
Build the AI feature with the unit economics in mind
The two engagements where this lens is part of the work, plus the calculator that quotes a budget against your scope:

About the author
Ritesh — Founding Partner, Appycodes
LinkedInRitesh leads engineering at Appycodes. Most of the cost data behind this study is drawn from AI SaaS engagements he advised on directly — including a doc-Q&A product that cut CPMU 41% by changing its UI before changing its model, and a code-gen feature that crossed over from frontier-tier to Haiku tier mid-quarter without a measurable drop in user satisfaction.
