Model pricing
AI tools usually price usage in one of two ways: API pricing, where you pay per token, and app subscriptions, where you pay a flat monthly fee with usage limits.
API pricing
Section titled “API pricing”Price trackers like models.dev and llm-prices.com usually show these fields:
- Input cost: what you pay for non-cached input tokens sent to the model.
- Output cost: what you pay for tokens generated by the model.
- Cache write cost: what you pay when the provider stores a prompt prefix in cache (so it can be reused later).
- Cache read cost: what you pay when later requests reuse that cached prefix.
Simple mental model:
total cost = input + output + cache write + cache readIf you’re integrating directly with an LLM API, lowering cost per request or session mostly means cutting the most expensive token categories:
- Keep prompts stable at the top (system prompt, tool definitions, long instructions) to maximize cache hits.
- Move dynamic parts (timestamps, random IDs, volatile context) lower in the prompt so they don’t invalidate the cached prefix.
- Cap output length when possible (
max_tokens/ equivalent). - Keep threads compact. Good cache hit rates help, but each turn still adds some uncached tail tokens, and cache entries can expire or be pruned over long sessions.
If you’re using a coding agent, many of these optimizations are handled internally, including prompt layout, caching, and compaction. Your main cost levers are usually model choice and keeping tasks and threads scoped.
Subscriptions
Section titled “Subscriptions”Subscriptions are different from API billing. You pay a monthly fee for usage inside a product, usually with fair-use limits or soft or hard caps. These plans do not include raw API credits for your own apps.
For most people, this is the cheapest way to get heavy day-to-day usage. The effective subscription-to-API ratio can swing a lot as vendors change limits, model mixes, and pricing.
Common subscription options:
Language choice and token cost
Section titled “Language choice and token cost”Token counts vary significantly by language. Polish, for example, requires roughly 40% more tokens than English for equivalent content — you can verify this yourself with Tiktokenizer .
| Text | Tokens |
|---|---|
| ”Jestem Adam i sprawdzam, ile tokenów potrzeba do zapisania tego tekstu. Czy język polski faktycznie wymaga ich więcej niż angielski?“ | 38 |
| ”My name is Adam, and I’m checking how many tokens are needed to write this text. Does Polish actually require more tokens than English?“ | 27 |
This is worth keeping in mind if you work in a non-English language. More tokens means higher cost and slower inference at scale.
This is also a good place to debunk a myth that spread in Polish media in early 2025. Several articles claimed that Polish is the “best language for AI”, citing a real paper: One ruler to measure them all: Benchmarking multilingual long-context language models Alveera Ahsan et al.. 2025-03-03. Marzena Karpińska from Microsoft, a co-author of the paper, addressed this directly:
No. We didn’t study that at all. We created a tool for diagnosing language models, checking how well they are able to extract information from very long texts.
A neat example of a fake news cycle born from real research, likely fueled by misunderstood patriotism.

