Model pricing
There are two very different pricing worlds in AI tools: API pricing (pay per token) and app subscriptions (pay a flat monthly fee with usage limits).
API pricing
Section titled “API pricing”On price trackers like models.dev and llm-prices.com , you’ll usually see these fields:
- Input cost: what you pay for non-cached input tokens sent to the model.
- Output cost: what you pay for tokens generated by the model.
- Cache write cost: what you pay when the provider stores a prompt prefix in cache (so it can be reused later).
- Cache read cost: what you pay when later requests reuse that cached prefix.
Simple mental model:
total cost = input + output + cache write + cache readIf you’re integrating directly with an LLM API, lowering cost per request/session mostly means reducing the most expensive token categories:
- Keep prompts stable at the top (system prompt, tool defs, long instructions) to maximize cache hits.
- Move dynamic parts (timestamps, random IDs, volatile context) lower in the prompt so they don’t invalidate the cached prefix.
- Cap output length when possible (
max_tokens/ equivalent). - Keep threads compact. Good cache hit rates help, but each turn still adds some uncached tail tokens, and cache entries can expire/prune over long sessions.
If you’re using a coding agent, many of these optimizations are handled internally (prompt layout, caching, compaction). Your main cost levers are usually model choice and keeping tasks/threads scoped.
Subscriptions
Section titled “Subscriptions”Subscriptions are different from API billing. You pay a monthly fee for usage inside a product, usually with fair-use limits or soft/hard caps. These plans do not include raw API credits for your own apps.
For most people, this is the cheapest way to get heavy day-to-day usage. The effective subscription-vs-API ratio can swing a lot as vendors change limits, model mixes, and pricing.
Common subscription options:
Language choice and token cost
Section titled “Language choice and token cost”Token counts vary significantly by language. Polish, for example, requires roughly 40% more tokens than English for equivalent content — you can verify this yourself with Tiktokenizer .
| Text | Tokens |
|---|---|
| ”Jestem Adam i sprawdzam, ile tokenów potrzeba do zapisania tego tekstu. Czy język polski faktycznie wymaga ich więcej niż angielski?“ | 38 |
| ”My name is Adam, and I’m checking how many tokens are needed to write this text. Does Polish actually require more tokens than English?“ | 27 |
This is worth keeping in mind if you work in a non-English language — more tokens means higher cost and slower inference at scale.
It’s also a good occasion to debunk a myth that spread in Polish media in early 2025. Several articles claimed that Polish is the “best language for AI”, citing a real paper: One ruler to measure them all: Benchmarking multilingual long-context language models Alveera Ahsan et al.. 2025-03-03. Marzena Karpińska from Microsoft, a co-author of the paper, addressed this directly:
No. We didn’t study that at all. We created a tool for diagnosing language models, checking how well they are able to extract information from very long texts.
A neat example of a fake news cycle born from real research — likely fueled by misunderstood patriotism.