Skip to content

Model pricing

There are two very different pricing worlds in AI tools: API pricing (pay per token) and app subscriptions (pay a flat monthly fee with usage limits).

On price trackers like models.dev and llm-prices.com , you’ll usually see these fields:

  • Input cost: what you pay for non-cached input tokens sent to the model.
  • Output cost: what you pay for tokens generated by the model.
  • Cache write cost: what you pay when the provider stores a prompt prefix in cache (so it can be reused later).
  • Cache read cost: what you pay when later requests reuse that cached prefix.

Simple mental model:

total cost = input + output + cache write + cache read

If you’re integrating directly with an LLM API, lowering cost per request/session mostly means reducing the most expensive token categories:

  • Keep prompts stable at the top (system prompt, tool defs, long instructions) to maximize cache hits.
  • Move dynamic parts (timestamps, random IDs, volatile context) lower in the prompt so they don’t invalidate the cached prefix.
  • Cap output length when possible (max_tokens / equivalent).
  • Keep threads compact. Good cache hit rates help, but each turn still adds some uncached tail tokens, and cache entries can expire/prune over long sessions.

If you’re using a coding agent, many of these optimizations are handled internally (prompt layout, caching, compaction). Your main cost levers are usually model choice and keeping tasks/threads scoped.

Subscriptions are different from API billing. You pay a monthly fee for usage inside a product, usually with fair-use limits or soft/hard caps. These plans do not include raw API credits for your own apps.

For most people, this is the cheapest way to get heavy day-to-day usage. The effective subscription-vs-API ratio can swing a lot as vendors change limits, model mixes, and pricing.

Common subscription options:

Token counts vary significantly by language. Polish, for example, requires roughly 40% more tokens than English for equivalent content — you can verify this yourself with Tiktokenizer .

TextTokens
”Jestem Adam i sprawdzam, ile tokenów potrzeba do zapisania tego tekstu. Czy język polski faktycznie wymaga ich więcej niż angielski?“38
”My name is Adam, and I’m checking how many tokens are needed to write this text. Does Polish actually require more tokens than English?“27

This is worth keeping in mind if you work in a non-English language — more tokens means higher cost and slower inference at scale.

It’s also a good occasion to debunk a myth that spread in Polish media in early 2025. Several articles claimed that Polish is the “best language for AI”, citing a real paper: One ruler to measure them all: Benchmarking multilingual long-context language models Alveera Ahsan et al.. 2025-03-03. Marzena Karpińska from Microsoft, a co-author of the paper, addressed this directly:

No. We didn’t study that at all. We created a tool for diagnosing language models, checking how well they are able to extract information from very long texts.

A neat example of a fake news cycle born from real research — likely fueled by misunderstood patriotism.