Instant token cost estimates for every major AI model. Free, no login required.
Click any scenario to auto-fill the calculator
Side-by-side cost for your exact usage — sorted by monthly cost
| Provider | Model | Monthly Cost | Per Request | Input $/1M | Output $/1M | Context |
|---|
Calculate your breakeven point and gross margin on AI costs
Enter your revenue per user to calculate breakeven.
Large language models process text as tokens — small chunks of text that are typically about 0.75 words each. When you send a prompt to an AI API, your input is broken into tokens, processed by the model, and the response is generated as output tokens. You are charged separately for both input and output tokens.
Output tokens are more expensive than input tokens because generating new text requires significantly more computation than reading existing text. Most providers charge 3 to 5 times more per output token than per input token. This is why optimising your prompts to be concise and setting appropriate max_tokens limits can dramatically reduce your monthly bill.
To estimate your monthly cost, multiply your average input tokens per request by the input price per token, add your average output tokens per request multiplied by the output price per token, then multiply by your total monthly API calls. The calculator above does this maths instantly for every model, so you can compare costs before writing a single line of code.
Roughly 1,333 tokens. The general rule is 1 token ≈ 0.75 words, so divide your word count by 0.75 to get an approximate token count. This varies slightly by language and content type — code tends to use more tokens per word than plain English.
Generating tokens requires significantly more compute than reading them. When a model processes your input, it reads tokens in parallel. When generating output, it produces tokens one at a time, each requiring a full forward pass through the neural network. This is why most models charge 3 to 5 times more per output token.
Use a smaller model for simple tasks — gpt-4o-mini and claude-haiku-3.5 cost 95%+ less than their flagship versions and handle most tasks well. Cache common system prompts, batch requests where possible, and set appropriate max_tokens limits to avoid paying for unnecessarily long responses.
Prices are correct as of the build date but AI providers update pricing regularly — sometimes reducing prices significantly. Always verify at the provider's official pricing page before making financial decisions. We update this calculator periodically to reflect the latest changes.
The context window is the maximum number of tokens a model can process in a single request (input + output combined). You are only charged for the tokens you actually send and receive — not the full context window. Think of it like a bucket: the context window is the size of the bucket, and your tokens are the water you put in.