LLM Pricing Calculator in Python (Claude, OpenAI, Gemini)

Build an LLM Pricing Calculator in Python

Python code to calculate and compare API costs across Claude, GPT-4o, and Gemini for any prompt or workload. Includes a free interactive tool.

Picking the cheapest LLM API for your workload requires comparing token prices AND factoring in caching. This Python module handles the math for all major providers.

Provider price table (2026)

Model	Input $/M	Output $/M	Cached read $/M
Claude Haiku 4.5	$0.80	$4.00	$0.08
Claude Sonnet 4.6	$3.00	$15.00	$0.30
Claude Opus 4.7	$15.00	$75.00	$1.50
GPT-4o	$2.50	$10.00	$1.25
GPT-4o-mini	$0.15	$0.60	$0.075
Gemini 2.0 Flash	$0.10	$0.40	—
Gemini 1.5 Pro	$1.25	$5.00	—

Model

Input $/M

Output $/M

Cached read $/M

Claude Haiku 4.5

$0.80

$4.00

$0.08

Claude Sonnet 4.6

$3.00

$15.00

$0.30

Claude Opus 4.7

$15.00

$75.00

$1.50

GPT-4o

$2.50

$10.00

$1.25

GPT-4o-mini

$0.15

$0.60

$0.075

Gemini 2.0 Flash

$0.10

$0.40

—

Gemini 1.5 Pro

$1.25

$5.00

—

Universal cost calculator

from dataclasses import dataclass from typing import Optional @dataclass class ModelPricing: name: str input_per_m: float # $ per million input tokens output_per_m: float # $ per million output tokens cache_read_per_m: Optional[float] = None # None = no caching MODELS = { "claude-haiku": ModelPricing("Claude Haiku 4.5", 0.80, 4.00, 0.08), "claude-sonnet": ModelPricing("Claude Sonnet 4.6", 3.00, 15.00, 0.30), "claude-opus": ModelPricing("Claude Opus 4.7", 15.00, 75.00, 1.50), "gpt-4o": ModelPricing("GPT-4o", 2.50, 10.00, 1.25), "gpt-4o-mini": ModelPricing("GPT-4o-mini", 0.15, 0.60, 0.075), "gemini-flash": ModelPricing("Gemini 2.0 Flash", 0.10, 0.40), "gemini-pro": ModelPricing("Gemini 1.5 Pro", 1.25, 5.00), } def calculate_cost( model_key: str, input_tokens: int, output_tokens: int, cached_tokens: int = 0, ) -> dict: p = MODELS[model_key] fresh_input = input_tokens - cached_tokens cost = (fresh_input / 1_000_000) * p.input_per_m cost += (output_tokens / 1_000_000) * p.output_per_m if cached_tokens > 0 and p.cache_read_per_m: cost += (cached_tokens / 1_000_000) * p.cache_read_per_m return { "model": p.name, "total_cost_usd": round(cost, 6), "cost_per_1k_calls_usd": round(cost * 1000, 4), } # Compare all models for a typical chatbot turn # 8K system prompt (cached after first call), 200 user tokens, 500 output tokens for key in MODELS: result = calculate_cost(key, input_tokens=8200, output_tokens=500, cached_tokens=8000) print(f"{result['model']:25s} ${result['total_cost_usd']:.6f}/call ${result['cost_per_1k_calls_usd']:.3f}/1K calls")

Get real token counts from the Claude API

import anthropic client = anthropic.Anthropic() response = client.messages.create( model="claude-sonnet-4-6", max_tokens=512, system="You are a helpful assistant.", messages=[{"role": "user", "content": "What is prompt caching?"}] ) # Real token usage from the response input_tok = response.usage.input_tokens output_tok = response.usage.output_tokens cache_read = getattr(response.usage, "cache_read_input_tokens", 0) cost = calculate_cost("claude-sonnet", input_tok, output_tok, cache_read) print(f"This call cost: ${cost['total_cost_usd']:.6f}") print(f"At 10K calls/day: ${cost['total_cost_usd'] * 10000:.2f}/day")

Monthly cost projection for common workloads

def monthly_cost_projection(model_key, calls_per_day, avg_input_tokens, avg_output_tokens, cached_tokens=0): single_call = calculate_cost(model_key, avg_input_tokens, avg_output_tokens, cached_tokens) daily = single_call["total_cost_usd"] * calls_per_day monthly = daily * 30 return {"daily_usd": round(daily, 2), "monthly_usd": round(monthly, 2)} # Chatbot: 5K calls/day, 10K system prompt (cached), 200 user tokens, 500 output tokens for key, label in [("claude-haiku","Haiku"), ("gpt-4o-mini","GPT-4o-mini"), ("gemini-flash","Gemini Flash")]: p = monthly_cost_projection(key, calls_per_day=5000, avg_input_tokens=10200, avg_output_tokens=500, cached_tokens=10000) print(f"{label:15s} ${p['daily_usd']:7.2f}/day ${p['monthly_usd']:8.2f}/month")

Interactive no-code alternative

Want to paste a prompt and instantly compare costs across Claude, GPT-4o, and Gemini without writing code? Use the LLM Prompt Pricing Calculator — it counts tokens and shows the exact cost per provider in real time.

Frequently asked questions

How do I calculate LLM API cost in Python?

Multiply input tokens by the input price per million, add output tokens multiplied by output price per million. Use `response.usage.input_tokens` and `response.usage.output_tokens` from the API response for exact token counts.

Which LLM API is cheapest in 2026?

Gemini 2.0 Flash (~$0.10/$0.40 per M tokens) and GPT-4o-mini ($0.15/$0.60) have the lowest base rates. Claude Haiku ($0.80/$4) is higher per-token but supports prompt caching, making it cheaper for apps with long repeated system prompts (up to 90% savings).

Is there a free tool to compare LLM prompt costs interactively?

Yes — the LLM Prompt Pricing Calculator at prompt-pricing.vercel.app lets you paste a prompt and compare costs across Claude, GPT-4o, and Gemini instantly, with no code required.

Does Claude's prompt caching change the cost comparison?

Yes significantly. With caching, Claude Haiku's effective read cost drops to $0.08/M tokens (90% discount). For a chatbot with a 10K-token system prompt, caching can save $540/month vs GPT-4o-mini at 100K calls/month.

How do I get token counts from Claude responses?

Access `response.usage.input_tokens` and `response.usage.output_tokens` on any `Message` object returned by the Anthropic SDK. For streaming, use `message_start` event's `usage` field.

Free tools

More examples