Complete Claude API pricing guide for 2026: all model prices, prompt caching savings, batch API discount, context window costs, and how to estimate your bill. With Python examples.
Claude API pricing is straightforward once you understand the three levers: model tier, prompt caching, and the Batch API. Here's everything you need to estimate and minimize your bill.
import anthropic
client = anthropic.Anthropic()
# 2,000-token system prompt cached across all user requests
SYSTEM = "You are a senior software engineer with 15 years of experience..." + " expert knowledge " * 400
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[{
"type": "text",
"text": SYSTEM,
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": "How do I handle database migrations in production?"}]
)
usage = response.usage
print(f"Cache read tokens: {usage.cache_read_input_tokens}")
print(f"Cache write tokens: {usage.cache_creation_input_tokens}")
# With 2,000 cached input tokens at $0.30/M vs $3/M:
# Savings per call: 2,000 × (3.00 - 0.30) / 1,000,000 = $0.0054
# At 100K calls/month: $540/month saved
Cost comparison: models by task type
Task
Typical tokens
Haiku cost
Sonnet cost
Opus cost
Classify a tweet (in/out)
100 / 10
$0.000088
$0.00045
$0.00225
Summarize article (in/out)
800 / 200
$0.0015
$0.0054
$0.027
Code review 500 lines
3K / 1K
$0.0064
$0.024
$0.12
Analyze 50-page PDF
40K / 2K
$0.04
$0.15
$0.75
Analyze 150K-token codebase
150K / 5K
$0.14
$0.525
$2.625
Cost optimization checklist
Start with Haiku: Run your task on Haiku first; upgrade to Sonnet only if quality is insufficient. Haiku costs 73% less per call.
Enable prompt caching: Any system prompt longer than ~500 tokens repeated across many calls should be cached. Add cache_control: {type: "ephemeral"} to the content block.
Use the Batch API: For analytics, document processing, or any task that doesn't need real-time response — Batch API cuts costs 50% with no quality tradeoff.
Set max_tokens conservatively: You're charged for actual output tokens, not max_tokens. But a low max_tokens prevents runaway long responses on edge cases.
Log token usage: Always log response.usage in production. Unexpected token spikes are the #1 source of surprise bills.
As of 2026: claude-haiku-4-5 is $0.80/$4 per million input/output tokens. claude-sonnet-4-6 is $3/$15. claude-opus-4-7 is $15/$75. With prompt caching, cached input tokens cost 90% less. The Batch API cuts all prices 50% for async workloads. Always verify current rates on the Anthropic pricing page as prices change.
Does Anthropic have a free tier for the Claude API?
Anthropic does not advertise a permanent free tier for the production API. New accounts may get a small credit to test the API. For budget-conscious development, use claude-haiku-4-5 (cheapest model) and the Batch API (50% discount). Claude.ai (the web interface) has a free plan but it does not give API access.
What is prompt caching and how much does it save?
Prompt caching stores frequently-used context blocks (system prompts, documents, examples) server-side. Subsequent requests that hit the cache pay $0.30/$3.75 per million tokens (vs $3/$15 for Sonnet) — a 90% reduction on input and 75% on cache write. Essential for production chatbots, RAG pipelines, and any app that reuses a long system prompt.
How do I estimate my Claude API bill?
Multiply your expected input tokens by the input price and output tokens by the output price. 1 token ≈ 0.75 words. A 1,000-word user message + 500-word system prompt ≈ 2,000 input tokens; a 500-word response ≈ 667 output tokens. With claude-sonnet-4-6: 2,000 input × $0.000003 = $0.006 + 667 output × $0.000015 = $0.01 per call. Use the Claude Cost Calculator for detailed estimates.
Is there a Claude API trial or sandbox?
New Anthropic accounts receive API credits to test. The API itself has no separate sandbox — you use the production endpoint with test keys. Unlike OpenAI, Anthropic does not have a 'playground' tier separate from the API; use the Claude.ai web interface for interactive testing, then move to the API for integration.