Claude Prompt Caching Python Example (2026)

Claude Prompt Caching in Python

How to use Claude's prompt caching feature in Python. Cache large system prompts or documents and save up to 90% on repeated input tokens.

Prompt caching dramatically reduces costs for workloads that reuse large prompts — system prompts, documents, or conversation history.

How caching works

Token type	Sonnet 4.6 cost per 1M	vs standard input
Standard input	$3.00	baseline
cache_write	$3.75	+25% (one-time write cost)
cache_read	$0.30	−90% (every subsequent call)

Token type

Sonnet 4.6 cost per 1M

vs standard input

Standard input

$3.00

baseline

cache_write

$3.75

+25% (one-time write cost)

cache_read

$0.30

−90% (every subsequent call)

Cache a large system prompt

import anthropic client = anthropic.Anthropic() # Large document or system prompt — must be ≥2,048 tokens for Sonnet/Opus LARGE_CONTEXT = """ [Your large document, code base, or lengthy system prompt goes here. The content must be at least 2,048 tokens to qualify for caching. Typical use cases: legal documents, codebases, long instructions, RAG context.] """ * 20 # Repeat to hit the token threshold for this demo def ask(question: str) -> str: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=512, system=[ { "type": "text", "text": LARGE_CONTEXT, "cache_control": {"type": "ephemeral"} # mark for caching } ], messages=[{"role": "user", "content": question}] ) usage = response.usage print(f"cache_read={getattr(usage, 'cache_read_input_tokens', 0)} " f"cache_write={getattr(usage, 'cache_creation_input_tokens', 0)}") return response.content[0].text # First call: cache_write (1.25× cost) print(ask("What are the main topics covered in this document?")) # Second call: cache_read (0.1× cost — 90% off) print(ask("Summarize section 3 of the document."))

Cache conversation history

messages = [ {"role": "user", "content": "Tell me about the Anthropic API."}, {"role": "assistant", "content": "The Anthropic API provides access to Claude..."}, # Mark the last turn for caching before adding new turns ] # Add cache_control to the last message you want cached messages[-1]["content"] = [ { "type": "text", "text": messages[-1]["content"], "cache_control": {"type": "ephemeral"} } ] # New user turn (not cached yet) messages.append({"role": "user", "content": "What are the pricing tiers?"}) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=512, messages=messages )

Frequently asked questions

What is the minimum cache block size?

At least 1,024 tokens for Haiku 4.5, and at least 2,048 tokens for Sonnet and Opus models. Blocks smaller than this threshold are not cached and are billed at standard input rates.

How long does the cache last?

Cache entries expire after 5 minutes of inactivity (TTL resets on each cache hit). For long-running sessions, make sure to send a cache-read request within 5 minutes of the previous one.

Is cache_write more expensive than normal input?

Yes — cache_write is billed at 25% above the standard input token rate (1.25× per-token cost). But the first cache_write pays for itself within 2 subsequent cache_read calls, which are billed at 10% of the standard input rate.

Free tools

More examples