Claude API Streaming in Python

How to stream Claude API responses in Python using the Anthropic SDK. Print tokens as they arrive instead of waiting for the full response.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

Streaming lets you display Claude's response token-by-token, improving perceived latency for long outputs like code generation or document drafts.

Basic streaming with stream_text()

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a 200-word product description for a noise-cancelling headset."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # newline after stream ends

Get usage after streaming

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize the history of Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    # After stream closes, get final message with usage
    final = stream.get_final_message()
    print(f"

Tokens: {final.usage.input_tokens} in / {final.usage.output_tokens} out")

Async streaming

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()

async def main():
    async with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Write a haiku about async programming."}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)

asyncio.run(main())

With system prompt

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system="You are an expert Python developer. Write clean, commented code.",
    messages=[{"role": "user", "content": "Write a function that parses ISO 8601 dates."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Use the Claude Cost Calculator to measure token costs for streaming sessions. For Node.js streaming, see the streaming Node.js example.

Frequently asked questions

Does streaming cost more than non-streaming?
No. Streaming and non-streaming calls are billed at identical per-token rates. Streaming only affects how the response is delivered — tokens arrive incrementally instead of all at once.
What is the difference between stream() and stream_text()?
`client.messages.stream()` gives you a context manager with `on_text` callbacks and a `.get_final_message()` method. `stream_text()` is a convenience iterator that yields just the text deltas. Use `stream_text()` for simple CLI output; use `stream()` for more control.
How do I get usage data when streaming?
Call `.get_final_message()` after the stream closes — it returns the complete `Message` object including `usage.input_tokens` and `usage.output_tokens`.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScriptClaude Prompt Caching in Python