Claude API Streaming in Python

How to stream Claude API responses in Python using the Anthropic SDK. Print tokens as they arrive instead of waiting for the full response.

Streaming lets you display Claude's response token-by-token, improving perceived latency for long outputs like code generation or document drafts.

Basic streaming with stream_text()

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a 200-word product description for a noise-cancelling headset."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # newline after stream ends

Get usage after streaming

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize the history of Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    # After stream closes, get final message with usage
    final = stream.get_final_message()
    print(f"

Tokens: {final.usage.input_tokens} in / {final.usage.output_tokens} out")

Async streaming

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()

async def main():
    async with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Write a haiku about async programming."}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)

asyncio.run(main())

With system prompt

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system="You are an expert Python developer. Write clean, commented code.",
    messages=[{"role": "user", "content": "Write a function that parses ISO 8601 dates."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Use the Claude Cost Calculator to measure token costs for streaming sessions. For Node.js streaming, see the streaming Node.js example.

Frequently asked questions

Does streaming cost more than non-streaming?

No. Streaming and non-streaming calls are billed at identical per-token rates. Streaming only affects how the response is delivered — tokens arrive incrementally instead of all at once.

What is the difference between stream() and stream_text()?

`client.messages.stream()` gives you a context manager with `on_text` callbacks and a `.get_final_message()` method. `stream_text()` is a convenience iterator that yields just the text deltas. Use `stream_text()` for simple CLI output; use `stream()` for more control.

How do I get usage data when streaming?

Call `.get_final_message()` after the stream closes — it returns the complete `Message` object including `usage.input_tokens` and `usage.output_tokens`.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScriptClaude Prompt Caching in Python

⏸ Before you go…

If the snippet helped, the full Claude Code Power Prompts pack has 29 more — paste straight into CLAUDE.md. Pay what you can.
Pay what you want · from 30p →
8-page PDF · 30 prompts · 7-day refund