Claude API WebSocket Streaming in Python

Build real-time Claude API streaming over WebSocket in Python (2026). Uses FastAPI WebSockets + AsyncAnthropic to push tokens to the browser as they arrive.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

The Anthropic Python SDK's streaming API returns tokens as they are generated, making it ideal for real-time chat UIs. Pairing it with FastAPI WebSockets lets you push each token to the browser the instant it arrives — no polling needed.

Installation

pip install anthropic fastapi uvicorn websockets

Backend: FastAPI WebSocket + AsyncAnthropic

import asyncio
import anthropic
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
import json

app = FastAPI()
client = anthropic.AsyncAnthropic()  # reads ANTHROPIC_API_KEY from env

@app.websocket("/ws/chat")
async def chat_ws(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            # Receive message from browser
            data = await websocket.receive_text()
            payload = json.loads(data)
            user_message = payload.get("message", "")

            # Stream Claude response token by token
            async with client.messages.stream(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=[{"role": "user", "content": user_message}]
            ) as stream:
                async for text in stream.text_stream:
                    await websocket.send_text(
                        json.dumps({"type": "delta", "text": text})
                    )

            # Signal completion
            final = await stream.get_final_message()
            await websocket.send_text(json.dumps({
                "type": "done",
                "input_tokens": final.usage.input_tokens,
                "output_tokens": final.usage.output_tokens
            }))

    except WebSocketDisconnect:
        pass  # Client disconnected — SDK cleans up the Anthropic HTTP stream

Start the server

ANTHROPIC_API_KEY=sk-ant-... uvicorn app:app --host 0.0.0.0 --port 8000

Browser client (vanilla JS)

const ws = new WebSocket("ws://localhost:8000/ws/chat");
const output = document.getElementById("output");

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  if (msg.type === "delta") {
    output.textContent += msg.text;   // append token as it arrives
  } else if (msg.type === "done") {
    console.log("Used " + msg.input_tokens + " input / " + msg.output_tokens + " output tokens");
  }
};

function sendMessage(text) {
  output.textContent = "";           // clear previous response
  ws.send(JSON.stringify({ message: text }));
}

// Example: sendMessage("Explain streaming in one paragraph")

Multi-turn conversation history

@app.websocket("/ws/chat/multi")
async def multi_turn_ws(websocket: WebSocket):
    await websocket.accept()
    history = []  # persisted per-connection

    try:
        while True:
            data = await websocket.receive_text()
            payload = json.loads(data)
            user_message = payload.get("message", "")

            # Append user turn
            history.append({"role": "user", "content": user_message})

            # Collect full assistant reply while streaming
            assistant_text = ""
            async with client.messages.stream(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=history
            ) as stream:
                async for text in stream.text_stream:
                    assistant_text += text
                    await websocket.send_text(
                        json.dumps({"type": "delta", "text": text})
                    )

            # Append assistant turn to history for next round
            history.append({"role": "assistant", "content": assistant_text})
            await websocket.send_text(json.dumps({"type": "done"}))

    except WebSocketDisconnect:
        pass

Cancellable streaming

@app.websocket("/ws/chat/cancellable")
async def cancellable_ws(websocket: WebSocket):
    await websocket.accept()
    cancel_flag = False

    async def listen_for_cancel():
        nonlocal cancel_flag
        try:
            while True:
                msg = await websocket.receive_text()
                if json.loads(msg).get("type") == "cancel":
                    cancel_flag = True
        except WebSocketDisconnect:
            cancel_flag = True

    asyncio.create_task(listen_for_cancel())

    try:
        while True:
            # Wait for a prompt (if not a cancel)
            cancel_flag = False
            # (simplified — real impl would use a queue)
            data = await websocket.receive_text()
            payload = json.loads(data)

            async with client.messages.stream(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=[{"role": "user", "content": payload.get("message", "")}]
            ) as stream:
                async for text in stream.text_stream:
                    if cancel_flag:
                        await websocket.send_text(json.dumps({"type":"done","cancelled":True}))
                        break
                    await websocket.send_text(json.dumps({"type": "delta", "text": text}))
                else:
                    await websocket.send_text(json.dumps({"type": "done", "cancelled": False}))

    except WebSocketDisconnect:
        pass

SSE vs WebSocket comparison

FeatureServer-Sent Events (SSE)WebSocket
DirectionServer → client onlyBidirectional
Cancel mid-streamClient closes connectionSend cancel message
Browser supportAll modern browsersAll modern browsers
Proxy / CDNWorks with most CDNsRequires WS-aware proxy
Best forStatic display, summariesInteractive chat, multi-turn

For per-request cost calculations using the token counts returned in the done event, use the Claude API Cost Calculator. For the pure HTTP streaming pattern without WebSockets, see the streaming guide.

Frequently asked questions

Can I stream Claude API responses over WebSocket?
Yes. Use `AsyncAnthropic` and iterate over `client.messages.stream()` — each `text_delta` event is one or a few tokens. Push each delta to the WebSocket client with `await websocket.send_text(delta)`. The browser receives tokens as fast as Claude generates them.
What is the difference between SSE and WebSocket for Claude streaming?
Server-Sent Events (SSE) is one-directional (server → browser) and simpler to set up. WebSocket is bidirectional, so the client can interrupt the stream mid-generation. For a chat UI where users can cancel, WebSocket wins. For a static display, SSE is simpler.
How do I cancel a Claude stream mid-generation over WebSocket?
Listen for a 'cancel' message from the client. In the stream loop, check a cancellation flag and `break` out of the iteration — the Anthropic SDK will close the HTTP stream cleanly. Then send a final `{type:'done',cancelled:true}` message.
How do I handle WebSocket disconnects while Claude is streaming?
Wrap the stream loop in a try/except for `websockets.exceptions.ConnectionClosed` (or FastAPI's `WebSocketDisconnect`). On disconnect, break the loop — the SDK will automatically clean up the open HTTP connection to Anthropic.
Does AsyncAnthropic work with FastAPI WebSocket endpoints?
Yes. FastAPI's `@app.websocket` handler is an async function, and `AsyncAnthropic` is fully async-compatible. Use `async with client.messages.stream(...) as stream:` and `async for text in stream.text_stream:` to push tokens.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript