Claude API Flask Example

Build a Claude chatbot backend with Flask in Python. Includes streaming with SSE using stream_with_context, conversation history endpoint, and CORS setup.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

Flask is the most widely deployed Python web framework. This guide shows how to build a Claude chatbot backend with Flask — from minimal synchronous calls to full server-sent event (SSE) streaming.

Installation

pip install flask anthropic flask-cors

Minimal synchronous endpoint

from flask import Flask, request, jsonify
import anthropic

app = Flask(__name__)
client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

@app.route("/chat", methods=["POST"])
def chat():
    user_message = request.json.get("message", "")
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_message}]
    )
    return jsonify({"reply": response.content[0].text})

if __name__ == "__main__":
    app.run(debug=True)

Test it:

curl -X POST http://localhost:5000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is prompt caching?"}'

Streaming with SSE (server-sent events)

from flask import Flask, request, Response, stream_with_context
import anthropic, json

app = Flask(__name__)
client = anthropic.Anthropic()

@app.route("/chat/stream", methods=["POST"])
def chat_stream():
    user_message = request.json.get("message", "")

    def generate():
        with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": user_message}]
        ) as stream:
            for text in stream.text_stream:
                yield f"data: {json.dumps({'text': text})}\n\n"
        yield "data: [DONE]\n\n"

    return Response(
        stream_with_context(generate()),
        mimetype="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no"  # disable nginx buffering
        }
    )

Consume the SSE stream in JavaScript

const source = new EventSource('/chat/stream');  // GET version
// Or for POST with fetch:
const resp = await fetch('/chat/stream', {
  method: 'POST',
  headers: {'Content-Type': 'application/json'},
  body: JSON.stringify({message: 'Hello Claude'})
});
const reader = resp.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const {done, value} = await reader.read();
  if (done) break;
  const lines = decoder.decode(value).split('\n');
  for (const line of lines) {
    if (line.startsWith('data: ') && line !== 'data: [DONE]') {
      const {text} = JSON.parse(line.slice(6));
      document.getElementById('output').textContent += text;
    }
  }
}

Conversation history endpoint

from flask import Flask, request, jsonify, session
import anthropic

app = Flask(__name__)
app.secret_key = "change-me-in-production"
client = anthropic.Anthropic()

@app.route("/chat/history", methods=["POST"])
def chat_with_history():
    if "history" not in session:
        session["history"] = []

    user_message = request.json.get("message", "")
    session["history"].append({"role": "user", "content": user_message})

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=session["history"]
    )

    assistant_reply = response.content[0].text
    session["history"].append({"role": "assistant", "content": assistant_reply})
    session.modified = True  # required: Flask won't detect nested mutations

    return jsonify({"reply": assistant_reply, "turns": len(session["history"]) // 2})

@app.route("/chat/reset", methods=["POST"])
def reset():
    session.pop("history", None)
    return jsonify({"ok": True})

CORS setup for a separate frontend

from flask_cors import CORS

app = Flask(__name__)
CORS(app, origins=["http://localhost:3000", "https://yourapp.com"])

Flask vs FastAPI for Claude streaming

CriterionFlaskFastAPI
Streamingstream_with_context + generatorStreamingResponse built-in
Async supportRequires flask[async] / geventNative async/await (ASGI)
Request validationManual or marshmallowPydantic (automatic)
Ecosystem fitFlask-Login, Flask-SQLAlchemy, Flask-AdminPydantic models, SQLModel
DeployGunicorn (WSGI)Uvicorn (ASGI)

Estimate token costs for your expected traffic at the Claude API Cost Calculator. For async patterns see the FastAPI streaming guide and async Python patterns.

Frequently asked questions

Can Flask handle Claude streaming responses?
Yes. Use Flask's `stream_with_context` decorator with a generator function that yields SSE-formatted strings. The generator calls `client.messages.stream()` and yields each text delta. Run Flask with `threaded=True` (the default) so concurrent streaming requests don't block each other.
Flask vs FastAPI for Claude chatbot backends — which should I choose?
FastAPI is preferred for new projects: native async/await, automatic request validation via Pydantic, and built-in streaming via `StreamingResponse`. Choose Flask if your existing stack is WSGI-based, you need Flask-Login/Flask-SQLAlchemy integrations, or your team knows Flask deeply. Both work well for Claude streaming.
How do I add CORS to a Flask Claude chatbot?
Install flask-cors (`pip install flask-cors`) and call `CORS(app)` after creating your Flask app. For production, restrict origins: `CORS(app, origins=['https://yourfrontend.com'])` to prevent cross-origin abuse.
How do I keep conversation history in a Flask Claude backend?
Maintain a list of `{role, content}` dicts per session. The simplest approach is Flask-Session with a server-side store (Redis or filesystem). Pass the full history list as the `messages` parameter on each call. Claude's 200K context window holds roughly 150,000 words of history before you need to summarize.
What's the minimal Flask setup to call the Claude API?
Install `flask` and `anthropic`. Create a POST route, call `client.messages.create(...)`, and return `response.content[0].text`. That's it — no async, no streaming, 8 lines of Python. Add streaming later with `stream_with_context` when response latency matters.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript