Build a Chatbot with Claude in Python

Step-by-step guide to building a Python chatbot with the Claude API. Terminal chatbot, Flask web chatbot, streaming, multi-persona, and context window management — all with working code.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

Building a chatbot with the Claude API requires two things: maintaining conversation history client-side, and choosing the right context-management strategy. This guide walks through four practical patterns from terminal prototype to production Flask web chatbot.

1. Minimal terminal chatbot

import anthropic

client = anthropic.Anthropic()
history = []

SYSTEM = "You are a helpful assistant."

def chat(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=SYSTEM,
        messages=history,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

if __name__ == "__main__":
    print("Chatbot ready. Type 'exit' to quit.")
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "exit":
            break
        print(f"Claude: {chat(user_input)}")

2. Sliding-window context management

Claude Sonnet 4.6 has a 200K token window, but long histories increase latency and cost. Keep the last N turns:

MAX_TURNS = 20  # keep last 20 user/assistant pairs (40 messages)

def trim_history(history: list) -> list:
    """Keep only the last MAX_TURNS complete turns."""
    # Each turn = 2 messages (user + assistant)
    max_messages = MAX_TURNS * 2
    if len(history) > max_messages:
        return history[-max_messages:]
    return history

def chat(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})
    trimmed = trim_history(history)
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=SYSTEM,
        messages=trimmed,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

3. Streaming chatbot responses

import anthropic

client = anthropic.Anthropic()

def chat_stream(history: list, user_message: str, system: str = "You are a helpful assistant."):
    history.append({"role": "user", "content": user_message})
    full_reply = ""
    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system,
        messages=history,
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
            full_reply += text
    print()  # newline after streamed output
    history.append({"role": "assistant", "content": full_reply})
    return full_reply

4. Flask web chatbot with SSE streaming

from flask import Flask, request, Response, stream_with_context
import anthropic, json

app = Flask(__name__)
client = anthropic.Anthropic()

@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    history = data.get("history", [])
    user_message = data["message"]
    system = data.get("system", "You are a helpful assistant.")
    history.append({"role": "user", "content": user_message})

    def generate():
        with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=system,
            messages=history,
        ) as stream:
            for text in stream.text_stream:
                yield f"data: {json.dumps({'delta': text})}

"
        yield "data: [DONE]

"

    return Response(
        stream_with_context(generate()),
        mimetype="text/event-stream",
        headers={"X-Accel-Buffering": "no"},  # disable Nginx buffering
    )

if __name__ == "__main__":
    app.run(debug=True)

5. Multi-persona chatbot

PERSONAS = {
    "support": "You are a friendly customer support agent for Acme Corp. Be concise and solution-focused. Escalate if you cannot resolve the issue.",
    "coding":  "You are an expert Python developer. Provide working code with explanations. Prefer standard library solutions.",
    "tutor":   "You are a patient math tutor. Explain concepts step-by-step using simple language. Use examples.",
}

sessions = {}  # session_id → {"history": [...], "persona": str}

def get_reply(session_id: str, user_message: str, persona: str = "support") -> str:
    if session_id not in sessions:
        sessions[session_id] = {"history": [], "persona": persona}
    session = sessions[session_id]
    session["history"].append({"role": "user", "content": user_message})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=PERSONAS[session["persona"]],
        messages=session["history"][-40:],  # last 20 turns
    )
    reply = response.content[0].text
    session["history"].append({"role": "assistant", "content": reply})
    return reply

Model comparison for chatbots

ModelCost (input/output per 1M tokens)Best for
claude-haiku-4-5$0.80 / $4FAQ bots, simple customer service, high-volume chat
claude-sonnet-4-6$3 / $15General-purpose chatbots, coding assistants, support escalation
claude-opus-4-7$15 / $75Complex reasoning, legal/medical Q&A, multi-step agent tasks

To estimate chatbot costs before launch, use the Claude API Cost Calculator. For multi-turn conversation patterns and token counting, see the conversation history guide.

Frequently asked questions

How do I maintain conversation history with Claude API?
Build a list of message dicts and append each user/assistant turn before every API call. Pass the full list as `messages=`. Claude has no server-side session memory — all context must be sent in each request.
What happens when the conversation history gets too long?
Claude Sonnet 4.6 supports a 200K token context window (~150K words). For typical chatbots, implement a sliding-window: keep only the last N turns. A common default is 20 turns (~40 messages). For production, track `usage.input_tokens` per response and truncate when approaching 150K.
Can I give my Claude chatbot a persistent persona?
Yes — put the persona in the `system` parameter. It persists for the entire conversation without consuming message slots. Change it per-session to support multi-persona chatbots (e.g., a customer service bot vs. a coding assistant).
How do I stream chatbot responses in Python?
Use `client.messages.stream()` as a context manager. It yields text deltas via `stream.text_stream`. For web apps, forward these as Server-Sent Events (SSE) using `text/event-stream` content type.
Is there a cost-effective model for high-volume chatbots?
Claude Haiku 4.5 costs ~$0.80/$4 per million input/output tokens — roughly 10× cheaper than Sonnet. For FAQ bots and customer-service chatbots where responses don't require deep reasoning, Haiku handles most queries well. Use Sonnet for escalated/complex turns.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript