Build a Chatbot with Claude API in Python (2026 Guide)

Step-by-step guide to building a Python chatbot with the Claude API. Terminal chatbot, Flask web chatbot, streaming, multi-persona, and context window management — all with working code.

Building a chatbot with the Claude API requires two things: maintaining conversation history client-side, and choosing the right context-management strategy. This guide walks through four practical patterns from terminal prototype to production Flask web chatbot.

1. Minimal terminal chatbot

2. Sliding-window context management

Claude Sonnet 4.6 has a 200K token window, but long histories increase latency and cost. Keep the last N turns:

3. Streaming chatbot responses

4. Flask web chatbot with SSE streaming

5. Multi-persona chatbot

Model comparison for chatbots

Frequently asked questions

Model	Cost (input/output per 1M tokens)	Best for
claude-haiku-4-5	$0.80 / $4	FAQ bots, simple customer service, high-volume chat
claude-sonnet-4-6	$3 / $15	General-purpose chatbots, coding assistants, support escalation
claude-opus-4-7	$15 / $75	Complex reasoning, legal/medical Q&A, multi-step agent tasks

How do I maintain conversation history with Claude API?

Build a list of message dicts and append each user/assistant turn before every API call. Pass the full list as `messages=`. Claude has no server-side session memory — all context must be sent in each request.

What happens when the conversation history gets too long?

Claude Sonnet 4.6 supports a 200K token context window (~150K words). For typical chatbots, implement a sliding-window: keep only the last N turns. A common default is 20 turns (~40 messages). For production, track `usage.input_tokens` per response and truncate when approaching 150K.

Can I give my Claude chatbot a persistent persona?

Yes — put the persona in the `system` parameter. It persists for the entire conversation without consuming message slots. Change it per-session to support multi-persona chatbots (e.g., a customer service bot vs. a coding assistant).

How do I stream chatbot responses in Python?

Use `client.messages.stream()` as a context manager. It yields text deltas via `stream.text_stream`. For web apps, forward these as Server-Sent Events (SSE) using `text/event-stream` content type.

Is there a cost-effective model for high-volume chatbots?

Claude Haiku 4.5 costs ~$0.80/$4 per million input/output tokens — roughly 10× cheaper than Sonnet. For FAQ bots and customer-service chatbots where responses don't require deep reasoning, Haiku handles most queries well. Use Sonnet for escalated/complex turns.

Build a Chatbot with Claude in Python