Claude API Multi-Turn Conversation in Python (2026)

Multi-Turn Conversations with Claude in Python

Build multi-turn chat with Claude in Python. Manage conversation history, control context length, and implement memory patterns with the Anthropic SDK.

Since the Claude API is stateless, building a chat requires storing and sending the full conversation history each turn.

Simple chat loop

import anthropic client = anthropic.Anthropic() def chat(): messages = [] print("Claude Chat — type 'quit' to exit ") while True: user_input = input("You: ").strip() if user_input.lower() == "quit": break messages.append({"role": "user", "content": user_input}) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system="You are a helpful assistant. Be concise.", messages=messages ) assistant_reply = response.content[0].text messages.append({"role": "assistant", "content": assistant_reply}) print(f" Claude: {assistant_reply} ") chat()

Track tokens and trim history

MAX_CONTEXT_TOKENS = 150_000 def trim_history(messages: list, max_tokens: int = MAX_CONTEXT_TOKENS) -> list: """Remove oldest message pairs until estimated token count is under budget.""" # Rough estimate: 1 token ≈ 4 chars while len(messages) > 2: total_chars = sum(len(str(m["content"])) for m in messages) if total_chars / 4 < max_tokens: break # Drop the oldest user + assistant pair (keep first message for context) messages = [messages[0]] + messages[3:] return messages messages = [] cumulative_tokens = 0 def chat_with_trimming(user_input: str) -> str: global messages, cumulative_tokens messages.append({"role": "user", "content": user_input}) messages = trim_history(messages) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=messages ) reply = response.content[0].text messages.append({"role": "assistant", "content": reply}) cumulative_tokens += response.usage.input_tokens + response.usage.output_tokens return reply

Inject retrieved context per turn (RAG)

def chat_with_rag(user_input: str, context_docs: list[str]) -> str: context = " ".join(context_docs) augmented_message = f"Context: {context} Question: {user_input}" messages.append({"role": "user", "content": augmented_message}) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=messages ) reply = response.content[0].text messages.append({"role": "assistant", "content": reply}) return reply

Frequently asked questions

Does Claude maintain conversation history automatically?

No. The Claude API is stateless — you must send the full message history with every request. There is no session ID or built-in memory. Your application is responsible for storing and truncating the `messages` array.

How do I prevent context window overflow?

Track cumulative token usage with `response.usage.input_tokens` and trim old messages when total exceeds ~150K tokens (leaving room for output). Common strategies: sliding window (drop oldest), summarization (replace old turns with a Claude-generated summary), or RAG (store history in a vector DB).

Should I include assistant prefill in the messages array?

Prefilling the assistant turn (passing `{role: 'assistant', content: 'prefix'}` as the last message) can steer Claude's response format. Use it for structured output or to continue a partial response. Never include the stop sequence in the prefill.

Free tools

More examples