undefined

Step-by-step guide to building a Claude-powered Discord bot in Python using discord.py. Covers slash commands, thread-aware conversation history, streaming, and rate limiting.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

Claude Discord Bot Python

Build a Discord bot powered by Claude using discord.py and the Anthropic SDK. This guide covers slash commands, per-channel conversation memory, and streaming replies.

Minimal Claude Discord bot

pip install discord.py anthropic python-dotenv
import discord
from discord.ext import commands
import anthropic
import os
from dotenv import load_dotenv

load_dotenv()
client_ai = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

intents = discord.Intents.default()
intents.message_content = True
bot = commands.Bot(command_prefix="!", intents=intents)

@bot.event
async def on_ready():
    print(f"Logged in as {bot.user}")

@bot.command(name="ask")
async def ask_claude(ctx, *, question: str):
    """!ask <question> — ask Claude anything"""
    async with ctx.typing():
        response = client_ai.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=1024,
            messages=[{"role": "user", "content": question}],
        )
    await ctx.send(response.content[0].text)

bot.run(os.getenv("DISCORD_BOT_TOKEN"))

Slash commands with discord.py app_commands

from discord import app_commands

tree = app_commands.CommandTree(bot)

@tree.command(name="claude", description="Ask Claude a question")
async def claude_slash(interaction: discord.Interaction, question: str):
    await interaction.response.defer()           # prevent 3-second timeout
    response = client_ai.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        messages=[{"role": "user", "content": question}],
    )
    await interaction.followup.send(response.content[0].text)

@bot.event
async def on_ready():
    await tree.sync()                            # register slash commands globally
    print(f"Synced slash commands")

Thread-aware conversation history per channel

from collections import defaultdict

# In-memory per-channel conversation history (max 20 turns)
histories: dict[int, list[dict]] = defaultdict(list)
MAX_TURNS = 20

@bot.command(name="chat")
async def chat(ctx, *, message: str):
    channel_id = ctx.channel.id
    history = histories[channel_id]

    history.append({"role": "user", "content": message})
    if len(history) > MAX_TURNS * 2:
        history[:] = history[-MAX_TURNS * 2:]   # trim oldest turns

    async with ctx.typing():
        response = client_ai.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system="You are a helpful Discord assistant. Be concise and friendly.",
            messages=history,
        )

    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    await ctx.send(reply)

@bot.command(name="reset")
async def reset(ctx):
    histories[ctx.channel.id].clear()
    await ctx.send("Conversation history cleared.")

Streaming responses via message edit

@bot.command(name="stream")
async def stream_reply(ctx, *, question: str):
    """Post a placeholder → stream internally → edit with full reply."""
    placeholder = await ctx.send("_Thinking…_")
    full_reply = ""
    with client_ai.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": question}],
    ) as stream:
        for text in stream.text_stream:
            full_reply += text
    # Discord messages are capped at 2000 chars; chunk if needed
    if len(full_reply) <= 2000:
        await placeholder.edit(content=full_reply)
    else:
        await placeholder.edit(content=full_reply[:2000])
        for chunk_start in range(2000, len(full_reply), 2000):
            await ctx.send(full_reply[chunk_start:chunk_start + 2000])

Rate-limit guard (per-user cooldown)

import time

user_last_call: dict[int, float] = {}
COOLDOWN_SECONDS = 10

@bot.command(name="safe_ask")
async def safe_ask(ctx, *, question: str):
    now = time.time()
    last = user_last_call.get(ctx.author.id, 0)
    if now - last < COOLDOWN_SECONDS:
        remaining = int(COOLDOWN_SECONDS - (now - last))
        await ctx.send(f"Please wait {remaining}s before asking again.")
        return
    user_last_call[ctx.author.id] = now

    response = client_ai.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,
        messages=[{"role": "user", "content": question}],
    )
    await ctx.send(response.content[0].text)

Claude Discord bot vs alternatives

ApproachCustomizabilitySetup timeCost
Claude + discord.py (this guide)Full — any system prompt, tools~30 minClaude API usage only
OpenAI + discord.pyFull — same pattern~30 minOpenAI API usage
MidJourney-style SaaS botNone — fixed featuresMinutesMonthly subscription
No-code (Zapier/Make)Low — template-based~15 minZapier + Claude usage

Key non-obvious patterns: (1) Always defer() slash command interactions immediately — Discord's 3-second timeout is tight for API calls. (2) Keep per-channel history server-side (not per-user) so group conversations feel coherent. (3) Discord's 2000-char message cap requires chunking long Claude responses. For cost estimates before deploying, use the Claude API Cost Calculator. For the Slack equivalent of this pattern, see the Slack bot guide.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript