Async Claude API in Python

Run multiple Claude API calls concurrently with asyncio in Python. Use AsyncAnthropic, gather parallel requests, and handle rate limits in async code.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

The AsyncAnthropic client integrates natively with asyncio — no threads or run_in_executor needed.

Basic async call

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()

async def summarize(text: str) -> str:
    response = await client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content": f"Summarize in 2 sentences: {text}"}]
    )
    return response.content[0].text

asyncio.run(summarize("...your text..."))

Parallel requests with asyncio.gather

async def batch_summarize(texts: list[str]) -> list[str]:
    tasks = [summarize(t) for t in texts]
    return await asyncio.gather(*tasks)

results = asyncio.run(batch_summarize(["article 1...", "article 2...", "article 3..."]))

Rate-limited concurrency with Semaphore

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()
semaphore = asyncio.Semaphore(10)  # max 10 concurrent requests

async def safe_call(text: str) -> str:
    async with semaphore:
        response = await client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=256,
            messages=[{"role": "user", "content": text}]
        )
        return response.content[0].text

async def process_all(items: list[str]) -> list[str]:
    return await asyncio.gather(*[safe_call(item) for item in items])

FastAPI endpoint with async streaming

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import anthropic

app = FastAPI()
client = anthropic.AsyncAnthropic()

@app.post("/chat")
async def chat(body: dict):
    async def generate():
        async with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": body["prompt"]}]
        ) as stream:
            async for text in stream.text_stream:
                yield text

    return StreamingResponse(generate(), media_type="text/plain")

For error handling in async code, see the error handling example. To measure costs of concurrent workloads, use the Claude Cost Calculator.

Frequently asked questions

When should I use AsyncAnthropic vs Anthropic?
Use `AsyncAnthropic` in async applications (FastAPI, Starlette, async scripts). Use the synchronous `Anthropic` client in Flask, Django (non-async), or synchronous scripts. Both share the same interface; just add `await` to async calls.
How many parallel requests can I make?
Your rate limit determines the ceiling. In the API's default tier, you're limited to requests-per-minute (RPM) and tokens-per-minute (TPM). Use `asyncio.Semaphore` to cap concurrency below your RPM limit.
Can I mix async Claude calls with async database queries?
Yes. Both `AsyncAnthropic` calls and async DB libraries (asyncpg, Motor, SQLAlchemy async) are awaitable and can be interleaved or run concurrently with `asyncio.gather`.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript