Async Claude API in Python

Run multiple Claude API calls concurrently with asyncio in Python. Use AsyncAnthropic, gather parallel requests, and handle rate limits in async code.

The AsyncAnthropic client integrates natively with asyncio — no threads or run_in_executor needed.

Basic async call

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()

async def summarize(text: str) -> str:
    response = await client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content": f"Summarize in 2 sentences: {text}"}]
    )
    return response.content[0].text

asyncio.run(summarize("...your text..."))

Parallel requests with asyncio.gather

async def batch_summarize(texts: list[str]) -> list[str]:
    tasks = [summarize(t) for t in texts]
    return await asyncio.gather(*tasks)

results = asyncio.run(batch_summarize(["article 1...", "article 2...", "article 3..."]))

Rate-limited concurrency with Semaphore

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()
semaphore = asyncio.Semaphore(10)  # max 10 concurrent requests

async def safe_call(text: str) -> str:
    async with semaphore:
        response = await client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=256,
            messages=[{"role": "user", "content": text}]
        )
        return response.content[0].text

async def process_all(items: list[str]) -> list[str]:
    return await asyncio.gather(*[safe_call(item) for item in items])

FastAPI endpoint with async streaming

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import anthropic

app = FastAPI()
client = anthropic.AsyncAnthropic()

@app.post("/chat")
async def chat(body: dict):
    async def generate():
        async with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": body["prompt"]}]
        ) as stream:
            async for text in stream.text_stream:
                yield text

    return StreamingResponse(generate(), media_type="text/plain")

For error handling in async code, see the error handling example. To measure costs of concurrent workloads, use the Claude Cost Calculator.

Frequently asked questions

When should I use AsyncAnthropic vs Anthropic?

Use `AsyncAnthropic` in async applications (FastAPI, Starlette, async scripts). Use the synchronous `Anthropic` client in Flask, Django (non-async), or synchronous scripts. Both share the same interface; just add `await` to async calls.

How many parallel requests can I make?

Your rate limit determines the ceiling. In the API's default tier, you're limited to requests-per-minute (RPM) and tokens-per-minute (TPM). Use `asyncio.Semaphore` to cap concurrency below your RPM limit.

Can I mix async Claude calls with async database queries?

Yes. Both `AsyncAnthropic` calls and async DB libraries (asyncpg, Motor, SQLAlchemy async) are awaitable and can be interleaved or run concurrently with `asyncio.gather`.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript

⏸ Before you go…

If the snippet helped, the full Claude Code Power Prompts pack has 29 more — paste straight into CLAUDE.md. Pay what you can.
Pay what you want · from 30p →
8-page PDF · 30 prompts · 7-day refund