Claude API Error Handling in Python

Handle rate limits, overloads, and API errors from the Anthropic SDK in Python. Implement exponential backoff, retries, and graceful degradation.

The Anthropic SDK raises typed exceptions that map cleanly to retry strategies.

Exception hierarchy

anthropic.APIError               # base class
├── anthropic.APIConnectionError  # network/timeout
├── anthropic.APIStatusError      # HTTP 4xx / 5xx
│   ├── anthropic.BadRequestError       # 400
│   ├── anthropic.AuthenticationError   # 401
│   ├── anthropic.PermissionDeniedError # 403
│   ├── anthropic.NotFoundError         # 404
│   ├── anthropic.RateLimitError        # 429
│   ├── anthropic.InternalServerError   # 500
│   └── anthropic.OverloadedError       # 529
└── anthropic.APITimeoutError    # request timed out

Basic try/except pattern

import anthropic
import time

client = anthropic.Anthropic()

def safe_call(prompt: str) -> str | None:
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

    except anthropic.RateLimitError as e:
        # 429: you've exceeded your rate limit
        print(f"Rate limited. Retry after: {e.response.headers.get('retry-after', '60')}s")
        return None

    except anthropic.OverloadedError:
        # 529: Anthropic servers are overloaded — transient
        print("API overloaded. Try again in a few seconds.")
        return None

    except anthropic.AuthenticationError:
        # 401: bad API key — don't retry
        raise

    except anthropic.APIConnectionError as e:
        print(f"Network error: {e}")
        return None

Exponential backoff retry decorator

import functools
import random
import time

def with_retry(max_attempts: int = 4, base_delay: float = 1.0):
    def decorator(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return fn(*args, **kwargs)
                except (anthropic.RateLimitError, anthropic.OverloadedError) as e:
                    if attempt == max_attempts - 1:
                        raise
                    delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
                    print(f"Attempt {attempt + 1} failed ({type(e).__name__}). Retrying in {delay:.1f}s...")
                    time.sleep(delay)
                except anthropic.APIStatusError:
                    raise  # 4xx errors should not be retried
        return wrapper
    return decorator

@with_retry(max_attempts=4)
def call_api(prompt: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

result = call_api("Summarize the history of Python.")

Configure SDK-level retries

# SDK retries 2× by default on 529 and 5xx
client = anthropic.Anthropic(
    max_retries=5,       # increase for batch workloads
    timeout=30.0         # default is 600s; set lower for interactive apps
)

For Batch API workloads that tolerate latency in exchange for 50% cost savings, see the Batch API Python example. For pricing context, see the Anthropic API pricing 2026 page.

Frequently asked questions

What HTTP status codes does the Anthropic API return?

400 = invalid request (bad parameters), 401 = authentication error (invalid API key), 403 = permission denied, 404 = resource not found, 429 = rate limited, 500/529 = server/overload error. The SDK raises typed exceptions for each class.

Does the SDK have built-in retry logic?

Yes — the default `Anthropic()` client retries up to 2 times on 529 (overloaded) and 5xx errors with exponential backoff. Set `max_retries=0` to disable, or `max_retries=5` to increase. The `retry-after` header is respected for 429s.

How do I handle rate limits without losing the user's request?

Catch `anthropic.RateLimitError`, read the `retry_after` value from `e.response.headers`, and sleep for that duration before retrying. For production workloads, use a queue + worker pattern to smooth traffic spikes.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript

⏸ Before you go…

If the snippet helped, the full Claude Code Power Prompts pack has 29 more — paste straight into CLAUDE.md. Pay what you can.
Pay what you want · from 30p →
8-page PDF · 30 prompts · 7-day refund