Deploy Claude API calls in AWS Lambda with Python. Working handler pattern, streaming response, timeout configuration, environment variable setup, and API Gateway wiring.
Deploying Claude API calls in AWS Lambda is straightforward but has three gotchas: timeout (default 3s is too short), cold start (import time), and packaging the anthropic library. This guide covers all three.
import anthropic
import json
import os
# Module-level client: reused across warm invocations (avoids re-init on each call)
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
def handler(event, context):
body = json.loads(event.get("body", "{}"))
user_message = body.get("message", "Hello!")
response = client.messages.create(
model="claude-haiku-4-5-20251001", # fast model reduces Lambda duration cost
max_tokens=1024,
messages=[{"role": "user", "content": user_message}]
)
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({"reply": response.content[0].text})
}
# On your local machine (match Python version to Lambda runtime)
mkdir -p python
pip install anthropic -t python/
zip -r anthropic-layer.zip python/
# Upload via AWS CLI
aws lambda publish-layer-version --layer-name anthropic-sdk --zip-file fileb://anthropic-layer.zip --compatible-runtimes python3.12
# Attach to your function
aws lambda update-function-configuration --function-name my-claude-function --layers arn:aws:lambda:us-east-1:123456789012:layer:anthropic-sdk:1
# Set via CLI (never hardcode in source)
aws lambda update-function-configuration --function-name my-claude-function --timeout 30 --memory-size 256 --environment Variables="{ANTHROPIC_API_KEY=sk-ant-...}"
# Or via Secrets Manager (production recommended)
import boto3, json, os
_secret_cache = None
def get_api_key() -> str:
global _secret_cache
if _secret_cache is None:
sm = boto3.client("secretsmanager", region_name="us-east-1")
secret = sm.get_secret_value(SecretId=os.environ["SECRET_ARN"])
_secret_cache = json.loads(secret["SecretString"])["anthropic_api_key"]
return _secret_cache
# Requires: Lambda Function URL with RESPONSE_STREAM invoke mode
# Install: pip install awslambdaric>=1.2
import anthropic
import os
import json
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
def handler(event, context):
# For streaming, handler must be called via Lambda Function URL, not API Gateway REST
body = json.loads(event.get("body", "{}"))
user_message = body.get("message", "Hello!")
def generate():
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": user_message}]
) as stream:
for text in stream.text_stream:
yield text.encode()
# awslambdaric streaming response
return context.response_stream(generate(), content_type="text/plain")
FROM public.ecr.aws/lambda/python:3.12
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY handler.py ./
CMD ["handler.handler"]
# requirements.txt:
# anthropic>=0.40
# boto3
| Setting | Recommended value | Why |
|---|---|---|
| Timeout | 30s (non-streaming) / 120s (streaming) | Claude responses take 5-25s; default 3s always times out |
| Memory | 256 MB | anthropic SDK + boto3 fit; more RAM also increases CPU |
| Runtime | python3.12 | Fastest cold-start among Lambda Python runtimes in 2026 |
| Concurrency | Reserved = Anthropic tier limit / avg duration | Prevents Lambda auto-scale from hitting API rate limits |
| API key storage | Secrets Manager (prod), env var (dev) | Env vars visible in console; Secrets Manager is audited |
Estimate how Claude API costs scale with Lambda invocations using the Claude API Cost Calculator. For the FastAPI alternative (long-running server instead of serverless), see the FastAPI guide. For error handling patterns (retries, 429s), see the error handling guide.