Deploy a Python Flask app that calls the Claude API to Google Cloud Run. Store the Anthropic API key in Secret Manager, containerize with Docker, and deploy serverless.
Google Cloud Run is an ideal host for Claude API microservices: you pay only for request processing time, scale automatically to zero, and keep the API key secure in Secret Manager. This guide covers the complete deployment flow.
claude-cloud-run/
├── app.py
├── requirements.txt
└── Dockerfile
import os
import anthropic
from flask import Flask, request, jsonify, Response, stream_with_context
app = Flask(__name__)
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
@app.route("/ask", methods=["POST"])
def ask():
data = request.get_json(force=True)
prompt = data.get("prompt", "")
if not prompt:
return jsonify({"error": "prompt required"}), 400
msg = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=512,
messages=[{"role": "user", "content": prompt}]
)
return jsonify({"response": msg.content[0].text})
@app.route("/stream", methods=["POST"])
def stream():
data = request.get_json(force=True)
prompt = data.get("prompt", "")
def generate():
with client.messages.stream(
model="claude-haiku-4-5-20251001",
max_tokens=512,
messages=[{"role": "user", "content": prompt}]
) as stream:
for text in stream.text_stream:
yield f"data: {text}\n\n"
yield "data: [DONE]\n\n"
return Response(
stream_with_context(generate()),
content_type="text/event-stream",
headers={"X-Accel-Buffering": "no"}
)
if __name__ == "__main__":
app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
anthropic==0.40.0
flask==3.1.0
gunicorn==23.0.0
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=8080
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "--workers", "1", "--threads", "8", "--timeout", "120", "app:app"]
# 1. Set your project
PROJECT_ID=your-project-id
REGION=us-central1
# 2. Store the Anthropic API key in Secret Manager
echo -n "sk-ant-..." | gcloud secrets create ANTHROPIC_API_KEY --data-file=- --project=$PROJECT_ID
# 3. Build and push the container to Artifact Registry
gcloud builds submit --tag gcr.io/$PROJECT_ID/claude-app --project=$PROJECT_ID
# 4. Deploy to Cloud Run with Secret Manager binding
gcloud run deploy claude-app --image gcr.io/$PROJECT_ID/claude-app --platform managed --region $REGION --memory 256Mi --cpu 1 --max-instances 10 --concurrency 80 --timeout 120 --set-secrets=ANTHROPIC_API_KEY=ANTHROPIC_API_KEY:latest --allow-unauthenticated --project=$PROJECT_ID
# Get the service URL
SERVICE_URL=$(gcloud run services describe claude-app --region=$REGION --format='value(status.url)')
# Test the /ask endpoint
curl -X POST "$SERVICE_URL/ask" -H "Content-Type: application/json" -d '{"prompt": "Explain Cloud Run in one sentence."}'
# Test streaming
curl -X POST "$SERVICE_URL/stream" -H "Content-Type: application/json" -d '{"prompt": "Count from 1 to 5, one number per line."}' --no-buffer
# Remove --allow-unauthenticated and require a service account token instead
# In your CI/CD pipeline, call the endpoint with:
TOKEN=$(gcloud auth print-identity-token)
curl -X POST "$SERVICE_URL/ask" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{"prompt": "Hello"}'
| Platform | Cold start | Cost (1K req/day) | Secret management | Best for |
|---|---|---|---|---|
| Cloud Run | 1–3s | ~$0.05/day | Secret Manager native | Microservices, GCP ecosystem |
| AWS Lambda | 0.5–2s | ~$0.02/day | Parameter Store / Secrets Manager | AWS ecosystem, event-driven |
| Vercel Functions | <200ms | Free tier sufficient | Env vars in dashboard | Frontend apps, Next.js |
| Cloud Run + min-instances=1 | 0ms | ~$15/month | Secret Manager native | Latency-sensitive APIs |
For the AWS Lambda equivalent, see the AWS Lambda guide. For Vercel/Next.js deployment, see the Next.js example. Use the Claude API Cost Calculator to model your API costs before launch.