Claude Vision API in Python

Analyze images with Claude in Python. Pass base64 images or URLs, extract text, describe scenes, and process PDFs with the Anthropic vision API.

Claude's vision capability lets you analyze images inline in the messages array — no separate endpoint required.

Analyze an image from a URL

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png"
                    }
                },
                {"type": "text", "text": "What do you see in this image? Be specific."}
            ]
        }
    ]
)
print(response.content[0].text)

Analyze a local image (base64)

import anthropic
import base64
from pathlib import Path

client = anthropic.Anthropic()

def analyze_local_image(image_path: str, prompt: str) -> str:
    image_data = base64.standard_b64encode(Path(image_path).read_bytes()).decode()

    # Detect media type from extension
    ext = Path(image_path).suffix.lower()
    media_type = {"jpg": "image/jpeg", ".jpeg": "image/jpeg",
                  ".png": "image/png", ".gif": "image/gif",
                  ".webp": "image/webp"}.get(ext, "image/jpeg")

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {"type": "base64", "media_type": media_type, "data": image_data}
                    },
                    {"type": "text", "text": prompt}
                ]
            }
        ]
    )
    return response.content[0].text

# Usage
result = analyze_local_image("screenshot.png", "Extract all text visible in this screenshot.")
print(result)

Multiple images in one request

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "url", "url": "https://example.com/before.jpg"}},
                {"type": "text", "text": "Before:"},
                {"type": "image", "source": {"type": "url", "url": "https://example.com/after.jpg"}},
                {"type": "text", "text": "After: What changed between these two images?"}
            ]
        }
    ]
)

Use the Cost Calculator to estimate token costs for image workloads. See the Python quickstart for non-vision API basics.

Frequently asked questions

Does Claude support image URLs or only base64?

Both. Pass `{type: 'image', source: {type: 'url', url: 'https://...'}}` for public URLs. For private images or PDFs, use `{type: 'base64', media_type: 'image/jpeg', data: ''}`. URL images are fetched by Anthropic's servers at request time.

What image formats does Claude support?

JPEG, PNG, GIF (static), and WebP. Maximum size per image: 5MB (base64 encoded). For PDFs, use the document type with `media_type: 'application/pdf'`.

Does vision cost more than text-only API calls?

Images are billed by token count. A 1080×1080 JPEG costs approximately 1,500–3,500 input tokens depending on detail level. At Sonnet 4.6 rates ($3/M input), a typical image costs $0.005–$0.01 to process.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript

⏸ Before you go…

If the snippet helped, the full Claude Code Power Prompts pack has 29 more — paste straight into CLAUDE.md. Pay what you can.
Pay what you want · from 30p →
8-page PDF · 30 prompts · 7-day refund