Testing Claude API Applications in Python

How to unit test and integration test Python apps that call the Claude API. Mock the Anthropic client with unittest.mock, write deterministic tests, and add CI test coverage.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

Testing LLM applications requires two complementary strategies: fast, deterministic unit tests with mocked API calls, and occasional integration tests against the real API. This guide shows both.

Install test dependencies

pip install anthropic pytest pytest-asyncio

Unit test: mock the Anthropic client

import pytest
from unittest.mock import MagicMock, patch
import anthropic

# --- Application code under test ---
def classify_sentiment(client: anthropic.Anthropic, text: str) -> str:
    """Returns 'positive', 'negative', or 'neutral'."""
    msg = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=10,
        system="Reply with exactly one word: positive, negative, or neutral.",
        messages=[{"role": "user", "content": text}]
    )
    return msg.content[0].text.strip().lower()

# --- Unit tests ---
def make_mock_response(text: str):
    """Helper: build a mock Message with content[0].text = text."""
    block = MagicMock()
    block.text = text
    msg = MagicMock()
    msg.content = [block]
    return msg

def test_positive_sentiment():
    client = MagicMock(spec=anthropic.Anthropic)
    client.messages.create.return_value = make_mock_response("positive")
    result = classify_sentiment(client, "I love this product!")
    assert result == "positive"
    client.messages.create.assert_called_once()

def test_negative_sentiment():
    client = MagicMock(spec=anthropic.Anthropic)
    client.messages.create.return_value = make_mock_response("negative")
    assert classify_sentiment(client, "Terrible experience.") == "negative"

def test_model_parameter_passed():
    """Verify the function uses the correct (cheap) model."""
    client = MagicMock(spec=anthropic.Anthropic)
    client.messages.create.return_value = make_mock_response("neutral")
    classify_sentiment(client, "It is what it is.")
    call_kwargs = client.messages.create.call_args.kwargs
    assert call_kwargs["model"] == "claude-haiku-4-5-20251001"
    assert call_kwargs["max_tokens"] == 10

Unit test: mock with patch decorator

from unittest.mock import patch, MagicMock

# --- App code ---
def summarize_text(text: str) -> str:
    client = anthropic.Anthropic()  # created inside the function
    msg = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=100,
        messages=[{"role": "user", "content": f"Summarize: {text}"}]
    )
    return msg.content[0].text

@patch("anthropic.Anthropic")
def test_summarize(mock_anthropic_class):
    mock_instance = MagicMock()
    mock_anthropic_class.return_value = mock_instance
    block = MagicMock(); block.text = "Short summary."
    mock_instance.messages.create.return_value = MagicMock(content=[block])

    result = summarize_text("A very long document...")
    assert result == "Short summary."
    mock_instance.messages.create.assert_called_once()

Unit test: mock streaming

from contextlib import contextmanager

def stream_response(client: anthropic.Anthropic, prompt: str) -> str:
    full = ""
    with client.messages.stream(
        model="claude-haiku-4-5-20251001",
        max_tokens=200,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            full += text
    return full

def test_streaming():
    client = MagicMock(spec=anthropic.Anthropic)
    # Mock the context manager
    mock_stream = MagicMock()
    mock_stream.text_stream = iter(["Hello", ", ", "world", "!"])
    mock_stream.__enter__ = lambda s: mock_stream
    mock_stream.__exit__ = MagicMock(return_value=False)
    client.messages.stream.return_value = mock_stream

    result = stream_response(client, "Say hello")
    assert result == "Hello, world!"

Integration test: real API (CI-gated)

import os
import pytest
import anthropic

# Gate: only run if env var is set (set in CI secrets, never locally by default)
pytestmark = pytest.mark.skipif(
    not os.environ.get("RUN_INTEGRATION_TESTS"),
    reason="Set RUN_INTEGRATION_TESTS=true to run against real API"
)

@pytest.fixture(scope="module")
def client():
    return anthropic.Anthropic()  # uses ANTHROPIC_API_KEY from env

def test_real_api_basic(client):
    msg = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=20,
        messages=[{"role": "user", "content": "Reply with exactly: OK"}]
    )
    assert "OK" in msg.content[0].text

def test_real_api_json_output(client):
    import json
    msg = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=50,
        system="Reply with valid JSON only. No markdown.",
        messages=[{"role": "user", "content": 'Return {"status": "ok", "value": 42}'}]
    )
    data = json.loads(msg.content[0].text)
    assert data["status"] == "ok"
    assert data["value"] == 42

GitHub Actions CI configuration

# .github/workflows/test.yml
name: Tests
on: [push, pull_request]

jobs:
  unit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install anthropic pytest
      - run: pytest tests/unit/ -v

  integration:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'  # only on merge to main
    env:
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      RUN_INTEGRATION_TESTS: "true"
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install anthropic pytest
      - run: pytest tests/integration/ -v

Testing approach comparison

ApproachSpeedCostReliabilityBest for
Mocked unit tests<1s$0DeterministicBusiness logic, prompt construction
Integration tests (Haiku)3–8s~$0.0001/testReal API behaviourPrompt validation, output format
Recorded cassettes (VCR.py)<1s$0Recorded responsesRegression testing without API calls
Eval frameworks (promptfoo)Minutes$0.01–$1StatisticalQuality regression across model upgrades

For async testing with asyncio, see the async Python guide. For cost estimates before running tests, see the Claude API Cost Calculator.

Frequently asked questions

How do I mock the Anthropic client in Python tests?
Use `unittest.mock.patch('anthropic.Anthropic')` or `MagicMock()` to replace the client with a mock that returns a fixed `Message` object. Set `mock_client.messages.create.return_value` to a mock with the expected `.content[0].text` value.
Should I run integration tests against the real Claude API?
Yes, but separately from unit tests. Keep unit tests (mocked) in `tests/unit/` and integration tests (real API) in `tests/integration/`. Run integration tests only in CI with the real `ANTHROPIC_API_KEY` — gate them on a `RUN_INTEGRATION_TESTS=true` env var to avoid accidental charges.
How do I test streaming Claude responses?
Mock the `client.messages.stream()` context manager. Return a mock iterator that yields `TextEvent` objects with `.text` values. The `with client.messages.stream(...) as stream:` pattern requires the mock to support `__enter__` and `__exit__`.
How much do integration tests cost?
A single integration test calling Claude Haiku with a short prompt costs under $0.0001. A full integration test suite of 20 tests typically costs $0.01–$0.05. Use `ANTHROPIC_API_KEY` in CI secrets and set `max_tokens=50` in test fixtures to minimise cost.
What is the best way to test Claude tool use?
Create a mock that returns a `ToolUseBlock` in the response content. Verify your tool dispatch logic processes the `tool_use` block correctly. Test the full round-trip (tool call → tool result → final answer) with an integration test against the real API.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript