Testing Claude API Applications in Python: Mocking & Integration Tests (2026)

How to unit test and integration test Python apps that call the Claude API. Mock the Anthropic client with unittest.mock, write deterministic tests, and add CI test coverage.

Testing LLM applications requires two complementary strategies: fast, deterministic unit tests with mocked API calls, and occasional integration tests against the real API. This guide shows both.

Install test dependencies

Unit test: mock the Anthropic client

Unit test: mock with patch decorator

Unit test: mock streaming

Integration test: real API (CI-gated)

GitHub Actions CI configuration

Testing approach comparison

Frequently asked questions

Approach	Speed	Cost	Reliability	Best for
Mocked unit tests	<1s	$0	Deterministic	Business logic, prompt construction
Integration tests (Haiku)	3–8s	~$0.0001/test	Real API behaviour	Prompt validation, output format
Recorded cassettes (VCR.py)	<1s	$0	Recorded responses	Regression testing without API calls
Eval frameworks (promptfoo)	Minutes	$0.01–$1	Statistical	Quality regression across model upgrades

How do I mock the Anthropic client in Python tests?

Use `unittest.mock.patch('anthropic.Anthropic')` or `MagicMock()` to replace the client with a mock that returns a fixed `Message` object. Set `mock_client.messages.create.return_value` to a mock with the expected `.content[0].text` value.

Should I run integration tests against the real Claude API?

Yes, but separately from unit tests. Keep unit tests (mocked) in `tests/unit/` and integration tests (real API) in `tests/integration/`. Run integration tests only in CI with the real `ANTHROPIC_API_KEY` — gate them on a `RUN_INTEGRATION_TESTS=true` env var to avoid accidental charges.

How do I test streaming Claude responses?

Mock the `client.messages.stream()` context manager. Return a mock iterator that yields `TextEvent` objects with `.text` values. The `with client.messages.stream(...) as stream:` pattern requires the mock to support `__enter__` and `__exit__`.

How much do integration tests cost?

A single integration test calling Claude Haiku with a short prompt costs under $0.0001. A full integration test suite of 20 tests typically costs $0.01–$0.05. Use `ANTHROPIC_API_KEY` in CI secrets and set `max_tokens=50` in test fixtures to minimise cost.

What is the best way to test Claude tool use?

Create a mock that returns a `ToolUseBlock` in the response content. Verify your tool dispatch logic processes the `tool_use` block correctly. Test the full round-trip (tool call → tool result → final answer) with an integration test against the real API.

Testing Claude API Applications in Python