Claude API with LlamaIndex in Python (2026 Working Example)

Use the Claude API with LlamaIndex in Python 2026. Build RAG pipelines, query engines, and LlamaIndex agents backed by Claude Sonnet. Working code examples.

LlamaIndex is a popular framework for building RAG (Retrieval-Augmented Generation) applications. This guide shows how to wire Claude as the LLM backend for LlamaIndex query engines, agents, and pipelines.

Installation

Basic Claude LLM setup

RAG pipeline: document Q&A with Claude

Chat engine (multi-turn conversation over documents)

ReAct agent with Claude

Streaming responses

LlamaIndex vs LangChain for Claude RAG

Prompt caching with Claude in LlamaIndex

Frequently asked questions

Aspect	LlamaIndex	LangChain
Primary use case	Document indexing, RAG, structured retrieval	General LLM orchestration, chains, agents
Document loaders	100+ built-in (PDF, Word, Notion, S3, web)	150+ loaders (broader, but less RAG-focused)
RAG ergonomics	One-liner: index.as_query_engine()	Requires explicit chain composition
Agent framework	ReActAgent, OpenAI-style function calling	More mature multi-agent support
Claude integration	llama-index-llms-anthropic (official)	langchain-anthropic (official)
Best for	Document Q&A, knowledge bases, RAG eval	Complex multi-step workflows, custom chains

How do I use Claude with LlamaIndex in Python?

Install llama-index-llms-anthropic and llama-index-core. Create an Anthropic LLM object: from llama_index.llms.anthropic import Anthropic; llm = Anthropic(model='claude-sonnet-4-6'). Pass it as llm= to any LlamaIndex index or query engine.

What is the difference between LlamaIndex and LangChain with Claude?

Both are orchestration frameworks for building LLM apps. LlamaIndex is optimized for retrieval-augmented generation (RAG) — indexing documents, chunking, embedding, and querying. LangChain is more general-purpose (agents, chains, tools). For document Q&A and RAG over large corpora, LlamaIndex is more ergonomic. Both support Claude as a drop-in LLM.

How do I build a RAG pipeline with Claude and LlamaIndex?

1) Load documents with SimpleDirectoryReader. 2) Build a VectorStoreIndex with your documents and a local embedding model or OpenAI embeddings. 3) Create a query engine with index.as_query_engine(llm=Anthropic(model='claude-sonnet-4-6')). 4) Call query_engine.query('your question'). LlamaIndex handles chunking, embedding, retrieval, and synthesis automatically.

Can I use LlamaIndex agents with Claude?

Yes. Use llama_index.core.agent.ReActAgent.from_tools(tools, llm=Anthropic(model='claude-sonnet-4-6')). Claude's strong instruction-following makes it an excellent ReAct agent backbone. You can mix LlamaIndex built-in tools (file read, code interpreter) with custom FunctionTool wrappers.

Does LlamaIndex support streaming with Claude?

Yes. Call query_engine.query() and use the streaming_response=True option, or use llm.stream_complete() directly. The Anthropic LlamaIndex integration wraps Claude's native streaming API.

Claude API with LlamaIndex: RAG, Agents, and Query Engines