Text-to-SQL with Claude API in Python

Convert natural language questions to SQL queries using Claude API. Schema injection, safety guardrails, multi-table joins, self-healing queries, and BigQuery support — all with Python code examples.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

Text-to-SQL is the most common LLM use case for data teams. Claude converts natural-language questions into SQL queries when given schema context — no fine-tuning required. This guide covers everything from a minimal prototype to production-ready patterns with safety guardrails.

1. Minimal text-to-SQL

import anthropic

client = anthropic.Anthropic()

SCHEMA = """
CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    total_amount DECIMAL(10,2),
    status TEXT,  -- 'pending', 'shipped', 'delivered', 'cancelled'
    created_at TIMESTAMP
);
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT,
    email TEXT,
    country TEXT
);
"""

def text_to_sql(question: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        system=f"""You are a SQL expert. Given a database schema and a question, return ONLY the SQL query with no explanation.
Database schema:
{SCHEMA}
Rules:
- Only generate SELECT statements.
- Never use DROP, DELETE, UPDATE, INSERT, or any DDL/DML.
- Use standard SQL (compatible with SQLite and PostgreSQL).""",
        messages=[{"role": "user", "content": question}],
    )
    return response.content[0].text.strip()

# Example
sql = text_to_sql("How many orders were placed in the last 30 days by customers from Germany?")
print(sql)
# SELECT COUNT(*) FROM orders o
# JOIN customers c ON o.customer_id = c.id
# WHERE c.country = 'Germany'
# AND o.created_at >= datetime('now', '-30 days')

2. Structured JSON output with explanation

import json

def text_to_sql_structured(question: str, schema: str) -> dict:
    """Returns {sql, explanation, tables_used, confidence}."""
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=f"""You are a SQL expert. Return a JSON object with these keys:
- sql: the SELECT query (string)
- explanation: one-sentence plain-English description of what the query does
- tables_used: list of table names referenced
- confidence: 'high' | 'medium' | 'low' (low if question is ambiguous)

Database schema:
{schema}

Only return valid JSON. Only generate SELECT statements.""",
        messages=[{"role": "user", "content": question}],
    )
    return json.loads(response.content[0].text)

3. Safety guardrails

import re

BLOCKED_KEYWORDS = re.compile(
    r"(DROP|DELETE|UPDATE|INSERT|TRUNCATE|ALTER|CREATE|REPLACE|MERGE|EXEC|EXECUTE)",
    re.IGNORECASE
)

def is_safe_sql(sql: str) -> bool:
    """Return True only if the SQL is a pure SELECT statement."""
    stripped = sql.strip().upper()
    if not stripped.startswith("SELECT") and not stripped.startswith("WITH"):
        return False
    if BLOCKED_KEYWORDS.search(sql):
        return False
    return True

def safe_text_to_sql(question: str, schema: str) -> str:
    sql = text_to_sql(question)
    if not is_safe_sql(sql):
        raise ValueError(f"Unsafe SQL generated: {sql}")
    return sql

4. Self-healing: fix SQL errors automatically

import sqlite3

def execute_with_healing(question: str, schema: str, db_path: str, max_retries: int = 2) -> list:
    """Generate SQL, execute it, auto-fix on error, return rows."""
    conn = sqlite3.connect(db_path)
    sql = text_to_sql(question)
    history = [{"role": "user", "content": question}, {"role": "assistant", "content": sql}]

    for attempt in range(max_retries + 1):
        try:
            cursor = conn.execute(sql)
            return cursor.fetchall()
        except sqlite3.Error as e:
            if attempt == max_retries:
                raise
            error_msg = f"This SQL produced an error: {e}

SQL:
{sql}

Please return a corrected SQL query only."
            history.append({"role": "user", "content": error_msg})
            fix_response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=512,
                system=f"You are a SQL expert. Database schema:
{schema}
Return corrected SQL only.",
                messages=history,
            )
            sql = fix_response.content[0].text.strip()
            history.append({"role": "assistant", "content": sql})
    conn.close()

5. BigQuery dialect

def text_to_bigquery_sql(question: str, schema: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        system=f"""You are a BigQuery SQL expert. Return ONLY the SQL query.
Rules:
- Use BigQuery SQL syntax (backtick identifiers, SAFE_DIVIDE, DATE_TRUNC with DATE type, etc.)
- Use standard project.dataset.table notation when possible
- Only SELECT statements; no DDL or DML
Schema:
{schema}""",
        messages=[{"role": "user", "content": question}],
    )
    return response.content[0].text.strip()

Text-to-SQL approach comparison

ApproachAccuracySetup complexitySchema size limit
Claude with schema injection (this guide)High for most queriesLow — just a system prompt~200K tokens (hundreds of tables)
Fine-tuned model (Spider/BIRD)Very high on benchmarksVery high — training pipelineFixed at training time
Template-based (NLTK/spaCy)Low — brittle on natural languageMediumScales poorly
OpenAI GPT-4oComparable to ClaudeSame as Claude128K tokens

For cost estimates on text-to-SQL pipelines, use the Claude API Cost Calculator. For extracting structured data beyond SQL generation, see the data extraction guide.

Frequently asked questions

How does Claude convert natural language to SQL?
You inject your database schema (CREATE TABLE statements or column descriptions) into the system prompt, then ask Claude to write a SQL query for the user's question. Claude understands table relationships, foreign keys, and SQL dialects. The key is giving Claude accurate, complete schema information.
How do I prevent Claude from generating dangerous SQL like DROP or DELETE?
Add explicit guardrails in the system prompt: 'Only generate SELECT statements. Never use DROP, DELETE, UPDATE, INSERT, TRUNCATE, or any DDL/DML statements.' Additionally, validate the generated SQL before execution using a simple regex check for prohibited keywords.
Which SQL dialects does Claude support?
Claude supports all major SQL dialects including PostgreSQL, MySQL, SQLite, BigQuery, Snowflake, Redshift, and T-SQL (SQL Server). Specify the dialect in your system prompt: 'Generate BigQuery SQL using SAFE_DIVIDE instead of / for division.'
What if the generated SQL has a syntax error?
Implement a self-healing loop: execute the query, catch the database exception, and send the error message back to Claude with 'This query produced an error: . Please fix it.' Claude typically corrects syntax errors on the first retry.
How much schema context can I include?
Claude Sonnet 4.6 supports 200K tokens (about 150,000 words). A typical database schema of 50 tables with column descriptions fits easily. For databases with hundreds of tables, include only the relevant subset based on the user's question, or use a schema search layer.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript