Claude Computer Use

Call the Claude computer use API to let Claude control a desktop. Full Python example: screenshot, send to Claude, execute mouse/keyboard actions, loop.

💥 50p impulse-buy: Power Prompts PDF (first 10 buyers) 30 battle-tested Claude Code prompts · 8-page PDF · paste into CLAUDE.md and never re-type a prompt again · 50p impulse-buy, no commitment

Safety: Anthropic recommends running computer use inside a sandboxed Docker container or VM, not on your personal machine. Claude can click and type anywhere on screen.

Install dependencies

pip install anthropic pyautogui Pillow

How the loop works

  1. Take a screenshot, base64-encode it.
  2. Send it to Claude as a tool_result with the computer_use_20250124 tool declared.
  3. Claude responds with a tool_use block: an action (click, type, screenshot, scroll) and coordinates.
  4. Execute that action on the real desktop.
  5. Take another screenshot and repeat until Claude returns a text reply.

Complete Python example

import anthropic, base64, time
from io import BytesIO
from PIL import ImageGrab
import pyautogui

client = anthropic.Anthropic()
SCREEN_W, SCREEN_H = pyautogui.size()

def take_screenshot():
    img = ImageGrab.grab()
    buf = BytesIO()
    img.save(buf, format="PNG")
    return base64.standard_b64encode(buf.getvalue()).decode("utf-8")

def execute_action(action):
    kind = action.get("type") or action.get("action")
    if kind == "screenshot":
        return
    elif kind == "left_click":
        pyautogui.click(*action["coordinate"])
    elif kind == "right_click":
        pyautogui.rightClick(*action["coordinate"])
    elif kind == "double_click":
        pyautogui.doubleClick(*action["coordinate"])
    elif kind == "type":
        pyautogui.write(action["text"], interval=0.02)
    elif kind == "key":
        pyautogui.hotkey(*action["text"].split("+"))
    elif kind == "scroll":
        x, y = action["coordinate"]
        d = action.get("direction", "down")
        pyautogui.scroll(action.get("amount", 3) * (1 if d=="up" else -1), x=x, y=y)
    time.sleep(0.5)

def run(task, max_steps=20):
    messages = []
    last_tool_id = None
    for step in range(max_steps):
        shot = take_screenshot()
        if step == 0:
            messages.append({"role": "user", "content": [
                {"type": "tool_result", "tool_use_id": "initial",
                 "content": [{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": shot}}]},
                {"type": "text", "text": task}
            ]})
        else:
            messages.append({"role": "user", "content": [
                {"type": "tool_result", "tool_use_id": last_tool_id,
                 "content": [{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": shot}}]}
            ]})

        resp = client.beta.messages.create(
            model="claude-sonnet-4-6-20251001",
            max_tokens=4096,
            tools=[{"type": "computer_20250124", "name": "computer",
                    "display_width_px": SCREEN_W, "display_height_px": SCREEN_H, "display_number": 1}],
            messages=messages,
            betas=["computer-use-2025-10-01"],
        )
        messages.append({"role": "assistant", "content": resp.content})
        tool_uses = [b for b in resp.content if b.type == "tool_use"]
        if not tool_uses:
            for b in resp.content:
                if hasattr(b, "text"): print(b.text)
            break
        for tu in tool_uses:
            last_tool_id = tu.id
            print(f"Step {step+1}: {tu.input}")
            execute_action(tu.input)

if __name__ == "__main__":
    run("Open a text editor and type 'Hello from Claude!'")

Supported actions

ActionRequired fieldsDescription
screenshotClaude requests a fresh screenshot
left_clickcoordinate: [x,y]Single left click at pixel coords
right_clickcoordinate: [x,y]Right click
double_clickcoordinate: [x,y]Double click
typetext: "string"Type text at current focus
keytext: "ctrl+c"Press key combination
scrollcoordinate, direction, amountScroll up/down

Run in Docker (recommended)

docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY -p 5900:5900 -p 8501:8501   ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Open http://localhost:8501 for the Streamlit demo or VNC to localhost:5900 to watch Claude work.

Cost estimation

Each 1280×800 screenshot costs ~1,600–2,000 input tokens. At Sonnet 4.6 pricing ($3/M tokens) that is ~$0.005 per screenshot. Estimate total task costs with the Claude API Cost Calculator.

Frequently asked questions

Which Claude models support computer use?
Claude Sonnet 4 and later models with the computer-use-2025-10-01 beta header support computer use. Always use the model-appropriate beta header or you will get a 400 error.
What is the computer_use_20250124 tool?
computer_use_20250124 is the built-in tool type you declare in the API request. Claude returns tool_use blocks with an action field (screenshot, left_click, type, key, scroll) that your code executes on the real desktop.
How do I take a screenshot and send it to Claude?
Take a screenshot with pyautogui or Pillow, base64-encode the PNG bytes, and include it as an image block inside a tool_result message. Claude then plans the next action based on what it sees.
Is Claude computer use safe to run on my main machine?
Anthropic recommends running computer use in an isolated sandbox (Docker container, VM), not on your personal computer. Claude can click and type anywhere on screen.
How much does Claude computer use cost?
Each 1280x800 screenshot costs ~1,600–2,000 input tokens. At Sonnet 4.6 pricing ($3/M tokens), that is ~$0.005 per screenshot. A 20-step automation costs ~$0.10 in screenshot tokens plus output.

Free tools

Cost Calculator → API Cookbook → Diff Summarizer → Skills Browser →

More examples

Claude API Python QuickstartClaude API Node.js / TypeScript QuickstartClaude API Streaming in PythonClaude API Streaming in Node.js / TypeScriptClaude API Tool Use in PythonClaude API Tool Use in Node.js / TypeScript