Claude just took control of my browser, opened a spreadsheet, filled in 47 rows of data, and sent a follow-up email — all while I made coffee. That’s not a hypothetical. That’s what Claude’s computer use feature does in practice, and most people have no idea it exists or how to actually set it up.
A BCG study found that knowledge workers using AI completed tasks 25–40% faster on average.
Computer use lets Claude operate a real desktop environment: clicking buttons, typing into forms, reading what’s on screen, and making decisions based on what it sees. It’s not just text generation anymore. It’s an AI agent that can literally use your computer the way a human assistant would.
I’ve spent the last few months testing this feature extensively — burning through API credits, running into walls, figuring out what actually works. This tutorial gives you the exact setup process, real automation examples with copy-paste prompts, and the honest truth about where it falls short in 2026.
What You’ll Build in This Tutorial
By the end of this guide, you’ll have a working Claude computer use automation that can:
- Open a browser and navigate to specific websites
- Extract data from web pages and log it to a file
- Fill out web forms automatically
- Take screenshots and report back on what it sees
- Chain multiple computer actions into a single workflow
The practical example we’ll build: a research automation that opens Google, searches for competitor pricing on three websites, and compiles the results into a text file — without you touching the keyboard.
Prerequisites Before You Start
You need a few things in place before step one. Don’t skip this section — missing any of these is the #1 reason people hit a wall immediately.
- Anthropic API access — Computer use is API-only. You can’t do this through Claude.ai in a browser. Go to console.anthropic.com and create an account. You’ll need to add a payment method and purchase credits (minimum $5).
- Python 3.10+ installed — Check with
python --versionin your terminal. - Docker installed — Anthropic’s official computer use demo runs in a Docker container. Download Docker Desktop from docker.com. It’s free.
- Basic comfort with the terminal — You don’t need to be a developer, but you need to be able to run commands by copy-pasting them.
- API key from Anthropic — In the console, go to API Keys → Create Key. Copy it somewhere safe.
The model you’ll use is claude-opus-4 or claude-sonnet-4 (the computer use capable models available in 2026). Opus is more capable but costs more per token. For most automation tasks, Sonnet hits the right balance — about $3 per million input tokens and $15 per million output tokens.
Step 1: Set Up the Docker Environment
Claude’s computer use runs inside a sandboxed virtual desktop — a Docker container with a browser, file system, and everything the AI needs to interact with a “computer” safely. This is important: Claude is operating on a virtual machine, not your actual desktop (unless you specifically route it there, which I’ll cover later).
Open your terminal and run these commands in order:
git clone https://github.com/anthropics/anthropic-quickstarts.git
cd anthropic-quickstarts/computer-use-demo
Now set your API key as an environment variable. On Mac/Linux:
export ANTHROPIC_API_KEY=your_api_key_here
On Windows (PowerShell):
$env:ANTHROPIC_API_KEY="your_api_key_here"
Now build and run the Docker container:
docker build -t computer-use-demo .
docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
-v $HOME/.anthropic:/home/user/.anthropic
-p 5900:5900 -p 8501:8501 -p 6080:6080
computer-use-demo
The first build takes 3-5 minutes. Once it’s running, open your browser and go to http://localhost:8501. You’ll see the Streamlit interface — a chat window on the left and a virtual desktop on the right. That virtual desktop is what Claude sees and controls.
Step 2: Run Your First Computer Use Command
Start simple. In the chat input on the left side of the interface, type this exact prompt:
Open the Firefox browser, go to google.com, and take a screenshot showing the homepage.
Hit Enter and watch the right panel. You’ll see Claude:
- Call the
computertool with action type “screenshot” to see the current state of the desktop - Identify where Firefox is and click on it
- Wait for Firefox to open
- Click the address bar
- Type “google.com” and press Enter
- Take a final screenshot and report back
The whole process takes about 15-30 seconds. When I first ran this, I’ll be honest — it felt like watching someone else use a computer through a window. It’s genuinely strange and impressive at the same time.
If Claude gets stuck or clicks the wrong thing, that’s normal. Computer use isn’t perfect. I’ll cover troubleshooting at the end.
Step 3: Build the Competitor Research Automation
Now let’s do something actually useful. This is the prompt I use for a basic competitor pricing research workflow. Copy this exactly:
Please do the following research task and save results to a file:
1. Open Firefox and go to [competitor website 1 URL]
2. Find their pricing page (look for a "Pricing" link in the navigation)
3. Note down the plan names and prices you see
4. Go to [competitor website 2 URL] and repeat
5. Go to [competitor website 3 URL] and repeat
6. Open a text editor (gedit or mousepad)
7. Create a new file and write a summary with this format:
COMPETITOR PRICING SUMMARY - [today's date]
Competitor 1: [name]
- Plans found: [list]
- Price range: [lowest to highest]
Competitor 2: [name]
- Plans found: [list]
- Price range: [lowest to highest]
Competitor 3: [name]
- Plans found: [list]
- Price range: [lowest to highest]
8. Save the file to the desktop as "pricing_research.txt"
If you can't find a pricing page on any site, note that in the file and move on.
Replace the placeholder URLs with real competitor websites before running this. I tested this with three SaaS tools in my niche and Claude successfully found pricing information on two out of three, with the third having a “Contact for pricing” wall. Total time: about 4 minutes. My time investment: zero.
Step 4: Use the Python API for Programmatic Control
The Streamlit demo is great for testing, but for real automation you want to call the API directly from Python. This lets you trigger computer use tasks from scripts, schedules, or other tools like Make.com or n8n.
First, install the Anthropic Python library:
pip install anthropic
Here’s a minimal working Python script that triggers a computer use task:
import anthropic
client = anthropic.Anthropic()
# Define the computer use tools
tools = [
{
"type": "computer_20250124",
"name": "computer",
"display_width_px": 1024,
"display_height_px": 768,
"display_number": 1
},
{
"type": "text_editor_20250124",
"name": "str_replace_editor"
},
{
"type": "bash_20250124",
"name": "bash"
}
]
# Your task prompt
task = """
Open Firefox, go to news.ycombinator.com,
find the top 5 post titles on the front page,
and use the bash tool to save them to a file
called /tmp/hn_titles.txt
"""
messages = [{"role": "user", "content": task}]
# Agentic loop
while True:
response = client.beta.messages.create(
model="claude-sonnet-4-5",
max_tokens=4096,
tools=tools,
messages=messages,
betas=["computer-use-2025-01-24"]
)
# Check if we're done
if response.stop_reason == "end_turn":
print("Task complete!")
print(response.content[-1].text)
break
# Handle tool use
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f"Claude is using: {block.name}")
# In production, you'd actually execute the tool here
# and return real results. For testing, we pass a placeholder.
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": "Tool executed successfully"
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
else:
break
Note: In the Docker demo environment, tool execution is handled automatically. In a standalone Python script, you need to implement the actual screenshot/click/type execution yourself, or use a library like pyautogui connected to a virtual display. The Docker demo handles all of this — for most people starting out, stick with the Docker setup and treat the Python API as a reference for when you want to build something custom.
Step 5: Create a Repeatable Form-Filling Workflow
One of the most practical uses I’ve found is automating repetitive form submissions — things like submitting content to multiple directories, filling in job application fields, or updating records across different platforms that don’t have APIs.
Here’s the prompt template I use for form-filling tasks:
I need you to fill out a web form. Here are the details:
URL: [form URL]
Field values to enter:
- Field labeled "Name" or "Full Name": John Smith
- Field labeled "Email": john@example.com
- Field labeled "Company": Acme Corp
- Field labeled "Message" or "Description": [your message here]
- Any dropdown labeled "Industry": Select "Technology"
Instructions:
1. Take a screenshot first to see the current page state
2. If you see a cookie consent popup, dismiss it first
3. Fill each field carefully - click on the field before typing
4. If you see a CAPTCHA, stop and tell me
5. Before submitting, take a screenshot so I can verify the fields
6. Wait for my confirmation before clicking Submit
If any field labels don't match exactly what I described,
use your best judgment to match them by meaning.
The key line here is “wait for my confirmation before clicking Submit.” I never let Claude auto-submit forms in production without a human review step. One wrong submission can cause real problems.
Comparing Claude Computer Use to Other Automation Approaches
Before you go all-in on Claude computer use, here’s how it stacks up against other tools you might already be using:
| Tool / Method | Best For | Handles Unstructured UIs? | Approx. Cost | Technical Skill Needed |
|---|---|---|---|---|
| Claude Computer Use | Tasks with no API, visual interfaces | ✅ Yes | ~$0.10–$0.50 per task | Low–Medium |
| Make.com / Zapier | App-to-app integrations with APIs | ❌ No | $9–$29/month | Low |
| Selenium / Playwright | Predictable, static web scraping | ⚠️ Partially | Free (self-hosted) | High |
| Browser extensions (like Bardeen) | Quick browser-based tasks | ⚠️ Limited | Free–$40/month | Low |
| RPA tools (UiPath, Automation Anywhere) | Enterprise desktop automation | ✅ Yes | $hundreds/month | High |
The sweet spot for Claude computer use is tasks that involve visual interfaces with no API — legacy software, websites that change layout constantly, or workflows that require actual decision-making based on what’s on screen. For simple app integrations, Make.com is still faster and cheaper.
Real Results: What I Automated in My Own Business
Here are three actual workflows I’ve set up and use regularly, with honest time/cost numbers:
Weekly Competitor Monitoring
Every Monday, I run a computer use task that checks 5 competitor sites, looks for new blog posts or product updates, and dumps a summary into a text file. Before: 45 minutes of manual checking. After: I spend 2 minutes reviewing Claude’s summary. Cost per run: about $0.35 in API credits.
LinkedIn Profile Data Collection
I used computer use to collect public profile information from a list of 30 potential collaborators — job titles, company names, recent posts. LinkedIn doesn’t have a public API for this. Claude handled it in about 20 minutes. I would have spent 2 hours doing it manually. Cost: $0.80.
Software Screenshot Documentation
For a product tutorial I was writing, I needed screenshots of a tool’s interface with specific settings configured. I gave Claude the exact states I needed: “Open the settings panel, click on Notifications, enable email alerts, take a screenshot.” It cycled through 12 different states and saved them all. A task that would have taken 30 minutes took 6 minutes.
“`htmlMy Real-World Experience
Last Tuesday I had a deadline: three property listings to write, a round of follow-up emails to send, and a CMA report to finish before a 4 PM call with a buyer couple from Lisbon. The kind of afternoon that used to mean skipping lunch and still feeling behind. I decided to finally put Claude’s computer use feature to a real test instead of just reading about it.
I set it up to pull recent sale prices from two local property portals I check manually every week, drop the numbers into a spreadsheet I already had open, and draft a neighbourhood price trend summary based on what it found. Watching it actually move the cursor, open tabs, copy data, and switch back to the spreadsheet was genuinely strange — in a good way. That research task alone usually takes me 40 to 50 minutes. Claude got through it in just under 12. I used the time to write the listing descriptions myself, which is honestly the part clients notice most and where I don’t want shortcuts.
Over 9 days of testing it across different tasks, I saved roughly 3 hours of admin and research work per week. For a solo operator charging by results and not by hours, that time goes straight back into prospecting and client calls — which is where money actually comes from.
Now the honest part: it is not reliable enough yet to leave running unsupervised. On two occasions it got stuck in a loop clicking the wrong element on a portal page, and once it misread a price field because of a formatting difference between Portuguese and UK number conventions. You have to babysit it for anything that touches client-facing output. That frustrated me more than I expected, because the whole promise is saving attention, not just time.
If the article rating applies here, I’d give Claude computer use a solid 7 out of 10 for solo real estate work — powerful for research and data gathering, but not yet trustworthy enough to automate anything a client will see without a human review pass.
Bottom line: If you are a solo agent drowning in market research, report prep, and repetitive admin, this is worth the learning curve and the API cost. Just stay in the room while it works — treat it like a capable intern on their first week, not an autopilot.
“`Troubleshooting: When Claude Computer Use Breaks
This feature isn’t flawless. Here are the most common failure modes and how to fix them:
Problem: Claude clicks the wrong element
Fix: Add more context to your prompt about the visual layout. Instead of “click the Submit button,” say “click the blue Submit button in the bottom right corner of the form.” The more specific your spatial description, the better. You can also ask Claude to take a screenshot first and describe what it sees before acting.
Problem: Task gets stuck in a loop
Fix: Set a max_tokens limit and add this to your prompt: “If you attempt any single action more than 3 times without success, stop and report what you’re seeing instead of retrying.” This prevents Claude from burning through your API credits trying the same failed click repeatedly.
Problem: Docker container crashes or won’t start
Fix: Run docker ps -a to check container status. If it exited with an error, run docker logs [container-id] to see why. The most common cause is the API key not being passed correctly. Double-check that your environment variable is set in the same terminal session where you’re running Docker.
Problem: High costs from long tasks
Fix: Each screenshot Claude takes gets sent back as a base64-encoded image, which is expensive in tokens. Break long tasks into smaller chunks rather than one massive prompt. Also, use claude-sonnet-4 instead of claude-opus-4 for tasks that don’t require heavy reasoning — it’s roughly 5x cheaper per token.
Problem: Claude refuses to proceed due to safety concerns
Fix: Claude has built-in
Robson Penassi
Real estate consultant in Madeira, Portugal. Solopreneur since 2012. Testing AI tools since 2023 to automate his one-person business. Writes about what actually works — and what does not.
More articles by Robson →