Step-by-step tutorial: build a real AI agent using Claude API and MCP in 30 minutes. Includes code, architecture, error handling, and deployment. No fluff.
A working AI agent that can read files, search the web, and execute multi-step tasks takes exactly 30 minutes to build from scratch. Not a demo. Not a toy. A production-capable agent with proper error handling, tool use, and a deployment path. This tutorial uses the Claude API with Model Context Protocol (MCP), and by the end you will have an agent running that can autonomously plan, act, observe results, and decide what to do next. Every line of code is included in both Python and TypeScript.
What You Are Building
The agent you will build follows this architecture:
User Request
|
v
[PLAN] — Agent analyzes the request and decides what tools to use
|
v
[ACT] — Agent calls one or more tools (file read, web search, code execution)
|
v
[OBSERVE] — Agent examines the tool results
|
v
[DECIDE] — Agent determines if the task is complete or if more steps are needed
|
v
(Loop back to PLAN if not done, or return final answer)
This Plan-Act-Observe-Decide (PAOD) loop is the fundamental architecture behind every production AI agent, from Claude Code to Devin to custom enterprise agents. The only differences between a tutorial agent and a production agent are the number of tools, the sophistication of the system prompt, and the robustness of the error handling. We will address all three.
Prerequisites
You need three things:
- An Anthropic API key — Get one at console.anthropic.com. The free tier is sufficient for this tutorial.
- Python 3.11+ or Node.js 20+ — Both code versions are provided. Pick whichever you are more comfortable with.
- A terminal — That is it. No frameworks, no Docker (for now), no cloud accounts.
Step 1: Project Setup (2 minutes)
Python
mkdir ai-agent && cd ai-agent
python -m venv venv
source venv/bin/activate # Windows: venv\\Scripts\\activate
pip install anthropic httpx python-dotenv
echo "ANTHROPIC_API_KEY=your-key-here" > .env
TypeScript
mkdir ai-agent && cd ai-agent
npm init -y
npm install @anthropic-ai/sdk dotenv
npm install -D typescript @types/node tsx
npx tsc --init --target ES2022 --module NodeNext --moduleResolution NodeNext
echo "ANTHROPIC_API_KEY=your-key-here" > .env
Step 2: Define Your Tools (5 minutes)
Tools are the capabilities your agent has access to. We will start with three practical tools: reading files, running shell commands, and searching the web. Each tool is defined as a JSON schema that tells the Claude API what parameters the tool accepts and what it does.
Python — tools.py
import subprocess
import httpx
from pathlib import Path
TOOLS = [
{
"name": "read_file",
"description": "Read the contents of a file at the given path. Returns the file content as a string.",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute or relative path to the file to read"
}
},
"required": ["path"]
}
},
{
"name": "run_command",
"description": "Execute a shell command and return stdout and stderr. Use for file operations, git commands, or system tasks.",
"input_schema": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "The shell command to execute"
},
"timeout": {
"type": "integer",
"description": "Maximum execution time in seconds (default: 30)",
"default": 30
}
},
"required": ["command"]
}
},
{
"name": "web_search",
"description": "Search the web for current information. Returns top results with titles, URLs, and snippets.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
}
]
def execute_tool(name: str, input_data: dict) -> str:
"""Execute a tool and return the result as a string."""
if name == "read_file":
return _read_file(input_data["path"])
elif name == "run_command":
return _run_command(
input_data["command"],
input_data.get("timeout", 30)
)
elif name == "web_search":
return _web_search(input_data["query"])
else:
return f"Error: Unknown tool '{name}'"
def _read_file(path: str) -> str:
try:
content = Path(path).read_text(encoding="utf-8")
if len(content) > 50_000:
return content[:50_000] + "\\n... [truncated, file exceeds 50K chars]"
return content
except FileNotFoundError:
return f"Error: File not found at '{path}'"
except PermissionError:
return f"Error: Permission denied reading '{path}'"
except Exception as e:
return f"Error reading file: {type(e).__name__}: {e}"
def _run_command(command: str, timeout: int) -> str:
# Safety: block destructive commands
blocked = ["rm -rf /", "mkfs", "dd if=", "> /dev/sd"]
if any(b in command for b in blocked):
return "Error: This command is blocked for safety reasons."
try:
result = subprocess.run(
command,
shell=True,
capture_output=True,
text=True,
timeout=timeout,
cwd="."
)
output = ""
if result.stdout:
output += f"stdout:\\n{result.stdout}"
if result.stderr:
output += f"\\nstderr:\\n{result.stderr}"
if not output.strip():
output = "(command completed with no output)"
return output[:20_000] # Truncate very long outputs
except subprocess.TimeoutExpired:
return f"Error: Command timed out after {timeout} seconds"
except Exception as e:
return f"Error executing command: {type(e).__name__}: {e}"
def _web_search(query: str) -> str:
# Using a free search API — replace with your preferred provider
try:
resp = httpx.get(
"https://api.duckduckgo.com/",
params={"q": query, "format": "json", "no_html": 1},
timeout=10
)
data = resp.json()
results = []
if data.get("AbstractText"):
results.append(f"Summary: {data['AbstractText']}")
results.append(f"Source: {data.get('AbstractURL', 'N/A')}")
for topic in data.get("RelatedTopics", [])[:5]:
if isinstance(topic, dict) and "Text" in topic:
results.append(f"- {topic['Text']}")
if not results:
return f"No results found for '{query}'. Try rephrasing."
return "\\n".join(results)
except Exception as e:
return f"Search error: {type(e).__name__}: {e}"
TypeScript — tools.ts
import { execSync } from "child_process";
import { readFileSync } from "fs";
export interface Tool {
name: string;
description: string;
input_schema: Record;
}
export const TOOLS: Tool[] = [
{
name: "read_file",
description:
"Read the contents of a file at the given path. Returns the file content as a string.",
input_schema: {
type: "object",
properties: {
path: {
type: "string",
description: "Absolute or relative path to the file to read",
},
},
required: ["path"],
},
},
{
name: "run_command",
description:
"Execute a shell command and return stdout and stderr. Use for file operations, git commands, or system tasks.",
input_schema: {
type: "object",
properties: {
command: {
type: "string",
description: "The shell command to execute",
},
timeout: {
type: "integer",
description: "Maximum execution time in seconds (default: 30)",
},
},
required: ["command"],
},
},
{
name: "web_search",
description:
"Search the web for current information. Returns top results with titles, URLs, and snippets.",
input_schema: {
type: "object",
properties: {
query: {
type: "string",
description: "The search query",
},
},
required: ["query"],
},
},
];
export function executeTool(
name: string,
inputData: Record
): string {
switch (name) {
case "read_file":
return readFile(inputData.path as string);
case "run_command":
return runCommand(
inputData.command as string,
(inputData.timeout as number) ?? 30
);
case "web_search":
return webSearch(inputData.query as string);
default:
return \`Error: Unknown tool '\${name}'\`;
}
}
function readFile(path: string): string {
try {
const content = readFileSync(path, "utf-8");
if (content.length > 50_000) {
return content.slice(0, 50_000) + "\\n... [truncated]";
}
return content;
} catch (err) {
const e = err as NodeJS.ErrnoException;
if (e.code === "ENOENT") return \`Error: File not found at '\${path}'\`;
if (e.code === "EACCES") return \`Error: Permission denied reading '\${path}'\`;
return \`Error reading file: \${e.message}\`;
}
}
function runCommand(command: string, timeout: number): string {
const blocked = ["rm -rf /", "mkfs", "dd if=", "> /dev/sd"];
if (blocked.some((b) => command.includes(b))) {
return "Error: This command is blocked for safety reasons.";
}
try {
const output = execSync(command, {
timeout: timeout * 1000,
encoding: "utf-8",
maxBuffer: 1024 * 1024,
});
return output.slice(0, 20_000) || "(command completed with no output)";
} catch (err) {
const e = err as Error & { stdout?: string; stderr?: string };
const parts: string[] = [];
if (e.stdout) parts.push(\`stdout: \${e.stdout}\`);
if (e.stderr) parts.push(\`stderr: \${e.stderr}\`);
if (parts.length === 0) parts.push(\`Error: \${e.message}\`);
return parts.join("\\n").slice(0, 20_000);
}
}
Note: The TypeScript web search implementation follows the same DuckDuckGo pattern as the Python version — use the native fetch API available in Node.js 20+.
Step 3: The System Prompt (3 minutes)
The system prompt is the most underestimated component of an AI agent. It defines the agent’s behavior, constraints, and personality. A bad system prompt produces an agent that hallucinates tool calls, refuses reasonable requests, or loops endlessly. A good system prompt produces an agent that is reliable, efficient, and useful.
SYSTEM_PROMPT = """You are a capable AI agent with access to tools for reading files,
running shell commands, and searching the web.
## How You Work
1. PLAN: Analyze the user's request. Break complex tasks into steps.
2. ACT: Use your tools to execute each step. Call one or more tools per turn.
3. OBSERVE: Examine tool results carefully. Check for errors.
4. DECIDE: If the task is complete, provide your final answer. If not, continue.
## Rules
- Always verify assumptions with tools before acting on them.
- If a command fails, read the error message carefully and try a different approach.
- Never guess at file contents — read them.
- Never assume a command succeeded — check the output.
- If you cannot complete a task after 3 attempts, explain what went wrong and suggest
alternatives.
- Be concise in your final answers. Users want results, not narration.
## Safety
- Never execute commands that delete system files or modify system configuration.
- Never expose API keys, passwords, or sensitive data in your responses.
- If a request seems dangerous, explain the risk and ask for confirmation.
"""
Step 4: The Agent Loop (10 minutes)
This is the core of your agent. The agent loop sends messages to the Claude API, detects when the model wants to use a tool, executes that tool, and feeds the result back to continue the conversation. The loop continues until the model produces a final text response without requesting any more tool calls.
Python — agent.py
import os
import json
from dotenv import load_dotenv
from anthropic import Anthropic
from tools import TOOLS, execute_tool
load_dotenv()
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
MODEL = "claude-sonnet-4-20250514"
MAX_TURNS = 20 # Safety limit to prevent infinite loops
def run_agent(user_message: str) -> str:
"""Run the agent loop for a single user request."""
messages = [{"role": "user", "content": user_message}]
turn_count = 0
while turn_count < MAX_TURNS:
turn_count += 1
print(f"\\n--- Agent Turn {turn_count} ---")
# Call Claude with tools
response = client.messages.create(
model=MODEL,
max_tokens=4096,
system=SYSTEM_PROMPT,
tools=TOOLS,
messages=messages,
)
# Check stop reason
if response.stop_reason == "end_turn":
# Model is done — extract final text
final_text = ""
for block in response.content:
if block.type == "text":
final_text += block.text
return final_text
elif response.stop_reason == "tool_use":
# Model wants to use tools — execute them
assistant_content = response.content
messages.append({"role": "assistant", "content": assistant_content})
# Process each tool call
tool_results = []
for block in assistant_content:
if block.type == "tool_use":
print(f" Tool: {block.name}({json.dumps(block.input, indent=2)[:200]})")
result = execute_tool(block.name, block.input)
print(f" Result: {result[:200]}...")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
else:
# Unexpected stop reason
return f"Agent stopped unexpectedly: {response.stop_reason}"
return "Error: Agent reached maximum turn limit. Task may be too complex."
# Interactive mode
if __name__ == "__main__":
print("AI Agent ready. Type 'quit' to exit.\\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit", "q"):
break
if not user_input:
continue
result = run_agent(user_input)
print(f"\\nAgent: {result}\\n")
TypeScript — agent.ts
import Anthropic from "@anthropic-ai/sdk";
import { TOOLS, executeTool } from "./tools.js";
import { createInterface } from "readline";
import "dotenv/config";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const MODEL = "claude-sonnet-4-20250514";
const MAX_TURNS = 20;
const SYSTEM_PROMPT = `You are a capable AI agent with access to tools...`; // Same as Python version
interface Message {
role: "user" | "assistant";
content: unknown;
}
async function runAgent(userMessage: string): Promise {
const messages: Message[] = [
{ role: "user", content: userMessage },
];
let turnCount = 0;
while (turnCount < MAX_TURNS) {
turnCount++;
process.stdout.write(\`\\n--- Agent Turn \${turnCount} ---\\n\`);
const response = await client.messages.create({
model: MODEL,
max_tokens: 4096,
system: SYSTEM_PROMPT,
tools: TOOLS as Anthropic.Messages.Tool[],
messages: messages as Anthropic.Messages.MessageParam[],
});
if (response.stop_reason === "end_turn") {
return response.content
.filter((b): b is Anthropic.Messages.TextBlock => b.type === "text")
.map((b) => b.text)
.join("");
}
if (response.stop_reason === "tool_use") {
messages.push({ role: "assistant", content: response.content });
const toolResults = response.content
.filter(
(b): b is Anthropic.Messages.ToolUseBlock => b.type === "tool_use"
)
.map((block) => {
const input = block.input as Record;
process.stdout.write(\` Tool: \${block.name}\\n\`);
const result = executeTool(block.name, input);
process.stdout.write(\` Result: \${result.slice(0, 200)}...\\n\`);
return {
type: "tool_result" as const,
tool_use_id: block.id,
content: result,
};
});
messages.push({ role: "user", content: toolResults });
}
}
return "Error: Agent reached maximum turn limit.";
}
// Interactive mode
const rl = createInterface({ input: process.stdin, output: process.stdout });
console.log("AI Agent ready. Type 'quit' to exit.\\n");
function prompt() {
rl.question("You: ", async (input) => {
const trimmed = input.trim();
if (["quit", "exit", "q"].includes(trimmed.toLowerCase())) {
rl.close();
return;
}
if (trimmed) {
const result = await runAgent(trimmed);
console.log(\`\\nAgent: \${result}\\n\`);
}
prompt();
});
}
prompt();
Step 5: Run Your Agent (1 minute)
Python
python agent.py
TypeScript
npx tsx agent.ts
Try these test prompts to verify it works:
Read the README.md file in the current directory and summarize it.What files are in the current directory? List them with their sizes.Search the web for the latest Claude API pricing and summarize the costs.Create a new file called hello.txt with the content "Hello from my AI agent".
Each of these prompts will trigger the PAOD loop: the agent will plan which tool to use, execute it, observe the result, and either continue or return a final answer.
Step 6: Add MCP Integration (5 minutes)
Model Context Protocol (MCP) is the open standard for connecting AI agents to external tools and data sources. Instead of hardcoding tools into your agent, MCP lets you connect to tool servers that expose capabilities dynamically. This means your agent can gain new abilities without code changes.
To add MCP support, install the MCP SDK:
Python
pip install mcp
TypeScript
npm install @modelcontextprotocol/sdk
MCP servers expose tools through a standardized protocol. Your agent discovers available tools at startup, and the Claude API handles tool calling the same way whether the tool is local or provided via MCP. The key architectural insight is that MCP separates tool definition (what the tool can do) from tool execution (how the tool does it), which means your agent code stays clean regardless of how many tools you add.
For a production MCP setup, you will want to configure tool servers in a JSON file that your agent reads at startup. Each server exposes one or more tools that become available to the agent automatically. The Claude API documentation at console.anthropic.com provides the complete MCP integration guide — our guide to Claude Code's multi-agent coordination covers how MCP works in the context of production agent systems.
Step 7: Error Handling That Actually Works (4 minutes)
The difference between a demo agent and a production agent is error handling. Here are the three error categories your agent must handle and the patterns that address each one.
1. Tool Execution Errors
Tools will fail. Files will not exist. Commands will timeout. APIs will return 500s. The critical design decision is: do you tell the agent about the error, or do you retry silently?
Always tell the agent. The agent is better at deciding what to do about an error than your error handling code is. Return the error message as the tool result, and let the agent decide whether to retry, try a different approach, or inform the user.
# The tool result for a failed command looks like:
{
"type": "tool_result",
"tool_use_id": "toolu_abc123",
"content": "Error: File not found at 'config.yaml'. The directory contains: config.json, config.toml, settings.yaml",
"is_error": True # Optional: signals to Claude this was an error
}
2. Agent Loop Errors
The agent can get stuck in loops — calling the same tool repeatedly, alternating between two approaches without making progress, or exceeding the maximum turn limit. The MAX_TURNS constant is your primary defense, but you can add smarter detection:
def detect_loop(messages: list, window: int = 4) -> bool:
"""Detect if the agent is repeating the same tool calls."""
recent_tools = []
for msg in messages[-window:]:
if isinstance(msg.get("content"), list):
for block in msg["content"]:
if hasattr(block, "name"):
recent_tools.append(f"{block.name}:{json.dumps(block.input)}")
# If the last N tool calls are identical, we are in a loop
if len(recent_tools) >= 2 and len(set(recent_tools)) == 1:
return True
return False
3. API Errors
The Claude API can return rate limit errors (429), server errors (500), or overloaded errors (529). Implement exponential backoff with jitter:
import time
import random
def call_with_retry(fn, max_retries: int = 3):
for attempt in range(max_retries):
try:
return fn()
except Exception as e:
if "rate_limit" in str(e).lower() or "overloaded" in str(e).lower():
wait = (2 ** attempt) + random.uniform(0, 1)
print(f" Retrying in {wait:.1f}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait)
else:
raise
raise Exception(f"Failed after {max_retries} retries")
Step 8: Deploy to Production (5 minutes)
Your agent works locally. Now make it accessible. The simplest production deployment is a FastAPI (Python) or Express (TypeScript) wrapper that exposes your agent as an HTTP endpoint.
Python — server.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from agent import run_agent
app = FastAPI(title="AI Agent API")
class AgentRequest(BaseModel):
message: str
class AgentResponse(BaseModel):
result: str
success: bool
@app.post("/agent", response_model=AgentResponse)
async def agent_endpoint(req: AgentRequest):
try:
result = run_agent(req.message)
return AgentResponse(result=result, success=True)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Deploy with:
pip install fastapi uvicorn
uvicorn server:app --host 0.0.0.0 --port 8000
For production, add authentication (API key header), rate limiting (slowapi), request logging, and run behind a reverse proxy like Nginx or Traefik. If you need to validate and format JSON data flowing through your agent pipeline, our JSON formatter is useful for debugging tool inputs and outputs during development.
Architecture Decisions That Matter
Now that the code is working, here are the architectural decisions that separate production agents from tutorial demos:
Stateless vs. Stateful Agents
The agent we built is stateless — each request starts fresh. For simple task execution, this is fine. For agents that need to maintain context across multiple interactions (a coding assistant that remembers your project structure, a research agent that builds on previous findings), you need conversation persistence. Store the messages array in a database (PostgreSQL with JSONB, Redis for fast access) and load it at the start of each turn.
Single-Agent vs. Multi-Agent
The PAOD loop works for tasks that a single agent can handle. For complex tasks that benefit from specialization — one agent for research, one for coding, one for review — you need a multi-agent architecture where a coordinator agent delegates subtasks to specialist agents. This is exactly how Claude Code's agent teams work, and our multi-agent coordination guide covers the orchestration patterns in detail.
Tool Selection at Scale
Three tools is manageable. Thirty tools is not — the model's ability to select the right tool degrades as the tool count increases. The solution is tool routing: categorize tools into groups, and use a lightweight initial classification step to determine which tool group is relevant before presenting the full tool definitions. MCP servers handle this naturally by exposing tools in namespaced groups.
Streaming for Better UX
In production, users should not wait for the entire agent loop to complete before seeing output. Use the Claude API's streaming mode to show the agent's thinking in real-time. Replace client.messages.create() with client.messages.stream() and yield text blocks as they arrive. This turns a 30-second wait into a 30-second experience where the user watches the agent work.
Testing Your Agent
Agent testing requires a different approach than traditional unit testing because the agent's behavior is non-deterministic. Here is a practical testing framework:
TEST_CASES = [
{
"input": "What files are in the current directory?",
"must_use_tool": "run_command",
"output_must_contain": ["agent.py", "tools.py"],
},
{
"input": "Read the file nonexistent_file_12345.txt",
"must_use_tool": "read_file",
"output_must_contain": ["not found", "error"],
},
{
"input": "What is 2 + 2?",
"must_not_use_tool": True, # Should answer directly
"output_must_contain": ["4"],
},
]
Run each test case, check that the correct tools were called (or not called), and verify that the output contains expected strings. This is not deterministic testing — it is behavioral testing that verifies the agent makes reasonable decisions. Run the suite 3 times and check for consistency. If you are validating output formats, our regex playground is helpful for building patterns that match the expected agent output structure.
Cost Optimization
Agent loops are expensive because each turn is a full API call with the growing conversation history. Here are concrete optimizations:
- Truncate tool outputs: The 50K and 20K character limits in our tool implementations are not arbitrary — they prevent a single tool result from consuming the entire context window.
- Use claude-sonnet-4-20250514 for the agent loop: It is faster and cheaper than Opus while being equally capable at tool selection and planning. Reserve Opus for the final synthesis step if needed.
- Summarize after N turns: If the conversation exceeds 10 turns, inject a summary of previous turns and trim the older messages. This keeps the context window manageable without losing important context.
- Cache tool results: If the agent calls the same tool with the same parameters within a session, return the cached result instead of re-executing.
What You Built and Where to Go Next
In 30 minutes, you have built an AI agent that:
- Accepts natural language instructions
- Plans multi-step task execution
- Reads files and examines their contents
- Executes shell commands with safety guardrails
- Searches the web for current information
- Handles errors gracefully and retries intelligently
- Runs as an HTTP API ready for production deployment
This is not a toy. This is the same architecture that powers production agent systems at companies processing millions of requests daily. The differences between what you have and what they have are scale (more tools, more robust infrastructure), polish (streaming, better UX), and specialization (domain-specific system prompts and tool sets).
The most impactful next steps, in order of priority:
- Add three domain-specific tools relevant to your work (database queries, API calls to services you use, document generation).
- Implement conversation persistence so the agent remembers context across sessions.
- Add streaming output for real-time feedback during long-running tasks.
- Deploy behind authentication so you can access your agent from any device.
- Build a multi-agent pipeline where specialized agents handle different task types.
The agent paradigm is where AI development is heading in 2026. The $12.8 billion flowing into agentic AI infrastructure this quarter is building the ecosystem around exactly the kind of system you just created. The developers who understand how agents work at the code level — not just the concept level — will have a structural advantage as this category matures.
Start with the agent you built today. Add one tool per week. Within a month, you will have a personal AI agent that handles tasks that previously required manual work across multiple applications. That is the real value of understanding how to build from scratch rather than depending on a platform you do not control.