MCP Server Configuration for Production Agents
MCP servers are the agent's hands. Without them, Claude Code can read files and run shell commands. With them, it can query Google Search Console, pull GA4 analytics, manage Cloudflare Workers, interact with browsers, search documentation, and call any API with a published MCP server.
Here is the MCP configuration that powers my analytics oracle agent — the one that runs every morning at 9:03 AM IST and produces a daily situation report:
// ~/.claude/settings.json (user scope — available to all projects)
{
"mcpServers": {
"gsc": {
"command": "/Users/me/.local/bin/mcp-gsc",
"env": { "GSC_SKIP_OAUTH": "true" }
},
"ga4": {
"command": "/Users/me/.local/bin/ga4-mcp-server",
"env": {
"GA4_PROPERTY_ID": "529733024",
"GOOGLE_APPLICATION_CREDENTIALS": "/Users/me/.config/google-seo-mcp/service-account.json"
}
},
"cloudflare": {
"command": "npx",
"args": ["-y", "@anthropic-ai/mcp-cloudflare"],
"env": {
"CLOUDFLARE_ACCOUNT_ID": "a319e...",
"CLOUDFLARE_API_TOKEN": "cfut_..."
}
}
}
}
Three servers. Not twelve. I experimented with adding Slack, GitHub, Notion, and Playwright MCP servers simultaneously. The result: tool selection accuracy dropped from roughly 95% to below 80%. The model would choose a Slack tool when it meant to use GitHub, or attempt a Playwright screenshot when a simple curl would suffice. The 3-5 server sweet spot is not a suggestion — it is a measured threshold.
For project-scoped servers that the team shares, put the configuration in .claude/settings.json at the project root and commit it to git. For user-scoped servers with personal credentials, use ~/.claude/settings.json. Never commit API tokens to project-scoped config.
Hooks: The Guardrails That Actually Work
The single most important lesson from 14 weeks of production agents: do not rely on the model to enforce constraints. Models are probabilistic. Hooks are deterministic. If an action must never happen — a force push to main, a database flush without confirmation, a deploy during an active CI run — encode that constraint in a hook, not in a prompt.
Here is a real hook from my production setup that prevents accidental destructive git operations:
// .claude/settings.json
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"command": "echo '$TOOL_INPUT' | python3 -c "import sys,json; cmd=json.load(sys.stdin).get('command',''); bad=['git push --force','git reset --hard','FLUSHDB','DROP TABLE']; sys.exit(2 if any(b in cmd for b in bad) else 0)""
}
]
}
}
Exit code 2 blocks the tool call. The model receives a rejection message and must find an alternative approach. This is not a suggestion to the model — it is a physical wall. The model cannot push force, cannot hard reset, cannot flush the database, no matter how convincing its reasoning.
The SessionStart hook is equally powerful for agent initialization. My production setup runs a health check script at the start of every session that verifies container status, checks for uncommitted changes, validates environment variables, and confirms MCP server connectivity. If any check fails, the agent starts with full context about what is broken — instead of discovering it 10 minutes into a task after modifying files that should not have been touched.
{
"hooks": {
"SessionStart": [
{
"command": "cd storefront && npx tsx scripts/health-check.ts 2>&1 | head -50"
}
]
}
}
Subagent Coordination Without an Orchestrator
The pattern that changed everything was realizing that Claude Code's built-in Agent tool — the ability to spawn subagents — eliminates the need for external orchestration frameworks. A parent session can launch multiple subagents in parallel, each with a specific brief, and aggregate their results.
Here is how my growth coordinator agent works. It is a single CLAUDE.md-defined agent that spawns 5 specialist subagents every Monday morning:
# Growth Coordinator — Weekly Multi-Agent Run
## Workflow
1. Spawn analytics-oracle agent → daily metrics + anomalies
2. Spawn seo-dominator agent → keyword gaps + ranking changes
3. Spawn content-architect agent → content calendar + topic gaps
4. Spawn cro-assassin agent → conversion funnel analysis
5. Spawn competitive-intel agent → competitor price/feature delta
## Coordination Rules
- Launch agents 1-3 in parallel (independent data)
- Wait for analytics-oracle results before launching CRO agent (needs baseline)
- Aggregate all results into weekly synthesis report
- Post synthesis to Telegram channel
Each specialist agent is defined as a Markdown file in .claude/agents/ with its own system prompt, tool access, and output format. The parent agent reads their results and synthesizes. No LangGraph. No CrewAI. No custom Python orchestration code. The orchestration is declarative Markdown, and the execution is Claude Code's native subagent primitive.
The critical constraint I learned the hard way: subagents must never push to git independently. Early in my setup, I had parallel build agents each committing and pushing their changes. Three pushes in rapid succession triggered three simultaneous deploys, all racing on Docker Compose, resulting in a 502 outage. The fix: all subagents write their changes to files. The parent agent reviews, commits once, and pushes once.
Cost Management: Model Tiering in Practice
Running 9 production agents without cost discipline would be financially irresponsible. Anthropic's current pricing — Opus at $5/$25 per million input/output tokens, Sonnet at $3/$15, Haiku at $1/$5 — means model selection directly determines whether your agent pipeline costs $50/month or $500/month for the same work.
The tiering system I use after extensive experimentation:
| Task Class | Model | Monthly Cost (est.) | Why This Tier |
| Trust-boundary code (payment, auth, webhooks) | Opus | ~$30 | Security errors are expensive; Opus catches edge cases Sonnet misses |
| Feature implementation, routine edits | Sonnet | ~$60 | 80% of work; Sonnet is fast and accurate for known patterns |
| Batch text (SEO meta, descriptions, formatting) | Haiku | ~$8 | Mechanical work; 10x cheaper, same quality for substitution tasks |
| Cross-provider audit | Codex (GPT-5.4) | ~$5 | Different model catches different bugs; found 8 issues Claude missed |
The cross-provider audit is the most counterintuitive line item. I run OpenAI's Codex on trust-boundary code after Claude reviews it. In May 2026, a Codex audit found that my cache header ordering was exposing checkout pages at Cloudflare's edge — a bug that had been live for weeks and that Claude had not flagged across multiple reviews. Different models have different blind spots. For code that handles money or authentication, spending an extra $5/month on a second opinion is trivially worth it.
Prompt caching reduces input costs by 90% for repeated context. If your agent loads the same CLAUDE.md, the same tool definitions, and the same project context on every run, the cache hit rate is extremely high after the first invocation. My analytics oracle agent — which runs daily with the same system prompt — costs roughly $0.40 per run after caching, compared to $2.80 without it.
Claude Code vs Cursor vs Codex: When to Use What
This is not a which-is-best comparison. Each tool has a genuine sweet spot, and using the wrong one for a task wastes time and money.
Claude Code wins at: codebase-wide analysis, multi-file refactors, agent orchestration, CI/CD integration, and any task where terminal-native execution matters. One benchmark showed Claude Code completing a task in 33,000 tokens that consumed 188,000 tokens in Cursor's agent mode — a 5.7x efficiency advantage for complex, cross-file operations. Claude Code also has the deepest extension system (CLAUDE.md + MCP + hooks + skills + subagents) of any AI coding tool.
Cursor wins at: in-editor work. If you are editing a single file, navigating code visually, or doing rapid inline iterations, Cursor's VS Code integration is faster than switching to a terminal. Cursor 3.3's Bugbot — which monitors CI and proposes fixes automatically — is a genuine time-saver for teams with extensive test suites.
Codex wins at: long-running autonomous tasks. OpenAI's cloud-based architecture lets Codex work on a problem for hours without maintaining a local session. For tasks like "migrate this 500-file codebase from JavaScript to TypeScript" or "write comprehensive tests for every untested module," Codex's patience and autonomy are unmatched.
My production workflow uses all three. Claude Code is the primary agent runtime — it runs the daily pipelines, handles deploys, and manages the codebase. Cursor is open alongside it for visual editing sessions. Codex runs periodic deep audits that benefit from its multi-hour attention span.
The Nine Agents: What They Do and What They Cost
Here is the actual agent inventory running in production, with real monthly costs after 14 weeks of operation:
| Agent | Schedule | Model | Monthly Cost | What It Does |
| Analytics Oracle | Daily 9:03 AM | Sonnet | $12 | Pulls GSC + GA4 via MCP, identifies anomalies, produces 3 ship-now actions |
| SEO Research Pipeline | Every 4 hours | Haiku + Sonnet | $25 | Monitors keyword opportunities, competitor content, SERP changes |
| Deploy Watchdog | Every 60 seconds | Bash only | $0 | Checks container health, auto-rolls back on 3 consecutive failures |
| Content Syndication | Every 6 hours | Haiku | $8 | Cross-posts to Hashnode, Dev.to, Blogger with canonical URLs |
| Growth Coordinator | Weekly Monday | Opus | $15 | Spawns 5 specialist subagents, synthesizes weekly report |
| Verification Agent | On-demand | Sonnet | $10 | Read-only adversarial review of every 3+ file change |
| Blog Writer | On-demand | Sonnet | $20 | Researches topic, writes SEO-optimized post, builds, commits, pushes |
| Tool Builder | On-demand | Sonnet | $15 | Builds browser-based tools with UI, registry, sitemap integration |
| Product QA | Weekly | Haiku | $5 | Sweeps 2,000+ products for metadata completeness, dead links, schema |
Total: approximately $110/month. The deploy watchdog costs nothing — it is a pure bash script with no AI component, checking HTTP status codes and triggering Docker rollbacks. The most expensive agent is the SEO research pipeline at $25/month, driven by its 6x daily frequency and the Sonnet calls required to analyze SERP data meaningfully.
Failure Modes: What Broke and How I Fixed It
No honest guide about production agents can skip the failures. Here are the five most expensive lessons from 14 weeks:
Failure 1: The split-brain deploy. Two automation systems — GitHub Actions and a legacy VPS cron — both believed they owned the deploy process. They raced on docker compose down/up, producing intermittent 502 errors that looked random but were actually deterministic conflicts. Fix: killed the legacy cron, added a concurrency group to GitHub Actions, added a filesystem lock (/tmp/wowhow-deploy.lock) as a secondary gate. Three layers of protection because one layer was not enough.
Failure 2: The noindex massacre. An agent applied robots: { index: false } to 2,600 pages — every product, topic hub, GST reference, and collection page — based on a reasonable-sounding interpretation of "hide thin content from Google." Impressions crashed from 7,500/day to near zero within a week. The fix took 8 days to fully reverse. Lesson: agents must never make bulk SEO changes without explicit human approval, regardless of how logical the reasoning sounds. This is now CLAUDE.md Rule 20.
Failure 3: The social media suspension. A content syndication agent posted 190 Mastodon toots in 15 minutes. The API allowed it — rate limits were not exceeded. But the instance moderators flagged it as spam and suspended the account permanently. API rate limits and platform moderation policies are different things. The agent now has a hard cap: 1 post per 30-60 minutes, maximum 20-30 per day, on any social platform.
Failure 4: The OAuth cascade. Deleting an old Google Cloud OAuth client — which seemed like a cleanup task — invalidated the refresh tokens used by three different scripts on the VPS. The daily analytics report, the GSC sitemap submission, and the GA4 data pipeline all failed silently. None of them had alerting configured for authentication failures. Fix: every API-dependent script now checks its authentication status before executing and sends a Telegram alert on failure.
Failure 5: The parallel push disaster. Three subagents ran in parallel, each making changes to different files. Each one committed and pushed independently. Three pushes triggered three GitHub Actions deploys. All three SSHed into the VPS simultaneously and raced on Docker Compose. Result: containers in an inconsistent state, Redis health checks failing, 502 for 12 minutes. Fix: subagents write files but never commit. The parent agent handles all git operations as a single atomic batch.
Getting Started: Your First Production Agent in 30 Minutes
The fastest path from zero to a running production agent:
Step 1: Install Claude Code. If you have not already: npm install -g @anthropic-ai/claude-code. Verify with claude --version. You need a Pro ($20/month) or Max ($100-200/month) subscription, or an API key.
Step 2: Create your CLAUDE.md. Start minimal. Write three things: what the project is, what the agent should never do, and how to verify its work. You will add rules as you discover failure modes — this is expected and healthy.
# My Project
## What This Is
Node.js API server with PostgreSQL. Deployed via Docker on a VPS.
## Hard Rules
- Never run DROP TABLE or TRUNCATE without explicit user confirmation
- Never push to main without running tests first
- Never modify .env files
## Verification
After changes, run: npm test && npm run build
Both must pass before committing.
Step 3: Add one MCP server. Start with something useful and low-risk. The GitHub MCP server is a good first choice:
claude mcp add --scope project --transport http github https://api.githubcopilot.com/mcp/
Step 4: Create your first hook. A SessionStart hook that shows git status gives the agent immediate context about what state the project is in:
// .claude/settings.json
{
"hooks": {
"SessionStart": [
{ "command": "git status && git log --oneline -5" }
]
}
}
Step 5: Create your first subagent. A code review agent that runs read-only and checks your work:
// .claude/agents/reviewer.md
# Code Reviewer
Review the most recent changes for:
- Security vulnerabilities (OWASP Top 10)
- Performance issues
- Missing error handling at system boundaries
- Adherence to project conventions in CLAUDE.md
Report findings as: file:line — issue — severity (HIGH/MEDIUM/LOW)
Do NOT modify any files. Read-only analysis only.
Step 6: Run it. Open Claude Code, type /agents to see your available agents, and invoke the reviewer after making some changes. Watch what it catches. Refine the agent's instructions based on what it misses or flags incorrectly.
That is a functional agent setup in under 30 minutes. From here, the path is incremental: add rules to CLAUDE.md when things break, add MCP servers when you need external tool access, add hooks when you need hard guardrails, and add subagents when tasks become complex enough to benefit from specialization.
What I Would Do Differently
If I were starting over with everything I know now, three changes would save weeks of debugging:
First, I would write the verification protocol into CLAUDE.md on day one — not after the third silent production break. The pattern is simple: after any change touching more than two files, spawn a read-only verification agent before claiming the work is done. This catches roughly 40% of the bugs that would otherwise reach production.
Second, I would set up Telegram alerting for every API-dependent automation from the start. Silent failures are the most expensive kind. An agent that fails loudly costs you 5 minutes. An agent that fails silently costs you days of stale data and missed opportunities before you notice.
Third, I would resist the temptation to add MCP servers aggressively. My initial setup had 8 servers connected. Tool selection accuracy dropped. Response times increased. Context windows filled with tool definitions instead of project context. I cut back to 3-5 per agent and quality improved immediately.
The production agent landscape in 2026 is still early. Claude Code, Cursor, Codex, and the dozens of agent frameworks competing for adoption are all improving rapidly. But the fundamentals — clear constraints, hard guardrails, cost discipline, and verification before deployment — will outlast any specific tool. Build those habits into your agent architecture from the start, and the specific tools become interchangeable.
Every tool and template mentioned in this guide is available at wowhow.cloud. The Claude Code Routines Recipe Pack includes production-tested CLAUDE.md templates, hook configurations, and agent definitions you can adapt to your own projects. The Token Counter and AI API Cost Calculator help estimate costs before committing to an agent architecture.
Sources
- Claude Code Agent SDK Documentation — Anthropic (2026)
- Claude API Pricing — Anthropic (2026)
- Codex vs Claude Code: Comprehensive Comparison — Builder.io (2026)
- AI Coding Assistant Statistics — Uvik (2026)
- MCP Server Configuration — Claude Code Docs (2026)
- Claude Code Changelog — Anthropic (2026)
Comments · 0
No comments yet. Be the first to share your thoughts.