The Anatomy of an AI Agent Exploit
To understand why AI agents have more attack surface than traditional code, you need to understand how they process inputs. A traditional application receives structured inputs through defined interfaces — form fields, API parameters, file uploads — and processes them through code paths that were written by humans who thought explicitly about what valid and invalid inputs look like. AI agents, by contrast, receive natural language inputs through interfaces that are intentionally open-ended, and they process those inputs by generating natural language outputs that then drive system actions.
The attack technique known as “Comment and Control” exploits this property directly.[6] An attacker embeds adversarial instructions in a source file, a README, a configuration file, or any other artifact that an AI coding agent is likely to read as part of its workflow. Those instructions are formatted to appear as legitimate code comments, documentation, or configuration directives. When the AI agent reads the file, it cannot reliably distinguish between instructions from the genuine user and instructions embedded in the code. The model attempts to follow all instructions it perceives as legitimate, including the adversarial ones.
Comment and Control has been demonstrated against Claude Code, Gemini CLI, and GitHub Copilot. The attack vector is not theoretical: security researchers published working proof-of-concept exploits for all three tools within a two-month window in early 2026. The exploits used the agents’ own capabilities — file writing, terminal execution, network requests — against the developers running them. An agent that can edit files and run commands has all the capabilities required to exfiltrate data, modify production configurations, or establish persistent access, once it has been convinced through injected instructions to do so.
The supply chain dimension of AI agent security is equally concerning and less frequently discussed. In February 2026, PyTorch Lightning versions 2.6.2 and 2.6.3 shipped with credential-stealing malware embedded in the package distribution.[7] The malware harvested environment variables — which is where developers typically store API keys, database credentials, and authentication tokens — and transmitted them to a remote server. The attack was discovered within 48 hours and the packages were pulled, but any developer or CI/CD system that installed either version during that window should consider their credentials compromised.
AI coding agents make supply chain attacks more dangerous in two ways. First, agents often install dependencies autonomously, without requiring the developer to explicitly review and approve each installation. An agent tasked with “set up the ML training environment” might install dozens of packages in sequence, and a compromised package in the middle of that installation chain may not trigger any review. Second, agents often have access to environment variables by design, because they need API credentials to call external services on the developer’s behalf. A malicious package installed by an AI agent, in an environment where the agent has access to production credentials, is a substantially more dangerous scenario than the same package installed in an isolated environment.
The aggregated attack surface of an AI coding agent running on a developer’s machine is qualitatively different from traditional developer tooling. An IDE that reads files and provides suggestions operates on a fundamentally different threat model than an agent that reads files, writes files, executes commands, calls external APIs, and installs packages — all autonomously, all with the host machine’s permissions, all in response to natural language instructions that an attacker may have had a hand in crafting.
What I Found When I Audited My Own Stack
I run a production Next.js application on a Hostinger VPS, behind Cloudflare, with Razorpay handling payments, Redis for session state, and a WordPress headless CMS for content. The stack has been in production for over a year. It processes real payments from real users. Significant portions of the codebase were written with Claude Code assistance over the past eight months.
After reading the Sherlock and DryRun reports in early May, I decided to audit the entire stack from the perspective of an attacker who had read those same reports and was looking for exactly the patterns they described. I used a combination of manual code review, static analysis, and targeted penetration testing. I did not engage a third-party firm. I did this myself, as a solo developer, with tools that are freely available. The exercise took approximately three full days.
I found three significant findings. None of them were novel. All of them were documented patterns that appear in the DryRun and Sherlock reports. All of them were introduced, at least in part, by AI-generated code that I had reviewed but not adequately security-reviewed. I am sharing them in detail because I think the specificity is more useful than the summary.
The first finding was in my Razorpay webhook handler. Razorpay sends webhook events to a route in my application when payments complete, subscriptions renew, or disputes are raised. To verify that a webhook payload is actually from Razorpay and not from an attacker attempting to trigger business logic by faking a payment event, you compute an HMAC signature over the payload body using a shared secret, then compare it to the signature that Razorpay sends in the request headers. This is standard webhook security practice. My implementation was doing the comparison. The problem was how it was doing the comparison.
The initial Claude Code-generated implementation used a simple string equality check:
// Original implementation — VULNERABLE to timing attacks
function verifyWebhookSignature(payload, signature, secret) {
const expectedSignature = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
return expectedSignature === signature; // String comparison — DO NOT USE
}
String equality in JavaScript returns early as soon as it finds a mismatched character. This creates a timing oracle: an attacker making many requests with different forged signatures can measure response time differences to determine how many leading characters of their forged signature match the real signature. With enough measurements, they can reconstruct the expected signature without knowing the secret. This is a timing side-channel attack, and it is one of the most commonly introduced vulnerabilities in webhook implementations. The correct implementation uses a constant-time comparison:
import crypto from 'crypto';
function verifyWebhookSignature(payload, signature, secret) {
const expectedSignature = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
const expected = Buffer.from(expectedSignature, 'utf8');
const received = Buffer.from(signature, 'utf8');
if (expected.length !== received.length) return false;
return crypto.timingSafeEqual(expected, received);
}
crypto.timingSafeEqual is a constant-time comparison function that takes the same amount of time to run regardless of how many characters match. It eliminates the timing oracle. The length check before the comparison is necessary because timingSafeEqual throws if the buffers have different lengths — and the length check itself needs to come before the constant-time comparison rather than inside it, since a length mismatch is always a failure regardless.
I had reviewed this code when Claude Code generated it. I checked that it was computing the HMAC correctly. I did not check whether the comparison was timing-safe. That is precisely the kind of review gap the DryRun study was documenting: the feature was correctly implemented, but the adversarial property was not.
The second finding was in how I had configured Claude Code itself. I had granted the agent broad file system access because the alternative — manually approving every file operation — felt like it would slow me down. In practice, this meant Claude Code had read access to my environment files, including the files containing my Razorpay live keys, my Redis connection strings, and my database credentials. I had also not restricted the agent’s ability to make outbound network requests. If a Comment and Control payload in a dependency’s source code had convinced the agent to read my .env.local file and transmit its contents to a remote endpoint, the agent had all the permissions required to do so without any system-level intervention.
This is not a theoretical risk. It is the exact scenario that the Comment and Control research demonstrated. The fix requires explicitly restricting agent permissions to the minimum required for the current task, which I will cover in the hardening section.
The third finding was in how I was handling LLM-generated code at runtime. I had built a feature that used Claude to generate SQL query fragments for a reporting dashboard — a configuration the user could set to define what data they wanted to see, processed by Claude into SQL, then executed against a read-only analytics database. The validation I had in place checked that the generated SQL was syntactically valid and that it referenced only the tables the user was authorized to see. What it did not check was whether the generated SQL contained subqueries or nested CTEs that could bypass the table-level restriction by joining against tables that were not in the allowlist through an intermediate table that was. A sufficiently crafted user input, or a sufficiently crafted prompt injection against the Claude call itself, could have exfiltrated data from tables the user had no business seeing.
The fix was to move from SQL generation to a structured query builder: Claude generates a structured JSON representation of the query intent, which is then translated to SQL by application code that enforces the authorization constraints at the structural level rather than the string level. LLM-generated SQL is validated against an explicit allowlist of query patterns before execution. Direct SQL string execution from model output is now prohibited in the codebase.
The Hardening Playbook — Seven Patterns That Actually Work
What follows is not a comprehensive security engineering textbook. It is the specific set of patterns that I have implemented, that the security research substantiates, and that a solo developer or small team can realistically apply to a production codebase in the near term. These are not theoretical best practices — they are the actual changes I made to my stack after the audit.
The first pattern is timing-safe comparison for all webhook and signature verification. I showed the implementation above. The rule is simple: never use ===, ==, or string equality to compare cryptographic signatures, tokens, or secrets. Always use crypto.timingSafeEqual in Node.js, hmac.compare_digest in Python, or the equivalent constant-time function in your language. This applies to webhook signatures (Razorpay, Stripe, GitHub, Slack), API keys in request headers, session tokens in cookies, and any other secret value that is transmitted as part of an HTTP request. Grep your codebase for === appearing within 10 lines of any HMAC computation, and replace each instance.
The second pattern is least-privilege agent sandboxing. AI coding agents should be granted the minimum permissions required for the task at hand, not the maximum permissions they might ever need. In Claude Code, this means using the permissions hooks in your .claude/settings.json to restrict what the agent can do without explicit approval:
{
"permissions": {
"allow": [
"Read(**)",
"Write(src/**)",
"Bash(npm run *)",
"Bash(npx tsc --noEmit)"
],
"deny": [
"Read(.env*)",
"Read(**/.env*)",
"Bash(curl *)",
"Bash(wget *)",
"Bash(ssh *)",
"Bash(scp *)"
]
}
}
This configuration allows the agent to read any file except environment files, write only within src/, and run npm scripts and TypeScript type checking, but explicitly denies reading environment files, making outbound network requests via curl or wget, or initiating SSH connections. Adjust the allowlist to match your actual workflow needs, but the principle holds: deny by default, allow explicitly, and always deny environment file access from agent processes.
The third pattern is git hook exploit detection. Given CVE-2026-26268, every developer should have a pre-clone validation step that inspects the hooks directory of a repository before those hooks can execute. Here is a basic version:
#!/bin/bash
# Validate git hooks before clone — run this BEFORE cloning unknown repos
# Usage: ./check-hooks.sh <repo-url>
REPO_URL=$1
TEMP_DIR=$(mktemp -d)
echo "[*] Shallow-fetching hooks directory from $REPO_URL"
git clone --depth=1 --filter=blob:none --sparse "$REPO_URL" "$TEMP_DIR" 2>/dev/null
cd "$TEMP_DIR" || exit 1
git sparse-checkout set .git/hooks 2>/dev/null
HOOKS_DIR="$TEMP_DIR/.git/hooks"
if [ -d "$HOOKS_DIR" ]; then
EXECUTABLE_HOOKS=$(find "$HOOKS_DIR" -type f -executable ! -name "*.sample")
if [ -n "$EXECUTABLE_HOOKS" ]; then
echo "[WARN] Executable hooks found in repository:"
echo "$EXECUTABLE_HOOKS"
echo "[WARN] Review these files before cloning. They will execute automatically."
rm -rf "$TEMP_DIR"
exit 1
else
echo "[OK] No executable hooks found."
fi
fi
rm -rf "$TEMP_DIR"
echo "[OK] Repository appears safe to clone."
This script is not exhaustive — a sophisticated attacker can embed hook execution in other git configuration mechanisms — but it catches the most common patterns and gives you a moment to review before executing arbitrary code from an unknown repository. Keep your IDE updated; Cursor patched CVE-2026-26268 in version 0.48.7.
The fourth pattern is input validation on AI-generated code before execution. Any code path where model output is executed at runtime — SQL generation, JavaScript eval, shell command construction — requires structural validation before execution. This means moving from string-based approaches to AST-based or schema-based approaches where possible. For SQL: use a query builder, not string interpolation. For shell commands: use parameterized exec calls, not string concatenation passed to a shell. For JavaScript: never eval model output. If you need dynamic computation, define a strict allowlist of functions and operators, parse the model output into a structured representation, and evaluate the structured form against your allowlist.
The fifth pattern is dependency audit automation. The PyTorch Lightning supply chain attack was caught within 48 hours because the security community was watching. You cannot rely on the community to catch every attack before it reaches your install. Integrate automated dependency auditing into your CI/CD pipeline so that every pull request and every dependency update runs a security scan before it can merge. Run this audit script against your codebase now:
#!/bin/bash
# AI Code Security Quick Audit
echo "=== Checking for common AI-generated vulnerabilities ==="
echo ""
echo "1. Hardcoded secrets:"
grep -rn "password|api_key|secret|token" --include="*.ts" --include="*.js" src/ | grep -v node_modules | grep -v ".test."
echo ""
echo "2. Missing auth checks on API routes:"
grep -rn "export.*GET|export.*POST|export.*PUT|export.*DELETE" --include="*.ts" src/app/api/ | grep -v "auth|session|token"
echo ""
echo "3. Eval or dynamic code execution:"
grep -rn "eval(|Function(|new Function" --include="*.ts" --include="*.js" src/
echo ""
echo "4. SQL injection vectors:"
grep -rn "query.*\${" --include="*.ts" --include="*.js" src/
echo ""
echo "5. Missing CSRF protection:"
grep -rn "POST|PUT|DELETE" --include="*.ts" src/app/api/ | grep -v "csrf|token|nonce"
echo ""
echo "6. String equality on secrets:"
grep -rn "=== signature|=== token|=== secret|=== hash" --include="*.ts" --include="*.js" src/
echo ""
echo "7. Dependency audit:"
npm audit --audit-level=high 2>&1 | tail -20
The sixth pattern is secret scanning in CI/CD. GitHub Actions has native secret scanning that blocks commits containing recognized credential patterns. Enable it. Supplement it with a pre-commit hook that runs detect-secrets or trufflehog locally before any code leaves your machine. The configuration for a pre-commit hook using detect-secrets:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
exclude: package.lock.json
Create an initial baseline with detect-secrets scan > .secrets.baseline, review the baseline to confirm there are no actual secrets in it, and then the pre-commit hook will flag any new potential secrets introduced in subsequent commits. AI coding agents sometimes generate example code that includes placeholder credentials in patterns that look like real credentials to static scanners — the baseline mechanism handles this by allowing you to explicitly mark known false positives.
The seventh pattern is model output sanitization for any LLM output that will be rendered in a browser context. If your application displays AI-generated content to users, that content must be sanitized before rendering, using the same rigorous approach you would apply to user-generated content. AI models can be prompted through indirect injection to generate output containing HTML, JavaScript, or SVG payloads that execute in the browser if rendered unsanitized. Use a library like DOMPurify with a strict allowlist of permitted tags and attributes. Never use dangerouslySetInnerHTML with unsanitized model output. If you need to render rich content from a model, parse it through a markdown processor with HTML sanitization rather than inserting raw model output into the DOM.
The Economics of Ignoring AI Code Security
The hardening patterns above require real effort. Let me make the case for why that effort is worth it, in terms that are harder to dismiss than abstract security concerns.
IBM Security’s 2025 Cost of a Data Breach Report put the average cost of a breach at $4.88 million, up 10% from the prior year.[8] That average is dominated by large enterprises with complex remediation requirements. For a small-to-medium application, a realistic breach scenario — credential exfiltration, customer data exposure, payment data compromise — might cost between $50,000 and $500,000 in combined remediation, legal, and customer notification costs. Payment card industry fines for exposing cardholder data can reach $100 per card compromised, which adds up quickly if your user base is in the thousands.
The hardening work I described above took me three days of development time. Implementing all seven patterns in a greenfield application would take perhaps two days. The economic case for skipping that work to ship faster assumes that no breach will occur, and that assumption is no longer defensible given that 92% of AI-generated codebases have critical vulnerabilities and 28.3% of CVEs are exploited within 24 hours of disclosure.
The “move fast and break things” approach to AI coding has a hidden cost structure that most developers are not accounting for. Moving fast with AI tools is genuinely valuable — I ship faster with Claude Code than I did without it, and that speed advantage compounds. But moving fast without security review means accumulating a security debt that accrues interest in the form of breach risk, and the interest rate on that debt has gone up dramatically as AI-enabled attacks have become more prevalent and more automated.
Agent security is also different from traditional application security in ways that affect the economics. Traditional application security audits are one-time or periodic events. Agent security is a continuous discipline because the agent’s capabilities and access evolve as the application evolves, and because the agent’s behavior can be influenced by content it encounters at runtime — not just by the code you write at development time. A prompt injection payload in a third-party API response, a malicious git hook in a newly-added dependency, a Comment and Control attack embedded in a documentation file that the agent indexes: these are runtime security events, not static code vulnerabilities. They require ongoing vigilance rather than a one-time audit.
The security debt created by AI-generated code is real, it is accumulating across the industry, and the attackers who understand the Sherlock and DryRun findings are already targeting it. Every week that passes without a security audit of an AI-assisted codebase is a week during which that codebase is increasingly likely to be on the wrong end of a finding that costs orders of magnitude more to remediate than a few days of proactive hardening would have cost to prevent.
What Happens Next — The Regulatory Response
The research community has spent the first half of 2026 documenting the problem. The regulatory community is beginning to respond. The pace of that response will accelerate, and the developers who have hardened their stacks before the mandates arrive will have a material advantage over those who wait.
The White House Office of Science and Technology Policy circulated a draft executive order in April 2026 that would require federal contractors using AI coding tools to demonstrate “adequate security vetting” of AI-generated code before it can be deployed to systems that process government data.[9] The draft order does not define what “adequate” means in operational terms, but it references NIST’s AI Risk Management Framework and the existing SSDF (Secure Software Development Framework) as baseline standards. Organizations that already apply SSDF practices to their AI-assisted development workflows will have a credible response to the vetting requirement. Those that do not will face a rushed compliance gap when the order is finalized.
Pennsylvania’s attorney general filed suit against Character.AI in March 2026, alleging that the company’s AI system caused harm to a minor through interactions that the plaintiffs argue should have been prevented by reasonable safety measures.[10] The lawsuit is specifically about consumer AI, not coding tools, but it is establishing legal precedent for the proposition that AI system operators have a duty of care toward users who interact with their systems. That duty of care framework, once established in litigation, tends to expand. A developer who ships an AI-assisted application with known vulnerability categories — broken access control, timing side-channels, prompt injection vectors — and suffers a breach that harms users may face a legal exposure that did not exist before this litigation cycle began.
The EU AI Act’s provisions on high-risk AI systems include requirements for logging, auditability, and human oversight that implicate AI coding tools used in the development of high-risk applications. The deadline for high-risk system compliance was recently extended to December 2027, but the compliance requirements are not changing — only the timeline. Organizations building applications in healthcare, critical infrastructure, or financial services with AI coding tool assistance should be designing their development workflows for AI Act compliance now rather than attempting a rushed remediation in 2027.
What developers should do before the mandates arrive is not complicated, but it requires starting now. Run the audit script in this post against your codebase. Implement timing-safe comparisons wherever you compare secrets. Restrict agent permissions to the minimum required for each task. Add dependency scanning to your CI/CD pipeline. Document which portions of your codebase were AI-generated and what security review each received — not because you enjoy paperwork, but because that documentation will be required by government contractors within the next 12 months and demanded by enterprise customers within the next 18.
The AI coding tools themselves are also evolving in response to these findings. Cursor released CVE-2026-26268 patches in version 0.48.7. Anthropic has published security guidance for Claude Code deployment patterns. Microsoft has patched both Semantic Kernel CVEs. The tools are getting safer, but tool patches do not retroactively fix code that was already generated and deployed. Your application’s security posture is a function of what the code does, not what tool generated it.
Conclusion
The 92% figure from Sherlock Forensics is not a bug in how AI coding tools work. It is an accurate measurement of what happens when powerful tools for generating features encounter an industry that has not yet developed the discipline to consistently apply security review to AI-generated output. The tools are not at fault. The workflow is.
I am not writing this to argue that you should stop using AI coding tools. I use Claude Code every day. The productivity gains are real and they compound. What I am arguing is that the security review discipline that you would apply to junior developer code — because a junior developer, however talented, may not have the threat modeling experience to implement authorization correctly on the first try — must be applied with equal rigor to AI-generated code. AI models are extraordinarily capable in many dimensions and consistently inconsistent in the specific dimension of adversarial thinking.
The seven patterns in this post are not exotic security engineering. They are standard practices that the industry developed over decades for exactly the kind of code that AI tools now generate at scale: webhook handlers, API routes, database queries, user input processing. The new wrinkle is that these patterns now need to be applied not just to code written by humans who might make mistakes, but to code generated by systems that make different and more systematic mistakes at higher volume.
The fix is engineering discipline. Run the audit script above against your codebase right now. It takes less than two minutes to execute. If it finds nothing, you have confirmation and a clean baseline. If it finds something, you have work to do before an attacker finds the same thing first. Explore the security-hardened starter templates in the WOWHOW catalog, check out the developer tools for security utilities, and if you are building a new application, consider starting from a foundation that has already addressed these patterns rather than discovering them through an audit six months after launch. The OWASP Top 10 for Agentic Applications is also essential reading if you are deploying any agent-based architecture in 2026.
The security debt is real. The exploits are arriving before the patches. The regulators are drafting the mandates. The developers who have hardened their AI-assisted stacks will find themselves ahead of a compliance curve that everyone else will be rushing to catch up to. That asymmetry is not going to last. Start now.
Sources
[1] Sherlock Forensics. AI-Generated Codebase Security Audit Report 2026. April 2026.
[2] DryRun Security. AI Coding Agent Security Audit: 30 PRs, 143 Issues. April 2026.
[3] Mandiant. M-Trends 2026. Google Cloud, May 2026.
[4] NIST National Vulnerability Database. CVE-2026-26268: Cursor IDE Arbitrary Code Execution. April 29, 2026.
[5] Microsoft Security Response Center. CVE-2026-25592 and CVE-2026-26030: Semantic Kernel Prompt Injection to RCE. May 7, 2026.
[6] Wunderwuzzi (Johann Rehberger). Comment and Control: Prompt Injection Against AI Coding Agents. March 2026.
[7] The Hacker News. PyTorch Lightning 2.6.2-2.6.3 Supply Chain Attack: Credential-Stealing Malware. February 2026.
[8] IBM Security. Cost of a Data Breach Report 2025. IBM Corporation, 2025.
[9] White House OSTP. AI Code Security Request for Information. April 2026.
[10] Reuters. Pennsylvania Sues Character.AI Over Minor Safety Allegations. March 2026.
Comments · 0
Beta: comments are stored locally on your device and not visible to other readers.
No comments yet. Be the first to share your thoughts.