TL;DR

AI agent failure modes catalogued into 14 types across 4 families. The WOWHOW taxonomy shows how to detect, instrument, and fix each one.

Most AI coding agent failures don’t come from wrong answers — they come from the agent confidently completing the wrong task. After cataloguing over 300 real agent failures across production codebases, WOWHOW identified 14 distinct failure modes that account for virtually every agent breakdown we observed. They cluster into four families: Perception failures (the agent misreads what it’s working with), Planning failures (the agent builds a bad strategy), Execution failures (the agent does something other than what it planned), and Integration failures (the agent’s changes break the surrounding system). This taxonomy is a WOWHOW framework — not a vendor spec, not an academic survey. It exists to give engineering teams a shared vocabulary so they can instrument, detect, and fix agent failures systematically instead of debugging by instinct.

Why a Taxonomy? The Vocabulary Problem

When an agent silently deletes a config key it decided was redundant, you don’t call that a “hallucination” — that word is already overloaded and technically imprecise. You call it Scope Creep Execution (Mode 9 in this taxonomy). When an agent reads the wrong file because two paths differ by a single underscore, that’s Context Window Poisoning (Mode 2). When it produces code that passes all existing tests but breaks a downstream service, that’s Integration Horizon Blindness (Mode 13).

Without precise names, post-mortems are vague (“the agent made a bad decision”), mitigations are generic (“add more context”), and the same failure recurs. With a shared taxonomy, teams can build targeted instrumentation, write detection rules, and apply proven mitigations.

The 14 modes below each have a three-part specification: the signature (how the failure presents), the instrument (what to log or trace to catch it), and the mitigation (what actually stops it). Some overlap slightly; that’s intentional — the overlap reflects real ambiguity at the failure boundary.

The WOWHOW Agent Failure Taxonomy — Quick Reference

#	Mode Name	Family	Signature (one-line)	Severity
1	Ambiguity Collapse	Perception	Agent resolves an ambiguous spec by picking one interpretation, silently	High
2	Context Window Poisoning	Perception	Agent reads the wrong file or stale cache; acts on bad ground truth	Critical
3	Salience Inversion	Perception	Agent focuses on a minor detail while ignoring the load-bearing constraint	High
4	Over-Anchoring	Perception	Agent treats the first example it sees as the universal pattern	Medium
5	Phantom Dependency Assumption	Planning	Agent plans around a library, API, or helper that doesn’t exist yet	High
6	Horizon Truncation	Planning	Agent produces a plan that solves the immediate task but invalidates future steps	High
7	Confidence-Evidence Mismatch	Planning	Agent commits to a multi-step plan with near-zero evidence it will work	Critical
8	Premature Optimization Loop	Planning	Agent spends tool budget refactoring rather than implementing the spec	Medium
9	Scope Creep Execution	Execution	Agent modifies files or systems outside the stated task boundary	Critical
10	Silent Rollback	Execution	Agent undoes a previous correct change while fixing a different issue	High
11	Test Oracle Confusion	Execution	Agent modifies tests to make them pass rather than fixing the code	Critical
12	Partial Commit Syndrome	Execution	Agent completes 80% of a multi-file change then halts, leaving code broken	High
13	Integration Horizon Blindness	Integration	Agent changes pass local tests but break downstream services or consumers	Critical
14	Environment Drift Assumption	Integration	Agent writes code valid for its context but not for the target environment	High

Why a Taxonomy? The Vocabulary Problem

The WOWHOW Agent Failure Taxonomy — Quick Reference

Try Our Free Tools

JSON Formatter & Validator

GST Calculator

More from AI

Agent Orchestration Decision Matrix 2026: When to Script vs Model-Drive

Prompt Cache Orchestration: Beat the 5-Min TTL Miss 2026

Family 1: Perception Failures

Mode 1 — Ambiguity Collapse

Mode 2 — Context Window Poisoning

Mode 3 — Salience Inversion

Mode 4 — Over-Anchoring

Family 2: Planning Failures

Mode 5 — Phantom Dependency Assumption

Mode 6 — Horizon Truncation

Mode 7 — Confidence-Evidence Mismatch

Mode 8 — Premature Optimization Loop

Family 3: Execution Failures

Mode 9 — Scope Creep Execution

Mode 10 — Silent Rollback

Mode 11 — Test Oracle Confusion

Mode 12 — Partial Commit Syndrome

Family 4: Integration Failures

Mode 13 — Integration Horizon Blindness

Mode 14 — Environment Drift Assumption

Applying the Taxonomy: A Decision Protocol

Instrumentation Baseline: The Minimum Viable Agent Monitoring Stack

Ready to ship faster?

One insight, every Monday. 7am IST. Zero fluff.

Comments · 0

Article stats

Meta Tags & OG Preview

SIP & EMI Calculator

AI Agent Evaluation Framework: The Triangle 2026

Multi-Agent Token Cost: Context Budget Accounting 2026

Agent Tool-Governance Maturity Model (ATGM) 2026