TL;DR

Claude Opus 4.8 released May 28, 2026: Dynamic Workflows, 4x honesty, 35% fewer tokens, $15/$75 pricing. Full review covering benchmarks, features, and developer guide.

Anthropic released Claude Opus 4.8 on May 28, 2026, the same day the company closed a $65 billion Series H round that vaults it to a $965 billion valuation — surpassing OpenAI for the first time. The model packs in Dynamic Workflows for autonomous subagent orchestration, a 4x improvement in honesty metrics, 35% fewer tokens per response, a new Fast mode running at 2.5x the speed of 4.7, and an unchanged price of $15 input / $75 output per million tokens. Andrej Karpathy joined Anthropic as a research advisor the same week, signaling the company's intent to stay at the frontier of AI research. Here is the complete picture.

Opus 4.8 is not a revolutionary model. Anthropic openly describes it as a "modest but tangible improvement" on 4.7, which itself redefined what frontier coding AI could do. But modest improvements at the Opus level translate into material gains on real engineering work, and the three headline features — Dynamic Workflows, the honesty jump, and the token efficiency — change the economics of deploying Opus-tier AI in ways that matter for budgets as much as for capabilities.

Benchmarks: How Opus 4.8 Stacks Up

Anthropic's internal and third-party benchmarks tell a consistent story: Opus 4.8 improves meaningfully on tasks that require deep reasoning, multi-step problem solving, and complex software engineering.

The Headline Numbers

SWE-bench Verified: 88.6% (up from 87.6% on Opus 4.7)
SWE-bench Pro: 69.2% (up from 64.3% on Opus 4.7) — nearly 5 points on harder real-world coding tasks
USAMO: 96.7% — elite-level mathematical reasoning, near-perfect on AMC/AIME-caliber problems
GDPval-AA Elo: 1890 — 121 Elo points ahead of GPT-5.5 on general agentic benchmarks

The SWE-bench Pro result is the most practically significant. Unlike the original SWE-bench Verified — which tests on GitHub issues that may have leaked into training data — SWE-bench Pro uses recent, harder issues across a broader range of codebases. A 64.3% to 69.2% jump on Pro represents a meaningful reduction in the failure rate on the kinds of complex engineering tasks that constitute real production work.

The USAMO score at 96.7% is worth pausing on. The United States of America Mathematical Olympiad is among the hardest math competitions for high school students in the world, selecting roughly 500 students per year from millions of participants. Opus 4.8 solving 96.7% of those problems is not just a benchmark win — it is evidence the model's mathematical reasoning chain-of-thought has reached a level of reliability that transfers to code, logic, and multi-step planning in ways the earlier Opus generations could not consistently achieve.

How It Compares to the Competition

The competitive landscape at the frontier in late May 2026 is crowded. Here is how Opus 4.8 sits relative to the other models developers are actually evaluating:

Model	Input ($/M tokens)	Output ($/M tokens)	SWE-bench Verified	GDPval-AA Elo
Claude Opus 4.8	$15.00	$75.00	88.6%	1890
GPT-5.5	$5.00	$30.00	~85% (est.)	~1769
Gemini 3.5 Flash	$1.50	$9.00	~79% (est.)	N/A

The pricing delta is significant. GPT-5.5 costs one-third of Opus 4.8 at $5/$30 per million tokens, and Gemini 3.5 Flash costs roughly one-tenth at $1.50/$9. For many high-volume production workloads, Gemini 3.5 Flash's cost profile is decisive, and it is genuinely competitive on straightforward coding and summarization tasks. Where Opus 4.8 earns its premium is precisely the territory that GDPval-AA and SWE-bench Pro measure: complex, multi-step agentic work where failures are expensive and reliability matters more than cost.

The 121 Elo margin over GPT-5.5 on GDPval-AA translates to roughly a 67% win rate in head-to-head task comparisons on general agentic benchmarks. That is meaningful but not dominant. For developers choosing between Opus 4.8 and GPT-5.5, the right decision depends on workload type, reliability requirements, and whether the 3x cost premium is justified by the quality delta in their specific use case.

Dynamic Workflows: The Feature That Justifies the Opus Tier

Dynamic Workflows is the headline addition in Opus 4.8 and the feature Anthropic's $65B fundraising pitch reportedly centered on. The concept: when given a complex, open-ended problem, Opus 4.8 does not attempt to solve it sequentially within a single context window. Instead, the model itself decides to spawn tens to hundreds of parallel subagents, each attacking a different angle of the problem simultaneously.

Here is the autonomous workflow the model runs:

Analyzes the problem and decomposes it into independently tractable subproblems
Writes an orchestration script that spins up parallel subagents — each assigned a specific angle, hypothesis, or subtask
Deploys adversarial reviewer agents whose job is to challenge and attempt to refute the primary agents' findings
Aggregates results across all agents and identifies convergence or conflict
Iterates until answers converge on a cross-validated response
Returns a synthesized output with the work of dozens of parallel reasoning threads behind it

The critical point: you do not configure this. You do not need to set up a multi-agent framework, define orchestration logic, or wire together a LangGraph pipeline. Opus 4.8 decides when the task warrants Dynamic Workflows, how many subagents to spawn, and how to structure the adversarial review. A single API call can trigger the entire process. For one-off research tasks, complex analysis, and multi-file engineering work, this removes the scaffolding overhead that previously made multi-agent architectures a significant development investment.

The practical implication for developers who have been running Anthropic Managed Agents or custom orchestration harnesses: Opus 4.8 can now replace a significant portion of that scaffolding with a single well-structured prompt. That changes the build-vs-buy calculation for agentic systems considerably.

Cost caveat: A single Opus 4.8 API call with Dynamic Workflows can generate a large and variable number of subagent calls under the hood. If you are metering costs per session or per task, test token consumption on representative workloads before putting Dynamic Workflows in a cost-constrained production environment. The feature is opt-in by prompt complexity — it will not trigger on simple queries — but complex tasks can generate substantially more tokens than a non-agentic equivalent.

Benchmarks: How Opus 4.8 Stacks Up

The Headline Numbers

How It Compares to the Competition

Dynamic Workflows: The Feature That Justifies the Opus Tier

Try Our Free Tools

Image Compressor

QR Code Generator

More from AI Tool Reviews

Claude Opus 4.8: Developer Guide — Dynamic Workflows, Fast Mode & $965B Valuation

Fast Mode: 2x Rate, 2.5x Speed

4x Honesty Improvement: What It Actually Means

35% Fewer Tokens Per Response

Mid-Conversation System Messages

Pricing: $15/$75 Per Million Tokens

Claude Code v2.1.154: What Changed for Developers

/workflows Command

Agent View Dashboard

/goal Command with Workflow Awareness

Effort Control: Fine-Tuning Compute Per Task

Karpathy Joins Anthropic: What It Signals

The $965B Valuation: What It Changes for Developers

What Developers Should Actually Do with Opus 4.8

Should You Upgrade from Sonnet or Haiku to Opus 4.8?

The Bottom Line

Ready to ship faster?

One insight, every Monday. 7am IST. Zero fluff.

Comments · 0

Key takeaways · 8

Topics

Article stats

WhatsApp Link Generator

Word & Character Counter

Claude for Small Business: 15 Workflows & Setup Guide 2026

Hermes Agent v0.13.0 Shipped 864 Commits — These 3 Primitives Are the Ones That Matter

GPT-5.5 Instant: The New ChatGPT Default Model Complete Guide 2026

IBM Bob: Enterprise AI Coding Assistant Complete Guide (2026)

Mistral Medium 3.5 Developer Guide: API, Remote Agents & Pricing 2026