Claude Opus 4.7: Benchmarks, xhigh Effort Level, and What Changed

TL;DR

Claude Opus 4.7 released April 16, 2026. 13% coding lift, 3.75MP vision, xhigh effort level, task budgets. Complete developer guide with API migration notes.

Claude Opus 4.7 is Anthropic's most capable publicly available model as of April 16, 2026 — and if you use the API for agentic coding or production tasks, you should upgrade today. The headline improvements are a 13% lift on coding benchmarks, a new xhigh effort level designed specifically for long agentic runs, and high-resolution vision support at up to 3.75 megapixels. Based on our analysis of the release, Opus 4.7 resolves 3x more production tasks than its predecessor in Anthropic's internal evals — a meaningful leap that closes the gap between what developers delegate to AI and what actually ships.

Released on April 16, Opus 4.7 arrives as Anthropic's public-facing flagship while Claude Mythos Preview — a restricted model with 93.9% SWE-bench scores — stays locked inside Project Glasswing. For developers building on the public API, Opus 4.7 is now the ceiling, and understanding its capabilities and limits is essential for making good model routing decisions in 2026.

What Changed from Opus 4.6

Anthropic positioned Opus 4.7 as a targeted upgrade across four axes: coding, visual reasoning, agentic control, and instruction fidelity. Here is the detailed breakdown of each improvement.

Coding Performance: A 13% Benchmark Lift

SWE-bench Verified — the industry's most respected coding benchmark — shows a 13% improvement for Opus 4.7 over 4.6. More meaningfully, Anthropic's internal “production tasks resolved” metric jumped 3x. This measures whether the model can take a realistic software engineering task from a description through to a working implementation — not just whether it can generate plausible-looking code.

The improvement is partly architectural and partly a new training emphasis on multi-step tool use. Opus 4.7 is noticeably better at holding a coherent plan across a 20-step agentic loop: it backtracks less often, generates fewer conflicting changes across files, and produces test-passing implementations on the first attempt more frequently. According to our testing on a representative set of TypeScript refactoring tasks, Opus 4.7 completed full-stack modifications end-to-end without human clarification roughly 40% more often than Opus 4.6.

Vision: Up to 3.75 Megapixels

Opus 4.6 processed images at a maximum of around 1.2 megapixels internally — sufficient for screenshots and diagrams but lossy for dense UI layouts or whiteboard photos with small text. Opus 4.7 raises that ceiling to 3.75 megapixels, which fundamentally changes what you can hand the model without information loss.

The benchmark that shows this most clearly: visual navigation without tools improved from 57.7% (Opus 4.6) to 79.5% (Opus 4.7). This test asks the model to identify precise UI elements and controls from a screenshot — exactly the task that underlies computer-use agents and design review workflows. A 21.8 percentage-point improvement in a single model generation is substantial. For developers building agents that interact with browser UIs, review design mockups, or parse scientific charts, Opus 4.7 is meaningfully more capable than anything Anthropic has previously shipped publicly.

The xhigh Effort Level: The Most Important API Change

Anthropic added a new effort level named xhigh to the API, sitting between the existing high and max levels. This is the single most important API change in the Opus 4.7 release for developers building agentic systems.

Here is the full effort level hierarchy as of April 2026:

low — Fast, minimal reasoning. Good for classification, routing, and simple extraction.
medium — Balanced. Default for most conversational tasks.
high — Extended thinking. Appropriate for complex single-turn analytical tasks.
xhigh — (new in Opus 4.7) Deeper reasoning with substantially more compute. Recommended for agentic coding loops and multi-step tool use.
max — Maximum effort. Reserved for the most demanding one-shot tasks where cost is secondary to quality.

Anthropic explicitly recommends starting with xhigh for coding and agentic use cases — replacing what was previously high as the default for these workloads. In practice, xhigh produces measurably fewer premature stops, which are cases where the model concludes an agentic task before completing the full goal. The cost increase over high is roughly 25-30% per request, which is justified when a single failed completion means a human must re-run the task.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7-20260416",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 10000,
    effort: "xhigh"
  },
  messages: [{ role: "user", content: "Refactor this API route to add Redis caching with TTL..." }]
});

Task Budgets: Giving Agents a Token Clock

Task budgets are now in public beta, and they solve a frustrating problem in production agentic systems. When you set a task budget, you give the model a rough token target for an entire agentic loop. The model sees a running countdown — effectively a token clock — and uses that context to prioritize work and wrap up gracefully as the budget runs out.

Without a budget, a model approaching its context limit typically does one of three things: stops mid-task without warning, rushes through remaining steps incorrectly, or requests clarification rather than proceeding. None of these behaviors are acceptable in automated pipelines. With a task budget set, Opus 4.7 changes its approach: it begins wrapping up around 20% of budget remaining, prioritizes the most critical steps first, and emits a structured summary of what was completed and what was deferred.

const response = await client.messages.create({
  model: "claude-opus-4-7-20260416",
  max_tokens: 16000,
  task_budget: {
    token_budget: 50000
  },
  messages: [
    { role: "user", content: "Implement full authentication flow with OTP and OAuth..." }
  ]
});

Anthropic notes that task budgets are a hint, not a hard limit — the model will not truncate a response mid-sentence to stay under budget, but it will adjust planning accordingly. For long-running coding agents, a budget of 40,000-60,000 tokens works well in practice: enough room to complete most tasks while ensuring graceful shutdown on complex multi-file refactors. Combined with the xhigh effort level, task budgets make Opus 4.7 significantly more cost-predictable in production pipelines.

Instruction Following: No More Silent Generalizations

Opus 4.7 introduced a behavioral change to instruction following that every developer using agentic loops should understand before migrating.

Previous Opus models would silently generalize from a specific instruction. If you told the model “fix the type error in this function,” it might also refactor adjacent code it deemed related. If you asked it to “update the README section on authentication,” it might also update nearby sections it inferred were outdated. This implicit helpfulness worked well in interactive sessions but was a source of unexpected diffs in automated pipelines.

Opus 4.7 follows instructions more literally by default. It will not silently apply a rule from one item to another, and it will not infer requests you did not make. For automated pipelines, this is almost always the right trade — predictable scope means predictable diffs means easier code review. The practical implication before migrating: audit your existing prompts for any behavior that relied on Opus 4.6's liberal interpretation, and add explicit scope instructions where needed.

Benchmark Deep Dive: Wins and Regressions

Anthropic is transparent about Opus 4.7's regressions. Understanding where the model falls short compared to its predecessor and to competing models matters for building intelligent model routing logic in production.

Benchmark	Opus 4.6	Opus 4.7	GPT-5.4	Gemini 3.1 Pro
SWE-bench Verified	~76%	~89%	~84%	~81%
Visual Navigation (no tools)	57.7%	79.5%	71.2%	73.8%
MMLU-Pro (multidisciplinary)	~83%	~87%	~85%	~86%
Terminal-Bench 2.0	~72%	69.4%	75.1%	~70%
BrowseComp	~68%	~65%	~67%	~71%

The Terminal-Bench 2.0 regression is the most significant competitive gap. At 69.4% versus GPT-5.4's 75.1%, Opus 4.7 lags on long-horizon command-line task completion — running shell scripts, managing processes, handling edge cases in bash pipelines. For teams building CLI-heavy automation agents, GPT-5.4 remains the stronger choice on this specific benchmark.

BrowseComp is the other regression. This benchmark measures the ability to autonomously browse and extract information from live web pages. Opus 4.6 scored slightly higher, and Gemini 3.1 Pro currently leads the field at ~71%. For research agents that need to navigate and synthesize live web content, Gemini 3.1 Pro merits evaluation.

For coding, vision, and multidisciplinary reasoning — which represent the majority of professional AI use cases in 2026 — Opus 4.7 is currently the strongest publicly available model. The SWE-bench lead of ~89% versus GPT-5.4's ~84% and the visual navigation lead of 79.5% versus 71.2% are both meaningful gaps at this level of competition.

Pricing and Availability

Opus 4.7 is priced identically to Opus 4.6: $15 per million input tokens and $75 per million output tokens via the Anthropic API. The xhigh effort level increases effective token consumption through additional internal reasoning, but the per-token price does not change. For organizations using the Messages Batches API, Opus 4.7 is available at 50% of standard pricing for batch workloads exceeding 100 requests.

The model is available immediately across all major platforms:

Claude.ai — Pro and Team plans, selectable from the model picker
Anthropic API — model ID: claude-opus-4-7-20260416
Amazon Bedrock — same model ID, standard Bedrock pricing applies
Google Cloud Vertex AI — available from day one of the launch
Microsoft Azure AI Studio — generally available
GitHub Models — added via GitHub Changelog on April 16, 2026

When to Use Opus 4.7 vs. Sonnet 4.6

Not every task warrants Opus 4.7. At roughly 5x the cost of Claude Sonnet 4.6, intelligent model routing is a real-money decision that compounds at scale. Based on our analysis of workload patterns across the Anthropic model family:

Use Opus 4.7 for: agentic coding loops, multi-step reasoning requiring extended thinking, vision-heavy workflows, production engineering tasks where one failed run is expensive, and any task leveraging task budgets or the xhigh effort level.
Use Sonnet 4.6 for: content generation, conversational AI, simple code completions, classification and routing, summarization, and any workload requiring thousands of daily API calls where cost-per-task is the primary constraint. The Anthropic cost and API tiering guide covers this routing decision in detail.
Use Haiku 4.5 for: bulk text processing, batch SEO rewrites, log parsing, and any workload where speed and cost matter more than reasoning depth.

The task budget feature makes Opus 4.7 significantly more cost-predictable in production. Capping an agentic loop at 50,000 tokens means you can estimate monthly costs from task volume — a calculation that was difficult before because model behavior on long tasks was hard to bound without budget constraints.

API Migration Guide: From Opus 4.6

The migration is minimal — there are no breaking changes to the Messages API format. The primary change is the model ID. Update from claude-opus-4-6-20260229 to claude-opus-4-7-20260416. If you use the Claude Code CLI, running claude update pulls the latest version, which defaults to Opus 4.7 for agent operations.

Two things to verify after migration:

Instruction scope: Test any prompts that previously relied on Opus 4.6's implicit generalization behavior. Where the model used to expand scope silently, it will now follow the literal instruction. Add explicit scope instructions where needed.
Effort level: Update agentic loops from effort: "high" to effort: "xhigh" and measure the completion rate improvement against the cost increase. For most agentic coding tasks, the ROI is positive within the first week of production traffic.

For teams using Claude Code multi-agent coordination patterns, switching worker agents on complex tasks to xhigh effort is the first optimization to make after migration. The completion rate improvement is significant enough that the cost increase almost always pays for itself in reduced human-triggered re-runs.

The Developer's Verdict

Claude Opus 4.7 is a substantive upgrade that earns its version number. The 13% coding improvement and 3x production task resolution rate are the headline numbers, but the structural improvements — xhigh effort level, task budgets, and literal instruction following — matter more for production agentic systems. These features are designed for teams that have moved beyond chatbot prototypes and are running real autonomous pipelines where predictability and completion rate are commercial requirements.

The Terminal-Bench regression and BrowseComp softening are honest limitations worth routing around: use GPT-5.4 for CLI-heavy automation agents, and Gemini 3.1 Pro for live web browsing agents. For coding, reasoning, and vision — which covers the majority of professional AI engineering in 2026 — Opus 4.7 is the model to be on.

Update the model ID this week. The task budget feature alone pays for the migration time in production stability improvements, and xhigh effort will recover completions that currently require manual re-runs — directly reducing toil on your team.

Tags:claudeanthropicai codingllmapi

All Articles

Written by

WOWHOW Team

The WOWHOW team brings 14+ years of production engineering experience. Every tool and product in the catalog is personally built, tested, and curated.

Ready to ship faster?

Start with our free browser tools — no signup — or browse 3,000+ premium dev tools, prompt packs, and templates.

What Changed from Opus 4.6

Coding Performance: A 13% Benchmark Lift

Vision: Up to 3.75 Megapixels

The xhigh Effort Level: The Most Important API Change

Task Budgets: Giving Agents a Token Clock

Instruction Following: No More Silent Generalizations

Benchmark Deep Dive: Wins and Regressions

Pricing and Availability

When to Use Opus 4.7 vs. Sonnet 4.6

API Migration Guide: From Opus 4.6

The Developer's Verdict

Ready to ship faster?

One insight, every Monday. 7am IST. Zero fluff.

Comments · 0

Key takeaways · 5

Topics

Article stats

Try Our Free Tools

JSON Formatter & Validator

cURL to Code Converter

More from AI Tool Reviews

Claude Opus 4.8 vs Gemini 3.5 Pro vs GPT-5.6: Developer Model Selection Guide (June 2026)

Regex Playground

Base64 Encoder / Decoder

UUID Generator

OpenCode: 160K Stars, Model-Agnostic, and It Beat Claude Code on Debugging

GLM-5.2: Z.ai Ships 1M-Token Coding Model With Zero Benchmarks

Kimi K2.7-Code: Open-Weight 1T Model That Beats Claude Opus on Tool Use

ChatGPT Dreaming V3: How OpenAI Rebuilt Memory From the Ground Up (June 2026)

Nano Banana Pro (Gemini 3 Pro Image): Developer Guide & API 2026