Claude Opus 4.7 Is Now Live: The Deliberate Half-Step

Claude Opus 4.7 launched on April 16, 2026 — and immediately told you it is less capable than its most powerful model. That framing is not an accident. It is the clearest signal yet that the frontier AI race has entered a new phase: one where restraint is the product.

What Claude Opus 4.7 Actually Changes

Claude Opus 4.7 is not a generational leap. It is a targeted upgrade over Opus 4.6, which had been showing signs of strain in recent weeks. The improvements cluster around three areas: advanced software engineering, visual reasoning, and instruction adherence.

On the engineering side, Anthropic's own description is precise: "notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks." Early-access testers from Cursor report a jump from 58% to 70% on CursorBench. Notion's evaluation found a 14% resolution lift over Opus 4.6 with fewer tokens and a third of the tool errors. Hex calls it "a more intelligent, more efficient Opus 4.6" and notes that low-effort Opus 4.7 roughly matches medium-effort Opus 4.6.

The vision upgrade is substantial. Claude Opus 4.7 processes images at higher resolution, which matters for document analysis, diagram interpretation, and multimodal workflows. Nick Saraev's independent benchmark testing measured visual reasoning jumping from 69.1% to 82.1% — a 13-percentage-point gain that is hard to dismiss.

But here is the number that did not move: agentic terminal coding showed a small or even negative delta in Saraev's tests. For teams running autonomous agent workflows, the practical difference may be negligible on routine tasks.

Claude Opus 4.7 By the Numbers

Metric	Claude Opus 4.6	Claude Opus 4.7	Source
CursorBench score	58%	70%	Cursor
Visual reasoning	69.1%	82.1%	Nick Saraev (independent)
Task resolution lift	baseline	+14%	Notion
Tool errors per task	baseline	−33%	Notion
BigLaw Bench (high effort)	—	90.9%	Harvey
Input token price	$5 / MTok	$5 / MTok	Anthropic pricing
Output token price	$25 / MTok	$25 / MTok	Anthropic pricing
Days from Opus 4.6 release	—	~3 days	Anthropic

Benchmark data from Anthropic's official partner evaluations published at launch.

The Glasswing Constraint

The most interesting part of Claude Opus 4.7 is what it cannot do. Anthropic explicitly states that during training, they "experimented with efforts to differentially reduce" the model's cybersecurity capabilities. This is a direct consequence of Project Glasswing, announced the week before, which laid out the risks and benefits of AI models for cybersecurity.

Claude Mythos Preview remains Anthropic's most powerful model, but its release stays limited. Opus 4.7 ships with automated safeguards that detect and block prohibited or high-risk cybersecurity requests. It is, in effect, a deliberately constrained release — powerful enough for production software engineering, intentionally weakened on the capabilities Anthropic considers highest-risk.

This is a new kind of product decision in frontier AI. Rather than shipping the most capable model and hoping for the best, Anthropic is staging its release. Security professionals who need the full capability set can apply to the new Cyber Verification Program. Everyone else gets a model that is excellent at coding and vision but deliberately less capable at vulnerability reproduction.

For agencies and development teams, this means Claude Opus 4.7 is the production model. Mythos is the research model. Treating them as the same product line misses the point.

Efficiency Over Capability: The Real Story

Nick Saraev, whose YouTube analysis reached 64,000 views within hours of publication, captured the shift cleanly: Claude Opus 4.7 "does not make things possible anymore. It just makes things slightly more profitable."

That framing deserves attention. The benchmark improvements are real but incremental. Where Claude Opus 4.7 shines is efficiency: fewer tokens per task, fewer tool errors, better instruction following. Devin reports that Opus 4.7 "works coherently for hours, pushes through hard problems rather than giving up." Replit describes it as "the same quality at lower cost." Harvey, the legal AI platform, measured 90.9% on BigLaw Bench at high effort with "better reasoning calibration on review tables." These are not capability stories — they are operational efficiency stories.

The pattern is consistent across every early tester report Anthropic published. Notion: +14% task resolution, one-third fewer tool errors. Cursor: CursorBench from 58% to 70%. An unnamed financial technology platform calls it "a significant leap" for accelerating "development velocity for faster delivery." The common thread is not raw intelligence — it is reliability at scale.

For teams already using Claude in production, the upgrade math is straightforward. Pricing stays identical to Opus 4.6 ($5 per million input tokens, $25 per million output tokens). If your agents run fewer retries and consume fewer tokens per task, your effective cost drops without changing your billing rate.

This is particularly relevant for businesses evaluating whether to build automation with AI agents or traditional workflow tools. The cost-per-task metric shifts meaningfully when the model wastes less compute on failed attempts.

Claude Code v2.1.111: The Trust Layer Ships Alongside

On the same day Opus 4.7 launched, Anthropic shipped Claude Code v2.1.111 with two features that are not performance upgrades — they are trust features. The /less-permission-prompts skill scans your session transcripts and proposes an allowlist of tools the agent can use without asking. The /ultrareview command adds a deeper code review pass.

A v2.1.112 hotfix followed within hours, addressing a "temporarily unavailable" error when using Opus 4.7 in auto mode — confirming just how fast the developer tooling team is iterating.

Three releases in 24 hours is not chaos. It is a coordinated strategy: ship the model and the trust infrastructure together. The bottleneck for enterprise AI adoption has shifted from "can the model do this?" to "can we let the model do this unsupervised?" The /effort slider and permission management tools in v2.1.111 address the second question directly. For teams already investing in AI-assisted development workflows, this trust layer is the missing piece.

What This Means for Your AI Strategy

If you are running Claude Opus 4.6 in production today, the upgrade is low-risk and likely cost-positive. The identical pricing means there is no billing surprise. The efficiency gains on complex, multi-step tasks — Notion measured +14% resolution with 33% fewer tool errors — translate directly to lower operational cost.

If you are evaluating frontier models for new projects, Opus 4.7 is the clearest production recommendation Anthropic has made. Unlike Mythos, which carries the weight of safety debates, Opus 4.7 is explicitly designed for deployment. The Glasswing constraints actually make it a safer bet for enterprise environments where security review processes would flag uncontrolled cyber capabilities.

If you are building agentic workflows — multi-step autonomous tasks that run for hours — the early tester feedback from Devin and Notion suggests Claude Opus 4.7 handles long-horizon autonomy better than any previous Claude model. The combination of better instruction following and fewer tool errors means fewer intervention points, which is the real unlock for scaling agent operations. Solve Intelligence confirmed this in the multimodal domain: Opus 4.7's higher-resolution vision is enabling "best-in-class tools for life sciences patent workflows, from drafting and prosecution to infringement detection."

Consider the competitive context. On the same day Opus 4.7 launched, Alibaba released Qwen3.6-35B — a free, Apache 2.0 model that runs on a laptop with roughly 3.5 billion active parameters. Simon Willison's pelican SVG benchmark showed the local model beating Opus 4.7 on creative generation tasks. When a free local model matches a $15/MTok API on creative tasks within hours of launch, the value proposition of frontier models shifts permanently from "what can it do" to "how reliably does it do it at scale." Opus 4.7's efficiency story is Anthropic's answer to that shift.

The deliberate half-step framing is not modesty. It is Anthropic signaling that the era of capability-first releases is over. The next phase is about deployment reliability, operational efficiency, and controlled release of high-risk capabilities. For teams building on Claude, that is better news than another benchmark record.

Frequently Asked Questions

Is Claude Opus 4.7 better than Claude Mythos Preview?

No — Anthropic explicitly states Opus 4.7 is "less broadly capable" than Mythos Preview. Opus 4.7 excels at software engineering, vision, and instruction adherence, but Mythos retains higher raw capability across a broader range of tasks. Opus 4.7 is the production-optimized model; Mythos is the frontier research model with limited availability.

How much does Claude Opus 4.7 cost?

Pricing is identical to Opus 4.6: $5 per million input tokens and $25 per million output tokens. The efficiency gains (fewer tokens per task, fewer retries) mean your effective cost per completed task likely decreases. Opus 4.7 is available via the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

What is Project Glasswing and how does it affect Opus 4.7?

Project Glasswing is Anthropic's cybersecurity safety framework, announced on April 9, 2026. It led to deliberate constraints on Opus 4.7's cyber capabilities — the model was trained with reduced ability to reproduce security vulnerabilities. Automated safeguards block prohibited cybersecurity requests. Legitimate security professionals can apply for the Cyber Verification Program to access full capabilities.

Should I upgrade from Opus 4.6 to Opus 4.7?

Yes — Claude Opus 4.7 is a recommended upgrade for most production use cases. Early testers consistently report equal or better quality at lower token consumption. Hex notes that "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6," meaning you get Opus 4.6-level output while using fewer compute resources. The upgrade is a drop-in replacement with no pricing changes.

What improved most in Opus 4.7 compared to Opus 4.6?

Claude Opus 4.7's visual reasoning saw the largest measurable gain — independent testing measured a jump from 69.1% to 82.1%. Software engineering benchmarks improved by 12-14% across multiple evaluators. Instruction adherence and tool-use reliability also improved significantly, with Notion reporting 33% fewer tool errors. Agentic terminal coding, however, showed minimal improvement in independent benchmarks.

Claude Opus 4.7 Is Now Live: The Deliberate Half-Step