SAN FRANCISCO — Anthropic has released its latest flagship model, Claude Opus 4.7, which has established a new performance ceiling in automated software engineering. In standardized testing released on April 16, 2026, the model outperformed OpenAI’s GPT-5.4 by a margin of 6.6 percentage points on the industry-standard SWE-bench Pro benchmark.
Performance Data and Benchmarks
The SWE-bench Pro evaluation, which requires AI models to resolve authentic software bugs sourced from open-source repositories, saw Claude Opus 4.7 achieve a success rate of 64.3%. In contrast, OpenAI’s GPT-5.4, released in March 2026, scored 57.7% on the same test.
This shift represents a significant move in the competitive landscape for AI-assisted development. While previous iterations often traded leads within the margin of error, the current data suggests a widening gap in agentic coding—the ability for an AI to plan, execute, and verify code changes independently.
Comparative Analysis: Coding and Reasoning
The results across various technical benchmarks highlight the specific areas where the new Anthropic model has gained ground:
| Benchmark | Claude Opus 4.7 | GPT-5.4 | Difference |
| SWE-bench Pro (Real Coding) | 64.3% | 57.7% | +6.6% |
| SWE-bench Verified | 87.6% | 80.8%* | +6.8% |
| GPQA Diamond (Logic/Reasoning) | 94.2% | 94.4% | -0.2% |
*GPT-5.4 score reflects comparative testing data from April 2026.
While GPT-5.4 maintains a slight edge in graduate-level reasoning (GPQA), the 64.3% score on SWE-bench Pro is the highest recorded for a generally available model. Anthropic’s internal “Mythos” preview remains higher at 77.8%, though it has not yet been cleared for public release.
Technical Enhancements
Beyond raw scores, the 4.7 update introduces several architectural changes aimed at professional workflows:
- Instruction Adherence: Anthropic reports a 14% improvement in multi-step reasoning, with the model less likely to omit specific constraints in complex prompts.
- Enhanced Vision: The model processes images at 3.75 megapixels, triple the resolution of the previous 4.6 version, aiding in front-end development and UI debugging.
- Agentic Reliability: New “Implicit-Need” testing allows the model to infer necessary tool use without explicit user direction, reducing tool-call errors by approximately 66%.
Industry Impact
The rapid succession of model releases—with GPT-5.4 launching just six weeks prior to Opus 4.7—underscores the volatility of the AI tool market. For enterprises integrating these models into production pipelines, the preference appears to be shifting toward “agentic” capabilities, where the AI functions as a semi-autonomous engineer rather than a simple autocomplete tool.
Claude Opus 4.7 is currently available via the Anthropic API and major cloud providers, maintaining the established pricing of $5 per million input tokens and $25 per million output tokens.
Anthropic Co-Founder & CEO Dario Amodei Flickr Picture by TechCrunch Disrupt