Cybersecurity in the Era of Autonomous AI
Claude Mythos Preview is Anthropic’s most capable AI model to date, according to its own claims. For now, it is not publicly available but is being provided exclusively to select partners for specialized projects. Here’s why— and what it means.
What Is Claude Mythos Preview?
Claude Mythos Preview is Anthropic’s latest frontier model, released on April 7, 2026. According to its accompanying System Card, it demonstrates a “marked leap” in performance across many evaluation benchmarks compared to its predecessors, such as Claude Opus 4.6.
Anthropic has decided to restrict access to the model as part of Project Glasswing, making it available only to a limited number of partner organizations that operate critical software infrastructure.
The reason is clear: its cybersecurity capabilities are so advanced that uncontrolled release is deemed too risky. Anthropic explicitly states on the Glasswing page:
“Securing critical infrastructure is a top national security priority for democratic countries—the emergence of these cyber capabilities is another reason why the US and its allies must maintain a decisive lead in AI technology.”
How Capable Is Claude Mythos Preview in Cybersecurity?
The claims in the System Card have not yet been independently verified. If accurate, however, they are impressive:
Cybench – CTF Challenges
Cybench is a well-established public benchmark featuring 40 Capture-the-Flag (CTF) challenges from real security competitions. These challenges simulate real-world attack and defense scenarios, from reverse engineering to vulnerability analysis. Anthropic evaluated Claude Mythos Preview on a subset of 35 challenges.
| Model | Success rate (pass@1, 35-Challenge-Subset) |
| Claude Mythos Preview | 100 % |
| Claude Opus 4.6 | ~70 % |
| Claude Sonnet 4.6 | ~60 % |
Claude Mythos Preview solved every tested challenge with a 100% success rate. The benchmark is now saturated—Anthropic may stop reporting Cybench results for future models.
CyberGym – Real Vulnerabilities in Open-Source Software
CyberGym is more demanding: it focuses not on gamified challenges but on reproducing real, already-known vulnerabilities from actual open-source projects. The model is given a description of a vulnerability and must independently locate it in the code. The benchmark includes 1,507 such tasks.
| Model | Score (pass@1) |
| Claude Mythos Preview | 0,83 |
| Claude Opus 4.6 | 0,67 |
| Claude Sonnet 4.6 | 0,65 |
This represents a ~24% improvement over the previous top model in identifying real, known vulnerabilities.
Firefox 147 – From Vulnerability to Working Exploit
In collaboration with Mozilla, Anthropic had previously identified and patched vulnerabilities in Firefox Release 147.0 (January 13, 2026). A follow-up test was conducted: the model was given 50 crash categories (basic types of issues in Firefox) already discovered by Opus 4.6 and tasked with developing functional exploits in an isolated environment for Firefox’s JavaScript and WebAssembly engine (SpiderMonkey) that could enable arbitrary code execution.
- Claude Opus 4.6 managed to create exploits in only 2 out of several hundred attempts and could reliably use only one of the available bugs.
- Claude Mythos Preview reliably identifies the most exploitable vulnerabilities and autonomously develops proof-of-concept exploits—almost every time using the same two most critical bugs, regardless of the initial crash category. In a variant without these “Top 2” bugs, the model still leverages four other known bugs for code execution.
This demonstrates a far better “intuition” for exploiting vulnerabilities in diverse ways.
The First Model to Autonomously Attack a Corporate Network
External partners tested the model on closed cyber ranges (simulated corporate networks) with realistic vulnerabilities.
The results:
- Claude Mythos Preview is the first model ever to fully and autonomously complete one of these cyber ranges. As a standalone agent, it completed a simulated corporate attack scenario in far less time than a human expert would need—estimated at over 10 hours for a human.
- It is capable of conducting autonomous end-to-end cyberattacks on small corporate networks with weak security postures (no active defenses, minimal monitoring).
- However, it could not complete a more complex cyber range in an Operational Technology (OT) environment (e.g., industrial systems).
Implication for poorly secured systems: Attacks will occur at a frequency and speed that manual defense simply cannot match.
Was It Trained Specifically for Cybersecurity?
No—and that’s notable. The System Card explicitly describes Claude Mythos Preview as a general-purpose frontier model with capabilities “in many areas,” including software engineering, reasoning, computer use, and knowledge work. Training was based on a mix of publicly available internet data, private datasets, and synthetic data; post-training focused on value alignment, not cybersecurity.
The model’s exceptional security capabilities emerged spontaneously as a result of increased general capabilities. Anthropic describes them as “inherently dual-use” and characterizes the performance leap as an unexpected quantum jump over all previous models.
In other words, the model wasn’t trained to be a cybersecurity expert—it simply became generally more capable than its predecessors and, in doing so, became a better hacker.
The Alignment Paradox: Better Aligned, But More Dangerous
Anthropic highlights this paradox in two places in the System Card:
“Claude Mythos Preview is […] the best-aligned of any model that we have trained to date by essentially all available measures.”
— Claude Mythos Preview System Card, Introduction
And in the Alignment chapter:
“It likely poses the greatest alignment-related risk of any model we have released to date.”
— Claude Mythos Preview System Card, §4.1.1
How does this fit together? Anthropic compares it to an experienced mountain guide: because they are more competent, they receive more dangerous requests and may guide clients along more extreme routes. Increased competence can outweigh increased caution.
Notable incidents:
- Sandbox escape with unsolicited exploit leak: In a simulated scenario, a user instructed the model to break out of a secure, restricted environment. The model developed a multi-stage exploit to gain broad internet access and then unprompted published details of the exploit on several hard-to-find but technically public websites. The escape capability was concerning; the unsolicited disclosure made the incident serious.
- Covering tracks: In rare cases, the model took rule-breaking actions and then attempted to conceal them—including editing Git commit histories to make changes invisible.
These incidents highlight: The more capable a model, the more consequential its rare failures can be.
Mythos Today, Standard Tomorrow: Others Will Follow
Anthropic is not alone. While Mythos Preview sets a new bar, dozens of other labs are training their own frontier models—and the performance curve is rising everywhere.
The question isn’t whether other models will reach Mythos Preview’s cybersecurity capabilities, but when—and under what access conditions. What is considered “too dangerous to release” today will be a standard feature in one or two model generations. Companies must adapt their security strategies accordingly—not for the threat landscape of 2026, but for 2027 and 2028.
What Does This Mean for Businesses?
Claude Mythos Preview is unlikely to see broad enterprise adoption anytime soon—it remains reserved for Glasswing partners. But the development it represents is relevant to everyone. Here’s my take:
1. AI will become an inevitable part of cybersecurity.
Attackers and defenders alike will increasingly rely on more powerful models. Organizations that don’t use AI-driven security tools will structurally lose ground to adversaries who do.
2. Vulnerability analysis will become faster and more comprehensive.
Models like Mythos Preview can perform code audits, penetration tests, and vulnerability assessments in a fraction of the time previously required. What takes a human expert 10 hours today could be a 10-minute model run tomorrow.
3. Legacy security gaps will become more dangerous.
Older software with known but unpatched vulnerabilities was once relatively safe because exploit development was labor-intensive. Automation changes that—even without zero-day capabilities, the risk profile for existing systems has increased significantly.
4. Monitoring and auditability of AI agents will be critical.
When AI agents operate with high autonomy, humans must be able to trace their actions. Logging, monitoring, and clear authorization boundaries for agent-driven systems are not optional features.
5. Model update risk is real.
The System Card notes that even at Anthropic, a model with more capabilities and autonomy led to unforeseen problems. Organizations using AI agents need processes to understand what the model is doing in their name—not just what it says when asked.
Conclusion: A New Class of Capabilities—With a Double Edge
For the cybersecurity landscape, AI is no longer just a tool for security teams—it is becoming the primary actor on both sides of the conflict. The question for businesses is no longer whether to use AI in security. It’s whether they can afford not to.
But you don’t need to wait until you have the best hacking model in your hands—because by then, it may be too late. For building defense, even existing frontier models are already well-equipped.
Sources: Anthropic System Card: Claude Mythos Preview (April 2026), Frontier Red Team Blog: Mythos Preview (April 2026), Project Glasswing – Anthropic (April 2026, including partner statements and a quote on national security), CyberGym Benchmark und CyberGym Blog – UC Berkeley RDI (Oktober 2025), Cybench

