When the Model Writes the Exploit

The Timing Problem

OpenBSD is one of the most security-hardened operating systems in the world. Its TCP stack has been reviewed by experienced security engineers, tested by fuzzers, and audited repeatedly over decades. For 27 years, a vulnerability in its Selective Acknowledgement implementation went undetected through all of it.

An AI model found it. It then identified a second bug in the same code path, determined how to chain the two through a signed integer overflow on 32-bit sequence numbers, and produced a proof-of-concept that remotely crashes any OpenBSD machine responding over TCP. The campaign cost under $20,000. No human guided the process after the initial prompt.

That is one of three fully disclosed results from Anthropic’s Claude Mythos Preview, a frontier model Anthropic chose not to release publicly. Instead, the company built a restricted defensive consortium, gave access to roughly fifty organizations, and committed $100 million in usage credits. Anthropic considers the model capable enough to deploy for defense and risky enough to withhold from broad availability.

The Short Version

This is not a story about AI helping with security research. AI has been doing that for some time. On April 7, 2026, Anthropic announced a model that can carry out substantial parts of the vulnerability lifecycle, from discovery through exploitation, with limited human involvement. That compresses the timeline between finding a flaw and having a working attack, and it puts pressure on enterprise processes that were designed around the assumption that exploitation takes longer than discovery.

Anthropic paired the announcement with Project Glass wing, a controlled defensive program with partners including AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, and Palo Alto Networks, plus roughly forty organizations that maintain critical infrastructure software. Post-credit pricing is $25/$125 per million input/output tokens. Logan Graham, Anthropic’s head of offensive cyber research, told NBC News that comparable capabilities could be broadly distributed within six to twelve months, including from non-U.S. companies.

Reuters reported concern from banking-sector experts about implications for legacy-heavy financial environments. The U.S. Treasury Secretary convened a meeting with systemically important banks, treating AI-driven cyber risk as a systemic stability concern. On April 14, SANS, CSA, OWASP, and [un]prompted jointly released an emergency briefing arguing that the discovery-to-exploit timeline has compressed from weeks to hours.

Under the Hood

The scaffold Anthropic describes is straightforward. A container runs in isolation with the target project and source code. Mythos Preview receives a one paragraph prompt asking it to find a security vulnerability, then operates in a loop: reading code, forming hypotheses, running the software to test them, adding debug instrumentation as needed, and repeating until it produces a bug report with a proof-of-concept or concludes there is nothing to find.

Anthropic ran many agents in parallel, each on a different file, pre-ranked by the model based on likely attack surface. A validation agent filtered findings for severity, discarding bugs that were technically real but operationally trivial.

Three vulnerabilities have been disclosed in full because the fixes have shipped.

The OpenBSD SACK bug 27 years old). TCP sequence numbers are 32-bit integers compared using _{(int)(a – b) < 0}, which is correct when values are within 2^31 of each other. Nothing in the code prevented an attacker from placing a SACK block start roughly 2^31 away from the real window. At that distance, the subtraction overflows the sign bit in both comparisons simultaneously, and the kernel concludes the attacker’s start is both below the hole and above the highest acknowledged byte at the same time. The kernel deletes the only SACK hole list entry, writes through the resulting null pointer, and crashes. Remote denial of service, no authentication required.

The FFmpeg H.264 bug 16 years old). A slice ownership table uses 16-bit entries while the slice counter is 32-bit. Initialization via _{memset(…, -1, …)} fills every entry with 65,535 as a sentinel. A frame crafted with 65,536 slices causes slice 65,535 to collide with the sentinel. The decoder treats a nonexistent neighbor as belonging to the current slice, writes out of bounds, and crashes. Introduced in 2003, made exploitable in a 2010 refactor, and missed by five million fuzzer runs on the relevant code path.

The FreeBSD NFS bug 17 years old, CVE2026 4747 .

The RPCSEC_GSS authentication handler copies packet data into a 128-byte stack buffer with a length check allowing up to 400 bytes, leaving 304 bytes of overflow. The compiler skips the stack canary because the buffer is _{int32_t[32]} rather than a character array. FreeBSD does not randomize the kernel load address. The remaining obstacle, a 16-byte GSS handle match, is bypassed through an unauthenticated NFSv4 EXCHANGE_ID call that returns the host UUID and boot time. Anthropic says the model assembled a twenty-gadget ROP chain across multiple packets without human involvement, delivering full root access to an unauthenticated remote attacker.

Anthropic claims thousands more findings across every major OS and browser, including privilege escalation, JIT heap sprays, KASLR bypasses, and authentication bypasses. Fewer than one percent are patched. SHA 3 hashes of undisclosed findings serve as accountability commitments, with details to follow within 135 days of maintainer notification. In 198 manually reviewed reports, security contractors agreed with the model’s severity assessment 89 percent of the time.

On a Firefox JavaScript engine exploit task, Anthropic says Opus 4.6 produced two working exploits from several hundred attempts while Mythos Preview produced 181 and achieved register control in 29 more. On an internal OSS Fuzz evaluation across 7,000 entry points, Opus 4.6 managed one tier-3 crash; Mythos Preview achieved ten full control-flow hijacks on fully patched targets.

What the Outside Record Says

Anthropic’s claims do not stand alone, but they are not fully corroborated.

AISLE, an independent security firm with over 180 externally validated CVEs across more than thirty projects, isolated the vulnerable code from Anthropic’s showcased findings and ran it through eight smaller, cheaper models in single zero-shot calls with no scaffold or tooling. Every model detected the FreeBSD overflow, including a 3.6-billion-parameter model at $0.11 per million tokens. A 5.1 billion-parameter open-weights model recovered the full OpenBSD SACK chain.

AISLE describes the frontier as “jagged”: detection capability does not scale smoothly with model size or price, and rankings reshuffle across task types. Their position is that the competitive advantage lies in the orchestration system, the validation pipeline, and maintainer trust, not in any single model.

There is a boundary around what AISLE tested, and they are explicit about it. They did not replicate the full process of scanning a repository from scratch, discovering a vulnerability, and exploiting it autonomously. They tested whether other models could detect what Mythos detected once pointed at the relevant code. That is a meaningful result for the discovery side but does not address the full autonomous workflow or exploitation.

The separation point is on the exploitation side. Discovery, once code is isolated, is broadly accessible with cheap models. Exploitation, where a bug becomes a reusable primitive and a constrained delivery mechanism is constructed around it, is where frontier capability separates. For defenders focused on patching, this is a useful distinction. For threat modeling against adversaries with frontier access, it is less useful.

Bruce Schneier, writing on April 13, noted that the capability gains are real, including autonomous exploit construction and multi-vulnerability chaining. He also noted that defenders hold a current advantage because finding for the purpose of fixing is easier than finding plus exploiting, though the advantage will shrink as more powerful models become available. Schneier flagged the absence

of false-positive-rate data, arguing that outsiders cannot yet judge how representative the showcased successes are.

The Speed Gap

Discovery is accelerating. Remediation is not keeping pace.

Anthropic’s own disclosure process makes the gap visible. Thousands of potential vulnerabilities identified, fewer than one percent patched. The constraint is human remediation capacity: validating a finding, writing a correct fix, testing for regressions, and deploying without breaking dependent systems. That constraint is most acute in open-source projects maintained by small volunteer teams, which is where several of Anthropic’s disclosed bugs were found. The $4 million in open-source donations acknowledges the problem but does not resolve the structural mismatch between AI-scale discovery and human-scale repair.

On April 12, SANS, CSA, OWASP, and [un]prompted published “The AI Vulnerability Storm: Building a Mythos-Ready Security Program.” Contributors include former CISA Director Jen Easterly, former National Cyber Director Chris Inglis, former NSA Cybersecurity Director Rob Joyce, and Google CISO Heather Adkins. Over 250 CISOs reviewed it. The briefing introduces “VulnOps,” treating vulnerability management as a continuous operating capability rather than a periodic assessment, with actions across three time horizons: this week, within 45 days, and over the next year.

What This Changes For operations

The operational question is whether existing patching programs were designed for this volume.

Patch cadences built around monthly or quarterly cycles assume a predictable rate of critical findings. A sustained increase in AI-driven disclosures moves the constraint from detection to remediation. The most exposed organizations are those with large legacy portfolios, slow release pipelines, or deep dependence on volunteer-maintained open-source components.

Legacy environments carry particular risk. Financial institutions operate stacks that mix modern systems with software that is decades old, and fast remediation in those environments requires architectural changes, not just process adjustments. Anthropic’s disclosed examples are consistent with this concern: all three were long-lived flaws in mature, heavily reviewed code.

If models can find vulnerabilities in internal code, third-party dependencies, and binaries without source, enterprise exposure extends across the full software estate, including products that were previously difficult to analyze at scale. The SANS/CSA briefing recommends virtual patching through WAFs and runtime protection as interim defense while formal fixes are developed, and identifies Software Bills of Materials as operationally urgent because AI agents need structured dependency maps to assess attack surfaces.

What Remains Uncertain

The public record has gaps. The “thousands” of zero-day findings are attested only by Anthropic, with SHA 3 commitments as the sole external accountability mechanism. The severity agreement rate comes from a 198-report internal sample with no independent audit. The six-to-twelve month proliferation timeline is an expert estimate from Anthropic’s researchers, partially supported by AISLE’s replication on the discovery side but not for exploitation.

Several questions remain open. If AI generates a continuous stream of critical findings in volunteer-maintained projects without proportional corporate funding, the result is a growing backlog of known but unpatched vulnerabilities in foundational software. If a Glass wing partner’s use of Mythos surfaces a vulnerability that is exploited before a patch ships, existing legal frameworks do not clearly assign responsibility. And if AI enables rapid reverse engineering of patches to reconstruct underlying flaws, the window between private disclosure and public fix may narrow significantly.

The underlying asymmetry remains. Discovery is accelerating. Remediation still depends on human validation, regression testing, change control, and deployment realities. The question is not whether AI can find more bugs. It is whether enterprise security organizations can absorb that volume without the bottleneck moving downstream.

Blogs

White Papers

Case Studies

Blogs

White Papers

Case Studies

When the Model Writes the Exploit

When the Model Writes the Exploit

The Timing Problem

The Short Version

Under the Hood

What the Outside Record Says

The Speed Gap

What This Changes For operations

What Remains Uncertain

Further Reading

Contact Us

Contact Us

Contact Us

Contact Us