How Mozilla's Mythos AI Found 271 Firefox Vulnerabilities with Minimal False Positives

By ✦ min read

When Mozilla's CTO recently declared that AI-assisted vulnerability detection meant "zero-days are numbered" and that defenders finally had a decisive advantage, skepticism was understandably high. Many saw it as just another tech hype cycle—cherry-picked results and missing fine print. But Mozilla has now backed up that claim with real data. Over two months, their team used Anthropic's Mythos AI model to identify 271 security flaws in Firefox, with what they describe as "almost no false positives." This behind-the-scenes look reveals how improved models and a custom-built "harness" transformed AI vulnerability detection from a source of unreliable noise into a powerful tool. Here’s how they did it.

What is Mythos and how did Mozilla use it to find Firefox vulnerabilities?

Mythos is an AI model developed by Anthropic, designed specifically for identifying software vulnerabilities. Mozilla integrated it into their security workflow by building a custom "harness"—a specialized framework that feeds Firefox source code to Mythos in a structured way. This harness allows the AI to analyze code at scale, flagging potential bugs in a format that developers can easily review. Over a two-month period, Mythos identified 271 distinct vulnerabilities in Firefox. The key breakthrough? The system produced virtually no false positives, meaning every report from Mythos turned out to be a real security issue. This marks a dramatic shift from earlier AI-based tools that often buried developers in unreliable, hallucinated bug reports.

How Mozilla's Mythos AI Found 271 Firefox Vulnerabilities with Minimal False Positives — Source: feeds.arstechnica.com

Why were previous AI vulnerability detection attempts unreliable?

In earlier experiments, Mozilla engineers found that AI-assisted vulnerability detection was plagued by what they called "unwanted slop." The typical process involved prompting a model to analyze a block of code, and the model would generate plausible-sounding bug reports at an unprecedented scale. However, upon human investigation, a large percentage of those reports contained hallucinated details—errors that looked real but weren't. Developers had to spend significant time manually verifying each report, often finding that the AI had imagined facts, misidentified root causes, or even invented vulnerabilities that didn't exist. This made the tools more of a burden than a help. The high false-positive rate eroded trust and meant that human experts still had to do most of the heavy lifting.

How did Mozilla achieve "almost no false positives" with Mythos?

Mozilla attributes their success with Mythos to two main factors: improvements in the underlying AI model and the development of a custom harness. First, Anthropic's Mythos itself represents a generational leap in AI vulnerability detection—it’s more accurate and less prone to hallucination than previous models. But the real secret weapon was Mozilla's custom harness. This harness structures the way code is presented to Mythos, breaking it into manageable chunks and providing contextual information that guides the model’s analysis. It also includes validation steps that filter out likely false positives before they reach human reviewers. By combining a better model with smarter integration, Mozilla transformed AI vulnerability detection from a noisy, unreliable process into a precise, high-confidence tool that developers could trust without spending hours double-checking every alert.

What bold claim did Mozilla's CTO make about AI and zero-days?

Last month, Mozilla’s CTO made a striking statement: "Zero-days are numbered" and "defenders finally have a chance to win, decisively." This was a direct reference to AI-assisted vulnerability detection. The claim stirred up skepticism because it echoed a common pattern in tech—where impressive but cherry-picked results are presented without the nuances of real-world performance. However, Mozilla’s recent disclosure provides concrete evidence to support that optimism. By showing that an AI system can autonomously discover hundreds of real vulnerabilities with essentially no false positives, Mozilla has demonstrated that the technology has matured beyond proof-of-concept. The CTO’s statement now seems less like hype and more like a realistic outlook, grounded in actual engineering outcomes.

How does the Mythos system work in detail to analyze Firefox code?

Mythos operates by analyzing Firefox’s source code using a custom harness developed by Mozilla. The harness pre-processes the code—splitting it into logical sections (like functions, classes, or modules) and attaching metadata such as variable types, function signatures, and security-relevant comments. This structured input is then fed to Mythos, which applies a specialized AI model trained to identify patterns indicative of vulnerabilities—like buffer overflows, use-after-free errors, or insecure API calls. The model outputs a report for each suspected bug, including a description, location, and severity. By design, the harness also includes a post-processing step that cross-references findings with known code patterns and reduces false positives. The final list of alerts is then sent to Mozilla’s security team for validation. Over two months, this pipeline produced 271 verified vulnerabilities, with the team reporting almost no false positives.

What challenges did Mozilla face with vulnerability detection before Mythos?

Before Mythos, Mozilla’s vulnerability detection relied heavily on traditional methods—manual code review, static analysis tools, and community bug reports. These approaches, while effective, were slow and labor-intensive. Earlier experiments with AI vulnerability detection introduced new problems: the models would generate vast numbers of false positive reports, often with hallucinated details that wasted developer time. Engineers described the output as "unwanted slop"—plausible-sounding but inaccurate. Handling those unreliable reports required significant manual effort to separate real bugs from hallucinations. This friction meant that even when the AI occasionally found a real vulnerability, the cost of verifying everything else outweighed the benefits. Mythos solved that problem by drastically reducing false positives, making AI a practical addition to Mozilla’s security arsenal rather than a distraction.

What does this mean for the future of software security?

Mozilla’s success with Mythos suggests that AI-assisted vulnerability detection is entering a new era. With near-zero false positives and the ability to scale to large codebases like Firefox, this technology could shift the balance in cybersecurity. Defenders may finally have a tool that finds bugs before attackers exploit them, potentially reducing the number of zero-day attacks. Mozilla’s approach—combining improved models with custom integration—provides a blueprint for other organizations. If widely adopted, AI vulnerability detection could make software inherently more secure by catching flaws early in the development cycle. However, challenges remain: the technology needs to be adapted to different codebases, and trust in AI outputs must continue to grow. But for now, Mozilla has shown that the hype around AI for security has real substance.

Tags: