Anthropic's mythos is evolving faster than expected, AI security agency reports

Eugene Mymrin/Moment via Getty Images

Follow ZDNET: Add us as a favorite source On Google.

ZDNET Highlights

The latest version of Cloud Mythos has already been upgraded.
External researchers found that it achieved several firsts in testing.
AI capabilities may be improving faster than anticipated.

Anthropic’s Cloud Mythos, which the company says is too powerful for general release, appears to have already gained new capabilities.

one in blog Posting on Wednesday, the UK AI Security Institute (AISI) reported that it had tested a new version of Mythos that outperformed both its previous results and OpenAI’s GPT-5.5 – just a month after Mythos’ initial release.

Also: Apple, Google and Microsoft join forces with Anthropic’s Project Glasswing to protect the world’s most critical software

The blog authors wrote, “The new Mythos Preview checkpoint completed both of our cyber ranges, solving ‘The Last Ones’ in 6 out of 10 attempts and the previously unsolved ‘Cooling Tower’ in 3 out of 10 attempts.” “This was the first time a model completed the second of our two cyber ranges.”

When Anthropic first announced Mythos Preview and Project Glasswing – the cybersecurity testing alliance it formed with rival tech companies and AI labs, to which it gave limited access to Mythos – last month, the UK AISI rated itFinding that the model “represents a step forward over previous frontier models in a scenario where cyber performance was already rapidly improving.”

That third-party perspective helped balance claims that the hype around the Mythos was either purely marketing or, at the other extreme, signaled a catastrophic shift in AI capabilities. The truth about what the model can do is likely to lie somewhere in the middle.

Also: How to learn cloud code for free with Anthropic’s AI courses – one only took me 20 minutes

AISI’s updated testing also exemplifies that capability improvements are not limited to individual model releases, but can also occur within versions of a single model.

rapidly growing cyber threat

AISI noted that AI models are rapidly advancing in their ability to handle cyber tasks with serious implications for cybersecurity, especially given the ability of Mythos to detect software vulnerabilities.

“In February 2026, we internally estimated that the duration of cyber tasks that could be completed by AI models would double every 4.7 months by the end of 2024 – already an acceleration from our November 2025 8-month estimate,” the blog authors wrote. “Since then, AISI reported on two new models, Cloud Mythos Preview and (OpenAI’s) GPT-5.5, both of which significantly exceed doubling rate trends.”

Also: Third major Linux kernel flaw found in two weeks – thanks to AI

The authors said it is unclear whether this trend will persist or whether these findings indicate a permanent increase. The Mythos and GPT-5.5 models may be notable breaks from the overall pattern of development.

Nevertheless, AISI clarified that there were many unknowns that could not be determined in its testing. The tests limited the tasks to 2.5 million tokens, allowing researchers to better compare performance results over time. “This naturally demonstrates what marginal models can do,” he wrote.

The blog continued, “Mythos Preview and GPT-5.5 have large upper bound error bars due to our narrow cyber suite’s near 100% success rate on the longest tasks, even with the 2.5M token limit.” “Our experiments are not long enough to determine how rapidly the model’s reliability will deteriorate over high task periods. This puts some of the newest models at the measurement limits of our narrow test suite.”

Also: I put GPT-5.5 through a 10-round test: it scored 93/100, losing points only due to excitement

While this makes it harder to measure the model’s failure point, it also means that the model’s success rate on these tasks would be much higher than without the token limit – so high, in fact, that “it becomes impossible to calculate the time horizon.” Models with greater token access and complex agent infrastructure will be more efficient.

The blog states, “The 2.5M token limit is relatively low – in our Cyber Range experiment we use up to 100M tokens and find that beyond that budget there is still potential for performance improvement, especially for recent models, which disproportionately benefit from the higher token limit.”

What's Hot

Tilak Verma overwhelmed with match winning bravery against PBKS

Tencent, Alibaba turn to local AI chips as Nvidia uncertainty grows

Insta360 Go 3S Retro Bundle removes digital display, adds waist-level optical viewfinder

Insta360 Go 3S Retro Bundle removes digital display, adds waist-level optical viewfinder

Google just turned your Pixel 10 into a phone that thinks ahead for you

Meta’s Ray-Ban (Gen 2) smart glasses are on sale for the first time

poolroboter von beatbot jetzt deutlich gunstiger

SL-A vs NZ-AW Dream11 Prediction Today Match, Dream11 Team Today, Fantasy Cricket Tips, Playing XI, Pitch Report, Injury Update- New Zealand A Women Tour of Sri Lanka 2026, 2nd OD

One UI 9 finally removes those annoying app handles from your screen

Christian college campus in Pace gets zoning board approval

Scientists discover a universal temperature curve that governs all life

In praise of hard work

AAUW Amador Branch Complaint and Coveration – Tuesday, March 24 | on the vine

Tilak Verma overwhelmed with match winning bravery against PBKS

Tencent, Alibaba turn to local AI chips as Nvidia uncertainty grows

Insta360 Go 3S Retro Bundle removes digital display, adds waist-level optical viewfinder

News

CATEGORIES

USEFUL LINK

Subscribe to Updates

What's Hot

Anthropic’s mythos is evolving faster than expected, AI security agency reports

ZDNET Highlights

rapidly growing cyber threat

Related Posts

News

CATEGORIES

USEFUL LINK

Subscribe to Updates