AI Model Release Tracker: Opus 4.8's misalignment rates are similar to Cloud Mythos preview

Elyse Betters Picaro/ZDNET

Follow ZDNET: Add us as a favorite source On Google.

AI laboratories are constantly sending new models. However, apart from being better and faster than its predecessors, not every new model is guaranteed to be a major change, no matter how poetic the company’s PR is about them. The strengths of models really emerge in context: where do competing models lack or excel? Which models have excellent specifications and which models are in line with industry standards?

Also: How we test AI at ZDNET

Our model release tracker helps you understand where models stand relative to each other, and whether they’re worth a deeper look. While we don’t test every model or model update on this list, we’ll always cover the key elements you need to know, along with our hands-on expert testing where applicable. We also include an expert score for some models. Do you want to know how we test AI? See this description of our process.

Here are some of the biggest model releases of 2026 so far and what to know about them. We’ll update this list whenever any notable new models arrive.

cloud opus 4.8

Anthropological 28 May 2026

What it does: According to Anthropic, replacing Opus 4.7 starting today (at the same price), Opus 4.8 offers faster thinking modes for a third of the previous version’s cost. Like most of Anthropic’s models, the 4.8 prioritizes coding capabilities, scoring higher than the 4.7 on two coding benchmarks but not entirely besting OpenAI’s GPT 5.5. “This reaches new heights on our measures of pro-social qualities such as supporting user autonomy and acting in the user’s best interest,” the company said in the release, though the definition of what that means remains vague.

Also: Anthropic launches Opus 4.8, with great feature Integrity

why it matters: Anthropic has always prioritized model safety and interpretability, but it appears it is placing further emphasis on that standard with this release. The company said Opus 4.7 had a 92% integrity rate, in addition to being less chattering and hallucination-prone overall. The fact that it claims that 4.8 shows a “substantially” lower rate of misalignment than 4.7 indicates an increasingly high standard for model security, especially because Anthropic compared 4.8’s alignment to Mythos Preview.

GPT-5.5 Instant

OpenAI | 5 May 2026

what does it do: OpenAI said in its announcement OpenAI’s recently released lightweight version of GPT-5.5 is less functional than its predecessor, GPT-5.3 Instant. It also claimed less hallucinations and better factuality, saying “GPT‑5.5 Instant produced 52.5% fewer hallucination claims than GPT‑5.3 Instant on high-stakes cues covering areas such as medicine, law, and finance.”

Also: Anthropic’s mythos is evolving faster than expected, according to AI security agency report

why this cases: GPT-5.5 replaces GPT-5.3 as the default model in Instant ChatGPT. Then, while the expectation is that each new AI model becomes more efficient, easier to use, and creates less content, a significant improvement in the hallucinogenicity of the models used by most people for faster queries could mean less misinformation spreading among the public. This is particularly important given, for example, how many people are using ChatGPT for everyday health-related questions.

(Disclosure: ZDNET’s parent company Ziff Davis filed a lawsuit against OpenAI in April 2025, alleging it infringed Ziff Davis copyrights in the training and operation of its AI systems.)

Nemotron 3 Nano Omni

nvidia | 28 April 2026

What it does: The latest in Nvidia’s open Nemotron family, it provides multimodal input to model agents. This means they can “understand and reason about visual, audio, and textual input within the same shared perception-to-action loop,” according to nvidiaSo that multiple capabilities can be integrated into a single system.

Also: AI is an arms race, and the US wants $9 billion in Nvidia superchips to keep up

why this Matters: Typically, agents’ systems need to use separate models for speech, vision, and text, meaning they have to jump across documents, video, and audio to complete multi-step tasks. This slows down workflow, weakens the collection of reference agents, and increases estimation costs. Nvidia’s approach, if it works, will streamline this process and reduce token usage, saving you money. Try it on Hugging Face.

GPT-5.5

OpenAI | 23 April 2026

Expert Score: 93/100

What it does: ZDNET’s resident tester David Gewirtz gave the GPT-5.5 an A- score technically, but said it “could be described as better and faster than GPT-5.4,” which is the minimum expectation for a new model. In particular, however, the model became better at agentic coding, clearly identifying concepts, scientific research, and factual accuracy.

Also: I put GPT-5.5 through a 10-round test: it scored 93/100, losing points only due to excitement

why it matters: While the model itself may not have progressed much from its immediate predecessor, the quick transition from 5.4 to 5.4 – less than two months – indicates how rapidly Agentic Coding is speeding up OpenAI’s model release cycle. As David Gewirtz broke in, the company, like other leading labs using AI to create AI, is shipping updates at a rapidly increasing rate.

chatgpt images 2

OpenAI | 23 April 2026

What it does: soon after sunset soraOpenAI, its generative video model and social platform, announced the somewhat confusing Images 2. ZDNET model tester David Gewirtz took an early look at Images 2 ahead of release and was impressed. Although he didn’t give this model a formal expert score, he said it’s fun, a big leap forward, and really useful for work.

why it matters: OpenAI looked to be exiting the more consumer-minded AI product game when it spun off Sora after falling behind Anthropic in securing lucrative enterprise contracts. OpenAI still came out with Image 2 within that redirect narrative which indicates that it considers image generators relevant enough for enterprise AI – especially on the heels of Anthropic’s cloud design.

cloud opus 4.7

Anthropological 16 April 2026

What is this? does: coming sooner than expected Following Opus 4.6, this model boasts new heights in honesty, less chatter and hallucinations. It also appears to have cybersecurity capabilities, as it supports the new Cloud Security, which was released shortly after the model was released – but no, it’s not Mythos, as many suspected.

Also: Anthropic’s new cloud security tool scans your codebase for flaws — and helps you decide what to fix first

why it matters: Hallucinations and hallucinations are among the thorniest, hardest-to-solve issues that plague even the best models. For Anthropic to claim such significant gains in those areas is no small feat for an AI lab that takes security seriously.

Cloud Mythos (preview)

Anthropological 7 April 2026

What is this? does:This is hard because the mythos isn’t really available to the public. Anthropic caused quite a media storm when it released a new general-purpose model much more powerful than usual. Although this model is clearly a step change from earlier anthropic models, the company was particularly concerned because of the security threat it posed, stating that “It is surprisingly capable of computer security tasks.”

In response, Anthropic led Project Glasswing, a collaborative effort with several rival AI labs including Google, Nvidia, and Microsoft, as well as security authorities like Palo Alto Networks, “to help secure the world’s most critical software, and prepare the industry for the practices we will all need to adopt to stay ahead of cyberattackers.”

Also: Apple, Google and Microsoft join forces with Anthropic’s Project Glasswing to protect the world’s most critical software

why it matters: If we believe Anthropic’s guidance that Mythos poses a significant threat to the world’s software – so much so that only a select few partners can use it – cybersecurity tools may not be ready to meet the rapidly evolving range of model capabilities. Mythos may not be the only model of its capability, but it is the first model to come after other labs have achieved similar success.

Now, just a few weeks after its release, Mythos is helping catch a large number of software bugs.

GPT-5.4

OpenAI | 5 March 2026

What is this? does: OpenAI released this new model, barely three months after GPT-5.2, specifically designed for professional work. According to the company’s own testing (which should always be taken with a grain of salt until verified by a third party), GPT-5.4 matches or outperforms human professionals in 83% of cases.

why it matters: As AI companies focus more on gaining enterprise trust (and contracts) while appreciating what agentic AI can do, they need models that can handle complex work-related tasks with minimal risk, delay, or extremely high costs. Any model advancement that shows proficiency in professional workflows has a better chance of being taken seriously by companies struggling to adopt AI, although nothing guarantees seamless integration.

Also: OpenAI’s new GPT-5.4 outperforms humans on pro-level tasks in tests – by 83%

cloud opus 4.6

Anthropological February 5, 2026

What is this? does: This model quickly redefined the standard for autonomous agentic work, especially coding. It’s no surprise that Anthropic excels at building models, especially adept at programming tasks. Opus 4.6 also demonstrated overall improvements in complex, long-running tasks.

why it matters: Opus 4.6’s ability to better handle tasks on its own means you can reliably put more of your workflow into it – something that agent offerings typically struggle with.

Also: Anthropic says its new Cloud Opus 4.6 can make your work better on the first try

gpt-5.3-codecs

OpenAI | February 5, 2026

What is this? does: This new coding model – which OpenAI itself helped create and debug – can be interrupted and redirected mid-task, which, if true, is a huge boon for developers using it with trial and error on complex or shifting projects. The GPT-5.3-codec claims a running time of over a day and a better capture of user intent.

Also: OpenAI’s new Spark model code is 15 times faster than GPT-5.3 codecs – but there’s a problem

why it matters: OpenAI is trying to catch up to Anthropic’s lead in agentic coding (and, coincidentally or not, released the 5.3 codec on the same day that Anthropic launched Opus 4.6). While ZDNET experts often prefer Cloud Code over other tools for Vibe coding, OpenAI’s rumored move toward enterprise clients and away from fun consumer tools could finally close that gap.

What's Hot

Putin’s confusion has increased due to the ‘clan’ of the elite class conspiring to murder. world | news

Los Gatos ‘party mom’ Shannon O’Connor gets long prison sentence

Samsung Galaxy Watch 9 codename hints at what’s to come

Samsung Galaxy Watch 9 codename hints at what’s to come

Perplexity launches Bumblebee: how its new read-only dev scanner is different from ChainGuard

This new projector lineup is all about outdoor summer screenings

Google has some smart new ideas for its Contact Wear OS UI

BYD unveils a new compact hybrid car aimed at European markets

Some Android tablets can’t open Chrome after latest update

Christian college campus in Pace gets zoning board approval

Scientists discover a universal temperature curve that governs all life

In praise of hard work

AAUW Amador Branch Complaint and Coveration – Tuesday, March 24 | on the vine

Putin’s confusion has increased due to the ‘clan’ of the elite class conspiring to murder. world | news

Los Gatos ‘party mom’ Shannon O’Connor gets long prison sentence

Samsung Galaxy Watch 9 codename hints at what’s to come

News

CATEGORIES

USEFUL LINK

Subscribe to Updates

What's Hot

AI Model Release Tracker: Opus 4.8’s misalignment rates are similar to Cloud Mythos preview

cloud opus 4.8

GPT-5.5 Instant

Nemotron 3 Nano Omni

GPT-5.5

chatgpt images 2

cloud opus 4.7

Cloud Mythos (preview)

GPT-5.4

cloud opus 4.6

gpt-5.3-codecs

Related Posts

News

CATEGORIES

USEFUL LINK

Subscribe to Updates