Follow ZDNET: Add us as a favorite source On Google.
ZDNET Highlights
- AI is getting better at smaller tasks, but still lags behind in long-term analysis.
- The consequences of prolonged interaction with AI could be disastrous.
- Use AI as a tool for well-defined tasks, and avoid falling into the rabbit hole.
It is better to do a little good than to do a lot of bad. So said the great philosopher Socrates, and his advice can apply to your use of artificial intelligence, including chatbots like OpenAI’s ChatGPT, or Perplexity, as well as agentic AI programs increasingly being tested in the enterprise.
AI research increasingly shows that the safest and most productive course with AI is to use it for small, limited tasks, where the outcomes can be well defined, and the results can be verified, rather than having extensive interactions with the technology over hours, days, and weeks.
Also: Asking AI for medical advice? A Doctor Explains There’s a Right and Wrong Way
Extended interactions with chatbots like ChatGPT and Perplexity can lead to misinformation at a minimum, and in some cases, Maya And Death. Technology is not yet ready to take on the demands of the most sophisticated types of logic, reasoning, common sense and deep analysis – areas where the human brain reigns supreme.
(Disclosure: ZDNET’s parent company Ziff Davis filed a lawsuit against OpenAI in April 2025, alleging it infringed Ziff Davis copyrights in the training and operation of its AI systems.)
We are not yet at AGI (Artificial General Intelligence), which is considered to be human-level capabilities of AI, so it is better to keep the limitations of the technology in mind when using it.
Simply put, use AI as a tool, rather than letting yourself get trapped in the rabbit hole and lost in endless rounds of AI conversations.
What AI does well – and not so well
AI performs well in simple tasks, but performs poorly in complex and deep types of analysis.
The latest examples are highlights from this week’s release Annual AI Index 2026 From a group of human-centered AI scholars at Stanford University.
On the one hand, editor-in-chief Sha Sajadieh and his colleagues explain that agentic AI is becoming increasingly successful at tasks such as finding information on the Web. In fact, agents are closer to human-level in routine online processes.
Also: 10 ways AI could cause unprecedented damage
In three benchmark tests – GAIA, osworldAnd WebArena – Sajadieh and team found that agents are performing human-level on multi-step tasks such as opening a database, applying a policy rule, and then updating a customer record. On the GAIA test, the agents’ accuracy rate is 74.5%, still below human performance at 92% but well above the 20% a year ago.
In OSWorld testing, “computer science students solve about 72% of these tasks with an average time of about two minutes,” while Anthropic’s Cloud Opus 4.5, which until recently was its most powerful model, reached 66.3%. This means “the best model is within 6 percentage points of human performance.”
WebArena shows the AI model’s accuracy is “now within 4 percentage points of the human baseline of 78.2%”.
Agent AI is getting better at online tasks like web browsing but still falls short of human-level accuracy.
stanford
While Cloud Opus and other LLMs are not perfect, they at least show rapid progress in reaching benchmark levels that come close to human-level performance.
This makes sense, because manipulating a web browser or searching something in a database should be one of the easier scenarios in which natural-language prompt APIs and external resources can be plugged into. In other words, AI must have most of the tools needed to interface with applications in a limited number of ways and complete tasks.
Also: 40 million people globally are using ChatGPT for health care – but is it safe?
Note that even with well-defined, limited tasks, it helps to check what you’re getting from the bot, as the average score on these benchmarks is still below human capability – and that’s in benchmark tests, a kind of simulated performance. In a real-world setting, your results may vary, not the reverse.
AI can’t handle hard things
When Stanford scholars searched for deeper types of functions, they found much less encouraging results.
The research found, he notes, that “models handle simple lookups well, but struggle when asked to find multiple pieces of matching information or apply conditions in a very long document – tasks that would be straightforward for a human scanning the same text.”
This finding matches my own anecdotal experience using ChatGPT to draft a business plan. The answers were fine in the first few rounds of prompting, but then got worse as the model got caught up in facts and figures that I had not specified, or that might have been relevant earlier in the process but did not involve any business in the current context.
I concluded that the lesson was that the longer your ChatGPT sessions go, the more errors will appear. This makes the experience infuriating.
Too: I Created a Business Plan with ChatGPT and It Turned Into a Cautionary Tale
The consequences of uncontrolled bot expansion can be more serious. an article last week In Nature The magazine explains how scientist Almira Osmanovic Thunström, a medical researcher at the University of Gothenburg, and her team invented a disease, “bicsonimania”, which they described as an eye condition resulting from excessive exposure to blue light from computer screens.
They wrote formal research papers on made-up situations, then published them online. Papers were picked up in bot-based searches. Most of the major language models, including Google’s Gemini, began to add to the Bixonimania situation in earnest in the chat, pointing to Thunstrom and team’s fake research papers.
The fact that bots will confidently assert the existence of fake bixonimania shows a lack of monitoring of the technology’s access to information. Without proper testing, you can’t know if a model will verify what it’s spitting out. As one scholar who was not involved in the research said, “We should evaluate (AI models) and have a pipeline for continuous evaluation.”
The consequences can be serious
A more serious version, where a user seems to have been tricked into believing in a bot, has recently been described. New York Times article By Teddy Rosenbluth about the case of an older man battling white blood cell cancer.
Instead of following her oncologist’s advice, patient Joe Riley relied on extensive interactions with chatbots, specifically Perplexity, to refute the doctor’s diagnosis. He stressed that his AI research had revealed what he called Richter transformation, a complication of cancer that would be made more unfavorable by the recommended treatment.
Also: Use Google AI Overview for Health Advice? Investigation shows it is ‘really dangerous’
Despite emails from experts questioning Richter’s content in a perplexity summary of the situation, Riley stuck to his belief in his AI-generated reports and resisted the pleas of his doctor and his family. He missed opening the window for proper treatment and by the time he calmed down and agreed to attempt treatment, it was too late.
Rosenbluth who makes the connection between the story of Riley and Adam Rhine case last yearWho committed suicide after extensive conversations with ChatGPT about his tendencies to end his life.
Riley’s son, Ben Riley, wrote his own account of his father’s journey with AI. While Young Riley doesn’t blame technology, he points out that being immersed in chat and losing perspective can have consequences.
“The fact that A.I. does exists in our world,” writes Riley, “and just as it can serve as fuel for those suffering from manic psychosis, so too it can reinforce or amplify our misconceptions about what is happening to us physically and medically.”
Staying healthy with incredible AI
The tendency to engage in lengthy discussions about depression, suicide, and serious health conditions is understandable. People have become used to long periods of engagement on social media for hours at a time. Some people are lonely, and a natural language conversation with a bot is better than no conversation at all.
Too: Your chatbot is playing a character – why Anthropic says this is dangerous
Research has shown that bots are prone to sycophancy, which can make hours-long interactions with a bot more gratifying than a typical transaction with a person.
And the companies making the technology put less emphasis on negative reports from individuals like Riley and Raine, warning users to verify bot output.
4 rules to avoid rabbit holes
A few rules can help mitigate the worst effects of placing too much emphasis on technology.
- Define what you’re going for with the chatbot. Is there a well-defined task that is limited in scope and for which the bot’s predictions can be fact-checked against other sources?
- Have healthy skepticism. It is well known that chatbots are prone to lying out of confidence. It doesn’t matter how many chatbots you use to balance the good and the bad; They should all be treated with healthy skepticism because they contain only a portion of the truth, if any.
- Don’t treat chatbots as friends or confidants. They are digital tools like Word or Excel. You’re not trying to form a relationship with a bot, but rather completing a task.
- Use proven digital overload skills. Take stretch breaks. Step away from the computer for non-digital human interaction, like playing a card game with a friend or going for a walk.
Also: Stop saying AI hallucinates – it doesn’t. And mischaracterization is dangerous
Falling down the rabbit hole is partly a result of being parked in front of a screen with no downtime.
