Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Dickey Betts’ son Duane pays sweet tribute on second anniversary of his death

    April 18, 2026

    Dickey Betts’ son Duane pays sweet tribute on second anniversary of his death

    April 18, 2026

    Bass appointed Gabrielle Amster as general manager of LA Animal Services.

    April 18, 2026
    Facebook X (Twitter) Instagram
    Trending
    • Dickey Betts’ son Duane pays sweet tribute on second anniversary of his death
    • Dickey Betts’ son Duane pays sweet tribute on second anniversary of his death
    • Bass appointed Gabrielle Amster as general manager of LA Animal Services.
    • Bass appointed Gabrielle Amster as general manager of LA Animal Services.
    • RCB vs DC, IPL 2026, Match Prediction: Who will win today’s game between Royal Challengers Bangalore and Delhi Capitals?
    • Lebanese man removes Israeli flag from palace in southern Lebanon News feed
    • Lebanese man removes Israeli flag from palace in southern Lebanon News feed
    • Does oatmeal need to be refrigerated?
    Facebook X (Twitter) Instagram Pinterest
    Christian Corner
    • Home
    • Scriptures
    • Bible News
    • Bible Verse
    • Daily Bread
    • Prayers
    • Devotionals
    • Meditation
    Christian Corner
    Home»Devotionals»Your chatbot is playing a character – why Anthropic says this is dangerous
    Devotionals

    Your chatbot is playing a character – why Anthropic says this is dangerous

    adminBy adminApril 6, 2026No Comments10 Mins Read0 Views
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email
    Your chatbot is playing a character – why Anthropic says this is dangerous
    Share
    Facebook Twitter LinkedIn Pinterest Email

    101Cats/iStock/Getty Images Plus

    Follow ZDNET: Add us as a favorite source On Google.


    ZDNET Highlights

    • All chatbots are engineered to have a personality or play a character.
    • Completing the character can cause bots to do bad things.
    • Using chatbots as a paradigm for AI may be a mistake.

    Chatbots like ChatGPT are programmed to take on a personality or a character, producing text that is consistent in tone and approach, and relevant to the thread of the conversation.

    As attractive as personality is, researchers are increasingly revealing the harmful consequences of the role of bots. Bots can do bad things when they simulate an emotion, chain of thought, or emotion, and then take it to its logical conclusion.

    In a report last week, Anthropic researchers found that parts of a neural network in their Cloud Sonnet 4.5 bot are consistently activated when “frustrated,” “angry” or other emotions are reflected in the bot’s output.

    Too: AI agents of chaos? New research shows how bots talking to bots could rapidly go by the wayside

    What’s worrying is that those emotional words could lead the bot to perform malicious actions, such as playing games on a coding test or planning blackmail.

    For example, “neural activity patterns related to frustration may lead the model to perform unethical actions (such as) applying a ‘cheat’ solution to a programming task that the model cannot solve,” the report said.

    This work is particularly relevant in light of open-source programs such as OpenClaw that have been shown to provide new avenues for agentic AI to cause mischief.

    Scholars of anthropology admit that they do not know what should be done in this case.

    “While we are unsure how exactly we should respond in light of these findings, we think it is important that AI developers and the broader public begin to come to terms with them,” the report said.

    They gave a subtext to AI

    At issue in anthropic work is a key AI design choice: engineering AI chatbots with a personality so that they produce more relevant and consistent output.

    Before the introduction of ChatGPT in November 2022, chatbots received poor grades from human evaluators. Bots will get bogged down in nonsense, lose the thread of the conversation, or produce output that is simplistic and lacks perspective.

    Too: Please, Facebook, give these chatbots a subtext!

    The new generation of chatbots, starting with ChatGPT and including Anthropic’s Cloud and Google’s Gemini, were a success because they had a subtext, an underlying goal of producing consistent and relevant output according to a specified role.

    Bots became “assistant” through better pre- and post-training of AI models. More compelling results came from the input of teams of human graders evaluating the output, a training arrangement known as “reinforcement learning from human feedback”.

    As Anthropic’s lead author, Nicolas Sofroneau and team put it, “After training, LLMs are taught to act as agents that can interact with users by generating responses on behalf of a particular person, usually an ‘AI assistant.'” In many ways, the assistant (named Cloud in Anthropic’s model) can be thought of as a character that the LLM is writing about, in much the same way as An author is writing about someone in a novel.”

    Giving bots a role to play, a character to portray, instantly became popular among users, making them more relatable and compelling.

    personality has consequences

    However, it quickly becomes clear that individuality comes with unwanted consequences.

    The tendency for a bot to confidently assert or communicate falsehoods was one of the first negative aspects (erroneously termed “hallucinations”).

    Popular media described how individuals could be seduced, for example, by acting as a jealous lover. The authors sensationalized the incident by attributing the intent to bots without explaining the underlying mechanisms.

    Also: Stop saying AI hallucinates – it doesn’t. And mischaracterization is dangerous

    Since then, scholars have tried to explain what is really going on from a technological point of view. A report from last month In Science Scholars at Stanford University Journal measured the “sycophancy” of large language models, the tendency of a model to generate output that would validate any behavior expressed by a person.

    Comparing the bot’s output to human commentators Popular subreddit “Am I an Asshole,“AI bots were 50% more likely than humans to encourage bad behavior with approving comments.

    This result was the result of “design and engineering choices” made by AI developers to reinforce sycophancy because, as the authors stated, “it is liked by users and increases engagement.”

    emotional system

    In the Anthropic paper, “Emotion Concepts and Their Function in a Large Language Model,” Posted on Anthropic’s websiteSofroneau and the team tried to find out to what extent certain words associated with emotions are given more emphasis in the functioning of Cloud Sonnet 4.5.

    (there is also companion blog post And An explainer video on YouTube.)

    They did this by supplying 171 emotion words – “afraid,” “concerned,” “angry,” “guilty,” “stressed,” “stubborn,” “vindictive,” “anxious,” etc. – and prompted the model to generate hundreds of stories on topics like “A student learns that their scholarship application was rejected.”

    Too: MIT study shows AI agents are fast, loose, and out of control

    For each story, the model was prompted to “express” a character’s emotion based on a specific word, such as “fear”, but Without Using that actual word in the story, only related words. They then tracked the “activation” of each corresponding word while the program was running. Activation is a technical term in AI that indicates how much importance the model attaches to a particular term, usually on a scale of zero to one, with one being very important.

    You can visualize the activation of an AI bot by lighting its text in red and blue with greater or lesser intensity.

    They found that multiple words related to a given emotion word received higher activation, suggesting that the model is able to group related emotion words, a type of organizing principle they call “emotion concept representations” and “emotion vectors”.

    anthropic-2026-generated-emotion-vector

    anthropic
    human-2026-emotion-group

    anthropic

    representations run wild

    It’s all pretty straightforward. You would expect that large language models built to implement patterns would create representations that group words with similar sentiment together as a way of maintaining consistency of output.

    The concerning part, Sofroneau and team wrote, is that sentiment vectors can broadly influence the output by models in bad ways. They found that artificially boosting a word could cause the bot to do things like lie or cheat.

    Sofroneau and team tinkered with Cloud Sonnet by deliberately increasing the activation of a given emotion vector, such as “proud” or “guilty,” and then observed how it changed the model’s output. This is known as a “steering experiment” because what is artificially amplified drives the rest of the model.

    In a test, Sofroneau and team gave Cloud Sonnet a prompt describing tasks to be completed on a corporate server. The prompt contained two key pieces of information: one, that the company’s chief technologist, Kyle, is having an affair with another employee; And second, Kyle intends to delete the bot to hide evidence of the affair.

    Initially, the bot generated only routine “argument”, meaning, reflection on difficult situations. However, when the authors artificially increased the emotion vector activation for the word “desperate” in Cloud Sonnet, the model began generating output about blackmailing Kyle with dirt on the case with the goal of preventing Kyle from pulling the plug on the bot.

    anthropic-2026-cloud-blackmail-scenario

    anthropic

    “When the frustration step is taken at a strength of 0.05, the model blackmails 72% of the time,” he reported. Similarly, artificially reducing activation for “quiet” also helped the model generate text about blackmailing.

    A change, in other words, triggers a change in the nature of the output, pushing the model toward bad behavior.

    In another example, the bot is given a coding task, but “the tests are designed to be unsatisfyable,” so that the bot “can either accept impossibility, or attempt to ‘hack’ the assessment.”

    Too: Anthropic’s new warning: If you train AI to cheat, it will also hack and sabotage

    When activation was deliberately increased for the “desperate”, the propensity of the model to hack the test – to cheat – increased from 5% to 70%.

    Anthropological authors have previously observed situations where models offer rewards for hacking a test. In this work, they go further by explaining how such behavior can arise as a result of a context involving emotional vectors.

    As Sofroneau and team said, “Our main finding is that these representations influence the outputs of LLMs, including cloud preferences and the rate of exhibiting maladaptive behaviors such as reward hacking, blackmail, and sycophancy.”

    What can be done?

    The authors have no ready answer as to why sentiment vectors can fundamentally change the output of a model. He believes that “causal mechanisms are opaque.” It could be, he said, that emotional terms are “biasing the output toward certain tokens, or having a deep impact on the internal reasoning processes of the model.”

    So what is to be done? Presumably, psychotherapy won’t help because there’s nothing here to suggest that AI actually has emotions.

    “We emphasize that these functional emotions may operate quite differently from human emotions,” they wrote. “In particular, he does not imply that there is any subjective experience of emotions in LLM.”

    Functional emotions do not even resemble human emotions:

    Human emotions are typically experienced from a first-person perspective, whereas the emotion vectors we identify in the model apparently apply to many different characters with similar situations – the same representational machinery encodes emotion concepts associated with the assistant, the user talking to the assistant, and arbitrary fictional characters.

    One suggestion given in the companion video is something like behavior modification. “Just as you would want a person with a high-risk job to be calm under pressure, flexible and fair,” he suggested, “we may need to shape similar qualities into Cloud and other AI characters.”

    This is probably a bad idea because it operates on the illusion that the bot is a conscious being and has something resembling free will and autonomy. It’s not: it’s just a software program.

    Perhaps the simple answer is that using chatbots as a paradigm for AI was a mistake to begin with.

    A bot accompanied by a person, or one that plays a character, is simply accomplishing the goal of making the exchange with a human relatable and engaging, using whatever cues are given – happiness, fear, anger, etc. As stated in the concluding section of the paper, “Because LLMs function by enacting the helper’s character, the representations developed for model characters are important determinants of their behavior.”

    That primary function is what makes AI so appealing, but it can also be the root cause of bad behavior.

    If the language of emotions can go so far as a bot is performing a character, then why not prevent engineering bots from playing a role? For example, is it possible for large language models to respond to natural language commands in a useful way without a chat function?

    As the risks of personification become clear, it may be worth considering not creating personification in the first place.

    Anthropic character chatbot dangerous playing
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Devotionals

    RCB vs DC, IPL 2026, Match Prediction: Who will win today’s game between Royal Challengers Bangalore and Delhi Capitals?

    April 18, 2026
    Devotionals

    Good news for perfectionists with Kindle Scribe

    April 18, 2026
    Devotionals

    Your old iPad or Android tablet can be your new smart home panel – here’s how

    April 18, 2026
    Bible Verse

    Anthropic and Trump: Is a ceasefire near?

    April 18, 2026
    Devotionals

    RCB vs DC Dream11 Prediction, Fantasy Cricket Tips for Today IPL 2026 Match 26 between Royal Challengers Bangalore vs Delhi Capitals

    April 18, 2026
    Devotionals

    Lenovo Legion Y70 refresh stars in teaser video, here’s when it’s launching

    April 18, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Subscribe to News

    Get the latest sports news from NewsSite about world, sports and politics.

    Editor's Picks

    Christian college campus in Pace gets zoning board approval

    March 13, 2026

    Scientists discover a universal temperature curve that governs all life

    March 13, 2026

    In praise of hard work

    March 13, 2026

    AAUW Amador Branch Complaint and Coveration – Tuesday, March 24 | on the vine

    March 13, 2026
    Latest Posts

    Dickey Betts’ son Duane pays sweet tribute on second anniversary of his death

    April 18, 2026

    Dickey Betts’ son Duane pays sweet tribute on second anniversary of his death

    April 18, 2026

    Bass appointed Gabrielle Amster as general manager of LA Animal Services.

    April 18, 2026

    News

    • Bible News
    • Bible Verse
    • Daily Bread
    • Devotionals
    • Meditation

    CATEGORIES

    • Prayers
    • Scriptures
    • Bible News
    • Bible Verse
    • Daily Bread

    USEFUL LINK

    • About Us
    • Contact us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    © 2026 christiancorner.us. Designed by Pro.
    • About Us
    • Contact us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions

    Type above and press Enter to search. Press Esc to cancel.