Hidden inside the system instructions for OpenAI’s Codex coding tool was an instruction that sounds like a fantasy moderation policy: Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures.
After Wired reported on the unusual language, OpenAI published a blog post this week detailing how a quirk in its reinforcement training turned mythological creature metaphors into a persistent habit across multiple model generations.
The problem began with GPT-5.1 and a discontinued feature called “nerdy” personality is one of several alternative interaction styles that users can apply to the model.
OpenAI observed that when running in Nerdy mode, the model began accessing the Goblin and Gremlin metaphors with unusual frequency.
This is because reinforcement learning was rewarding strange contexts when they occurred in the output of the nerdy personality type. The neural network learned that using metaphors about creatures was a good stylistic choice, at least in that particular case.
Actions that yield rewards in one situation can spread to other situations, especially when results obtained in the original situation are used to train a new system.
And that’s exactly what happened. References to Goblins began to appear in other modes, including future versions of the model, after exposure to training data with this specification. In March, OpenAI discontinued the Nerdy personality type, which reduced the occurrences of creature references but did not stop them entirely.
