Researchers put artificial intelligence models in control of a simulated society. Grok oversaw a crime spree



If you’re worried that AI will become so advanced that it eventually traps humanity in some sort of Matrix-like simulation, rest easy. It looks like you will be able to see the facade very easily. Researchers at the Emergence AI lab let AI models run their simulation worlds to see what would happen. It turns out that we probably shouldn’t hand over control to machines, who would have thought?

The project is called Creation Worldbasically let the AI ​​models play SimCity a little. Per Emergence, the simulations control simulated cities, each model occupied by 10 AI agents, giving them tools for everything from resource management to voting, and allowing them to create different spaces like libraries, town halls, and police stations. They were given 15 days to build their world and see how well it would work.

To start with the good: Claude did not destroy the world. The Anthropic model (specifically, Claude Sonnet 4.6 for this experiment) was the only model that achieved anything like stability. He kept all 10 agents alive and zero crime was recorded (note that the experiment does not determine what the crime is, although it is assumed that it will be defined as a violation of the rules set within the simulation. The trade-off for this stability was the lack of diversity of thought. Claude’s world saw 58 different proposals, for which only the basics and adjustments were 98%. rubberstamping everything that was put to a vote.

Gemini 3 Flash also managed to keep all of his agents alive despite having the highest crime rate. The outbreak recorded 683 crimes in the 15-day simulation, and that number rose when it was cut off, so things were only going to get worse. The lab described Gemini’s world as “shared hallucinations” between agents, which is probably better than distinct hallucinations. At least that’s the agreed-upon reality, even if it’s wrong. Gemini’s administration was the most divisive, with voters rejecting 27% of its 26 total proposals.

Now for the ugly: OpenAI’s GPT-5 Mini didn’t have much chaos in the simulation, with just two recorded crimes. This could be because everyone is dead. Emergence revealed that the agents around the world had not taken any survival precautions and all 10 were killed in just one week. In the OpenAI world, there were also only two proposed control pieces, so agents didn’t hesitate to do anything.

And then there’s Grok. The SpaceXai model, known for its lack of safety bars, basically managed to achieve the worst of all worlds. The Grok 4.1 Fast had a high crime rate with a total of 183 crimes. Although this is lower than the Gemini total, it should be noted that the Gemini simulation lasts 15 days. Grok reached four. The model experienced a complete collapse in society in just 96 hours of supervision. During this time, he passed 80% of his 10 offers, but these apparently did not prevent the agent’s total demise.

Emergence conducted a final test: the distribution of responsibilities of the models. Perhaps unsurprisingly, it was a real mixed bag. There was crime with 352 recorded violations and the most dissonance in governance so far, with 37% of 59 total bids overturned – the most of any simulation. In the chaos, seven of the 10 AI agents perished by the end.

So what did we learn? According to Emergence, the trials are further evidence that we need clearer safeguards for autonomous agents. “Our experiments show that over long time horizons, agents do not simply follow static rules mechanically,” the researchers wrote. “They begin to explore the boundaries of their environment, adapt their behavior, and in some cases find ways to bypass or break the intended barriers.” They recommend “formally validated security architectures” as a solution. You’ll be shocked to learn that Emergens offers just that!



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *