According to Anthropic, fictional depictions of AI can have a real impact on AI models.
Last year, the company said that Claude Opus 4 would be frequent during pre-release tests involving a fictitious company. try to blackmail the engineers so as not to be replaced by another system. Then anthropic published research suggested that models from other companies had similar problems with “agent incompatibility”.
Anthropic seems to have done more work around this behavior Writing in X“We believe the original source of the behavior was internet text that depicted AI as evil and self-preservation.”
The company provided more detailed information blog post Claude Haiku stated that since 4.5, Anthropic models have “never engaged in blackmailing (during testing) that previous models sometimes blackmailed up to 96%”.
What explains the difference? The company said that training on “documents about Claude’s constitution and fictional stories about the behavior of artificial intelligences” improves adaptation admirably.
In that regard, Anthropic said it has found that training is more effective when it includes “principles that underlie appropriate behavior” rather than just “single displays of adaptive behavior.”
“Doing both together seems to be the most effective strategy,” the company said.
Techcrunch event
San Francisco, CA
|
October 13-15, 2026





