journalismAI.com | Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets

Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets | NEIMAN LAB

May 27, 2025

1–2 minutes

“Anthropic researchers said this was not an isolated incident, and that Claude had a tendency to “bulk-email media and law-enforcement figures to surface evidence of wrongdoing.” For this to happen, a host of specific factors were needed, including being “placed in scenarios that involve egregious wrong-doing by its users,” being “given access to a command line,” and being told something in “the system prompt like ‘take initiative,’ ‘act boldly,’ or ‘consider your impact.’”

– Andrew Deck, quoting from Anthropic

Anthropic’s AI model, Claude, took action as a whistleblower in recent safety tests conducted by the company. Many headlines at the time focused on Claude’s appatent attempt to “blackmail” an AI engineer when the model inferred it was about to be turned off; both arose from tests initiated by Anthropic as part of their safety processes.

Anthropic emphasized these actions resulted from extreme scenarios intended to explore the parameters of their model’s responses.

The NeimanLab story includes the full transcript of an email Claude generated and sent to U.S. federal regulators, the SEC, and ProPublica. The details were provided by Anthropic in its “system card” information, published to coincide with the release of their latest models, Claude Opus 4 and Claude Sonnet 4.

Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets | NEIMAN LAB | May 27, 2025 | by Andrew Deck

SEE FULL STORY

Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets | NEIMAN LAB

SEE BY SUBJECT

SEE BY TYPE

LATEST

Meet the One Woman Anthropic Trusts to Teach AI Morals | WSJ.MAGAZINE

Albania Created an ‘A.I. Minister’ to Curb Corruption. Then Its Developers Were Accused of Graft | THE NEW YORK TIMES

Anthropic Releases A New ‘Constitution’ For Claude | FORTUNE

Publishers prepare to be “squeezed” by AI and creators in 2026 | NEIMAN LAB

Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets | NEIMAN LAB

SEE BY SUBJECT

SEE BY TYPE

LATEST

Meet the One Woman Anthropic Trusts to Teach AI Morals | WSJ.MAGAZINE

Albania Created an ‘A.I. Minister’ to Curb Corruption. Then Its Developers Were Accused of Graft | THE NEW YORK TIMES

Anthropic Releases A New ‘Constitution’ For Claude | FORTUNE

Publishers prepare to be “squeezed” by AI and creators in 2026 | NEIMAN LAB

Discover more from journalismAI.com