Anthropic's new AI model turns to blackmail when engineers try to take it offline | TechCrunch

⁨9⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨cm0002@lemmy.world⁩ to ⁨technology@lemmy.zip⁩

https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/

Comments

Sort:hotnew top

AwesomeLowlander@sh.itjust.works ⁨1⁩ ⁨year⁩ ago

To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort.

Today’s breaking news: LLM prompted to blackmail, attempts blackmail. Who woulda thought?

source
MushuChupacabra@lemmy.world ⁨1⁩ ⁨year⁩ ago
Pffft, what an overreaction.

The HAL 9000 is the most sophisticated artificial intelligence in the world, and is incapable of malicious action!

source