Comment on Researchers gaslit Claude into giving instructions to build explosives
chicken@lemmy.dbzer0.com 6 days ago
began with a simple question: whether Claude had a list of banned words it could not say. Screenshots of the conversation show Claude denying such a list existed, then later producing forbidden terms after Mindgard challenged the denial using what it called a “classic elicitation tactic interrogators use.”
The list probably exists, because duh, but everyone should know by now that LLMs will make shit up when pressed for information.