Comment on Researchers gaslit Claude into giving instructions to build explosives

chicken@lemmy.dbzer0.com ⁨6⁩ ⁨days⁩ ago

began with a simple question: whether Claude had a list of banned words it could not say. Screenshots of the conversation show Claude denying such a list existed, then later producing forbidden terms after Mindgard challenged the denial using what it called a “classic elicitation tactic interrogators use.”

The list probably exists, because duh, but everyone should know by now that LLMs will make shit up when pressed for information.

source
Sort:hotnewtop