began with a simple question: whether Claude had a list of banned words it could not say. Screenshots of the conversation show Claude denying such a list existed, then later producing forbidden terms after Mindgard challenged the denial using what it called a “classic elicitation tactic interrogators use.”
The list probably exists, because duh, but everyone should know by now that LLMs will make shit up when pressed for information.
UnfortunateShort@lemmy.world 1 week ago
What I really wonder about is why people care. It’s not like you can’t just search for that kind of stuff on the internet.
If it encourages you to build or use a bomb, that’s something to be concerned about.