Comment

Comment on Researchers Jailbreak AI by Flooding It With Bullshit Jargon

SheeEttin@lemmy.zip ⁨1⁩ ⁨year⁩ ago

No, those filters are performed by a separate system on the output text after it’s been generated.

Sort:hotnew top

iAvicenna@lemmy.world ⁨1⁩ ⁨year⁩ ago
makes sense though I wonder if you can also tweak the initial prompt so that the output is also full of jargon so that output filter also misses the context

source
- SheeEttin@lemmy.zip ⁨1⁩ ⁨year⁩ ago
  Yes. I tried it, and it only filtered English and Chinese. If I told it to use Spanish, it didn’t get killed.
  
  source