One long sentence is all it takes to make LLMs misbehave

⁨51⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨misk@piefed.social⁩ to ⁨technology@lemmy.zip⁩

https://www.theregister.com/2025/08/26/breaking_llms_for_fun/

source

Comments

Sort:hotnew top

BaroqueInMind@piefed.social ⁨2⁩ ⁨months⁩ ago
Censored LLMs are complete trash. The self-hosted uncensored general models perform better and often don't refuse to answer based on a corporate executives idea of what is not modest.

Fuck you, if you believe we should listen to a fucking dipshit billionaire in how we are allowed to use an LLM. If I want it to write instructions on how to build a suitcase nuke that also sucks your dick, it should be allowed.

When we talk to heavily sheltered people, we pity them for their ignorance and inexperience. The same goes for corporate censored outputs from AI LLMs.

source
- Kowowow@lemmy.ca ⁨2⁩ ⁨months⁩ ago
  It would be the same for uploading yourself too could you imagine a real company allowing ai that have all the nasty small things normal people have, like the personal reasons that would drive you to blow up a building or even just take a swing at someone, and even if they messed with nothing how long and how many shareholders would it take to change a copy that in theory could last forever
  
  source
- possiblylinux127@lemmy.zip ⁨2⁩ ⁨months⁩ ago
  “AI safety”
  
  source
TropicalDingdong@lemmy.world ⁨2⁩ ⁨months⁩ ago
Wild. I’ll misbehave with even a short sentence.

source
- Photonic@lemmy.world ⁨2⁩ ⁨months⁩ ago
  I usually receive my sentence only after I misbehave
  
  source
twice_hatch@midwest.social ⁨2⁩ ⁨months⁩ ago
Oh that explains why one of those local LLMs requires a big incantation to convince it to do sex stuff

source
Evotech@lemmy.world ⁨2⁩ ⁨months⁩ ago
This refers spesifically to local models like llama 70b

Not that cloud models don’t have this issue, ut they very much have defence in depth for this type of attack

source