Wild. I’ll misbehave with even a short sentence.
One long sentence is all it takes to make LLMs misbehave
Submitted 1 day ago by misk@piefed.social to technology@lemmy.zip
https://www.theregister.com/2025/08/26/breaking_llms_for_fun/
Comments
TropicalDingdong@lemmy.world 1 day ago
Photonic@lemmy.world 1 day ago
I usually receive my sentence only after I misbehave
twice_hatch@midwest.social 1 day ago
Oh that explains why one of those local LLMs requires a big incantation to convince it to do sex stuff
Evotech@lemmy.world 17 hours ago
This refers spesifically to local models like llama 70b
Not that cloud models don’t have this issue, ut they very much have defence in depth for this type of attack
BaroqueInMind@piefed.social 1 day ago
Censored LLMs are complete trash. The self-hosted uncensored general models perform better and often don't refuse to answer based on a corporate executives idea of what is not modest.
Fuck you, if you believe we should listen to a fucking dipshit billionaire in how we are allowed to use an LLM. If I want it to write instructions on how to build a suitcase nuke that also sucks your dick, it should be allowed.
When we talk to heavily sheltered people, we pity them for their ignorance and inexperience. The same goes for corporate censored outputs from AI LLMs.
Kowowow@lemmy.ca 1 day ago
It would be the same for uploading yourself too could you imagine a real company allowing ai that have all the nasty small things normal people have, like the personal reasons that would drive you to blow up a building or even just take a swing at someone, and even if they messed with nothing how long and how many shareholders would it take to change a copy that in theory could last forever
possiblylinux127@lemmy.zip 18 hours ago
“AI safety”