Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

One long sentence is all it takes to make LLMs misbehave

⁨51⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨day⁩ ago⁩ by ⁨misk@piefed.social⁩ to ⁨technology@lemmy.zip⁩

https://www.theregister.com/2025/08/26/breaking_llms_for_fun/

source

Comments

Sort:hotnewtop
  • BaroqueInMind@piefed.social ⁨1⁩ ⁨day⁩ ago

    Censored LLMs are complete trash. The self-hosted uncensored general models perform better and often don't refuse to answer based on a corporate executives idea of what is not modest.

    Fuck you, if you believe we should listen to a fucking dipshit billionaire in how we are allowed to use an LLM. If I want it to write instructions on how to build a suitcase nuke that also sucks your dick, it should be allowed.

    When we talk to heavily sheltered people, we pity them for their ignorance and inexperience. The same goes for corporate censored outputs from AI LLMs.

    source
    • Kowowow@lemmy.ca ⁨1⁩ ⁨day⁩ ago

      It would be the same for uploading yourself too could you imagine a real company allowing ai that have all the nasty small things normal people have, like the personal reasons that would drive you to blow up a building or even just take a swing at someone, and even if they messed with nothing how long and how many shareholders would it take to change a copy that in theory could last forever

      source
    • possiblylinux127@lemmy.zip ⁨18⁩ ⁨hours⁩ ago

      “AI safety”

      source
  • TropicalDingdong@lemmy.world ⁨1⁩ ⁨day⁩ ago

    Wild. I’ll misbehave with even a short sentence.

    source
    • Photonic@lemmy.world ⁨1⁩ ⁨day⁩ ago

      I usually receive my sentence only after I misbehave

      source
  • twice_hatch@midwest.social ⁨1⁩ ⁨day⁩ ago

    Oh that explains why one of those local LLMs requires a big incantation to convince it to do sex stuff

    source
  • Evotech@lemmy.world ⁨17⁩ ⁨hours⁩ ago

    This refers spesifically to local models like llama 70b

    Not that cloud models don’t have this issue, ut they very much have defence in depth for this type of attack

    source