Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

Researchers Jailbreak AI by Flooding It With Bullshit Jargon

⁨81⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨day⁩ ago⁩ by ⁨cm0002@lemmy.cafe⁩ to ⁨technology@lemmy.zip⁩

https://www.404media.co/researchers-jailbreak-ai-by-flooding-it-with-bullshit-jargon/

source

Comments

Sort:hotnewtop
  • alexalbedo@lemmy.zip ⁨1⁩ ⁨day⁩ ago

    I too have been known to wax obtusely verbose so that I may perchance sway - by obfuscation, tantalization, or even frustration - the hearts and minds of those individuals with whom I may at some point in time desire to make egress into their personal chambers to examine forthwith the contents contained therein by their consent or otherwise with the sole intention of the removal of some small item of greater or lesser value for the enrichment of my own person.

    source
    • Almacca@aussie.zone ⁨1⁩ ⁨day⁩ ago

      That’s easy for you to say.

      source
    • SPRUNT@lemmy.world ⁨23⁩ ⁨hours⁩ ago

      Do you write legal contracts for a living?

      source
  • sp3ctr4l@lemmy.dbzer0.com ⁨1⁩ ⁨day⁩ ago

    Oh so it works by corpospeak rules, who could have possibly guessed?

    It is extremely funny to watch two corpospeakers get into a buzzword fight as a dominance dispute/display.

    source
  • TheReturnOfPEB@reddthat.com ⁨1⁩ ⁨day⁩ ago

    hmmmm …

    mediamatters.org/…/misinformer-year-steve-bannons…

    source
    • pelespirit@sh.itjust.works ⁨1⁩ ⁨day⁩ ago

      I’m curious to what you’re trying to say. It could be taken a few different ways.

      • Yes, that’s a technique that Bannon uses and it works too well. The researchers are breaking AI like Bannon broke democracy.
      • That this is just like Bannon’s method and they’re using it to spread misinformation.

      I think you’re saying the first one, yeah?

      source
      • TheReturnOfPEB@reddthat.com ⁨22⁩ ⁨hours⁩ ago

        Sorry I didn’t see your reply.

        I find it interesting that the way to break the human created AI is the same thing that breaks us.

        source
  • iAvicenna@lemmy.world ⁨1⁩ ⁨day⁩ ago

    I wonder if they tried this on DeepSeek with Tiananmen square queries

    source
    • SheeEttin@lemmy.zip ⁨1⁩ ⁨day⁩ ago

      No, those filters are performed by a separate system on the output text after it’s been generated.

      source
      • iAvicenna@lemmy.world ⁨1⁩ ⁨day⁩ ago

        makes sense though I wonder if you can also tweak the initial prompt so that the output is also full of jargon so that output filter also misses the context

        source
        • -> View More Comments
  • NewNewAugustEast@lemmy.zip ⁨1⁩ ⁨day⁩ ago

    Yawn. So work with models without guardrail constraints? I am not sure what the point is here.

    Seems like it might be just as easy to read the book they referenced in the prompt and go from there instead of working so hard to break a commercially offered AI guardrails.

    source
  • bhamlin@lemmy.world ⁨1⁩ ⁨day⁩ ago

    So you’re saying that all the time I spent trying to ask my parents for the same thing in different ways is finally going to pay off?

    source