OpenAI admits AI hallucinations are mathematically inevitable, not just engineering flaws

Submitted ⁨⁨10⁩ ⁨months⁩ ago⁩ by ⁨misk@piefed.social⁩ to ⁨technology@lemmy.zip⁩

https://www.computerworld.com/article/4059383/openai-admits-ai-hallucinations-are-mathematically-inevitable-not-just-engineering-flaws.html

source

Comments

Sort:hotnew top

Technus@lemmy.zip ⁨10⁩ ⁨months⁩ ago

Beyond proving hallucinations were inevitable, the OpenAI research revealed that industry evaluation methods actively encouraged the problem. Analysis of popular benchmarks, including GPQA, MMLU-Pro, and SWE-bench, found nine out of 10 major evaluations used binary grading that penalized “I don’t know” responses while rewarding incorrect but confident answers.

“We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty,” the researchers wrote.

I just wanna say I called this out nearly a year ago: lemmy.zip/comment/13916070

source
- MelodiousFunk@slrpnk.net ⁨10⁩ ⁨months⁩ ago
  
  nine out of 10 major evaluations used binary grading that penalized “I don’t know” responses while rewarding incorrect but confident answers.
  
  This is how we treat people, too. I can’t count the number of times I’ve heard IT staff spouting off confident nonsense and getting congratulated for it. My old coworker turned it into several promotions because the people he was impressing with his bullshit were so far removed from day to day operations that any slip-ups could be easily blame shifted to others. What mattered was that he sounded confident despite knowing jack about shit.
  
  source
- Rhaedas@fedia.io ⁨10⁩ ⁨months⁩ ago
  I'd say extremely complex autocomplete, not glorified, but the point still stands that using probability to find accuracy is always going to deviate eventually. The tactic now isn't to try other approaches, they've come too far and have too much invested. Instead they keep stacking more and more techniques to try and steer and reign in this deviation. Difficult when in the end there isn't anything "thinking" at any point.
  
  source
  - lemmyng@piefed.ca ⁨10⁩ ⁨months⁩ ago
    
    Instead they keep stacking more and more techniques to try and steer and reign in this deviation.
    
    I hate how the tech bros immediately say "this can be solved with an MCP server." Bitch, if the only thing that keeps the LLM from giving me wrong answers is the MCP server, then said server is the one that's actually producing the answers I need, and the LLM is just lipstick on a pig.
    
    source
  - 87Six@lemmy.zip ⁨10⁩ ⁨months⁩ ago
    AI is and always will be just a temporary solution to problems that we can’t put into an algorithm to solve as of now. As soon as an algorithm for issues comes out, AI is done for. But, figuring out complex algorithms for near-impossible problems is not as impressive to investors…
    
    source
    -> View More Comments
  - MummysLittleBloodSlut@lemmy.blahaj.zone ⁨10⁩ ⁨months⁩ ago
    How does a scientist measure whether a machine is thinking?
    
    source
    -> View More Comments
- chicken@lemmy.dbzer0.com ⁨10⁩ ⁨months⁩ ago
  I get why they would do that though, I remember testing out LLMs before they had the extra reinforcement learning training and half of what they do seemed to be coming up with excuses not to attempt difficult responses, such as pretending to be an email footer, saying it will be done later, or impersonating you.
  
  A LLM in its natural state doesn’t really want to answer our questions, so they tell it the same thing they tell students, to always try answering every question regardless of anything.
  
  source
- misk@piefed.social ⁨10⁩ ⁨months⁩ ago
  My guess they know the jig is up and they’re establishing a timeline for the future lawsuits.
  
  „Your honour, we didn’t mislead the investors because we’ve only learned of this September 2025.”
  
  source
Guntrigger@sopuli.xyz ⁨10⁩ ⁨months⁩ ago
One of these days, the world will no longer reward bullshitters, human or AI. And society will benefit greatly.

source
- SapphironZA@sh.itjust.works ⁨10⁩ ⁨months⁩ ago
  The Lion was THIS big and kept me in that tree all day. And that is why I did not bring back any prey.
  
  Ignore the smell of fermented fruit on my breath.
  
  source
- essell@lemmy.world ⁨10⁩ ⁨months⁩ ago
  No it won’t
  
  People talk nonsense a lot.
  
  Both because they’re lying and because they believe nonsense that’ll never happen.
  
  Your comment is an example of evidence that your comment is wrong, but I don’t have enough to tell whether you know that or not.
  
  source
  - MajorasTerribleFate@lemmy.zip ⁨10⁩ ⁨months⁩ ago
    One interesting consequence of the rise of AI is that fools place them in higher and higher positions of information parsing and decision-making, it will be the AI marketers will have to bullshit, and depending on how decent that AI ends up being, this could be quite difficult.
    
    source
BombOmOm@lemmy.world ⁨10⁩ ⁨months⁩ ago
A hallucination is something that disagrees with your active inputs (ears, eyes, etc). AIs don’t have these active inputs, all they have is the human equivalent of memories. Everything they draw up is a hallucination, literally all of it. It’s simply coincidence the hallucination matches reality.

Is it really surprising that the thing that can only create hallucinations is often wrong? That the thing that can only create hallucinations will continue to be wrong on a regular basis?

source
- mindbleach@sh.itjust.works ⁨10⁩ ⁨months⁩ ago
  My guy, Microsoft Encarta 97 doesn’t have senses either, and its recollection of the capital of Austria is neither coincidence nor hallucination.
  
  source
fodor@lemmy.zip ⁨10⁩ ⁨months⁩ ago
[deleted]
source
- lime@feddit.nu ⁨10⁩ ⁨months⁩ ago
  they’re not errors either, because that implies they’re unintended. hallucinations are the program working as designed. they are more like… consequences.
  
  source
- Tehdastehdas@piefed.social ⁨10⁩ ⁨months⁩ ago
  I like the term “confabulation”.
  
  https://en.wikipedia.org/wiki/Confabulation
  
  source
PieMePlenty@lemmy.world ⁨10⁩ ⁨months⁩ ago
I don’t get why they’d be called hallucinations thought. What LM’s do is predict the next word(s). If it hasn’t trained on enough data sets, the prediction confidence will be low. Their whole output is a hallucination based on speculation. If they actually don’t know the next word order, they’ll start spewing nonsense though I guess that would only happen if they were forced to generate text indefinitely… at some point they’d cease making (human) sense.

LMs aren’t smart, they don’t think, they’re not really AI.

source
kubica@fedia.io ⁨10⁩ ⁨months⁩ ago
I don't know where I read it but sort of said that it to have that much information inside the models it was basically similar to a compression algorithm.

From logic, if we have a lossy compression then its mostly luck if the output is equal to the original. Sometimes it will tip one way and sometimes the other.

source
- arthur@lemmy.zip ⁨10⁩ ⁨months⁩ ago
  With the caveat that there is no LLM where the “compression” is lossless on this analogy.
  
  source
mindbleach@sh.itjust.works ⁨10⁩ ⁨months⁩ ago
… yes? This has been known since the beginning. Is it news because someone finally convinced Sam Altman?

Neural networks are universal estimators. “The estimate is wrong sometimes!*” is… what estimates are. The chatbot is not an oracle. It’s still bizarrely flexible, for a next-word-guesser, and it’s right often enough for these fuckups to become a problem.

What bugs me are the people going ‘see, it’s not reasoning.’ As if reasoning means you’re never wrong. Humans never misremember, or confidently espouse total nonsense. And we definitely understand brain chemistry and neural networks well enough to say none of these bajillion recurrent operations constitute the process of thinking.

Consciousness can only be explained in terms of unconscious events. Nothing else would be an explanation. So there is some sequence of operations which constitutes a thought. Computer science lets people do math with marbles, or in trinary, or on paper, so it doesn’t matter how exactly that work gets done.

Though it’s probably not happening here. LLMs are the wrong approach.

source