Hacker News Clone

Making o1, o3, and Sonnet 3.7 hallucinate for everyone

by hahahacorn on 3/1/2025, 6:24:22 PM with 219 comments

by andix on 3/1/2025, 7:43:29 PM
I've got a lot of hallucinations like that from LLMs. I really don't get how so many people can get LLMs to code most of their tasks without those issues permanently popping up.
by dominicq on 3/1/2025, 7:14:04 PM
ChatGPT used to assure me that you can use JS dot notation to access elements in a Python dict. It also invented Redocly CLI flags that don't exist. Claude sometimes invents OpenAPI specification rules. Any time I ask anything remotely niche, LLMs are often bad.
by latexr on 3/1/2025, 8:00:39 PM
> Conclusion
> LLMs are really smart most of the time.
No, the conclusion is they’re never “smart”. All they do is regurgitate text which resembles a continuation of what came before, and sometimes—but with zero guarantees—that text aligns with reality.
by simonw on 3/2/2025, 6:26:34 AM
Every time this topic comes up I post a similar comment about how hallucinations in code really don't matter because they reveal themselves the second you try to run that code.
I've just written up a longer form of that comment: "Hallucinations in code are the least dangerous form of LLM mistakes" - https://simonwillison.net/2025/Mar/2/hallucinations-in-code/
by Chance-Device on 3/1/2025, 7:11:02 PM
It’s not really hallucinating though, is it? It’s repeating a pattern in its training data, which is wrong but is presented in that training data (and by the author of this piece, but unintentionally) as being the solution to the problem. So this has more in common with an attack than a hallucination on the LLM’s part.
by adamgordonbell on 3/1/2025, 7:53:06 PM
We at pulumi started treating some hallucinations like this as feature requests.
Sometimes an llm will hallucination a flag, or option that really makes sense - it just doesn't actually exist.
by joelthelion on 3/1/2025, 9:21:13 PM
Hallucinations like this could be a great way to identify missing features or confusing parts of your framework. If the llm invents it, maybe it ought to be like this?
by mberning on 3/1/2025, 7:34:41 PM
In my experience LLMs do this kind of thing with enough frequency that I don’t consider them as my primary research tool. I can’t afford to be sent down rabbit holes which are barely discernible from reality.
by IAmNotACellist on 3/1/2025, 8:23:57 PM
"Not acceptable. Please upgrade your browser to continue." No, I don't think I will.
by aranw on 3/1/2025, 8:33:13 PM
I wonder how easy it would be to influence super LLMs if a particular group of people created enough articles that were clear to any human reader that it's a load of garbage and rubbish and should ignore it but if a LLM was to parse it wouldn't realise and then ruin it's reasoning and code generation abilities?
by Narretz on 3/1/2025, 7:11:54 PM
This is interesting. If the models had enough actual code as training data, that forum post code should have very little weight, shouldn't it? Why do the LLMs prefer it?
by lxe on 3/1/2025, 10:37:26 PM
This is incredible, and it's not technically a "hallucination". I bet it's relatively easy to find more examples like this... something on the internet that's both niche enough, popular enough, and wrong, yet was scraped and trained on.
by leumon on 3/1/2025, 8:32:23 PM
He should've tested 4.5. This model is hallucinating much less than any other model.
by Baggie on 3/1/2025, 9:53:48 PM
The conclusion paragraph was really funny and kinda perfectly encapsulates the current state of AI, but as pointed out by another comment, we can't even call them smart, just "Ctrl C Ctrl V Leeroy Jenkins style"
by jwjohnson314 on 3/1/2025, 10:18:36 PM
The interesting thing here to me is that the llm isn’t ‘hallucinating’, it’s simply regurgitating some data it digested during training.
by zeroq on 3/2/2025, 2:31:41 AM
This is exactly what I mean when I say tell me your bad without saying so. Most people here disagree with that.
A while back a friend of mine told me he's very found of llms because he's confused with kubernetes cli and instead of looking up the answer on the internet he can simply state his desire in a chat to get the right answer.
Well... Sure, but if you'd look the answer on stackoverflow you'd see the whole thread including comments and you'd had the opportunity to understand what the command actually does.
It's quite easy to create a catastrophic event in kubernetes if you don't know what you're doing.
If you blindly trust llms in such scenarios sooner or later you'll find yourself in a lot of trouble.
by saurik on 3/1/2025, 10:41:08 PM
What I honestly find most interesting about this is the thought that hallucinations might lead to the kind of emergent language design we see in natural language (which might not be a good thing for a computer language, fwiw, but still interesting), where people just kind of thing "language should work this way and if I say it like this people will probably understand me".
by sirolimus on 3/1/2025, 6:25:40 PM
o3-mini or o3-mini-high?
by egberts1 on 3/2/2025, 5:45:09 PM
Write me a Mastercard/Visa fraud detection code in Ada, please.
by forum-soon-yuck on 3/2/2025, 10:04:19 AM
Good luck staking the future on AI