What's confusing: the model clearly "knows" its cutoff date when asked directly, and can express uncertainty in other contexts. Yet it chooses to hallucinate instead of admitting ignorance.
Is this a fundamental architecture limitation, or just a training objective problem? Generating a coherent fake explanation seems more expensive than "I don't have that information."
Why haven't labs prioritized fixing this? Adding web search mostly solves it, which suggests it's not architecturally impossible to know when to defer.
Has anyone seen research or experiments that improve this behavior? Curious if this is a known hard problem or more about deployment priorities.
LLMs don't "choose" to do anything. They inference weights. Text is an extremely limiting medium, and doesn't afford LLMs the distinction between fiction and reality.
Next time, they will use 1 point for each correct answer and -.1 for each incorrect one, and 0 for "I don't know" and the model will behave. (And perhaps add some intermediate value for "I guess that [something]".)
We do that in the university. If the exam has 0 points for bad answers, I encourage my students to answer all of them.