Definitions
AI Hallucination = When an AI makes confident but false statements.
Why?
- It stems back to the training methodology.
- Training is binary - it’s a good response or a bad response.
- “Bad” responses tend to get better feedback than “I don’t know.” A “wrong confident guess” if often favored over admitting that the LLM does not know the answer (”I don’t know” nets 0 points).
Key factors that increase hallucination risk
- singleton facts (things mentioned only once in training)
- models are unable to capture certain patterns
- distribution shift
- errors in training data (”garbage in, garbage out”)
Recommendations
- Modify existing benchmarks and evaluations so that expressing uncertainty (or abstaining) is not heavily penalized. For example, include “IDK” or confidence thresholds.
- Make confidence targets explicit in evaluation instructions; e.g. include rules that penalize wrong answers more than uncertain ones, so that the model has clear incentive to abstain when it's not confident.
- Encourage “behavioral calibration”: model behavior (what it outputs) should reflect its actual confidence. That is, when the model is uncertain, it should abstain or indicate uncertainty rather than bluff.