arXiv (Cornell University)
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
April 2024 • Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov
Large language models (LLMs) are prone to hallucinations, which sparked a widespread effort to detect and prevent them. Recent work attempts to mitigate hallucinations by intervening in the model's generation, typically computing representative vectors of hallucinations vs. grounded generations, for steering the model's hidden states away from a hallucinatory state. However, common studies employ different setups and do not properly separate different possible causes of hallucinations, making interventions misguid…