Large Language Model Reasoning Failures (arXiv:2602.06176)

Abstract (from arXiv)

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios. To systematically understand and address these shortcomings, the paper presents a comprehensive survey dedicated to reasoning failures in LLMs.

It introduces a categorization framework that splits reasoning into embodied and non-embodied types, with non-embodied further divided into informal (intuitive) and formal (logical) reasoning. It also classifies failures into: (1) fundamental failures intrinsic to LLM architectures, (2) application-specific limitations in particular domains, and (3) robustness issues where small input variations cause inconsistent behavior.

For each failure type, the survey provides definitions, summarizes prior studies, discusses root causes, and outlines mitigation strategies. The authors also release a companion GitHub collection: https://github.com/Peiyang-Song/Awesome-LLM-Reasoning-Failures

Quick takeaway

This is a taxonomy-and-synthesis paper: useful as a map of where LLM reasoning breaks, why it breaks, and what mitigation directions currently exist.