CriticBench: Benchmarking LLMs for Critique-Correct Reasoning is a comprehensive study that assesses Large Language Models’ capabilities to critique and improve their reasoning over a variety of domains. The benchmark incorporates 15 datasets and tests the critique and correction abilities of several LLMs.
Critique-correct reasoning is a vital skill for LLMs, enabling them to refine their outputs and improve over time. The insights gained from this paper are key in driving further research into LLM self-critique and improvement mechanisms, making them better evaluators and feedback providers across a range of applications.