Unravel the mysteries of LLM reasoning with this survey that covers methodologies for evaluating reasoning behavior that transcends mere task accuracy. Key contributions of the survey include:
This survey offers a critical look at the current state of LLM reasoning, urging the AI community to engage in more comprehensive evaluations. It’s a call to action for researchers to develop more intricate methods to decipher the actual reasoning competencies of AI models.