Recent advancements in Multi-modal Large Language Models (MLLMs) have shown impressive performance in visual tasks. However, their effectiveness in visual math problem-solving has been underexplored. The new benchmark MathVerse aims to address this gap by providing 2,612 high-quality math problems with diagrams, covering multiple subjects. These problems are transformed into six versions with varying multi-modality information, totaling 15K unique test samples.
This study underscores the importance of thorough evaluation frameworks to advance our comprehension of MLLM capabilities. By dissecting the reasoning process, MathVerse not only tests MLLM’s problem-solving skills but also paves the way for models that can ‘see’ and ‘understand’ visual elements more profoundly.