The newly proposed MME benchmark aims to comprehensively evaluate Multimodal Large Language Models (MLLMs) on a total of 14 subtasks, focusing on their perception and cognition abilities. It stands out by avoiding the data leakage issue commonly facing public datasets and by using manually designed instruction-answer pairs.
The benchmark’s features include:
MME is a significant step towards understanding and optimizing MLLMs. It sets the stage for future developments that could fine-tune these models for even more complex and varied tasks. Examine the benchmark study.