Evaluation Points | Findings |
---|---|
General Intelligence | Questioned by Abstract Patterns |
Model Performance | Struggles with Simple Patterns |
Diagnostic Analysis | Weak Visual Perception, Reasoning |
Large multimodal models extend the capabilities of large language models by integrating multimodal understanding. However, their general intelligence abilities are questioned. PuzzleVQA presents abstract patterns to evaluate these models.
This paper sheds light on the limitations and the potential improvement of large multimodal models. Read more.