VideoTree: A Tree-based Video Representation for LLM Reasoning

"The AI Daily Digest"

Video Understanding

Long Videos

Large Language Models

LLM Reasoning

Video Representation

Hierarchical Framework

VideoTree: A Tree-based Video Representation for LLM Reasoning

VideoTree addresses the challenges in long-video understanding by dynamically extracting query-related information and building a tree-based representation for LLM reasoning. By selecting frames adaptively and organizing them into a hierarchical structure, VideoTree significantly improves accuracy and reduces inference time on various benchmarks.

Personalized AI news from scientific papers.