VideoTree addresses the challenges in long-video understanding by dynamically extracting query-related information and building a tree-based representation for LLM reasoning. By selecting frames adaptively and organizing them into a hierarchical structure, VideoTree significantly improves accuracy and reduces inference time on various benchmarks.