Developed by a team including Xinyu Zhan, Lixin Yang, and others, the OAKINK2 dataset provides a multi-level abstracted view to understand bimanual object manipulation tasks. It offers a detailed breakdown of complex daily activities into three levels: Affordance, Primitive Task, and Complex Task, each abstracting the functionality and interactive elements in object manipulation.
Aiming to facilitate the advancement of applications such as interaction reconstruction and motion synthesis, OAKINK2 supplies multi-camera image streams and precise human body, hand, and object pose annotations. Furthermore, the dataset supports the creation of a task-oriented Complex Task Completion (CTC) framework, which employs Large Language Models to transition complex task objectives into sequences of Primitive Tasks, with a subsequent Motion Fulfillment Model to generate coordinated bimanual hand motion.
OAKINK2 represents a significant stride in the creation and understanding of structured physical interactions, essential for enhancing the synergy between AI agents and human activities. The comprehensiveness of the data and its application framework makes it an invaluable resource for research into AI-based object manipulation. It can notably expand AI’s potential in areas such as robotics and assistive technologies.