Advances in computer vision, multimodal learning, and AR/VR/XR technologies and smart glasses are converging toward Virtual Intelligent Task Assistants (VITAs)—systems that observe, interpret, and guide humans in complex real-world activities. This workshop bridges computer vision foundations and interactive AR/VR/XR research to enable long-term task understanding and assistance. Topics include learning from long streaming egocentric and exocentric videos, multimodal reasoning, task and step prediction, procedure planning and correction, human-AI collaboration and coaching, and new datasets and benchmarks. By fostering dialogue across disciplines, the workshop aims to define the core challenges and opportunities for building practical and generalizable VITAs.
June 3, 2026
1:00 PM
Room 108
Colorado Convention Center
Denver, CO
[in Denver local time · MDT / UTC−6]