Vision for Intelligent Task Assistants

VITA

The 1st Workshop in Conjunction with CVPR 2026

Date and Time TBA

About VITA 2026

Advances in computer vision, multimodal learning, and AR/VR/XR technologies and smart glasses are converging toward Virtual Intelligent Task Assistants (VITAs)—systems that observe, interpret, and guide humans in complex real-world activities. This workshop bridges computer vision foundations and interactive AR/VR/XR research to enable long-term task understanding and assistance. Topics include learning from long streaming egocentric and exocentric videos, multimodal reasoning, task and step prediction, procedure planning and correction, human-AI collaboration and coaching, and new datasets and benchmarks. By fostering dialogue across disciplines, the workshop aims to define the core challenges and opportunities for building practical and generalizable VITAs.