VITA 2026 — Workshop on Vision for Intelligent Task Assistants

CVPR 2026 · Workshop

Vision for Intelligent
Task Assistants

VITA 2026 · June 3, 1:00 PM · Denver, CO

About VITA 2026

About VITA 2026

Advances in computer vision, multimodal learning, and AR/XR technologies and smart glasses are converging toward Virtual Intelligent Task Assistants (VITAs)—systems that observe, interpret, and guide humans in complex real-world activities. This workshop discusses the most recent research to enable long-term task understanding and assistance. Topics include learning from long streaming videos, egocentric and exocentric video understanding, vision-language models, multimodal reasoning, task/step/procedure prediction, planning and correction, human-AI collaboration and coaching, and new datasets and benchmarks. By fostering dialogue across disciplines, the workshop aims to define the core challenges and opportunities for building practical and generalizable VITAs.

When

June 3, 2026
1:00 PM

Where

Room 108
Colorado Convention Center
Denver, CO

Speakers

Kristen Grauman

Kristen Grauman

University of Texas at Austin

Ivan Laptev

Ivan Laptev

MBZUAI

Marc Pollefeys

Marc Pollefeys

ETH Zurich

Juan Carlos Niebles

Juan Carlos Niebles

Salesforce AI Research

Antonino Furnari

Antonino Furnari

University of Catania

Gedas Bertasius

Gedas Bertasius

University of North Carolina, Chapel Hill

Steven Feiner

Steven Feiner

Columbia University

David Hayden

David Hayden

META

Evgeniy Oleinik

Evgeniy Oleinik

META

Organizers

Mohsen Moghaddam

Mohsen Moghaddam

Georgia Institute of Technology

🔗

Angela Yao

Angela Yao

National University of Singapore

🔗

Jason Corso

Jason Corso

University of Michigan

🔗

Ehsan Elhamifar

Ehsan Elhamifar

Northeastern University

🔗

Schedule

[in Denver local time · MDT / UTC−6]

1:20 – 1:30

Opening Talk

Ehsan Elhamifar

1:30 – 2:00

Invited Talk 1

David Hayden & Evgeniy Oleinik

Aria as a Remote-Expert Data Collection Platform for Developing Egocentric AI Task Assistants

2:00 – 2:30

Invited Talk 2

Kristen Grauman

Skill++: Learning to Assess and Improve Physical Skills from Video

2:30 – 3:00

Invited Talk 3

TBA

3:00 – 3:30

Invited Talk 4

Gedas Bertasius

From Perception to Agency: The Cognitive Stack for Video Task Assistants

3:30 – 4:00

Break

4:00 – 4:30

Invited Talk 5

TBA

4:30 – 5:00

Invited Talk 6

Juan Carlos Niebles

TBA

5:00 – 5:30

Invited Talk 7

Antonino Furnari

Towards Always‑On Wearable AI That Perceives, Understands, and Assists

5:30 – 6:00

Invited Talk 8

TBA

6:00 – 6:10

Concluding Remarks