Motivation
Assessing how engaged a child is during play with a partner is central to autism screening, developmental research, and therapy, but the gold-standard coding schemes used for this assessment require trained raters to score hours of video by hand, which is the bottleneck preventing these tools from being used at scale.
Research
A computer-vision system that learns to predict engagement directly from raw play video, designed to work with the small annotated datasets that clinical and developmental projects typically have access to, by reusing strong pretrained vision and gaze backbones, encoding the dyad with explicit social and behavioral priors, and training a single compact model to handle several engagement-related readings together.
Outcome
A pipeline that produces clinically meaningful engagement signals automatically from short video clips of two people interacting, demonstrated on a new corpus of children playing together, with the goal of supporting researchers and clinicians who need scalable engagement assessment without large-scale manual annotation.