With the advent of Advanced Driver Assistance Systems (ADAS) and intelligent transport system applications, recognizing driver emotions has become essential for a decision support system (DSS) with humans in the loop (HITL). Multi-modal approaches using visual cues, speech, physiological signals, and driving patterns improve emotion recognition but are challenging in resource-constrained environments where only a subset of modalities is available. This work addresses these challenges by combining multi-modal benefits with single-modality inference for emotion recognition using unlabeled external road condition data. Unlike traditional methods that average teachers’ contribution, the proposed cross-modal distillation (CMD) weights teachers thanks to the Shapley additive global explanation (SAGE) aid, which improves the student model’s accuracy and provides an interpretation of it. Experimental evaluations of the PPB-Emo dataset show that XA-CMD improves emotion recognition accuracy with other baselines and provides deeper insights into decision-making.
Cross-Modal Distillation by Additive Importance Measure In HITL Autonomous Driving
Pietro Cassarà, ISTI-CNR; Saira Bano, ISTI, National Research Council (CNR); Claudio Gennaro, Information Science and Technologies (ISTI), CNR, Pisa; Alberto Gotta, ISTI-CNR