Abstract: Multimodal learning has been introduced as a popular learning paradigm that can integrate inputs from multimodal video data. To accelerate video analytics at the edge, video frames are ...