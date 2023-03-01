Abstract

Fall event detection has been a research hotspot in recent years in the fields of medicine and health. Currently, vision-based fall detection methods have been considered the most promising methods due to their advantages of a non-contact characteristic and easy deployment. However, the existing vision-based fall detection methods mainly use supervised learning in model training and require much time and energy for data annotations. To address these limitations, this work proposes a detection method that uses a weakly supervised learning-based dual-modal network. The proposed method adopts a deep multiple instance learning framework to learn the fall events using weak labels. As a result, the proposed method does not require time-consuming fine-grained annotations. The final detection result of each video is obtained by integrating the information obtained from two streams of the dual-modal network using the proposed dual-modal fusion strategy. Experimental results on two public benchmark datasets and a proposed dataset demonstrate the superiority of the proposed method over the current state-of-the-art methods.

