SCTF: an efficient neural network based on local spatial compression and full temporal fusion for video violence detection

Zhenhua, Tan; Zhenche, Xia; Pengfei, Wang; Danke, Wu; Li, Li

doi:10.1007/s11042-023-16269-x

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

SCTF: an efficient neural network based on local spatial compression and full temporal fusion for video violence detection
Citation	Zhenhua T, Zhenche X, Pengfei W, Danke W, Li L. Multimed. Tools Appl. 2023; ePub(ePub): ePub.
Copyright	(Copyright © 2023, Holtzbrinck Springer Nature Publishing Group)
DOI	10.1007/s11042-023-16269-x
PMID	unavailable
Abstract	Spatiotemporal modeling is key for action recognition in videos. In this paper, we propose a Spatial features Compression and Temporal features Fusion (SCTF) block, including a Local Spatial features Compression (LSC) module and a Full Temporal features Fusion (FTF) module, we call the network equipped with SCTF block SCTF-NET, which is a human action recognition network more suitable for violent video detection. The spatial extraction and temporal fusions in previous works are typically achieved by stacking large numbers of convolution layers or adding some complex recurrent neural layers. In contrast, the SCTF module extracts the spatial information of video frames by LSC module, and the temporal sequence information of continuous frames is fused by FTF module, which can effectively conduct spatiotemporal modeling. Finally, our approach achieves good performance on action recognition benchmarks such as HMDB51 and UCF101. Meanwhile, it is more efficient in training and detection. What's more, experiments on violence datasets Hockey Fights, Movie Fight and Violent Flow show that, our proposed SCTF block is more suitable for violent action recognition. Our code is available at https://github.com/TAN-OpenLab/SCTF-Net. Language: en
Keywords	3DCNN; Action recognition; Spatiotemporal fusion; Violence detection