SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Zhenhua T, Zhenche X, Pengfei W, Danke W, Li L. Multimed. Tools Appl. 2023; ePub(ePub): ePub.

Copyright

(Copyright © 2023, Holtzbrinck Springer Nature Publishing Group)

DOI

10.1007/s11042-023-16269-x

PMID

unavailable

Abstract

Spatiotemporal modeling is key for action recognition in videos. In this paper, we propose a Spatial features Compression and Temporal features Fusion (SCTF) block, including a Local Spatial features Compression (LSC) module and a Full Temporal features Fusion (FTF) module, we call the network equipped with SCTF block SCTF-NET, which is a human action recognition network more suitable for violent video detection. The spatial extraction and temporal fusions in previous works are typically achieved by stacking large numbers of convolution layers or adding some complex recurrent neural layers. In contrast, the SCTF module extracts the spatial information of video frames by LSC module, and the temporal sequence information of continuous frames is fused by FTF module, which can effectively conduct spatiotemporal modeling. Finally, our approach achieves good performance on action recognition benchmarks such as HMDB51 and UCF101. Meanwhile, it is more efficient in training and detection. What's more, experiments on violence datasets Hockey Fights, Movie Fight and Violent Flow show that, our proposed SCTF block is more suitable for violent action recognition. Our code is available at https://github.com/TAN-OpenLab/SCTF-Net.


Language: en

Keywords

3DCNN; Action recognition; Spatiotemporal fusion; Violence detection

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print