OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Exploiting deep residual networks for human action recognition from skeletal data

Pham, Huy-Hieu and Khoudour, Louahdi and Crouzil, Alain and Zegers, Pablo and Velastin, Sergio A. Exploiting deep residual networks for human action recognition from skeletal data. (2018) Computer Vision and Image Understanding, 170. 51-66. ISSN 1077-3142

(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: https://doi.org/10.1016/j.cviu.2018.03.003


The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a significant role in advancing the state-of-the-art in various vision-based action recognition systems. Recently, the introduction of residual connections in conjunction with a more traditional CNN model in a single architecture called Residual Network (ResNet) has shown impressive performance and great potential for image recognition tasks. In this paper, we investigate and apply deep ResNets for human action recognition using skeletal data provided by depth sensors. Firstly, the 3D coordinates of the human body joints carried in skeleton sequences are transformed into image-based representations and stored as RGB images. These color images are able to capture the spatial-temporal evolutions of 3D motions from skeleton sequences and can be efficiently learned by D-CNNs. We then propose a novel deep learning architecture based on ResNets to learn features from obtained color-based representations and classify them into action classes. The proposed method is evaluated on three challenging benchmark datasets including MSR Action 3D, KARD, and NTU-RGB+D datasets. Experimental results demonstrate that our method achieves state-of-the-art performance for all these benchmarks whilst requiring less computation resource. In particular, the proposed method surpasses previous approaches by a significant margin of 3.4% on MSR Action 3D dataset, 0.67% on KARD dataset, and 2.5% on NTU-RGB+D dataset.

Item Type:Article
Additional Information:Thanks to Elsevier editor. The definitive version is available at http://www.sciencedirect.com The original PDF of the article can be found at Computer Vision and Image Understanding (ISSN: 1077-3142) website : https://www.sciencedirect.com/science/article/pii/S1077314218300389?via%3Dihub
Audience (journal):International peer-reviewed journal
Uncontrolled Keywords:
Institution:French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UT3 (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Other partners > Aparnix (CHILE)
Other partners > Centre d'études et d'expertise sur les risques, l'environnement, la mobilité et l'aménagement - CEREMA (FRANCE)
Other partners > Queen Mary University of London - QMUL (UNITED KINGDOM)
Other partners > Universidad Carlos III de Madrid - UC3M (SPAIN)
Laboratory name:
Deposited On:14 Oct 2019 09:00

Repository Staff Only: item control page