Bin Liang, Lihong Zheng
IEEE Transactions on Image Processing
This paper presents a novel approach to action recognition using synthetic multi-view data from depth maps. Specifically, multiple views are first generated by rotating 3D point clouds from depth maps. A pyramid multi-view depth motion template is then adopted for multi-view action representation, characterizing the multi-scale motion and shape patterns in 3D. Empirically, despite the view-specific information, the latent information between multiple views often provides important cues for action recognition. Concentrating on this observation and motivated by the success of the dictionary learning framework, this paper proposes to explicitly learn a view-specific dictionary (called specificity) for each view, and simultaneously learn a latent dictionary (called latent correlation) across multiple views. Thus, a novel method, specificity and latent correlation learning, is put forward to learn the specificity that captures the most discriminative features of each view, and learn the latent correlation that contributes the inherent 3D information to multiple views. In this way, a compact and discriminative dictionary is constructed by specificity and latent correlation for feature representation of actions. The proposed method is evaluated on the MSR Action3D, the MSR Gesture3D, the MSR Action Pairs, and the ChaLearn multi-modal data sets, consistently achieving promising results compared with the state-of-the-art methods based on depth data.