Robust Action Recognition Using Multi-Scale Spatial-Temporal Concatenations of Local Features as Natural Action Structures

Hdl Handle:
http://hdl.handle.net/10675.2/829
Title:
Robust Action Recognition Using Multi-Scale Spatial-Temporal Concatenations of Local Features as Natural Action Structures
Authors:
Zhu, Xiaoyuan; Li, Meng; Li, Xiaojian; Yang, Zhiyong; Tsien, Joe Z.
Abstract:
Human and many other animals can detect, recognize, and classify natural actions in a very short time. How this is achieved by the visual system and how to make machines understand natural actions have been the focus of neurobiological studies and computational modeling in the last several decades. A key issue is what spatial-temporal features should be encoded and what the characteristics of their occurrences are in natural actions. Current global encoding schemes depend heavily on segmenting while local encoding schemes lack descriptive power. Here, we propose natural action structures, i.e., multi-size, multi-scale, spatial-temporal concatenations of local features, as the basic features for representing natural actions. In this concept, any action is a spatial-temporal concatenation of a set of natural action structures, which convey a full range of information about natural actions. We took several steps to extract these structures. First, we sampled a large number of sequences of patches at multiple spatial-temporal scales. Second, we performed independent component analysis on the patch sequences and classified the independent components into clusters. Finally, we compiled a large set of natural action structures, with each corresponding to a unique combination of the clusters at the selected spatial-temporal scales. To classify human actions, we used a set of informative natural action structures as inputs to two widely used models. We found that the natural action structures obtained here achieved a significantly better recognition performance than low-level features and that the performance was better than or comparable to the best current models. We also found that the classification performance with natural action structures as features was slightly affected by changes of scale and artificially added noise. We concluded that the natural action structures proposed here can be used as the basic encoding units of actions and may hold the key to natural action understanding.
Citation:
PLoS One. 2012 Oct 4; 7(10):e46686
Issue Date:
4-Oct-2012
URI:
http://hdl.handle.net/10675.2/829
DOI:
10.1371/journal.pone.0046686
PubMed ID:
23056403
PubMed Central ID:
PMC3464264
Type:
Article
ISSN:
1932-6203
Appears in Collections:
Brain & Behavior Discovery Institute: Faculty Research and Publications

Full metadata record

DC FieldValue Language
dc.contributor.authorZhu, Xiaoyuanen_US
dc.contributor.authorLi, Mengen_US
dc.contributor.authorLi, Xiaojianen_US
dc.contributor.authorYang, Zhiyongen_US
dc.contributor.authorTsien, Joe Z.en_US
dc.date.accessioned2012-10-26T20:35:12Z-
dc.date.available2012-10-26T20:35:12Z-
dc.date.issued2012-10-4en_US
dc.identifier.citationPLoS One. 2012 Oct 4; 7(10):e46686en_US
dc.identifier.issn1932-6203en_US
dc.identifier.pmid23056403en_US
dc.identifier.doi10.1371/journal.pone.0046686en_US
dc.identifier.urihttp://hdl.handle.net/10675.2/829-
dc.description.abstractHuman and many other animals can detect, recognize, and classify natural actions in a very short time. How this is achieved by the visual system and how to make machines understand natural actions have been the focus of neurobiological studies and computational modeling in the last several decades. A key issue is what spatial-temporal features should be encoded and what the characteristics of their occurrences are in natural actions. Current global encoding schemes depend heavily on segmenting while local encoding schemes lack descriptive power. Here, we propose natural action structures, i.e., multi-size, multi-scale, spatial-temporal concatenations of local features, as the basic features for representing natural actions. In this concept, any action is a spatial-temporal concatenation of a set of natural action structures, which convey a full range of information about natural actions. We took several steps to extract these structures. First, we sampled a large number of sequences of patches at multiple spatial-temporal scales. Second, we performed independent component analysis on the patch sequences and classified the independent components into clusters. Finally, we compiled a large set of natural action structures, with each corresponding to a unique combination of the clusters at the selected spatial-temporal scales. To classify human actions, we used a set of informative natural action structures as inputs to two widely used models. We found that the natural action structures obtained here achieved a significantly better recognition performance than low-level features and that the performance was better than or comparable to the best current models. We also found that the classification performance with natural action structures as features was slightly affected by changes of scale and artificially added noise. We concluded that the natural action structures proposed here can be used as the basic encoding units of actions and may hold the key to natural action understanding.en_US
dc.subjectResearch Articleen_US
dc.subjectBiologyen_US
dc.subjectComputational Biologyen_US
dc.subjectNeuroscienceen_US
dc.subjectComputational Neuroscienceen_US
dc.subjectCircuit Modelsen_US
dc.subjectCoding Mechanismsen_US
dc.subjectSensory Systemsen_US
dc.subjectVisual Systemen_US
dc.subjectNeural Networksen_US
dc.subjectComputer Scienceen_US
dc.subjectComputer Modelingen_US
dc.subjectComputing Methodsen_US
dc.subjectComputer Inferencingen_US
dc.subjectEngineeringen_US
dc.subjectHuman Factors Engineeringen_US
dc.subjectMan Computer Interfaceen_US
dc.subjectSignal Processingen_US
dc.subjectVideo Processingen_US
dc.subjectMathematicsen_US
dc.subjectProbability Theoryen_US
dc.subjectBayes Theoremen_US
dc.subjectProbability Distributionen_US
dc.subjectStatisticsen_US
dc.subjectStatistical Methodsen_US
dc.titleRobust Action Recognition Using Multi-Scale Spatial-Temporal Concatenations of Local Features as Natural Action Structuresen_US
dc.typeArticleen_US
dc.identifier.pmcidPMC3464264en_US
dc.contributor.corporatenameBrain & Behavior Discovery Institute-
dc.contributor.corporatenameDepartment of Neurology-
dc.contributor.corporatenameDepartment of Ophthalmology-

Related articles on PubMed

All Items in Scholarly Commons are protected by copyright, with all rights reserved, unless otherwise indicated.