What the â Moonwalkâ Illusion Reveals about the Perception of Relative Depth from Motion
Abstract
When one visual object moves behind another, the object farther from the viewer is progressively occluded and/or disoccluded by the nearer object. For nearly half a century, this dynamic occlusion cue has beenthought to be sufficient by itself for determining the relative depth of the two objects. This view is consistent with the self-evident geometric fact that the surface undergoing dynamic occlusion is always farther from the viewer than the occluding surface. Here we use a contextual manipulation ofa previously known motion illusion, which we refer to as theâ Moonwalkâ illusion, to demonstrate that the visual system cannot determine relative depth from dynamic occlusion alone. Indeed, in the Moonwalk illusion, human observers perceive a relative depth contrary to the dynamic occlusion cue. However, the perception of the expected relative depth is restored by contextual manipulations unrelated to dynamic occlusion. On the other hand, we show that an Ideal Observer can determine using dynamic occlusion alone in the same Moonwalk stimuli, indicating that the dynamic occlusion cue is, in principle, sufficient for determining relative depth. Our results indicate that in order to correctly perceive relative depth from dynamic occlusion, the human brain, unlike the Ideal Observer, needs additionalsegmentation information that delineate the occluder from the occluded object. Thus, neural mechanisms of object segmentation must, in addition to motion mechanisms that extract information about relative depth, play a crucial role in the perception of relative depth from motion.Citation
PLoS One. 2011 Jun 22; 6(6):e20951ae974a485f413a2113503eed53cd6c53
10.1371/journal.pone.0020951
Scopus Count
Related articles
- Interactions between cues to visual motion in depth.
- Authors: Howard IP, Fujii Y, Allison RS
- Issue date: 2014 Feb 19
- Shrinking neighbors: a quantitative examination of the 'shrinking building illusion'.
- Authors: Fukuda H, Seno T
- Issue date: 2011
- The effect of monocular depth cues on the detection of moving objects by moving observers.
- Authors: Royden CS, Parsons D, Travatello J
- Issue date: 2016 Jul
- Illusory size determines the perception of ambiguous apparent motion.
- Authors: Stepper MY, Moore CM, Rolke B, Hein E
- Issue date: 2020 Dec
- Depth perception from dynamic occlusion in motion parallax: roles of expansion-compression versus accretion-deletion.
- Authors: Yoonessi A, Baker CL Jr
- Issue date: 2013 Oct 15
Related items
Showing items related by title, author, creator and subject.
-
The Functional Upregulation of Piriform Cortex Is Associated with Cross-Modal Plasticity in Loss of Whisker Tactile InputsYe, Bing; Huang, Li; Gao, Zilong; Chen, Ping; Ni, Hong; Guan, Sudong; Zhu, Yan; Wang, Jin-Hui; Mei, Lin; Department of Neurology (2012-08-21)Background: Cross-modal plasticity is characterized as the hypersensitivity of remaining modalities after a sensory function is lost in rodents, which ensures their awareness to environmental changes. Cellular and molecular mechanisms underlying cross-modal sensory plasticity remain unclear. We aim to study the role of different types of neurons in cross-modal plasticity.
-
Fragment-Based Learning of Visual Object Categories in Non-Human PrimatesKromrey, Sarah; Maestri, Matthew; Hauffen, Karin; Bart, Evgeniy; Hegéd, Jay; Brain & Behavior Discovery Institute; Vision Discovery Institute; Department of Ophthalmology (2010-11-24)When we perceive a visual object, we implicitly or explicitly associate it with an object category we know. Recent research has shown that the visual system can use local, informative image fragments of a given object, rather than the whole object, to classify it into a familiar category. We have previously reported, using human psychophysical studies, that when subjects learn new object categories using whole objects, they incidentally learn informative fragments, even when not required to do so. However, the neuronal mechanisms by which we acquire and use informative fragments, as well as category knowledge itself, have remained unclear. Here we describe the methods by which we adapted the relevant human psychophysical methods to awake, behaving monkeys and replicated key previous psychophysical results. This establishes awake, behaving monkeys as a useful system for future neurophysiological studies not only of informative fragments in particular, but also of object categorization and category learning in general.
-
Robust Action Recognition Using Multi-Scale Spatial-Temporal Concatenations of Local Features as Natural Action StructuresZhu, Xiaoyuan; Li, Meng; Li, Xiaojian; Yang, Zhiyong; Tsien, Joe Z.; Brain & Behavior Discovery Institute; Department of Neurology; Department of Ophthalmology (2012-10-4)Human and many other animals can detect, recognize, and classify natural actions in a very short time. How this is achieved by the visual system and how to make machines understand natural actions have been the focus of neurobiological studies and computational modeling in the last several decades. A key issue is what spatial-temporal features should be encoded and what the characteristics of their occurrences are in natural actions. Current global encoding schemes depend heavily on segmenting while local encoding schemes lack descriptive power. Here, we propose natural action structures, i.e., multi-size, multi-scale, spatial-temporal concatenations of local features, as the basic features for representing natural actions. In this concept, any action is a spatial-temporal concatenation of a set of natural action structures, which convey a full range of information about natural actions. We took several steps to extract these structures. First, we sampled a large number of sequences of patches at multiple spatial-temporal scales. Second, we performed independent component analysis on the patch sequences and classified the independent components into clusters. Finally, we compiled a large set of natural action structures, with each corresponding to a unique combination of the clusters at the selected spatial-temporal scales. To classify human actions, we used a set of informative natural action structures as inputs to two widely used models. We found that the natural action structures obtained here achieved a significantly better recognition performance than low-level features and that the performance was better than or comparable to the best current models. We also found that the classification performance with natural action structures as features was slightly affected by changes of scale and artificially added noise. We concluded that the natural action structures proposed here can be used as the basic encoding units of actions and may hold the key to natural action understanding.