Reconciling Direct Realism?

March 7, 2008

Sometimes I sit in class and think about the nature of perception and reality. That sounds cliche, but I often find myself wondering whether I am really perceiving the professor as they give a lecture. What am I looking at? Am I merely perceiving representations, or ideas, in my head, or am I really looking at the external world? How can I reconcile the fact that visual information from the environment must be filtered through my nervous system before it is perceived with the sensation that I am directly looking at the world. On one hand, the representational theory of perception makes sense because it seems like there is always going to be this “gap” between my perception and reality, mediated through my sensory organs. On the other hand, it makes evolutionary sense that animals would develop a direct perceptual system in order to save cognitive resources. “Perception is cheap, representation is expensive.”

So what am I looking at when I perceive the world? Ideas in my head or real objects? James Gibson proposed a solution that he thought solved these dualistic paradoxes when he came up with the concept of the ambient optic array. Light is bouncing all around the environment, reflecting information about surfaces and textures, eventually settling into invariant “visual angles”. It is the information in this ambient optic array that we perceive. We don’t perceive the world. We don’t perceive representations in our head, projected onto a Cartesian theater. We directly pickup information from the invariant visual angles of light in the ambient optic array.

This is a mind/body/world system. It embedded and embodied. It is confusing to talk about sense-data stimulating the retina, and the brain “perceiving” this data, as if it was projected onto our cortex and the mind just mysteriously “reads” the data. This leads to conceptual muddles such as mind/body dualism and the representational theory of perception. Gibson thought it made more sense to talk about a ecologically embedded perceptual system picking up information directly from the environment. The distinction between this information pickup and the representational theory of perception is subtle. The difference lies in the fact that with the representational theory there is this impossible divide between between “internal” world of the mind and the “external” physical world. Somehow information crosses this metaphysical gap. Gibson thought it was much more parsimonious and evolutionarily sound to talk about perception in terms of direct pickup by a holistic agent in the environment. The information in the ambient optic array is structurally isomorphic to the firings of the nervous system, which is embedded in a whole body, capable of moving about in the world. By utilizing this ecological approach to perception, Gibson was able to drop the conceptual muddle of a “mind” perceiving ideas driven by the sense organs, but rather, a Self perceiving the environment through invariant structures in the light reflected in the environment. This is why the phenomenology of perception always puts the environment “out there”, in the world, as opposed to “inside” the internal chambers of the mind.

add to del.icio.us :: Add to Blinkslist :: add to furl :: Digg it :: add to ma.gnolia :: Stumble It! :: add to simpy :: seed the vine :: :: :: TailRank


Pragmatic and Epistemic Action

February 20, 2008

In a 1994 paper entitled “On Distinguishing Epistemic From Pragmatic Action” published in Cognitive Science, David Kirsh and Paul Maglio make an fascinating distinction between actions that change the world(pragmatic) and actions that change the nature of our mental tasks(epistemic). That sounds interesting you say, but how did the researchers go about showing such a distinction? By playing Tetris! Or rather, watching other people play Tetris.

I am sure almost all of you are familiar with the game Tetris so I won’t bother going into too much detail describing how one plays it. Basically, various geometric shapes called “zoids” fall one at a time and you have to arrange them in a row. One is allowed to rotate the zoids to best fit them into the virtual environment. The key idea behind using Tetris as their methodological domain was that Tetris is requires real-time, split-second interactive cognitive and perceptual performance. This allowed the researchers to tease out how people offload cognitive computation onto the external world in order to ease up the difficulty of the mental task at hand. This sort of external manipulation is called epistemic action and as I mentioned above, is distinguished from an action that merely seeks to change the nature of the world. Epistemic actions improve cognition by doing the following:

  • Reducing the memory involved in mental computation, that is, space
    complexity;
  • Reducing the number of steps involved in mental computation, that is,
    time complexity;
  • Reducing the probability of error of mental computation, that is,
    unreliability.
  • Kirsh and Maglio found that advanced Tetris players perform a variety of epistemic actions to reduce their internal computational effort. In contrast to less-advanced players who rotate the zoids in their head, advanced players would physically rotate the zoids. This seemingly simple action changes the way the mind handles the computational task of rotating the zoids in the game and thus allows the player to manipulate the virtual world with more reliability and speed.

    Such data suggests that standard theoretical frameworks in cognitive science might not be enough to explain the full extant to which humans utilize the external environment in ways that alter their mental landscape to improve cognitive performance. Instead of breaking up the world into a dualism of physical space and information-processing space, it might be more theoretically useful to have a more unified and fluid space where both pragmatic and epistemic actions can take place. This approach gives more credence to the idea that we are fundamentally in the world, embedded and embodied, with a perceptual and cognitive repertoire that doesn’t make hard and fast distinctions between the inner and outer realms.

    Reference:

    Kirsh, D., & Maglio, P (1994) On Distinguishing Epistemic from Pragmatic Action. Cognitive Science: A Multidisciplinary Journal, Vol. 18, No. 4: pages 513-549

    add to del.icio.us :: Add to Blinkslist :: add to furl :: Digg it :: add to ma.gnolia :: Stumble It! :: add to simpy :: seed the vine :: :: :: TailRank


    Perception and Action: Context-specificity

    February 15, 2008

    Research by Adolph, Eppler, and Gibson on infant responses to slopes has provided insight into the interplay between perception and action. In the research, infants with different forms of mobility(crawlers or walkers) were encouraged to ascend and descend slopes with different degrees of steepness. The walkers were wary of slopes of 20 degrees or more whereas the crawlers fearlessly attempt slopes of 20 degrees or more. As the crawlers increased in experience, they learned to avoid descending the steeper slopes. However, when crawlers first begin to walk this avoidance pattern seems to disappear and they again plunged down the steep slope without hesitation.

    These results seem to indicate that the perceptual knowledge that infants gain about the world is action-specific. Infants do not learn about slopes in general but rather, they learn about slopes-and-crawling and then slopes-and-walking. Research along these lines paints a picture of perception as being for specific action-routines. Thus, theoretical frameworks in cognitive science should be geared towards “motocentricity” rather can “visuocentricity”. This re-conceptualization ties in with what James Gibson posited almost 30 years ago in his book The Ecological Approach to Visual Perception: that perception is tied in what we can do with perceptual information. Our perception of a chair is intimately coupled with the fact that chairs are for sitting. Gibson claimed that these perceptual affordances for action are directly perceived in the environment around us. So when an infant looks at a slope, he perceives that the slope affords for falling. The only trouble is putting such information about the environment into use using the context-specific motor schemas available to the infant.

    References:

    Karen E. Adolph, Marion A. Eppler, Eleanor J. Gibson (1993) Crawling versus Walking Infants’ Perception of Affordances for Locomotion over Sloping Surfaces Child Development, Vol. 64, No. 4 (Aug., 1993), pp. 1158-1174


    “Unconscious perception” and the body schema

    January 13, 2008

    In this post over at Science and Consciousness Review, Stan Franklin discusses how the dorsal, or “where” stream of visual perception is unconsciousness. In contrast to the ventral stream, which is responsible for the “what”, or object recognition, the dorsal stream is involved with spatial awareness for actions such as grasping.

    Franklin cites work from Goodale and Milner that studied a patient with an impaired dorsal stream. After careful experimentation, Goodale and Milner concluded that the dorsal stream is unconscious. Normally, I do not like discussing perception in terms of the conscious/unconscious dichotomy, but I though this conclusion was interesting because of its implications for Shaun Gallagher’s preconscious body schema, that I discussed in this previous post. This body schema is a system of sensory-motor capacities that is responsible for such pre-reflective activities as walking, keeping upright posture, and other motor actions that we do more or less “unconsciously” such as appropriately  molding our grip to reach an object. So, with respect with the unconscious dorsal processing Franklin was discussing, I think the body schema is an appropriate theoretical construction that fits the evidence.


    Perceptual Learning

    November 4, 2007

    Perception is a form of action. It is a skill set that is incredibly useful for gathering information about the environment so that we can act in various ways. Some aspects of our perception are inborn, such as the ability to perceive faces as infants. Others are learned. In this post I want to discuss a famous anthropological case that clearly illustrates the reality of perceptual learning.

    In the late 1950s and early 1960s, Colin Turnbull spent time in the Ituri Forest in Congo studying the BaMbuti Pygmies. He spent the majority of his time observing their behavior as it occurred in its natural setting. He had a young(22 years old) Pygmy assistant named Kenge who acted as a guide. On one particular excursion in between villages, Turnbull and Kenge came to the edge of a hill that had been cleared of trees. This clearing offered a view of the distant Ruwenzori Mountains. Normally, the Ituri forest is extremely thick and such clearings are rare. Because of this, Kenge had never experienced a view over such vast distances. He asked if the mountains were hills or clouds. Turnball offered to drive over to the mountain to see them more closely.

    On the drive over, it began raining and the visibility was reduced. Upon arriving to the foot of the mountain, Kenge was amazed at their size. He didn’t know what to make of their snowcaps. As they were leaving, a herd of buffalo grazing on the plain a couple of miles away was visible. Kenge asked Turnbull what kind of insects they were! Turnbull tried to inform Kenge that the buffalo were much bigger up close, but because Kenge had never learned the perceptual skill of size constancy, he was skeptical of such claims. Turnbull, of course, drove Kenge to the buffalos. As they were driving, the optic array of the buffalo became larger and larger to Kenge, and he asked Turnbull what sort of witchcraft was at work to make the buffalo grow in size. Over the next day or so, Kenge quickly learned the skill of perceptual size constancy and no longer made such optical errors.

    This fascinating anthropological tale vividly illustrates how important exposure to a wide variety of different environments is crucial to developing an adaptable perceptual system. Kenge had grown up in the dense forest and had never been exposed to optical arrays of the environment that offered information about such great distances. This lack of distance information shaped the development of his brain in such a way as to make it quite shocking when he was finally exposed to such optical information 22 years into his life. It is a testament to the plasticity of the brain that he was able to adapt so quickly and illustrates how readily our perceptual systems learn when exposed to new environmental circumstances.


    Change blindness and Enactive Perception

    October 30, 2007

    Change blindness is an interesting phenomenon that has many implications for theories of vision. Click on the following link of an animated gif that shifts between two slightly different images and see how long it takes you to notice what is different between the two. Likely, it will take you longer than you expect.

    Click here for the change blindness example

    Why does this phenomenon exist? Change blindness challenges traditional “stream of vision” theories that describe vision as a process of building up detailed representations of the world around us. The phenomenon of change blindness shows us that this idea of “visual richness” is deeply flawed. Clearly, we need a new conceptual framework for perception that allows for the continuity of vision while accounting for the illusion of rich perceptual representations. As a matter of fact, I discussed such a theory in the previous post: that of James Gibson’s ecological perception.

    Another name for this approach to vision is enactive perception. As the name applies, this theory views perception as a form of action. Looking is an evolved skill, like everything else, and it is an exploratory process. Under this conceptual framework, change blindness can be explained in the following way: When we look at a picture, what we see is not a representation of reality. While this seems counterintuitive, it is quite obvious when you think about the fact that one never confuses a two-dimensional picture with the real world. Whenever you look at a picture, you are always aware that you are looking at a flat surface with information on it within the real three-dimensional world. What we see is rather a collection of information, not a representation or symbol of anything.

    It is this embedded information that we actively seek out whenever we look at a picture. So in the change blindness example, it takes time to fully explore and extract all information in the picture. I want to really emphasize the process of exploration, because it is this active process that gives enactive perception its rich explanatory power. In addition to change blindness, enactive perception can potentially explain the phenomena of inattentional blindness, where you fail to perceive what is available for you to see. These effects are usually explained in terms of a limited cognitive capacity for attention, but I believe these attentional theories need to be placed in a broader conceptual framework for perception, and this is where enactive perception comes in. Under this framework, attention, looking, and perception are all brought under the same conceptual umbrella: that of seeking out information offered to us by the environment.


    The Binding Problem-continued

    October 21, 2007

    cajal

    Well, in my last post I kind of left my audience hanging in the fact that I discussed the binding problem, but didn’t give any proposed solution. In this post I want to discuss and speculate on a possible answer to the question of how vision is bound together.

    Possibly, the most well-known solution is Treisman’s feature-integration theory, which is an attention-based theory of the human visual system. In a nutshell, Treisman has proposed that when you attend to an object, the fact that you are attending to it necessarily integrates all the salient features of the object together aka the features are bound together. Furthermore, he postulates “feature maps” in the parietal lobes of the brain are used to select the features being bound together for any particular object.

    His theory can be tested and has been tested in the following way:

    Two white digits are briefly presented in the center of a computer screen, one of which is physically large than the other; the subjects’ task is to report the larger of the two digits, a task that requires attention to be directed at the center of the screen. Simultaneously with the digits, two colored letters are briefly presented in the periphery, one of which is always an F or X accompanied by a distractor letter(such as an O). Thus, after reporting on the digits, subjects are asked which of the two target letters occurred(F or X) and, most importantly the color in which that target was presented. If attention is required for binding, one might expect to observe “illusory conjunctions” in this paradigm such that subjects miscombine the features making up the two peripheral letters. And, in fact, that is just what is observed-when a red O and a yellow X are presented, for example, subjects often report seeing a red X more often than would be predicted by chance.(Hunt & Ellis, 2004)

    Furthermore, empirical support for Treisman’s theory has been found in patients who have sustained damage to the parietal lobes.

    When presented with multiple objects, a patient [who had sustained bi-lateral damage to the parietal lobes] could only report the individual features making up various objects; he was unable to correctly report which features belonged to the same object!(Hunt & Ellis, 2004)

    However, I’d like to point out that while there is a lot of empirical support for Treisman’s theory and various other cognitive/neural explanations, there is still an explanatory gap in the following way: as far as I am aware, no theory of vision that attempts to account for the binding problem adequately gives an evolutionary explanation. This is a problem because it seems logical for reasons of parsimony to assume that at some point in our evolutionary history salient features weren’t bound together, so in order to give a satisfactory answer to the binding problem, one must propose some sort of evolutionary pressure explaining how and why they got bound together. I will not go into details in this post, but I speculate that one can get around this “why” problem if one has a wider conceptual framework to substantiate “brain bound” theories such as Treisman’s. Without better conceptual frameworks, these neural theories of perception will necessarily have limited explanatory power, despite being supported by empirical evidence.

    References:
    Hunt, R., & Ellis, H. (2004). Fundamentals of Cognitive Psychology (Seven ed.): McGraw-Hill Higher Education.


    The Binding Problem of Perception

    October 19, 2007

    Imagine looking at an apple. If asked to describe it you would probably begin describing it’s various features: its color, brightness, hue, shape, size, texture, spatial location, etc. Now suppose if someone asked you where are all these features located, you would probably give them a funny look. Isn’t it obvious? Out there! In the apple! They are the apple! That is what an apple is: a conglomeration of various features integrated into one continuous percept. This seems like a perfectly sensible explanation, but if this obstinate person continues his line of questioning and asks you where this apple-percept is located in the brain, you might have to crack open a textbook and get back to him, because now the answer is not so simple.

    After studiously pouring over the latest research on visual processing, you are finally ready to give the questioner an answer: no where. Simply put, the various features that make up an “apple” are represented in a highly complicated manner across a dazzlingly diverse array of brain tissue. For the sake of simplicity, brain researcher’s often distinguish between two primary information pathways that sensory data takes: the what and where streams. These two streams form the basis of an exemplary conceptual framework for how the brain processes various features of the objects around us to form a more-or-less continuous percept of objects such as apples. It is these continuous percepts that allows us to manipulate and verbally describe them accurately.

    So, if all these apple-features are neurally processed in a separated fashion, why do they appear to be bound together? One obvious answer is they are bound together for the sake of convenience for the perceiver, otherwise how could a person act in a meaningful way? In order to pick up an apple to eat it, a subject must have a more-or-less continuous perception of all the various apple-features integrated into a single spatio-temporal location. This seems to make sense, but who or what are these high-order representations being convenient for? Why would neurons care if things are bound together or not? Surely, from an evolutionary perspective, where efficiency and survival are involved, doing all this extra processing to bind all these neurally separate features together for some subject seems bizarre. Who is this perceiving subject and why would the brain decide to stop the important business of surviving and begin integrating all these salient features into a continuous perception? There needs to be a perceiver for there to be a perception and in order for there to be a perceiver, there needs to be some sort of integrated perception. It seems as if we are at a chicken-and-egg impasse, thus making the phenomenon of unity perception a problem for students of the mind.

    There have been many proposed solutions to the binding problem over the years, but they are beyond the scope of this post, in which I only wanted to outline the problem. Whether or not I will attempt to discuss any of these solutions is yet to be determined as of now, but I wouldn’t be surprised if it was the focus of a forthcoming post. Sorry to leave you hanging!


    Perception versus Representation

    October 15, 2007

    cartesian mind

    The world is it’s own best model…why? We can put the answer in another slogan that [AI researcher] Brooks would probably like: Perception is cheap, representation expensive. Such a slogan might surprise many AI workers, who are acutely aware of how difficult pattern recognition can be. But the point is that good enough perception is cheaper than good enough representation-where that means “good enough” to avoid serious errors. The trouble with representation is that, to be good enough, it must be relatively complete and relatively up to date, both of which are costly in a dynamic environment. Perception, by contrast, can remain happily ad hoc, dealing with concrete questions only as they arise. To take a homely example, it would be silly, for most purposes, to try and keep track of what shelf everything in the refrigerator is currently on, if and when you want something, just look.

    This quotation is from chapter nine of John Haugeland’s essays in the metaphysics of the mind, Having Thought. In the chapter, Haugeland argues against the “interrelationist” account of the mind and the body, which accepts the premise:

    that the mental, or at any rate the cognitive”, has some essential feature, such as intentionality or normativity, and then argue that this feature is impossible except through participation in some supra-individual network of relations.

    interrelationist arguments are holistic in the specific sense that they take cognitive phenomena to be members of some class phenomena, each of which has its relevant character only by virtue of its determinate relations to the others-that relevant character being, in effect, nothing other than its “place” in the larger pattern or whole.

    As a competing theoretical framework, Haugeland offer’s what he calls the “intimacy of the mind’s embodiment and embeddedness in the world…[with the] term “intimacy” suggesting more than just necessary interrelation or interdependence but a kind of commingling or integralness.”

    This embodiment relates to my earlier entry on the MIT Cog project, which is following the general thesis outlined by Haugeland, which is that intelligence does not depend on a “furniture of information” or “complex symbol structure that are, in many respects, just like the contents of the traditional Cartesian mind.”, but rather, on the “concrete details of the agent’s embodiment and worldly situation”.

    This embodied and embedded approach to perception has probably been most controversially put forward by psychologist James J. Gibson. Haugeland quotes him as follows:

    the words animal and environment make an inseparable pair. Each term implies the other. No animal could exist without an environment surrounding it. Equally, although not so obvious, an environment implies an animal(or at least an organism) to be surrounded. This means that the surface of the earth, millions of years before life developed on it, was not an environment, properly speaking.

    Thus, in this sense, Haugeland says that we can understand animals(including humans) as “perceivers” if we consider than “inseparably related to an environment, which is itself understood in terms appropriate to that animal”.

    So for humans, in order to understand perception, we must understand our complexly relative relationship to the environment, which necessarily implies our dynamic socialization schemas. Furthermore, Gibson claims that in order to understand this perception of the environment, we must take into account the affordances of the environment, or what it offers to the animal. Thus, “the central question for the theory of affordances is not whether they exist and are real but whether information is available in ambient light for perceiving them”.

    This perception of affordances puts the first quotation of this post in context. The world is it’s own best model and our perception of it is in relative(embodied and embedded) terms with the environment.


    Thoughts on perception

    September 24, 2007

    k

    This is a painting done by a congenitally blind artist named Esref Armagan.

    The most obvious question is how his brain is able to perform such feats of perspective, but as the article mentions, it is well understood that “blind people… understand and can draw in three dimensions”

    With that said, I think this particular “how” question is easily answered with modern paradigms of neural plasticity/pruning etc

    I believe the more puzzling question to ask is not how he can perform such tasks, but rather, how his developing brain learned to generate high-order representations of three-dimensional space to such a phenomenal degree of accuracy.

    Well, “I was taught, he says. Not by any formal teacher, but by casual comments by friends and acquaintances.”

    This remark, combined with the fact that “it is impossible to know if he had some vision as an infant”, makes it difficult to extract any conclusive insights about whether his “mind’s eye” is purely mapped out in non-visual sensory-terms (the primary contributors likely being kinesthetic and proprioceptive), and his incredible “accuracy” is simply the result of his early peers subtlety nudging him back and forth until he got it “right”.

    An alternative answer is that his brain got some “extra” reinforcement by crude visual data before his eye completely degenerated, thus making his inner conceptual space not in purely non- visual terms as is suggested by the article. This would also explain why his degree of accuracy is greater than most other congenitally blind people.

    These are difficult, but fascinating problems in the psychology of perception, but regardless of Mr. Armagan’s unique skills, there seems to be an emerging consensus from all psychological disciplines that whatever “perception” is, it is realizable across many different modalities.