Many philosophers have used visual illusions as support for a representational theory of visual experience. The basic idea is that sensory input in the environment is too ambiguous for the brain to really figure out anything on the basis of sensory evidence alone. To deal with this ambiguity, theorists have conjectured that the brain generates a series of predictions or hypotheses about the world based on the continuously incoming evidence and it’s accumulated knowledge (known as “priors”). On this theory, the nature of visual experience is explained by saying that what we experience is really just the prediction. So on the visual illusion above, the brain guesses that the B square is a lighter color and therefore we experience it as lighter. The brain guesses this because in its stored memory is information about typical configurations of checkered squares under typical kinds of illumination. On this standard view, all of visual experience is a big illusion, like a virtual-reality type Matrix.
Lately I have been deeply interested in thinking about these notions of “guessing” and “prediction”. What does it mean to say that a collection of neurons predicts something? How is this possible? What does it mean for a collection of neurons to make a hypothesis? I am worried that in using these notions as our explanatory principle, we risk the possibility that we are simply trading in metaphors instead of gaining true explanatory power. So let’s examine this notion of prediction further and see if we can make sense of it in light of what we know about how the brain works.
One thought might be that predictions or guesses are really just kinds of representations. To perceive the B square as lighter is just for your brain to represent it as lighter. But what could we mean by representation? One idea comes from Jeff Hawkin’s book On Intelligence. He talks about representations in terms of invariancy. For Hawkins, the concept of representation and prediction is inevitably tied into memory. To see why consider my perception of my computer chair. I can see and recognize that my chair is my chair from a variety of visual angles. I have a memory of what my chair looks like in my brain and the different visual angles provide evidence that matches my stored memory of my chair. The key is that my high-level memory of my chair is invariant with respect to it’s visual features. But at lower levels of visual processing, the neurons are tuned to respond only to low-level visual features. So some low-level neurons only fire in respond to certain angles or edge configurations. So on different visual angles these low-level neurons might not respond. But at higher levels of visual processing, there must be some neurons that are always firing regardless of the visual angle because their level of response invariancy is higher. So my memory of the chair really spans a hierarchy of levels of invariancy. At the highest levels of invariancy, I can even predict the chair when I am not in the room. So if I am about to walk into my office, I can predict that my chair will be on the right side of the room. If I walked in and my chair was not on the right side, I would be surprised and I’d have to update my memory with a new pattern.
On this account, representation and prediction is intimately tied into our memory, our stored knowledge of reality that helps us make predictions to better cope with our lives. But what is memory really? If we are going to be neurally realistic, it seems like it is going to have to be cashed out in terms of various dispositions of brain cells to react in certain ways. So memory is the collective dispositions of many different circuits of brain cells, particularly their synaptic activities. Dispositions can be thought of as mechanical mediations between input and output. Invariancies can thus be thought of as invariancies in mediation. Low-level mediation is variant with respect to the fine-grained features of the input. High-level mediation is less variant with respect to fine-grain detail. What does this tell us about visual experience? I believe the mediational view of representation offers an alternative account of illusions.
I am still working out the details of this idea, so bear with me. My current thought is that the brain’s “guess” that square B is lighter can be understood dispositionally rather than intentionally. Let’s imagine that we reconstruct the 2D visual illusion in the real world, so that we experience the same illusion that the B square is lighter. What would it mean for my brain to make this prediction? Well, on the dispositional view, it would mean that in making such a prediction my brain is essentially saying “If I go over and inspect that square some more I should expect it to be lighter”. If you actually did go inspect the square and found it is is not a light square, you would have to make an update to your memory store. However, visual illusions are persistent despite high-level prediction. This is because the entirety of the memory store for low-level visual processing overrides the meager alternate prediction generated at higher levels.
What about qualia? The representational view says that the qualitative features of the B square result from the square being represented as lighter. But if we understand representations as mediations, we see that representations don’t have to be these spooky things with strange properties like “aboutness”. Aboutness is just cashed out in terms of specificity of response. But the problem of qualia is tricky. In a way I kind of think the “lightness” of the B square is just an illusion added “on top” of a more or less veridical acquaintance. So I feel like I should resist inferring from this minor illusional augmentation that all of my visual experience is massively illusory in this way. Instead, I think we could see the “prediction” of the B square as lighter as a kind of augmentation of mediation. The brain augments the flow of mediations such that if this illusion was a real scene and someone asked you to “go step on all the light squares” you would step on the B square. For this reason, I think the phenomenal impressiveness of the illusions are amplified because of their 2Dness. If it were a 3D scene, the “prediction” would take the form of possible continuations of mediated behavior in response to a task demand (e.g. finding light squares). But because it’s a 2D image, the “qualia” of the B square being light takes on a special form, pressing itself upon us as being a “raw visual feel” of lightness that on the surface doesn’t seem to be linked to behavior. But I think if we understand the visual hierachy of invariant mediation, and the ways in which the higher and lower levels influence each other, we don’t need to conclude that all visual experience is massively illusory because we live behind a Kantian screen of representation. Understanding brain representations as mediational rather than intentional helps us strip the Kantian image of its persuasive power.