Deep Networks in the Brain

Deep Networks in the Brain Vicente L. Malave January 26, 2012 A quantitative theory of immediate visual recognition 1 1 Figure from [Serre et al...
Author: Mary Clark
2 downloads 2 Views 5MB Size
Deep Networks in the Brain Vicente L. Malave

January 26, 2012

A quantitative theory of immediate visual recognition

1

1

Figure from [Serre et al., 2007]

I’m going to present the model, talk about where the data come from, and be critical of some of the claims.

Most of this is not in the paper, but I want you to put this in context, be aware of some things omitted from the paper, and to be more critical when someone claims to know how the brain works.

Outline Early Visual System : Simple and Complex Cells The Model Electrophysiological evidence for early object recognition. Claim: Linear Classifiers indicate that information is represented in IT cortex A V1 strawman. Conclusions

Visual System

2 2

Figure from [Felleman and Van Essen, 1991]

What Serre et al think the Visual System Does In summary, the accumulated evidence points to four, mostly accepted, properties of the feed-forward path of the ventral stream architecture: (a) a hierarchical build-up of invariances first to position and scale and then to viewpoint and other transformations; (b) an increasing selectivity, originating from inputs from previous layers and areas, with a parallel increase in both the size of the receptive elds and in the complexity of the optimal stimulus; (c) a basic feedforward processing of information (for immediate recognition tasks); and (d) plasticity and learning probably at all stages with a time scale that decreases from V1 to IT and PFC.

It’s often claimed that object recognition is feedforward.

3

3

Figure from [Thorpe and Fabre-Thorpe, 2001]

Feedforward Object Recognition

I

Very fast (but how fast, exactly?)

I

No eye movements

I

No attention [Li et al., 2002]

Simple Cells Most sparse coding papers contain this figure, and an argument that it’s similar to what V1 cells do.

4

4

Figure from [Olshausen and D.J., 1996]

Recording a receptive field

(tell the Hubel and Wiesel story, show the video clip) http: //www.youtube.com/watch?v=KE952yueVLA&feature=related

Simple Cells

5

5

Figure from [Hubel and Wiesel, 1962]

Complex Cells Complex cells have some mild invariance.

6

6

Figure from [Hubel and Wiesel, 1962]

Simple Cells can be Modeled by Gabors The gabor filter [Daugman, 1985] is a good model of what simple cells do.

7

7

Figure from [Jones and Palmer, 1987]

Simple Cells can be Modeled by Gabors

8 8

Figure from [Jones and Palmer, 1987]

Simple Cells can be Modeled by Gabors

9

9

Figure from [Jones and Palmer, 1987]

Being a little more formal: Reverse Correlation

For more detail on how to measure neural tuning, see [Wu et al., 2006, Dayan and Abbott, 2001]

Outline Early Visual System : Simple and Complex Cells The Model Electrophysiological evidence for early object recognition. Claim: Linear Classifiers indicate that information is represented in IT cortex A V1 strawman. Conclusions

How to build complex cells Hubel and Wiesel proposed that you can build invariances by combining a set of input functions.

10

10

Figure from [Hubel and Wiesel, 1962]

A quantitative theory of immediate visual recognition The model is an attempt to push this idea as far as they can.

11

11

Figure from [Serre et al., 2007]

Simple Cells : Radial Basis Functions

Radial Basis Function [Bishop, 1995] is: 

 N X 1 y = exp − 2 (wj − xj )2  2σ j=1

(1)

Complex Cells : Max

y = max xj j=1,...,N

(2)

A quantitative theory of immediate visual recognition The model is an attempt to push this idea as far as they can.

12

12

Figure from [Serre et al., 2007]

How do they learn the parameters? [Serre et al., 2005]

This paper’s main claim is that the model reproduces many of the experimental results. How good are these experiments?

Outline Early Visual System : Simple and Complex Cells The Model Electrophysiological evidence for early object recognition. Claim: Linear Classifiers indicate that information is represented in IT cortex A V1 strawman. Conclusions

Speed of Processing in the human visual system

The main result (in humans) comes from [Thorpe et al., 1996]. First, I’ll explain the method.

Clearly the upper bound on processing time is 445 ms.

13

13

Figure from [Thorpe et al., 1996]

14 14

Figure from [Luck, 2005]

Fast Recognition The ERPs are different after about 150 milliseconds, at a frontal electrode.

15

15

Figure from [Thorpe et al., 1996]

Here’s the problem. We know some EEG activity is stimulus-driven: it could have nothing to do with the behavior.

The stimuli Thorpe used images similar to experiment 1.

16

16

Figure from [Johnson and Olshausen, 2003]

Spatial Frequency

There is a huge difference in spatial frequency between these categories. Could it be driving the response?

17

17

Figure from [Johnson and Olshausen, 2003]

What did they actually show?

Condition Condition Condition Condition

A B C D

Animal Natural Natural Animal

Target Nontarget Target Nontarget

Fast Recognition ?

18

18

Figure from [Johnson and Olshausen, 2003]

Fast Recognition ? There are two components here: a fast time-locked stimulus driven part, and slower component which co-varies with reaction time.

19 19

Figure from [Johnson and Olshausen, 2003]

Caltech 101 Not just a psychology problem, in computer vision there can be dataset problems too.

(mean image of caltech 101). [Ponce et al., 2006].

Outline Early Visual System : Simple and Complex Cells The Model Electrophysiological evidence for early object recognition. Claim: Linear Classifiers indicate that information is represented in IT cortex A V1 strawman. Conclusions

Another key paper is [Hung et al., 2005]: in inferior temporal cortex, they can use a linear classifier to readout object identity after 200 milliseconds.

20

20

Figure from [Hung et al., 2005]

21 21

Figure from [Hung et al., 2005]

The conclusions, which the model can actually account for I

Linear classifiers can readout object identity from Inferior Temporal (C2b in the model).

I

Invariance to position and scaling

22

22

Figure from [Serre et al., 2007]

Are these good stimuli?

Object images were not normalized for mean gray level, contrast or other basic image properties. It is possible to partially read out object category based on some of these simple image properties (1).

“Only some spatial patterns of fMRI response are read out in task performance.”

23 23

Figure from [Williams et al., 2007]

Behavioral Validation To their credit, the model does also match human behavior usign C2b units but not earlier ones.

24

24

Figure from [Serre et al., 2007]

Outline Early Visual System : Simple and Complex Cells The Model Electrophysiological evidence for early object recognition. Claim: Linear Classifiers indicate that information is represented in IT cortex A V1 strawman. Conclusions

How well can you do with V1-like features.?

25

25

Figure from [Pinto et al., 2008]

Maybe you’re solving the wrong problem?

26

26

Figure from [Pinto et al., 2008]

Outline Early Visual System : Simple and Complex Cells The Model Electrophysiological evidence for early object recognition. Claim: Linear Classifiers indicate that information is represented in IT cortex A V1 strawman. Conclusions

What Serre et al think the Visual System Does In summary, the accumulated evidence points to four, mostly accepted, properties of the feedforward path of the ventral stream architecture: (a) a hierarchical build-up of invariances first to position and scale and then to viewpoint and other transformations; (b) an increasing selectivity, originating from inputs from previous layers and areas, with a parallel increase in both the size of the receptive elds and in the complexity of the optimal stimulus; (c) a basic feedforward processing of information (for immediate recognition tasks); and (d) plasticity and learning probably at all stages with a time scale that decreases from V1 to IT and PFC.

Recommended Reading on Invariance: I

[DiCarlo and Cox, 2007]

I

[Rust and Stocker, 2010]

I

[Kravitz et al., 2008, Kravitz et al., 2010]

Recommended reading on Hierarchical theories: I

Tai-Sing Lee (CMU) [Lee and Mumford, 2003, Lee and Yuille, 2006]

Other things you could read: I

[Friston, 2008, Friston, 2009]

I

[George and Hawkins, 2005, Hawkins and Blakeslee, 2005]

If nothing else, please keep in mind that whatever the neuroscientists tell you is an inference, and they can often be wrong.

When you read (or write) a sparse coding paper, ask yourself, am I making quantitative claims?

There are some papers on the limits of classifiers compared to other techniques, [Serences and Saproo, 2011, Naselaris et al., 2010, Kriegeskorte, 2011], and I‘m writing a better one and I’d love to talk about it.

Questions

I

Feedforward object recognition ?

I

A linear classifier isn’t enough, the data have to be behaviorally useful.

I

How can you rigorously say your sparse code looks like V1?

I

Is Object recognition invariant?

I

What dataset should we use? (easy to criticize, hard to solve)

Bishop, C. (1995). Neural networks for pattern recognition. Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Optical Society of America, Journal, A: Optics and Image Science, 2:1160–1169. Dayan, P. and Abbott, L. (2001). Theoretical neuroscience: Computational and mathematical modeling of neural systems. DiCarlo, J. and Cox, D. (2007). Untangling invariant object recognition. Trends in Cognitive Sciences, 11(8):333–341. Felleman, D. and Van Essen, D. (1991). Distributed hierarchical processing in the primate cerebral cortex.

Cerebral cortex, 1(1):1. Friston, K. (2008). Hierarchical models in the brain. PLoS computational biology, 4(11):e1000211. Friston, K. (2009). The free-energy principle: a rough guide to the brain? Trends in cognitive sciences, 13(7):293–301. George, D. and Hawkins, J. (2005). A hierarchical bayesian model of invariant pattern recognition in the visual cortex. In Neural Networks, 2005. IJCNN’05. Proceedings. 2005 IEEE International Joint Conference on, volume 3, pages 1812–1817. Ieee. Hawkins, J. and Blakeslee, S. (2005). On intelligence. Owl Books. Hubel, D. and Wiesel, T. (1962).

Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology, 160(1):106. Hung, C., Kreiman, G., Poggio, T., and DiCarlo, J. (2005). Fast readout of object identity from macaque inferior temporal cortex. Science, 310(5749):863. Johnson, J. and Olshausen, B. (2003). Timecourse of neural signatures of object recognition. Journal of Vision, 3(7). Jones, J. and Palmer, L. (1987). An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6):1233. Kravitz, D., Kriegeskorte, N., and Baker, C. (2010). High-level visual object representations are constrained by position. Cerebral Cortex, 20(12):2916.

Kravitz, D., Vinson, L., and Baker, C. (2008). How position dependent is visual object recognition? Trends in cognitive sciences, 12(3):114–122. Kriegeskorte, N. (2011). Pattern-information analysis: From stimulus decoding to computational-model testing. NeuroImage. Lee, T. and Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. JOSA A, 20(7):1434–1448. Lee, T. and Yuille, A. (2006). Efficient coding of visual scenes by grouping and segmentation. Bayesian Brain: Probabilistic Approaches to Neural Coding, MIT Press, Cambridge, MA, pages 145–188. Li, F., VanRullen, R., Koch, C., and Perona, P. (2002).

Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences, 99(14):9596. Luck, S. (2005). An introduction to the event-related potential technique. MIT Press. Naselaris, T., Kay, K., Nishimoto, S., and Gallant, J. (2010). Encoding and decoding in fMRI. Neuroimage. Olshausen, B. and D.J., F. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607–609. Pinto, N., Cox, D., and DiCarlo, J. (2008). Why is real-world visual object recognition hard? PLoS computational biology, 4(1):e27.

Ponce, J., Berg, T., Everingham, M., Forsyth, D., Hebert, M., Lazebnik, S., Marszalek, M., Schmid, C., Russell, B., Torralba, A., et al. (2006). Dataset issues in object recognition. Toward category-level object recognition, pages 29–48. Rust, N. and Stocker, A. (2010). Ambiguity and invariance: two fundamental challenges for visual processing. Current opinion in neurobiology, 20(3):382–388. Serences, J. and Saproo, S. (2011). Computational advances towards linking bold and behavior. Neuropsychologia. Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman, G., and Poggio, T. (2005). theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex,.

Technical Report CBCL Paper #259/AI Memo #2005-036,, Massachusetts Institute of Technology, Cambridge, MA. Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., and Poggio, T. (2007). A quantitative theory of immediate visual recognition. Progress in Brain Research, 165:33–56. Thorpe, S. and Fabre-Thorpe, M. (2001). Seeking categories in the brain. Science, 291(5502):260. Thorpe, S., Fize, D., Marlot, C., et al. (1996). Speed of processing in the human visual system. nature, 381(6582):520–522. Williams, M., Dang, S., and Kanwisher, N. (2007). Only some spatial patterns of fmri response are read out in task performance. Nature neuroscience, 10(6):685–686. Wu, M., David, S., and Gallant, J. (2006).

Complete functional characterization of sensory neurons by system identification. Annu. Rev. Neurosci., 29:477–505.