Semantic Image Segmentation with Deep Learning

Torr Vision Group, Engineering Department Semantic Image Segmentation with Deep Learning Sadeep Jayasumana 07/10/2015 Collaborators: Bernardino Rome...

Author: Nathaniel Simpson

35 downloads 2 Views 5MB Size

Report

Download PDF

Recommend Documents

Semantic Segmentation of Satellite Images using Deep Learning

Image Segmentation: Definitions. Image Segmentation. Image Segmentation: Definitions. Image Segmentation: Definitions

Maximin affinity learning of image segmentation

LEARNING SHAPES FOR AUTOMATIC IMAGE SEGMENTATION

LESION DETECTION IN CT IMAGES USING DEEP LEARNING SEMANTIC SEGMENTATION TECHNIQUE

Learning Fine-grained Image Similarity with Deep Ranking

Deep Visual-Semantic Alignments for Generating Image Descriptions

Importance of Segmentation. Image Segmentation. Importance of Segmentation (Cont.) Application of Segmentation. Image Segmentation Example

Deep Learning Based Automatic Segmentation of Pathological Kidney in CT: Local vs. Global Image Context

DETECTION OF SMALL BIRDS IN LARGE IMAGES BY COMBINING A DEEP DETECTOR WITH SEMANTIC SEGMENTATION

Humanitarian Mapping with Deep Learning

Image Segmentation. Image Segmentation. Detection of Discontinuities. Line Detection

Tutorial: Image Segmentation

Image Captioning with Deep Bidirectional LSTMs

Algorithms for Image Segmentation

Image Segmentation and Preprocessing

Shape prior based image segmentation using manifold learning

Meta-Evaluation of Image Segmentation Using Machine Learning

Affinity Learning via Self-diffusion for Image Segmentation and Clustering

Formulating Semantic Image Annotation as a Supervised Learning Problem

TSC-DL: Unsupervised Trajectory Segmentation of Multi-Modal Surgical Demonstrations with Deep Learning

Deep Learning for Image Processing in Astrophysical Experiments

Image Region Forgery Detection: A Deep Learning Approach

Deep Multiple Instance Learning for Image Classification and Auto-Annotation

Torr Vision Group, Engineering Department

Semantic Image Segmentation with Deep Learning Sadeep Jayasumana 07/10/2015

Collaborators: Bernardino Romera-Paredes Shuai Zheng Phillip Torr

Torr Vision Group, Engineering Department

Live Demo - http://crfasrnn.torr.vision/

Torr Vision Group, Engineering Department

Outline  Semantic segmentation  Why?  CNNs for Pixelwise prediction  CRFs  CRF as RNN  Conclusion

Torr Vision Group, Engineering Department

Semantic Segmentation • Recognizing and delineating objects in an image  Classifying each pixel in the image

Torr Vision Group, Engineering Department

Why Semantic Segmentation? • To help partially sighted people by highlighting important objects in their glasses

Torr Vision Group, Engineering Department

Why Semantic Segmentation? • To let robots segment objects so that they can grasp them

Torr Vision Group, Engineering Department

Why Semantic Segmentation? • Road scenes understanding • Useful for autonomous navigation of cars and drones

Image taken from the cityscapes dataset.

Torr Vision Group, Engineering Department

Why Semantic Segmentation? • Useful tool for editing images

Torr Vision Group, Engineering Department

Why Semantic Segmentation? • Medical purposes: e.g. segmenting tumours, dental cavities, ...

Image taken from Mauricio Reyes

ISBI Challenge 2015, dental x-ray images

Torr Vision Group, Engineering Department

But How? • Deep convolutional neural networks are successful at learning a good representation of the visual inputs.

• However, here we have a structured output.

Torr Vision Group, Engineering Department

CNN for Pixel-wise Labelling • Usual convolutional networks

Torr Vision Group, Engineering Department

CNN for Pixel-wise Labelling • Usual convolutional networks

• Fully convolutional networks

Long et. al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015.

Torr Vision Group, Engineering Department

Fully Convolutional Networks [Long et al, CVPR 2014]

Torr Vision Group, Engineering Department

Fully Convolutional Networks [Long et al, CVPR 2014]

+ Significantly improved the state of the art in semantic segmentation. - Poor object delineation: e.g. spatial consistency neglected.

Image

FCN Results

Ground truth

Torr Vision Group, Engineering Department

Conditional Random Fields (CRFs) • A CRF can account for contextual information in the image

Coarse output from the pixel-wise classifier

MRF/CRF modelling

Output after the CRF inference

Torr Vision Group, Engineering Department

Conditional Random Fields (CRFs) ∈ {bg, cat, tree, person, …}

• Define a discrete random variable Xi for each pixel i. • Each Xi can take a value from the label set. • Connect random variables to form a random field. (MRF)

Torr Vision Group, Engineering Department

Conditional Random Fields (CRFs) ∈ {bg, cat, tree, person, …}

• • • •

= bg

= cat

Define a discrete random variable Xi for each pixel i. Each Xi can take a value from the label set. Connect random variables to form a random field. (MRF) Most probable assignment given the image → segmentation.

Torr Vision Group, Engineering Department

Finding the Best Assignment = bg Pr Pr

=

,

= |

=

,…,

=

= exp −

|

|

= Pr ( = | )

= cat

• Maximize Pr

=

→ Minimize

• So we have formulated the problem as an energy minimization.

Torr Vision Group, Engineering Department

|

=

_

+

_

=

Torr Vision Group, Engineering Department

|

=

_

+

_

Unary energy 

(

=

) = ?

=

Torr Vision Group, Engineering Department

|

=

_

+

_

Unary energy 

(

=

) = ?

 Your label doesn’t agree with the initial classifier → you pay a penalty.

=

Torr Vision Group, Engineering Department

|

=

_

+

_

Unary energy 

(

=

) = ?

 Your label doesn’t agree with the initial classifier → you pay a penalty.

Pairwise energy 

(

=

,

=

) = ?

 You assign different labels to two very similar pixels → you pay a penalty.  How do you measure similarity?

Torr Vision Group, Engineering Department

|

=

_

+

_

Unary energy 

(

=

) = ?

 Your label doesn’t agree with the initial classifier → you pay a penalty.

Pairwise energy 

(

=

,

=

) = ?

 You assign different labels to two very similar pixels → you pay a penalty.  How do you measure similarity?

Torr Vision Group, Engineering Department

|

=

_

+

_

Unary energy 

(

=

) = ?

 Your label doesn’t agree with the initial classifier → you pay a penalty.

Pairwise energy 

(

=

,

=

) = ?

 You assign different labels to two very similar pixels → you pay a penalty.  How do you measure similarity?

Torr Vision Group, Engineering Department

Dense CRF Formulation [Krähenbühl & Koltun, NIPS 2011.]

• Pairwise energies are defined for every pixel pair in the image.

=

(

)+

( , ,

• Exact inference is not feasible. • Use approximate mean field inference.

)

Torr Vision Group, Engineering Department

Dense CRF Formulation [Krähenbühl & Koltun, NIPS 2011.]

• Pairwise energies are defined for every pixel pair in the image.

=

(

)+

( , ,

• Exact inference is not feasible. • Use approximate mean field inference. exp (−

)=

=

( )

)

Torr Vision Group, Engineering Department

Fully Connected CRFs as a CNN

Torr Vision Group, Engineering Department

Fully Connected CRFs as a CNN

U Q I

Bilateral

Torr Vision Group, Engineering Department

Fully Connected CRFs as a CNN

U Q I

Bilateral

Conv

Torr Vision Group, Engineering Department

Fully Connected CRFs as a CNN

U Q I

Bilateral

Conv

Conv

Torr Vision Group, Engineering Department

Fully Connected CRFs as a CNN

U Q I

Bilateral

Conv

Conv

+

Torr Vision Group, Engineering Department

Fully Connected CRFs as a CNN

U Q I

Bilateral

Conv

Conv

+

SoftMax

Torr Vision Group, Engineering Department

CRF as a Recurrent Neural Network U Q I

Bilateral

Conv

Conv

+

SoftMax

Mean-field Iteration

• Each of these blocks is differentiable  We can backprop

Torr Vision Group, Engineering Department

CRF as a Recurrent Neural Network Image CRF Iteration

Unaries

Output

SoftMax CRF as RNN

• Each of these blocks is differentiable  We can backprop

Torr Vision Group, Engineering Department

Putting Things Together FCN

CRF-RNN

Torr Vision Group, Engineering Department

Experiments FCN

FCN

CRF

[Long et al, 2014] [Chen et al, 2015]

68.3

69.5

FCN

CRFRNN

Ours

72.9

Torr Vision Group, Engineering Department

Try our demo: http://crfasrnn.torr.vision Code & model: https://github.com/torrvision/crfasrnn Shuai Zheng

Bernardino Romera-Paredes

Philip Torr

Torr Vision Group, Engineering Department

Examples

http://pp.vk.me/c622119/v622119584/20dc3/7lS5BU2Bp_k.jpg

Torr Vision Group, Engineering Department

Examples

http://media1.fdncms.com/boiseweekly/imager/mountain-bikers-are-advised-to-dism/u/original/3446917/walk_thru_sheep_1_.jpg

Torr Vision Group, Engineering Department

Examples

http://img.rtvslo.si/_up/upload/2014/07/22/65129194_tour-3.jpg

Torr Vision Group, Engineering Department

Examples

http://www.toxel.com/wp-content/uploads/2010/11/bike05.jpg

Torr Vision Group, Engineering Department

Not-so-good examples

http://www.independent.co.uk/incoming/article10335615.ece/alternates/w620/planecat.jpg

Torr Vision Group, Engineering Department

Not-so-good examples

http://i1.wp.com/theverybesttop10.files.wordpress.com/2013/02/the-world_s-top-10-best-images-of-camouflage-cats-5.jpg?resize=375,500

Torr Vision Group, Engineering Department

Tricky examples

http://se-preparer-aux-crises.fr/wp-content/uploads/2013/10/Golum.png

Torr Vision Group, Engineering Department

Tricky examples

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRf4J7Hszkc8Wf6riVUX-cV_K-un8LJy5dYIBW1KDIn6i7UCzGHpg

Torr Vision Group, Engineering Department

Tricky examples

http://i.huffpost.com/gen/1478236/thumbs/s-DIRD6-large640.jpg

Torr Vision Group, Engineering Department

Conclusion • CNNs yield a coarse prediction on pixel-labeled tasks. • CRFs improve the result by accounting for the contextual information in the image. • Learning the whole pipeline end-to-end significantly improves the results.

CNN

CRF

Torr Vision Group, Engineering Department

Conclusion • CNNs yield a coarse prediction on pixel-labeled tasks. • CRFs improve the result by accounting for the contextual information in the image. • Learning the whole pipeline end-to-end significantly improves the results.

Thank You!

CNN

CRF