Topic 1 of Part II Camera Models

3D Computer Vision 3D Vision and Video Computing CSC I6716 Fall 2010 Topic 1 of Part II Camera Models Zhigang Zhu, City College of New York [email protected]
Author: Audra Hicks
2 downloads 5 Views 335KB Size
3D Computer Vision

3D Vision

and Video Computing

CSC I6716 Fall 2010

Topic 1 of Part II Camera Models

Zhigang Zhu, City College of New York [email protected]

3D Computer Vision

and Video Computing



Closely Related Disciplines    



Image Processing – images to mages Computer Graphics – models to images Computer Vision – images to models Photogrammetry g y – obtaining g accurate measurements from images g

What is 3-D ( three dimensional) Vision?   



3D Vision

Motivation: making computers see (the 3D world as humans do) Computer Vision: 2D images to 3D structure Applications : robotics / VR /Image-based rendering/ 3D video

Lectures on 3-D Vision Fundamentals    

Camera Geometric Models (3 lectures) Camera Calibration (3 lectures) Stereo (4 lectures) Motion (4 lectures)

3D Computer Vision

and Video Computing 

Geometric Projection of a Camera   



Pinhole camera model Perspective projection Weak-Perspective Projection

Camera Parameters  

Intrinsic Parameters: define mapping from 3D to 2D Extrinsic parameters: define viewpoint and viewing direction 



Basic Vector and Matrix Operations, Rotation

C Camera M Models d l R Revisited i it d 

Linear Version of the Projection Transformation Equation    



Lecture Outline

Perspective Camera Model Weak Perspective Camera Model Weak-Perspective Affine Camera Model Camera Model for Planes

Summary

3D Computer Vision

and Video Computing



Camera Geometric Models   



Lecture Assumptions

Knowledge about 2D and 3D geometric transformations Linear algebra (vector, matrix) This lecture is only about geometry

Goal Build up relation between 2D images and 3D scenes -3D 3D G Graphics hi ((rendering): d i ) ffrom 3D tto 2D -3D Vision (stereo and motion): from 2D to 3D -Calibration: Determning the parameters for mapping

3D Computer Vision

Image Formation

and Video Computing

Light (Energy) Source

Surface Imaging Plane

Pinhole Lens

World

Optics

Sensor

Signal

B&W Film

Silver Density

Color Film

Silver density in three color layers

TV Camera

Electrical

3D Computer Vision

Image Formation

and Video Computing

Light (Energy) Source

Surface Imaging Plane

Camera: Spec & Pose

3D Scene Pinhole Lens

World

Optics

Sensor

Signal

2D Image

3D Computer Vision

Pinhole Camera Model

and Video Computing

Image Plane

Optical Axis f Pinhole lens 





Pin-hole is the basis for most graphics and vision  Derived from physical construction of early cameras  Mathematics is very straightforward 3D World p projected j to 2D Image g  Image inverted, size reduced  Image is a 2D plane: No direct depth information Perspective projection  f called the focal length of the lens  given image size, change f will change FOV and figure sizes

3D Computer Vision

Focal Length, FOV

and Video Computing 

Consider case with object on the optical axis: Image g plane p

f

z viewpoint

    

Optical axis: the direction of imaging Image plane: a plane perpendicular to the optical axis Center of Projection (pinhole), focal point, viewpoint, nodal point Focal length: distance from focal point to the image plane FOV : Field of View – viewing angles in horizontal and vertical directions

3D Computer Vision

Focal Length, FOV

and Video Computing 

Consider case with object on the optical axis: Image plane

z

f

Out of view     



Optical axis: the direction of imaging Image plane: a plane perpendicular to the optical axis Center of Projection (pinhole), focal point, viewpoint, , nodal point Focal length: distance from focal point to the image plane FOV : Field of View – viewing angles in horizontal and vertical directions

Increasing f will enlarge figures, but decrease FOV

3D Computer Vision

Equivalent Geometry

and Video Computing 

Consider case with object on the optical axis: f

z 

More convenient with upright image:

z

f Projection plane z = f



Equivalent mathematically

3D Computer Vision

Perspective Projection

and Video Computing



Compute the image coordinates of p in terms of the world (camera) coordinates of P P. y Y

p(x,, y) p(

x

P(X Y Z ) P(X,Y,Z

X 0

Z Z=f

  

Origin of camera at center of projection Z axis along optical axis Image Plane at Z = f; x // X and y//Y

X x f Z Y y f Z

3D Computer Vision

Reverse Projection

and Video Computing 

Given a center of projection and image coordinates of a point, it is not possible to recover the 3D depth of the point from a single image. P(X,Y,Z) can be anywhere along this line

p(x,y) p( ,y)

All points on this line have image coordinates (x,y). In general, at least two images of the same point taken from two different locations are required to recover depth.

3D Computer Vision

and Video Computing

Pinhole camera image

Amsterdam : what do you see in this picture? straight g

line

size parallelism/angle p g shape shape

of planes

depth

Photo by Robert Kosara, [email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html

3D Computer Vision

and Video Computing

Pinhole camera image Amsterdam

straight g line size parallelism/angle p g shape shape

of planes

depth

Photo by Robert Kosara, [email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html

3D Computer Vision

and Video Computing

Pinhole camera image Amsterdam

straight g line size parallelism/angle p g shape shape

of planes

depth

Photo by Robert Kosara, [email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html

3D Computer Vision

and Video Computing

Pinhole camera image Amsterdam

straight g line size parallelism/angle p g shape shape

of planes

depth

Photo by Robert Kosara, [email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html

3D Computer Vision

and Video Computing

Pinhole camera image Amsterdam

straight g line size parallelism/angle p g shape shape

of planes

depth

Photo by Robert Kosara, [email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html

3D Computer Vision

and Video Computing

Pinhole camera image Amsterdam

straight g line size parallelism/angle p g shape shape 

of planes

parallel to image

depth

Photo by Robert Kosara, [email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html

3D Computer Vision

and Video Computing

Pinhole camera image Amsterdam: what do you see?

straight g line size parallelism/angle p g shape shape 

of planes

parallel to image

Depth

?

stereo

- We see spatial shapes rather than individual pixels - Knowledge: top top-down down vision belongs to human - Stereo &Motion most successful in 3D CV & application - You can see it but you don't know how…

motion size structure



3D Computer Vision

and Video Computing Yet Yet

other pinhole camera images

Rabbit or Man?

Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches Fine Art Center University Gallery, Sep 15 – Oct 26

3D Computer Vision

and Video Computing Yet Yet

other pinhole camera images

2D projections are not the “same” as the real object as we usually see everyday!

Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches Fine Art Center University Gallery, Sep 15 – Oct 26

3D Computer Vision

and Video Computing

It’s real!

3D Computer Vision

and Video Computing 

Weak Perspective Projection

Average depth Z is much larger than the relative distance between any two scene points measured along the optical axis y Y

p(x, y)

x

P(X,Y,Z )

X 0

Z Z=f



A sequence of two transformations  



Orthographic projection : parallel rays Isotropic scaling : f/Z

Linear Model 

Preserve angles and shapes

X x f Z Y y f Z

3D Computer Vision

Camera Parameters

and Video Computing

Pose / Camera

xim Image frame

(xim,yim)

yim 

y

x

p

Coordinate Systems    



Frame F Grabber

O

Frame coordinates (xim, yim) pixels Image coordinates (x,y) in mm Camera coordinates ((X,Y,Z) , , ) World coordinates (Xw,Yw,Zw)

Camera Parameters 



Object / World o d

Zw

P Pw Xw

Yw

Intrinsic Parameters (of the camera and the frame grabber): link the frame coordinates of an image point with its corresponding camera coordinates Extrinsic parameters: define the location and orientation of the camera coordinate system with respect to the world coordinate system

3D Computer Vision

and Video Computing

y

(0,0)

x p (x,y,f) O 

 



Image center Directions of axes Pixel size

ox xim

o y

yim

Pixel (xim,yim)

x  ( xim i  ox ) s x y  ( yim  o y ) s y

From 3D to 2D 



Size: (sx,sy)

From image to frame 

Intrinsic Parameters (I)

Perspective projection

Intrinsic Parameters   

( x ,,oy) : image (o g center ((in p pixels)) (sx ,sy) : effective size of the pixel (in mm) f: focal length

X x f Z Y y f Z

3D Computer Vision

and Video Computing (x, y) 



Intrinsic Parameters (II) k1 , k2

(xd, yd)

Lens Distortions

Modeled as simple radial distortions    

r2 = xd2+yd2 x  xd (1  k1r 2  k 2 r 4 ) (xd , yd) distorted points 2 4 y  y ( 1  k r  k r 1 2 ) d k1 , k2: distortion coefficients A model with k2 =0 is still accurate for a CCD sensor of 500x500 with ~5 pixels distortion on the outer boundary

3D Computer Vision

Extrinsic Parameters

and Video Computing

xim (xim,yim) yim 

O

y

x

p

From World to Camera Zw

P  R Pw  T 

Extrinsic Parameters 



P

Pw Xw

T

Yw

A 3-D translation vector, T, describing the relative locations of the origins of the two coordinate systems (what’s it?) A 3x3 rotation matrix, R, an orthogonal matrix that brings the corresponding axes of the two systems onto each other

3D Computer Vision

and Video Computing Linear 

A point as a 2D/ 3D vector p   x   ( x, y)T   



Algebra: Vector and Matrix

Image point: 2D vector Scene point: 3D vector Translation: 3D vector

 y  

T: Transpose

P  ( X , Y , Z )T

T  (Tx , T y , Tz )T

Vector Operations 

Addition: 





Translation of a 3D vector

Dot product ( a scalar): 

P  Pw  T  ( X w  Tx , Yw  T y , Z w  Tz )T

a.b = |a||b|cos

c  a  b  aT b

Cross product (a vector) 

Generates a new vector that is orthogonal to both of them

c  ab

a x b = (a2b3 - a3b2)i + (a3b1 - a1b3)j + (a1b2 - a2b1)k

3D Computer Vision

and Video Computing Linear 

Rotation: 3x3 matrix 

Orthogonal :

R 1  RT , i.e. RRT  RT R  I 





Algebra: Vector and Matrix  r11 r12 R  rij  r21 r22 33  r31 r32

 

r13  R1T   T  r23   R 2  r33  RT3   

9 elements => 3+3 constraints (orthogonal/cross ) => 2+2 constraints (unit vectors) => 3 DOF ? (degrees of freedom, orthogonal/dot)

How to generate R from three angles? (next few slides)

Matrix Operations 

R Pw +T= ? - Points in the World are projected on three new axes (of the camera system) and translated to a new origin

 r11 X w  r12Yw  r13 Z w  Tx   R1T Pw  Tx      T P  RPw  T   r21 X w  r22Yw  r23Z w  T y   R 2 Pw  T y     T  R P T     r X r Y r Z T z   31 w 32 w 33 w z   3 w

3D Computer Vision

and Video Computing 

Rotation: from Angles to Matrix

Rotation around the Axes  Result of three consecutive rotations around the coordinate axes

O  Zw

R  R R  R  

N t Notes:   

Only three rotations Every time around one axis B i corresponding Bring di axes tto each h other th 



Xw = X, Yw = Y, Zw = Z

First step (e.g.) Bring Xw to X

Xw

 Yw



3D Computer Vision

and Video Computing

Rotation: from Angles to Matrix 

cos  R   sin   0 

0 0 1

Zw

O Yw

Rotation  around the Zw Axis   





 sin  cos  0

Rotate in XwOYw plane Goal: Bring Xw to X But X is not in XwOYw

Xw

YwX X in XwOZw (Yw XwOZw)  Yw in YOZ ( X YOZ)

Next time rotation around Yw

3D Computer Vision

and Video Computing

Rotation: from Angles to Matrix 

cos  R   sin   0



0 0 1

Zw

Xw

Rotation  around the Zw Axis  



 sin  cos  0

Rotate in XwOYw plane so that YwX X in XwOZw (YwXwOZw)  Yw in i YOZ (  XYOZ) X YOZ)

Zw does not change

O

Yw

3D Computer Vision

and Video Computing

cos  R    0  sin 



Zw

Xw

Rotation  around the Yw Axis  



0  sin   1 0  0 cos  

Rotation: from Angles to Matrix

O



Rotate in XwOZw plane so that Xw = X  Zw in YOZ (& Yw in YOZ)

Yw does not change

Yw

3D Computer Vision

and Video Computing

cos  R    0  sin 



0  sin   1 0  0 cos  

Rotation  around the Yw Axis  



Rotation: from Angles to Matrix

O



Rotate in XwOZw plane so that Xw = X  Zw in YOZ (& Yw in YOZ)

Yw does not change

Yw

Xw

Zw

3D Computer Vision

and Video Computing

0 1 R  0 cos  0 sin 



0   sin   cos  

Rotation  around the Xw(X) Axis  



Rotation: from Angles to Matrix

Rotate in YwOZw plane so that Yw = Y, Zw = Z (& Xw = X)

Xw does not change

 O

Yw

Xw

Zw

3D Computer Vision

and Video Computing

0 1 R  0 cos  0 sin 

Rotation: from Angles to Matrix

0   sin   cos  

Yw  O

Xw

Zw 

Rotation  around the Xw(X) Axis  



Rotate in YwOZw plane so that Yw = Y, Zw = Z (& Xw = X)

Xw does not change

3D Computer Vision

and Video Computing

Rotation: from Angles to Matrix

A Appendix di A.9 A 9 off th the textbook t tb k 

Rotation around the Axes 

Result of three consecutive rotations around the coordinate axes

O

R  R R  R 



 Zw

Notes:      

Rotation directions The order of multiplications matters:  Same R, 6 different sets of  R Non-linear function of  R is orthogonal It’s easy to compute angles from R

cos  cos   R   sin  sin  cos   cos  sin   cos  sin  cos   sin  sin 

 cos  sin  sin  sin  sin   cos  cos   cos  sin  sin   sin  cos 

 Xw

Yw 

 sin    sin  cos   cos  cos  

3D Computer Vision

and Video Computing

Rotation- Axis and Angle

Appendix A.9 of the textbook





According to Euler’s Euler s Theorem, Theorem any 3D rotation can be described by a rotating angle, , around an axis defined by an unit vector n = [n1, n2, n3]T. Three degrees of freedom – why?

 n12  R  I cos   n2 n1  n3 n1 

n1n2 n22 n3 n2

n1n3   0  n2 n3  (1  cos  )   n3  n2 n32 

 n3 0 n1

n2   n1  sin  0 

3D Computer Vision

Linear Version and Video Computing



World to Camera   



 

Camera: P = (X C (X,Y,Z) Y Z)T Image: p = (x,y)T Not linear equations

Image to Frame  



 r11 X w  r12Yw  r13 Z w  Tx   R1T Pw  Tx      P  RPw  T   r21 X w  r22Yw  r23 Z w  T y   RT2 Pw  T y    T      r X r Y r Z T  31 w 32 w 33 w z   R 3 Pw  Tz 

Camera to Image 



Camera: P = (X,Y,Z)T World: Pw = (Xw,Yw,Zw)T Transform: R, T

of Perspective Projection

Neglecting distortion Frame (xim, yim)T

World to Frame  

(Xw,Yw,Zw)T -> (xim, yim)T Effective focal lengths  

fx = f/sx, fy=f/sy Three are not independent

( x, y )  ( f

X Y , f ) Z Z

x  ( xim  o x ) s x y  ( yim  o y ) s y r X  r Y  r Z T xim  ox   f x 11 w 12 w 13 w x r31 X w  r32Yw  r33Z w  Tz yim  o y   f y

r21 X w  r22Yw  r23Z w  T y r31 X w  r32Yw  r33Z w  Tz

3D Computer Vision

and Video Computing 

Projective Space 

Add fourth coordinate 









Only extrinsic parameters World to camera

3x3 Matrix Mint 

    X  x1  w      x2   M int M ext  Yw  Z  x   3  w  1 

 xim   x1 / x3        yim   x2 / x3 

x1/x3 =xim, x2/x3 =yim

3 4M 3x4 Matrix t i Mext 



Pw =

(Xw,Yw,Zw, 1)T

Define (x1,x2,x3)T such that 



Linear Matrix Equation of perspective ti projection j ti

Only intrinsic parameters Camera to frame

 r11 r12 M ext  r21 r22  r31 r32  f x M int   0  0

0  fy 0

r13 Tx  R1T  r23 T y   RT2 r33 Tz  RT3 

Tx   Ty   Tz 

ox  o y  1 

Simple Matrix Product! Projective Matrix M= MintMext   

(Xw,Yw,Zw)T -> (xim, yim)T Linear Transform from projective space to projective plane M defined up to a scale factor – 11 independent entries

3D Computer Vision

and Video Computing 

Perspective Camera Model 





  fr11  fr12 Making some assumptions M   ffr21  ffr22  Known center: Ox = Oy = 0  r31 r32  Square pixel: Sx = Sy = 1 11 independent entries 7 parameters

 

fr13  fTx  ffr23  fT f y  r33 Tz 

Weak Perspective Camera Model Weak-Perspective  

Average Distance Z >> Range Z Define centroid vector Pw Z  Z  RT 3 Pw  Tz





Three Camera Models

8 independent entries

Affine Camera Model   

Mathematical Generalization of Weak Weak-Pers Pers Doesn’t correspond to physical camera But simple equation and appealing geometry 



  M wp    

Doesn’t p preserve angle g BUT p parallelism

8 independent entries

fr11  fr12 fr21  fr22 0 0

 

 fTx  fr13  fr23  fT y   0 RT 3 Pw  Tz 

 a11 a12 M af  a21 a22  0 0

a13 b1  a23 b2  0 b3 

3D Computer Vision

and Video Computing



Planes are very common in the Man-Made World nx X w  n yYw  n z Z w  d 



 

One more constraint for all points: Zw is a function of Xw and Yw Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> > 2D point

Projective Model of a Plane 



n T Pw  d

Special case: Ground Plane 



Camera Models for a Plane

8 independent entries

General Form ? 

8 independent entries

 x1    fr11  fr12     x2    fr21  fr22 x   r r32  3   31

 Xw  fr13  fTx   Yw   fr23  fT y  Z 0 r33 Tz  w 1

3D Computer Vision

and Video Computing



Camera Models for a Plane

A Plane in the World nx X w  n yYw  n z Z w  d 



 

Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> > 2D point

 x1    fr11  fr12     x2    fr21  fr22 x   r r32  3   31

Projective Model of Zw=0 



One more constraint for all points: Zw is a function of Xw and Yw

Special case: Ground Plane 



n T Pw  d

8 independent entries

General Form ? 

 Xw  fr13  fTx   Yw   fr23  fT y  Z 0 r33 Tz  w 1

8 independent entries

f  fr f 12  x1    fr    11  x2    fr21  fr22 x   r r32  3   31

 fT f x  X w      fT23  Yw  Tz 1 

3D Computer Vision

and Video Computing



Camera Models for a Plane

A Plane in the World nx X w  n yYw  n z Z w  d 



 

 x1    fr11  fr12     x2    fr21  fr22 x   r r32  3   31

Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> > 2D point

Projective Model of Zw=0 



One more constraint for all points: Zw is a function of Xw and Yw

Special case: Ground Plane 



n T Pw  d

 Xw   fr13  fTx   Yw    fr23  fT y  Zw   r33 Tz   1 

8 independent entries

General Form ? 

 x    f (r11  n x r13 )  f (r12  n y r13 )  f (dr13  Tx )  X w    1   x   f ( r  n r )  f ( r  n r )  f ( dr  T ) Y nz = 1  2  21 x 23 22 y 23 23 y  w  (r32  n y r33 ) (dr33  Tz ) 1  Z w  d  n x X w  n yYw  x3   (r31  n x r33 )



8 independent entries 

2D (xim,yim) -> 3D (Xw, Yw, Zw) ?

3D Computer Vision

and Video Computing 

Graphics /Rendering 

From 3D world to 2D image  





Changing Ch i viewpoints i i t and d directions di ti Changing focal length

Fast rendering algorithms

    X  x1  w      x2   M int M ext  Yw  Z  x   3  w  1 

Vision / Reconstruction 

From 2D image to 3D model  

 



Applications and Issues

Inverse problem Much harder / unsolved

Robust algorithms for matching and parameter estimation Need to estimate camera parameters first

Calibration    

 xim   x1 / x3       y x / x  im   2 3 

Find Fi d iintrinsic t i i & extrinsic t i i parameters t Given image-world point pairs Probably a partially solved problem ? 11 independent entries 

10 parameters: fx, fy, ox, oy, , Tx,Ty,Tz

 f x M int   0  0

0  fy 0

 r11 r12 M ext  r21 r22  r31 r32

ox  o y  1 

r13 Tx  r23 T y  r33 Tz 

3D Computer Vision

and Video Computing



Geometric Projection of a Camera   



Pinhole camera model Perspective projection Weak-Perspective Projection

Camera Parameters (10 or 11) 





Camera Model Summary

Intrinsic Parameters: f, ox,oy, sx,sy,k1: 4 or 5 independent parameters Extrinsic parameters: R, T – 6 DOF (degrees of freedom)

Linear Equations of Camera Models (without distortion)     

General Projection Transformation Equation : 11 parameters P Perspective ti C Camera M Model: d l 11 parameters t Weak-Perspective Camera Model: 8 parameters Affine Camera Model: generalization of weak-perspective: 8 P j ti ttransformation Projective f ti off planes: l 8 parameters t

3D Computer Vision

Next

and Video Computing



Determining the value of the extrinsic and intrinsic parameters of a camera

Calibration (Ch. 6)     X  x1  w      x2   M int M ext  Yw  Z  x   3  w  1 

 f x M int   0  0

0  fy 0

 r11 r12 M ext  r21 r22  r31 r32

ox  o y  1 

r13 Tx  r23 T y  r33 Tz 