3D Computer Vision
3D Vision
and Video Computing
CSC I6716 Fall 2010
Topic 1 of Part II Camera Models
Zhigang Zhu, City College of New York
[email protected]
3D Computer Vision
and Video Computing
Closely Related Disciplines
Image Processing – images to mages Computer Graphics – models to images Computer Vision – images to models Photogrammetry g y – obtaining g accurate measurements from images g
What is 3-D ( three dimensional) Vision?
3D Vision
Motivation: making computers see (the 3D world as humans do) Computer Vision: 2D images to 3D structure Applications : robotics / VR /Image-based rendering/ 3D video
Lectures on 3-D Vision Fundamentals
Camera Geometric Models (3 lectures) Camera Calibration (3 lectures) Stereo (4 lectures) Motion (4 lectures)
3D Computer Vision
and Video Computing
Geometric Projection of a Camera
Pinhole camera model Perspective projection Weak-Perspective Projection
Camera Parameters
Intrinsic Parameters: define mapping from 3D to 2D Extrinsic parameters: define viewpoint and viewing direction
Basic Vector and Matrix Operations, Rotation
C Camera M Models d l R Revisited i it d
Linear Version of the Projection Transformation Equation
Lecture Outline
Perspective Camera Model Weak Perspective Camera Model Weak-Perspective Affine Camera Model Camera Model for Planes
Summary
3D Computer Vision
and Video Computing
Camera Geometric Models
Lecture Assumptions
Knowledge about 2D and 3D geometric transformations Linear algebra (vector, matrix) This lecture is only about geometry
Goal Build up relation between 2D images and 3D scenes -3D 3D G Graphics hi ((rendering): d i ) ffrom 3D tto 2D -3D Vision (stereo and motion): from 2D to 3D -Calibration: Determning the parameters for mapping
3D Computer Vision
Image Formation
and Video Computing
Light (Energy) Source
Surface Imaging Plane
Pinhole Lens
World
Optics
Sensor
Signal
B&W Film
Silver Density
Color Film
Silver density in three color layers
TV Camera
Electrical
3D Computer Vision
Image Formation
and Video Computing
Light (Energy) Source
Surface Imaging Plane
Camera: Spec & Pose
3D Scene Pinhole Lens
World
Optics
Sensor
Signal
2D Image
3D Computer Vision
Pinhole Camera Model
and Video Computing
Image Plane
Optical Axis f Pinhole lens
Pin-hole is the basis for most graphics and vision Derived from physical construction of early cameras Mathematics is very straightforward 3D World p projected j to 2D Image g Image inverted, size reduced Image is a 2D plane: No direct depth information Perspective projection f called the focal length of the lens given image size, change f will change FOV and figure sizes
3D Computer Vision
Focal Length, FOV
and Video Computing
Consider case with object on the optical axis: Image g plane p
f
z viewpoint
Optical axis: the direction of imaging Image plane: a plane perpendicular to the optical axis Center of Projection (pinhole), focal point, viewpoint, nodal point Focal length: distance from focal point to the image plane FOV : Field of View – viewing angles in horizontal and vertical directions
3D Computer Vision
Focal Length, FOV
and Video Computing
Consider case with object on the optical axis: Image plane
z
f
Out of view
Optical axis: the direction of imaging Image plane: a plane perpendicular to the optical axis Center of Projection (pinhole), focal point, viewpoint, , nodal point Focal length: distance from focal point to the image plane FOV : Field of View – viewing angles in horizontal and vertical directions
Increasing f will enlarge figures, but decrease FOV
3D Computer Vision
Equivalent Geometry
and Video Computing
Consider case with object on the optical axis: f
z
More convenient with upright image:
z
f Projection plane z = f
Equivalent mathematically
3D Computer Vision
Perspective Projection
and Video Computing
Compute the image coordinates of p in terms of the world (camera) coordinates of P P. y Y
p(x,, y) p(
x
P(X Y Z ) P(X,Y,Z
X 0
Z Z=f
Origin of camera at center of projection Z axis along optical axis Image Plane at Z = f; x // X and y//Y
X x f Z Y y f Z
3D Computer Vision
Reverse Projection
and Video Computing
Given a center of projection and image coordinates of a point, it is not possible to recover the 3D depth of the point from a single image. P(X,Y,Z) can be anywhere along this line
p(x,y) p( ,y)
All points on this line have image coordinates (x,y). In general, at least two images of the same point taken from two different locations are required to recover depth.
3D Computer Vision
and Video Computing
Pinhole camera image
Amsterdam : what do you see in this picture? straight g
line
size parallelism/angle p g shape shape
of planes
depth
Photo by Robert Kosara,
[email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image Amsterdam
straight g line size parallelism/angle p g shape shape
of planes
depth
Photo by Robert Kosara,
[email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image Amsterdam
straight g line size parallelism/angle p g shape shape
of planes
depth
Photo by Robert Kosara,
[email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image Amsterdam
straight g line size parallelism/angle p g shape shape
of planes
depth
Photo by Robert Kosara,
[email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image Amsterdam
straight g line size parallelism/angle p g shape shape
of planes
depth
Photo by Robert Kosara,
[email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image Amsterdam
straight g line size parallelism/angle p g shape shape
of planes
parallel to image
depth
Photo by Robert Kosara,
[email protected] http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image Amsterdam: what do you see?
straight g line size parallelism/angle p g shape shape
of planes
parallel to image
Depth
?
stereo
- We see spatial shapes rather than individual pixels - Knowledge: top top-down down vision belongs to human - Stereo &Motion most successful in 3D CV & application - You can see it but you don't know how…
motion size structure
…
3D Computer Vision
and Video Computing Yet Yet
other pinhole camera images
Rabbit or Man?
Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches Fine Art Center University Gallery, Sep 15 – Oct 26
3D Computer Vision
and Video Computing Yet Yet
other pinhole camera images
2D projections are not the “same” as the real object as we usually see everyday!
Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches Fine Art Center University Gallery, Sep 15 – Oct 26
3D Computer Vision
and Video Computing
It’s real!
3D Computer Vision
and Video Computing
Weak Perspective Projection
Average depth Z is much larger than the relative distance between any two scene points measured along the optical axis y Y
p(x, y)
x
P(X,Y,Z )
X 0
Z Z=f
A sequence of two transformations
Orthographic projection : parallel rays Isotropic scaling : f/Z
Linear Model
Preserve angles and shapes
X x f Z Y y f Z
3D Computer Vision
Camera Parameters
and Video Computing
Pose / Camera
xim Image frame
(xim,yim)
yim
y
x
p
Coordinate Systems
Frame F Grabber
O
Frame coordinates (xim, yim) pixels Image coordinates (x,y) in mm Camera coordinates ((X,Y,Z) , , ) World coordinates (Xw,Yw,Zw)
Camera Parameters
Object / World o d
Zw
P Pw Xw
Yw
Intrinsic Parameters (of the camera and the frame grabber): link the frame coordinates of an image point with its corresponding camera coordinates Extrinsic parameters: define the location and orientation of the camera coordinate system with respect to the world coordinate system
3D Computer Vision
and Video Computing
y
(0,0)
x p (x,y,f) O
Image center Directions of axes Pixel size
ox xim
o y
yim
Pixel (xim,yim)
x ( xim i ox ) s x y ( yim o y ) s y
From 3D to 2D
Size: (sx,sy)
From image to frame
Intrinsic Parameters (I)
Perspective projection
Intrinsic Parameters
( x ,,oy) : image (o g center ((in p pixels)) (sx ,sy) : effective size of the pixel (in mm) f: focal length
X x f Z Y y f Z
3D Computer Vision
and Video Computing (x, y)
Intrinsic Parameters (II) k1 , k2
(xd, yd)
Lens Distortions
Modeled as simple radial distortions
r2 = xd2+yd2 x xd (1 k1r 2 k 2 r 4 ) (xd , yd) distorted points 2 4 y y ( 1 k r k r 1 2 ) d k1 , k2: distortion coefficients A model with k2 =0 is still accurate for a CCD sensor of 500x500 with ~5 pixels distortion on the outer boundary
3D Computer Vision
Extrinsic Parameters
and Video Computing
xim (xim,yim) yim
O
y
x
p
From World to Camera Zw
P R Pw T
Extrinsic Parameters
P
Pw Xw
T
Yw
A 3-D translation vector, T, describing the relative locations of the origins of the two coordinate systems (what’s it?) A 3x3 rotation matrix, R, an orthogonal matrix that brings the corresponding axes of the two systems onto each other
3D Computer Vision
and Video Computing Linear
A point as a 2D/ 3D vector p x ( x, y)T
Algebra: Vector and Matrix
Image point: 2D vector Scene point: 3D vector Translation: 3D vector
y
T: Transpose
P ( X , Y , Z )T
T (Tx , T y , Tz )T
Vector Operations
Addition:
Translation of a 3D vector
Dot product ( a scalar):
P Pw T ( X w Tx , Yw T y , Z w Tz )T
a.b = |a||b|cos
c a b aT b
Cross product (a vector)
Generates a new vector that is orthogonal to both of them
c ab
a x b = (a2b3 - a3b2)i + (a3b1 - a1b3)j + (a1b2 - a2b1)k
3D Computer Vision
and Video Computing Linear
Rotation: 3x3 matrix
Orthogonal :
R 1 RT , i.e. RRT RT R I
Algebra: Vector and Matrix r11 r12 R rij r21 r22 33 r31 r32
r13 R1T T r23 R 2 r33 RT3
9 elements => 3+3 constraints (orthogonal/cross ) => 2+2 constraints (unit vectors) => 3 DOF ? (degrees of freedom, orthogonal/dot)
How to generate R from three angles? (next few slides)
Matrix Operations
R Pw +T= ? - Points in the World are projected on three new axes (of the camera system) and translated to a new origin
r11 X w r12Yw r13 Z w Tx R1T Pw Tx T P RPw T r21 X w r22Yw r23Z w T y R 2 Pw T y T R P T r X r Y r Z T z 31 w 32 w 33 w z 3 w
3D Computer Vision
and Video Computing
Rotation: from Angles to Matrix
Rotation around the Axes Result of three consecutive rotations around the coordinate axes
O Zw
R R R R
N t Notes:
Only three rotations Every time around one axis B i corresponding Bring di axes tto each h other th
Xw = X, Yw = Y, Zw = Z
First step (e.g.) Bring Xw to X
Xw
Yw
3D Computer Vision
and Video Computing
Rotation: from Angles to Matrix
cos R sin 0
0 0 1
Zw
O Yw
Rotation around the Zw Axis
sin cos 0
Rotate in XwOYw plane Goal: Bring Xw to X But X is not in XwOYw
Xw
YwX X in XwOZw (Yw XwOZw) Yw in YOZ ( X YOZ)
Next time rotation around Yw
3D Computer Vision
and Video Computing
Rotation: from Angles to Matrix
cos R sin 0
0 0 1
Zw
Xw
Rotation around the Zw Axis
sin cos 0
Rotate in XwOYw plane so that YwX X in XwOZw (YwXwOZw) Yw in i YOZ ( XYOZ) X YOZ)
Zw does not change
O
Yw
3D Computer Vision
and Video Computing
cos R 0 sin
Zw
Xw
Rotation around the Yw Axis
0 sin 1 0 0 cos
Rotation: from Angles to Matrix
O
Rotate in XwOZw plane so that Xw = X Zw in YOZ (& Yw in YOZ)
Yw does not change
Yw
3D Computer Vision
and Video Computing
cos R 0 sin
0 sin 1 0 0 cos
Rotation around the Yw Axis
Rotation: from Angles to Matrix
O
Rotate in XwOZw plane so that Xw = X Zw in YOZ (& Yw in YOZ)
Yw does not change
Yw
Xw
Zw
3D Computer Vision
and Video Computing
0 1 R 0 cos 0 sin
0 sin cos
Rotation around the Xw(X) Axis
Rotation: from Angles to Matrix
Rotate in YwOZw plane so that Yw = Y, Zw = Z (& Xw = X)
Xw does not change
O
Yw
Xw
Zw
3D Computer Vision
and Video Computing
0 1 R 0 cos 0 sin
Rotation: from Angles to Matrix
0 sin cos
Yw O
Xw
Zw
Rotation around the Xw(X) Axis
Rotate in YwOZw plane so that Yw = Y, Zw = Z (& Xw = X)
Xw does not change
3D Computer Vision
and Video Computing
Rotation: from Angles to Matrix
A Appendix di A.9 A 9 off th the textbook t tb k
Rotation around the Axes
Result of three consecutive rotations around the coordinate axes
O
R R R R
Zw
Notes:
Rotation directions The order of multiplications matters: Same R, 6 different sets of R Non-linear function of R is orthogonal It’s easy to compute angles from R
cos cos R sin sin cos cos sin cos sin cos sin sin
cos sin sin sin sin cos cos cos sin sin sin cos
Xw
Yw
sin sin cos cos cos
3D Computer Vision
and Video Computing
Rotation- Axis and Angle
Appendix A.9 of the textbook
According to Euler’s Euler s Theorem, Theorem any 3D rotation can be described by a rotating angle, , around an axis defined by an unit vector n = [n1, n2, n3]T. Three degrees of freedom – why?
n12 R I cos n2 n1 n3 n1
n1n2 n22 n3 n2
n1n3 0 n2 n3 (1 cos ) n3 n2 n32
n3 0 n1
n2 n1 sin 0
3D Computer Vision
Linear Version and Video Computing
World to Camera
Camera: P = (X C (X,Y,Z) Y Z)T Image: p = (x,y)T Not linear equations
Image to Frame
r11 X w r12Yw r13 Z w Tx R1T Pw Tx P RPw T r21 X w r22Yw r23 Z w T y RT2 Pw T y T r X r Y r Z T 31 w 32 w 33 w z R 3 Pw Tz
Camera to Image
Camera: P = (X,Y,Z)T World: Pw = (Xw,Yw,Zw)T Transform: R, T
of Perspective Projection
Neglecting distortion Frame (xim, yim)T
World to Frame
(Xw,Yw,Zw)T -> (xim, yim)T Effective focal lengths
fx = f/sx, fy=f/sy Three are not independent
( x, y ) ( f
X Y , f ) Z Z
x ( xim o x ) s x y ( yim o y ) s y r X r Y r Z T xim ox f x 11 w 12 w 13 w x r31 X w r32Yw r33Z w Tz yim o y f y
r21 X w r22Yw r23Z w T y r31 X w r32Yw r33Z w Tz
3D Computer Vision
and Video Computing
Projective Space
Add fourth coordinate
Only extrinsic parameters World to camera
3x3 Matrix Mint
X x1 w x2 M int M ext Yw Z x 3 w 1
xim x1 / x3 yim x2 / x3
x1/x3 =xim, x2/x3 =yim
3 4M 3x4 Matrix t i Mext
Pw =
(Xw,Yw,Zw, 1)T
Define (x1,x2,x3)T such that
Linear Matrix Equation of perspective ti projection j ti
Only intrinsic parameters Camera to frame
r11 r12 M ext r21 r22 r31 r32 f x M int 0 0
0 fy 0
r13 Tx R1T r23 T y RT2 r33 Tz RT3
Tx Ty Tz
ox o y 1
Simple Matrix Product! Projective Matrix M= MintMext
(Xw,Yw,Zw)T -> (xim, yim)T Linear Transform from projective space to projective plane M defined up to a scale factor – 11 independent entries
3D Computer Vision
and Video Computing
Perspective Camera Model
fr11 fr12 Making some assumptions M ffr21 ffr22 Known center: Ox = Oy = 0 r31 r32 Square pixel: Sx = Sy = 1 11 independent entries 7 parameters
fr13 fTx ffr23 fT f y r33 Tz
Weak Perspective Camera Model Weak-Perspective
Average Distance Z >> Range Z Define centroid vector Pw Z Z RT 3 Pw Tz
Three Camera Models
8 independent entries
Affine Camera Model
Mathematical Generalization of Weak Weak-Pers Pers Doesn’t correspond to physical camera But simple equation and appealing geometry
M wp
Doesn’t p preserve angle g BUT p parallelism
8 independent entries
fr11 fr12 fr21 fr22 0 0
fTx fr13 fr23 fT y 0 RT 3 Pw Tz
a11 a12 M af a21 a22 0 0
a13 b1 a23 b2 0 b3
3D Computer Vision
and Video Computing
Planes are very common in the Man-Made World nx X w n yYw n z Z w d
One more constraint for all points: Zw is a function of Xw and Yw Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> > 2D point
Projective Model of a Plane
n T Pw d
Special case: Ground Plane
Camera Models for a Plane
8 independent entries
General Form ?
8 independent entries
x1 fr11 fr12 x2 fr21 fr22 x r r32 3 31
Xw fr13 fTx Yw fr23 fT y Z 0 r33 Tz w 1
3D Computer Vision
and Video Computing
Camera Models for a Plane
A Plane in the World nx X w n yYw n z Z w d
Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> > 2D point
x1 fr11 fr12 x2 fr21 fr22 x r r32 3 31
Projective Model of Zw=0
One more constraint for all points: Zw is a function of Xw and Yw
Special case: Ground Plane
n T Pw d
8 independent entries
General Form ?
Xw fr13 fTx Yw fr23 fT y Z 0 r33 Tz w 1
8 independent entries
f fr f 12 x1 fr 11 x2 fr21 fr22 x r r32 3 31
fT f x X w fT23 Yw Tz 1
3D Computer Vision
and Video Computing
Camera Models for a Plane
A Plane in the World nx X w n yYw n z Z w d
x1 fr11 fr12 x2 fr21 fr22 x r r32 3 31
Zw=0 Pw =(Xw, Yw,0, 1)T 3D point -> > 2D point
Projective Model of Zw=0
One more constraint for all points: Zw is a function of Xw and Yw
Special case: Ground Plane
n T Pw d
Xw fr13 fTx Yw fr23 fT y Zw r33 Tz 1
8 independent entries
General Form ?
x f (r11 n x r13 ) f (r12 n y r13 ) f (dr13 Tx ) X w 1 x f ( r n r ) f ( r n r ) f ( dr T ) Y nz = 1 2 21 x 23 22 y 23 23 y w (r32 n y r33 ) (dr33 Tz ) 1 Z w d n x X w n yYw x3 (r31 n x r33 )
8 independent entries
2D (xim,yim) -> 3D (Xw, Yw, Zw) ?
3D Computer Vision
and Video Computing
Graphics /Rendering
From 3D world to 2D image
Changing Ch i viewpoints i i t and d directions di ti Changing focal length
Fast rendering algorithms
X x1 w x2 M int M ext Yw Z x 3 w 1
Vision / Reconstruction
From 2D image to 3D model
Applications and Issues
Inverse problem Much harder / unsolved
Robust algorithms for matching and parameter estimation Need to estimate camera parameters first
Calibration
xim x1 / x3 y x / x im 2 3
Find Fi d iintrinsic t i i & extrinsic t i i parameters t Given image-world point pairs Probably a partially solved problem ? 11 independent entries
10 parameters: fx, fy, ox, oy, , Tx,Ty,Tz
f x M int 0 0
0 fy 0
r11 r12 M ext r21 r22 r31 r32
ox o y 1
r13 Tx r23 T y r33 Tz
3D Computer Vision
and Video Computing
Geometric Projection of a Camera
Pinhole camera model Perspective projection Weak-Perspective Projection
Camera Parameters (10 or 11)
Camera Model Summary
Intrinsic Parameters: f, ox,oy, sx,sy,k1: 4 or 5 independent parameters Extrinsic parameters: R, T – 6 DOF (degrees of freedom)
Linear Equations of Camera Models (without distortion)
General Projection Transformation Equation : 11 parameters P Perspective ti C Camera M Model: d l 11 parameters t Weak-Perspective Camera Model: 8 parameters Affine Camera Model: generalization of weak-perspective: 8 P j ti ttransformation Projective f ti off planes: l 8 parameters t
3D Computer Vision
Next
and Video Computing
Determining the value of the extrinsic and intrinsic parameters of a camera
Calibration (Ch. 6) X x1 w x2 M int M ext Yw Z x 3 w 1
f x M int 0 0
0 fy 0
r11 r12 M ext r21 r22 r31 r32
ox o y 1
r13 Tx r23 T y r33 Tz