3D Camera Calibration Nichola Abdo André Borgeat Department of Computer Science, University of Freiburg
Abstract Time-of-flight (TOF) cameras based on the Photomixing Detector (PMD) technology are capable of measuring distances to objects at high frame rates, making them valuable for many applications in robotics and computer vision. The distance measurements they provide, however, are affected by many factors, systematic and otherwise, incurring the need for calibrating them. This paper presents a technique for calibrating PMD cameras, in which we extend a recently-proposed calibration technique [Fuchs and May, 2007] to include intensity-related errors in depth measurement. Our model accounts for three major sources of error: the circular error resulting from the signal modulation process, the signal propagation delay, and the intensity-related error. Our experimental results confirm the advantage of considering the intensityrelated factors in the calibration process.
The acquisition of 3D information about the world is of crucial importance to many fields, ranging from industrial processes to computer vision and robotics. This typically requires the determination of the distances of objects in the environment to the sensor taking the measurements, and is usually carried out using laser scanners or stereo-vision cameras. While those techniques provide precise measurements with high resolution, they suffer from several drawback. Systems involving laser scanners, for example, require some mechanism for sequentially scanning the environment with a laser beam, and are therefore relatively expensive and timeconsuming. Additionally, systems utilizing stereo cameras must analyze the scene as viewed in different images to obtain depth measurements, a process which is computationallydemanding and that is not immune to errors resulting from regions of homogeneous intensity and color [Ringbeck and Hagebeuker, 2007]. On the other hand, time-of-flight devices based on PMD technology are becoming increasingly popular for 3D imaging applications. Those devices are compact, relatively cheap, and capable of obtaining depth information about the world with higher frame rates than when using the techniques
mentioned above [Ringbeck and Hagebeuker, 2007]. This stems from the fact that all pixels in a PMD camera compute the depth measurement of the corresponding points in space in parallel. Consequently, those devices are more suited for real-time applications. However, the performance of PMD devices depends on many factors, including light intensity, distances of objects, and their reflectivity [Fuchs and May, 2007]. This gives rise to the need for calibrating the distance measurements obtained by those devices, which requires an investigation into the dependencies of these measurements on the different relevant factors and sources of systematic error. This paper presents our attempt at calibrating such a PMD camera, through which we account for systematic errors caused by signal modulation, signal-propagation delays, and light intensity. In the following section, we give a brief overview of the most important and relevant work in this field. This is followed by a description of the theory behind the operation of PMD devices and the different sources of error affecting their performance. We then present our own error model and the calibration procedure we conducted. Finally, we give the experimental results we achieved and discuss the main conclusions of the work.
There exist a number of approaches for calibrating the depth measurements of TOF cameras. Lindner and Kolb provided a B-splines approximation for the error [Lindner and Kolb, 2006]. In a subsequent work, they used a similar B-splines approximation to account for the intensity factors in the error in [Lindner and Kolb, 2007]. On the other hand, Kahlmann et al. presented a technique based on look-up tables [Kahlmann et al., 2006]. Those works involved calibrating the camera against a known distance taken as the ground truth. The cameras, however, were manually fixed in place in the experimental settings and moved to different distances from the objects being sensed. This could lead to erroneous results when assuming an accurate ground truth distance, and does not take into account the many robotics and industrial applications in which the camera is fixed to a robotic arm for example. Instead, Fuchs and May proposed a different model that additionally estimates the transformation between the camera and the tool center point (TCP) (the end-effector of a
robotic arm in that case) [Fuchs and May, 2007]. Their model accounts for the circular error (induced by the modulation process) and the signal propagation delay, but not for the intensity-related error. In this paper, we build on the model proposed in [Fuchs and May, 2007] and extend it in two main ways. Firstly, we adjust the error term related to the signal propagation delay to more accurately describe the error in terms of the pixel location in the PMD array. And secondly, we introduce an intensity-related factor in the error model, since the error in the depth measurements is also dependent on the intensity of the light signal received by the camera.
Principle of Operation of PMD Cameras
PMD cameras operate on the concept of time of flight, and are therefore capable of providing distance information about the objects they are sensing. Typically, a PMD camera consists of a PMD chip and its peripheral electronics, an illumination source, receiver optics, and a camera control system including digital interfaces and software. The illumination source emits infrared light onto the scene, and the reflected light is received by the camera and used to measure the distances to the objects. In contrast to typical TOF devices, however, all pixels in the PMD’s smart pixel array simultaneously analyze the received optical signal to calculate the depth measurement of the corresponding point in space. This eliminates the need for scanning a single light beam over the environment to obtain 3D information [Ringbeck and Hagebeuker, 2007]. The PMD chip is based on CMOS-processes (complementary metal–oxide–semiconductor), which also provide an automatic suppression of background light, allowing the device to be used outdoors as well as indoors [Lindner and Kolb, 2006]. Furthermore, the number of the pixels in the array naturally defines the lateral resolution of the device. Typical resolutions include 48×64 and 160×120 pixel at 20 Hz. The reader is referred to [Prasad et al., 2006] (as cited by [Lindner and Kolb, 2006]) for an approach combining PMD cameras with 2D cameras to overcome the resolution limitations of PMD devices. To calculate the distance measurement, each pixel in the array carries out a demodulation process. In a PMD camera, a reference electrical signal is applied to the modulation gates of each pixel. Additionally, the incident light on the photogates of the pixels generates a second electrical signal. If the reference voltage and light source are initially modulated using the same signal, then the received optical signal would differ from the reference signal by a phase shift, which is proportional to the distance of the reflecting object in space [Ringbeck and Hagebeuker, 2007]. For a given reference signal, g(t), and the optical signal, s(t), received by the pixel, the correlation function, c(τ ), for a given phase shift, τ , is calculated as follows: Z
c(τ ) = s ⊗ g = lim
s(t) · g(t + τ )dt.
For a sinusoidal signal, g(t) = cos(ωt), the incident light and resulting correlation function are s(t) = k + a cos(ωt +
φ) and c(τ ) = h + a2 cos(ωτ + φ) respectively, where a is the amplitude of the incident optical signal, h is the offset of the correlation function (and represents the gray-scale value of each pixel [Ringbeck and Hagebeuker, 2007]), ω is the modulation frequency, and φ is the phase offset proportional to the distance [Lindner and Kolb, 2007]. By sampling four signals (A0 −A3 ) at π2 intervals from the correlation function, those values are calculated as: A3 − A1 φ = arctan( ), (2a) A0 − A2 p (A3 − A1 )2 + (A0 − A2 )2 a= , (2b) 2 A0 + A1 + A2 + A3 . (2c) h= 4 Finally, the distance, d, to the target can be calculated from the phase shift, φ, as [Lindner and Kolb, 2007]: cφ , (3) 4πω where c is the speed of light. Figure (1) below illustrates the correlation function and the four samples used to calculate the distance measurement as described above. d=
Figure 1: Sampling from the correlation function to compute the phase shift φ. We add that the chosen modulation frequency, ω, determines the distance unambiguousness [Lindner and Kolb, 2007]. For example, a modulation frequency of 20 MHz results in an unambiguous distance range of 7.5m, as can be verified from the equation governing the wavelength, λ, the speed of light, and the frequency: λ = c/ω, and noting that the distance range = λ/2 as this distance has to be traveled twice by the light.
Error Sources in TOF Depth Measurements
First of all, one has to note that TOF Cameras, like regular gray-scaled cameras, are defined by the pinhole camera model. Therefore, their images are corrupted by lens distortion effects, focal length, and shifting of the optical center. Those effects are usually handled by the lateral (2D) calibration of the camera.
Additionally, the depth measurements of TOF cameras themselves are corrupted by numerous error sources (see [Lange, 2000] for an exhaustive review). First of all, Lindner and Kolb  observe a periodic error related to the measured distance. The error has a wave length of approximately 2m. Lindner and Kolb account this error to the fact, that the calculation of the distance assumes a perfectly sinusoidal light source, which in practice is not given. A second source of error is the time it takes the sensor in the array to propagate the signal to the processing unit. This error depends on the relative position of the sensor within the array (i.e. the pixel in the image) [Fuchs and May, 2007]. Furthermore, and since the distance calculation depends on the amount of reflected light, the intensity of the optical signal (i.e. the brightness) affects the distance measurements. On the one hand, a low intensity leads to a bad signal to noise ratio, corrupting the measurement randomly. On the other hand, different sources [Lindner and Kolb, 2007; Guðmundsson et al., 2007; Radmer et al., 2008] report an additional systematic error, which can be incorporated into the calibration. Another error arises from the shutter time of the camera, i.e. the time over which the camera integrates the image. Longer integration times tend to shift the image towards the camera [Kahlmann et al., 2006; Lindner and Kolb, 2007]. Kahlmann et al.  also report that the internal temperature of the camera, as well as the external temperature, influence the depth measurements. During the first few minutes, while the camera warms up, the measured distance increases. But even after the temperature has more or less stabilized, Kahlmann et al. report a small deviation that, according to them, is due to a cool down which occurs in between taking the individual images. They also determined that increasing the room temperature results in a drift away from the camera at around 8mm/◦ C. Finally, in [Guðmundsson et al., 2007], the effects of multiple reflections on the distance measurements are discussed.
Error Model and Calibration
Let Dvi be the distance measurement of a pixel v = (r, c) at row r and column c in the ith image, Evi the error in this distance measurement. Let, furthermore A : R × R2 → R3 be the projection of a given pixel with a given distance to the cartesian coordinate system, including the correction of the focal length, the shifting of the optical center and the lens distortion. In our setup, the camera was attached at the end of a robot arm with multiple joints (see Figure 2). The endeffector pose w Tit is given by the robot control and assumed to be true. Additionally we assume an unknown transformation t Ts , the sensor-to-tool-center-point transformation, between the sensor coordinate system and the end-effector coordinate system. Using these definitions, the world coordinate xiv corresponding to a pixel v in image i is given by xiv = w Tit t Ts A Dvi − Evi , v . (4)
Figure 2: The PMD camera fixed to the robot’s end-effector (courtesy of Barbara Frank). The arm was moved to different poses to take images from different views of the wall and checkerboard.
As discussed in section 4, the depth error Evi consists of different factors. In this work, we try to account for the distancerelated error, D, the pixel-related error, P, and the intensityrelated error, I. The sum of these three individual errors will be used as our error model: Evi = D + P + I.
Distance-Related Error Since this work focuses on distances below 2m, we decided to ignore the periodicity of the circular error and not use a sinusoidal-base function. Instead, we follow the approach of [Fuchs and May, 2007] and model this error as a third order polynomial: 2 3 D Dvi = c0 + c1 dDvi + c2 Dvi + c3 Dvi . (6) Pixel-Related Error As previously mentioned, Fuchs & May  state, that the pixel related error stems from the propagation delay within the CMOS gates. They model this error using a linear function of the row and column of the pixel as follows: P1 (r, c) = p0 + p1 r + p2 c. This assumes that the pixel-related error has its minimum at one corner of the image (depending on the signs of p1 and p2 ), and its maximum at the diagonally-opposite corner. However, an inspection of the plots of the error against the row and the column (see Figure 3) reveals that our data does not seem to confirm this assumption. The error does not seem to have its maximum exactly at a corner of the image, nor does a linear model seem to fit. One thing to note though is that the individual errors are highly
With the distance, di , and the position of the robot relative to the wall, given by the unit normal vector of the plane, ni (measured with a laser range finder), it is clear that using the xiv from equation (4), the equation (ni )T xiv + di = 0
(a) Error against row
(b) Error against column
Figure 3: Plot of the error against (a) the row of the pixel and (b) the column of the pixel. Errors are given in centimeters, averaged over all available pixels, thus the actual value is largely uninformative. The plot are only used as an indicator for the general trend.
must hold true for all pixels in all images, if we know a perfect model of the error. The task of the calibration is, therefore, to find a parametrization a? of Evi and the unknown sensor-totool-center-point-transformation, that minimizes the sum of the squared errors over all available pixels: XX 2 a? = argmin (ni )T xiv + di . (10) a∈Rn
P2 (r, c) = p0 + p1 (r − r0 )2 + p2 (c − c0 )2 ,
to allow for more accurate determination of the location (in terms of row and column number) where this error is minimal. Intensity-Related Error As already mentioned in the previous section, the error in the measured distance is also related to the intensity of the pixel. Pixels with lower reflectivity (i.e. darker pixels) tend to drift closer to the camera (this was also observed in [Lindner and Kolb, 2007]). Unlike Lindner and Kolb, where B-Splines are used to model the intensity related error, we decided to use a polynomial function, since it seemed to fit better in the general model of Fuchs and May. In order to keep the number of parameters small we only used a second-order polynomial, so we get 2 I1 Ivi = i0 + i1 Ivi + i2 Ivi ,
as error term, where Ivi is the intensity reading reported by the camera for pixel v in the ith image. Additionally, to account for the fact that the measured intensity is not only related to the reflectivity, but also to the distance, we used a distance-normalized intensity, Nvi , given by i 2
Nvi = in Ivi Dv
We implemented this optimization procedure using the Levenberg-Marquardt algorithm, an algorithm for non-linear least-square estimation.
6 correlated. The pixels on the outside of the image, for example, are usually much darker and therefore are affected differently by any intensity related effects than those in the center. Nonetheless we decided to model the pixel related error as it appears to be and utilized the following term
Image Sets In the experiments, we used three different sets of images taken of a plane, white wall. The first set contains 62 images of a checkerboard pattern hung on the wall, the second consists of 30 images of the white wall, whereas the third contains 20 images of the plane wall and 22 checkerboard images. The images were taken using a PMD-[vision] O3 camera manufactured by PMD Technologies, which is attached to the end-effector of a robotic arm and has a resolution of 50×64 pixels. Moreover, each image was taken from a different position relative to the wall. Since the three image sets were taken on three different occasions with different operating conditions (ambient light, room temperature, temperature of the camera, . . . ) we decided not to mix the sets up, but to treat them individually. Lateral Correction The calibration for the focal length, the optical center, and the lens distortion is a well-explored topic in computer vision. The calibration can be done in a distinct preprocessing step using the intensity values from the images, since it does not depend on the distance readings. In our experiments, we used camera parameters calculated using the OpenCV library1 . Error Models We tested three different approaches: EA : the error model as described above using the plain intensities EA = D + P2 + I1 EB : the same model, using the normalized intensities instead of the plain intensities EB = D + P2 + I2 E0 : the error model from [Fuchs and May, 2007] with the linear pixel related error term (as discussed above) and no intensity error as a baseline
analogous to the measured intensity, yielding
E0 = D + P1 I2 Ivi , Dvi
2 = i0 + i3 Nvi + i4 Nvi .
Experiments In a first series of experiments, each data set was randomly split into two halves, one training set used during the calibration and one test set used during the evaluation. Table 1 shows the absolute error for each error model, averaged over all the pixels in the test set. All three methods decreased the error significantly. In particular, all three methods were able to align the image with the plane (see Figure 4), i.e. they found a suitable sensor-to-tcp transformation. The two methods presented in this paper both outperformed the baseline method on all three image sets, although the margin on the second image set is by far smaller than on the others. This is due to the fact that the second image set only contains white images, which have lesser variation in the intensities and therefore the intensity related error is not as decisive. Table 1: Average per pixel error before and after correction by the three error models on the three image sets in millimeters Image Set 1 Image Set 2 Image Set 3
0 28.30 55.29 26.12
(a) Before Correction
E0 11.33 7.38 21.86
EA 8.11 6.36 16.39
EB 8.05 7.00 17.23
(b) After Correction
Figure 4: Projection of one image into 3D, (a) before correction and (b) after correction. Note how in (a) the plane is translated and tilted with respect to the expected plane, while in (b) the expected and the actual plane are more or less aligned. This is most notably the effect of the sensor-totool-center-point transformation. Since the difference between the model using the measured intensities and model using the normalized intensities were too small for any qualified statement, we ran a second series of experiments. The second series of experiments was done using 6-fold cross validation, to get rid of the randomness coming from splitting the image sets. Due to time constraints, the second experiment was only done using the third data set, which we believe to be the most representative for the error sources, the presented error models try to account for. Table 2 shows the results of the second experiment. Although the results suggest that the model using the measured intensities performs slightly better than the model with the normalized intensities, no statistically significant difference between the two models can be claimed. Effects of the intensity related error term The results clearly show that using the intensities as additional error
Table 2: Average per pixel error (in millimeter) for the different error models for each fold Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Mean 2.57SE
0 19.79 21.57 36.99 24.41 32.66 23.83 26.54 ±6.48
E0 15.71 12.54 23.52 14.64 13.66 10.78 15.14 ±4.26
EA 10.16 7.92 19.05 9.76 11.58 6.96 10.90 ±4.13
EB 9.76 10.58 18.55 9.83 12.53 8.85 11.68 ±3.44
terms is beneficial. Both of the here-presented models are significantly better compared to the baseline model (using a t-distribution with 5 degrees of freedom and a confidence level of 95%). The effect of the correction using the intensity is shown in Figure 5. One can see that the effect of the checkerboard pattern is almost gone. Another effect we observed using the baseline model is that the surface of the image is always concave. This is probably due to the fact that the intensities in our image sets tend to be much higher in the center of the image. This effect was also largely reduced by the intensity error term. One more interesting thing to note is that the error tends to be higher on the edges of the checkerboard squares. This could be explained by the low resolution of the camera. The pixels on the edges are probably a mixture between the darker and the brighter areas.
Other Observations In our experiments we also noticed effects we think we can attribute to the temperature of the camera. Figure 6 shows the error in the distance measurement of the first three images of image set two. The three images were taken consecutively from almost the same position. Since we didn’t notice such a large difference (roughly 15mm from the first to the third) in other images with such similar positions, we believe we can assume that this drift is related to the warm up phase of the camera.
Figure 6: Plots of the absolute error in the measured distance of three consecutive images to illustrate the drift of the image during the warm up phase of the camera We could not confirm this assumption, since our data does contain neither information about the temperature of the camera nor of the room.
(a) Baseline Method
(b) Plain intensities
(c) Normalized intensities
Figure 5: Projection of a checkerboard image into 3D (a) without, (b) with plain and (c) with normalized intensity correction. Note how in (a) the checkerboard pattern is affecting the distance measurement and how this is accounted for in (b) and (c). Also note that, since the intensity is in general higher in the center of the image, (a) has a bowl-like shape. This effect is also reduced in (b) and (c).
In this paper we presented a method for calibrating TOF cameras. The presented method extends the error model published by Fuchs & May  by an additional error term which accounts for the intensity related shift in the measured distances. Our experiments showed that, by introducing our model of the distance-intensity dependency, the error in the distance measurement can be significantly decreased. Further work should be made to find more suitable functions for the individual error terms, as well as trying to find error models for other systematic errors mentioned in this and other papers, such as the temperature of the camera and the environment. A problem we see in this approach is that the number of parameters of the error model is already big enough to make it susceptible to over-fitting. We fear that a further enhancement of the model could increase this problem. Finally, and to better-assess the viability of the presented method, more work should be invested into comparing this work to other approaches which incorporate the intensity as an error source, such as the work done by Lindner & Kolb .
Acknowledgements This paper was written for the final project of the Mobile Robotics 2 course, held in the 2009/10 winter term at the Albert-Ludwigs Universität Freiburg, lectured by Prof. Dr. Wolfram Burgard, PD Dr. Cyrill Stachniss, Dr. Giorgio Grisetti and Dr. Kai Arras. The project was supervised by Barbara Frank, who we would like to thank for her assistance and for providing us with the image sets and data necessary for this work.
References [Fuchs and May, 2007] Stefan Fuchs and Stefan May. Calibration and registration for precise surface reconstruction with TOF cameras. In Proceedings of the ADGM Dyn3D Workshop, Heidelberg, Germany, 2007.
[Guðmundsson et al., 2007] Sigurjón Árni Guðmundsson, Henrik Aanæs, and Rasmus Larsen. Environmental effects on measurement uncertainties of time-of-flight cameras. In International Symposium on Signals Circuits and Systems - ISSCS, 2007. [Kahlmann et al., 2006] T. Kahlmann, F. Remondino, and H. Ingensand. Calibration for increased accuracy of the range imaging camera swissranger™ . In ISPRS Commission V Symposium ’Image Engineering and Vision Metrology’, 2006. [Lange, 2000] Robert Lange. 3D Time-of-Flight Distance Measurement with Custom Solid-State Image Sensors in CMOS/CCD-Technology. PhD thesis, Department of Electrical Engineering and Computer Science at Uuniversity of Siegen, 2000. [Lindner and Kolb, 2006] Marvin Lindner and Andreas Kolb. Lateral and depth calibration of PMD-distance sensors. In Advances in Visual Computing, volume 2, pages 524–533. Springer, 2006. [Lindner and Kolb, 2007] Marvin Lindner and Andreas Kolb. Calibration of the intensity-related distance error of the PMD TOF-camera. In Intelligent Robots and Computer Vision XXV: Algorithms, Techniques, and Active Vision, 2007. [Prasad et al., 2006] T.D.Arun Prasad, Klaus Hartmann, Wolfgang Weihs, Seyed Eghbal Ghobadi, and Arnd Sluiter. First steps in enhancing 3D vision technique using 2D/3D sensors. In Computer Vision Winter Workshop, 2006. [Radmer et al., 2008] Jochen Radmer, Pol Moser Fusté, Henning Schmidt, and Jörg Krüger. Incident light related distance error study and calibration of the PMD-range imaging camera. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008. [Ringbeck and Hagebeuker, 2007] Thorsten Ringbeck and Bianca Hagebeuker. A 3D time of flight camera for object detection. In Object 3D Measurement Techniques. ETH Zürich, 2007.