A Sensor-based SLAM Algorithm for Camera Tracking in Virtual Studio

International Journal of Automation and Computing 05(2), April 2008, 152-162 DOI: 10.1007/s11633-008-0152-6 A Sensor-based SLAM Algorithm for Camera...
Author: Cory Alexander
6 downloads 0 Views 1MB Size
International Journal of Automation and Computing

05(2), April 2008, 152-162 DOI: 10.1007/s11633-008-0152-6

A Sensor-based SLAM Algorithm for Camera Tracking in Virtual Studio Po Yang∗

Wenyan Wu

Mansour Moniri

Claude C. Chibelushi

Faculty of Computing, Engineering and Technology, Staffordshire University, Stafford ST16 9DG, UK

Abstract: This paper addresses a sensor-based simultaneous localization and mapping (SLAM) algorithm for camera tracking in a virtual studio environment. The traditional camera tracking methods in virtual studios are vision-based or sensor-based. However, the chroma keying process in virtual studios requires color cues, such as blue background, to segment foreground objects to be inserted into images and videos. Chroma keying limits the application of vision-based tracking methods in virtual studios since the background cannot provide enough feature information. Furthermore, the conventional sensor-based tracking approaches suffer from the jitter, drift or expensive computation due to the characteristics of individual sensor system. Therefore, the SLAM techniques from the mobile robot area are first investigated and adapted to the camera tracking area. Then, a sensor-based SLAM extension algorithm for two dimensional (2D) camera tracking in virtual studio is described. Also, a technique called map adjustment is proposed to increase the accuracy and efficiency of the algorithm. The feasibility and robustness of the algorithm is shown by experiments. The simulation results demonstrate that the sensor-based SLAM algorithm can satisfy the fundamental 2D camera tracking requirement in virtual studio environment. Keywords:

1

Simultaneous localization and mapping (SLAM), particle filter, chroma key, camera tracking.

Introduction

Camera tracking in unprepared scenes is a hot research topic recently since it can be applied into various areas e. g., augmented reality, virtual reality, virtual studio, etc. In a virtual studio environment, it is arduous to obtain a system that is sufficiently accurate, fast, and robust for effective camera tracking and also suitable for chroma keying process in video or movie production. The chroma keying is the process of segmenting objects from images and video using color cues. A blue or green screen placed behind an object during recording is used for special effects and in virtual studios. The blue or green color is replaced by a different background. The existing camera tracking techniques are classified into vision-based tracking methods and sensor-based tracking methods. The vision-based tracking approaches[1, 2] are based on image information; they track the position and rotation of a camera by using the information contained in images, such as fiducially marker or feature points. Though simultaneous localization and mapping (SLAM) techniques have been applied into the pure vision-based tracking method, the high input data rate, the inherent 3D quality of visual data and the difficulty in extracting long-term features to map limit the range of its possible applications of vision-based tracking systems. Sensor-based tracking methods are based on active sensors, which incorporate powered signal emitters and sensors placed in a prepared and calibrated environment (e. g., magnetic[3] , optical[4] , radio, and ultrasound guided). However, most of the active (sensor-emitter) tracking systems directly observe the position or orientation parameters of camera movement, and the results are inaccurate due to the drawback of different sensor systems. Magnetic tracking Manuscript received September 2, 2007; revised January 15, 2008 *Corresponding author. E-mail address: p.yang@staffs.ac.uk

suffers in terms of jitter; optical tracking is computationally expensive and slow. In the last several decades, SLAM techniques have been of great interest for mobile computing and robotic researchers. In the field of mobile computing, by providing the location information of a user, it can be applied to build others applications. In order to know the information about robot s environment, the sensor measurements deliver information about the bearing, distance, appearance, etc. of nearby features in the environment. There have been various mobile computing systems providing in-door localization, using sensors like: ultrasonic, infrared, laser or radio frequency (RF). Some of these systems are commercial and have achieved impressive success, e. g. the active badge system[5] by the AT&T lab in Cambridge University, and the RF transmitter and ultrasonic transmitter system in Bristol University by Randel and Muller[6] . Therefore, for virtual studio environments, either vision tracking methods or sensor tracking methods is subjective to be applied successfully. First, the chroma keying process limits the application of vision-based methods in a virtual studio since the blue screen would not give enough feature information for tracking. Second, the conventional sensor-based tracking approaches suffer from jitter, drift or expensive computation due to the characteristics of individual sensor systems. Therefore, our work is focused on the application of the SLAM methodology from the mobile robot to the camera tracking area, and proposes a novel sensor-based SLAM algorithm to solve the camera tracking problem in a virtual studio environment. The main advantages of this algorithm are that it is simple and robust, without being limited by either the individual character of the sensor system or the chroma keying process. At the same time, it would potentially improve the tolerance of the camera tracking system to erratic motion since it would use the statistic and probabilistic method to predict the cam-

153

P. Yang et al. / A Sensor-based SLAM Algorithm for Camera Tracking in Virtual Studio

era movement by using the distance information instead of observing the position information. In this paper, Section 2 gives a basic literature review of camera tracking in a virtual studio and techniques. Section 3 presents a novel sensor-based SALM algorithm for camera tracking. Section 4 describes the simulation results and analysis. Section 5 draws some conclusions and future work.

2 2.1

the shot s duration. The spatial relationships existing between the two layers are not consistently maintained. The Fig. 1 presents the simplified principle of camera tracking in virtual studios, where VTR means video tape recorder.

Literature review Camera tracking in virtual studio

Virtual studios have long been in use for commercial broadcasting and motion pictures. Most virtual studios are based on “blue screen” technology, and its two dimensional (2D) nature restricts the user from making natural 3D interactions. In general, virtual studio sets require “blue screen” (chroma keying) technology, high-end graphics workstations, camera tracking technology and signal compositing for high realism, and exact mixing results[7] . The 2D nature of virtual studios as developed and used in the current broadcast industry limits its use to situations where the camera angle is fixed and there is minimal user interaction[8] . For the conventional camera tracking system in a virtual studio, the vision-based methods are used widely. Yamanouchi et al.[9] overcame the limitation of the cost and space for the conventional blue-screen setups in their “real space-based virtual studio” system. This system combines the real and virtual space images. The real and virtual images are mixed using the depth information in real time, but their algorithms are limited only to indoor studio sets. The most straightforward method of extracting 3-D information (camera parameters) from a scene is to use multiple camera views and stereo correspondences. Scharstein and Szeliski[10] gave a very good survey and taxonomy of the different algorithms and approaches to the stereocorrespondence problem. Yang et al.[11] introduced a new method for using commodity graphics hardware to achieve real-time 3-D depth estimation by a plane-sweeping approach with multi-resolution color-consistency tests. These vision-based methods are the main approaches for the extraction and track of camera parameters in virtual studios. Besides, chroma keying is a very important issue affecting on camera tracking in virtual studios. Chroma keying is used in video and movie production for replacing the background in special effects and in virtual studio applications, and for hiding objects. It is a staple of video production, providing a good starting point for understanding the historical development of virtual studios. In traditional chroma keying, the subject is shot against a constant background such as a blue curtain or screen. This “blue screen” shot then passes through a chroma keyer, where it is combined with a second shot containing the new background. Conceptually, chroma keyer operation is simple: replace the foreground with the background in those places where the foreground contains a particular color known as the key color. The chroma keyer itself may do so automatically. Despite the chroma key systems sophistication, their operation imposes a fundamental constraint: The foreground camera that cannot move it must be “lock off” for

Fig. 1

Camera tracking in virtual studio

Because of the limitation of the chroma keying process in virtual studios, the vision-based camera tracking methods suffer from the detections of fiducial markers and feature points in blue or green background. Therefore, sensorbased tracking methods are applied into different commercial camera tracking systems in virtual studios, such as electromechanical camera tracking system and infrared camera tracking system. Table 1 shows the comparison of different camera tracking systems in virtual studios. Table 1

Parameter

Comparison of camera tracking systems in virtual studio Electro-

Auxiliary

mechanical

camera

Cost

Expensive

Mid-range

Angular

360 degrees

360 degrees 360 degrees

shooting sector

Chroma keying

Infrared

Vision

Expensive

Very low 150 degrees

(ceiling)

Typically.

200 degrees

Limited by

(target)

field of view.

No

No

No

Good quality

limitation

limitation

limitation

is possible

quality

but requires adjustment.

Calibration Complicated and setup

Relatively

Relatively

complicated complicated

Very simple

procedure Processing time delay

1 frame

1-2 frames

1 frame

1-2 frames

154

2.2

International Journal of Automation and Computing 05(2), April 2008

SLAM techniques

SLAM has been a fundamental topic in robotic and mobile computing communities. In our research work, we will extend the SLAM method from mobile robots to camera tracking. Localization deals with the problem of trying to find the location of the mobile object, given a map and some sensor reading data. Mapping is the process of building and maintaining a model of the surrounding environments. Localization and mapping have been under active development during the past several decades, the first paradigm was called model-based[12] in the 1970 s. In the 1980 s, Brook s[13] behavior-based architecture became more popular. The last paradigm that emerged in the mid 1990 s, which is still under rapid development, is usually termed probabilistic robotics[14] . This method describes all the information in a probabilistic way, unlike the above methods which are deterministic. Recently, some vision researchers investigated SLAM algorithms[15, 16] in pure vision domain. However, the vision-only SLAM systems suffer from the inherent 3D quality of visual data and the difficulty in extracting long-term features to map[17] . Thus, in this paper, the particle filter SLAM method, which is still based on the probabilistic techniques, has been adapted by using sensorbased instead of vision-based features. In a probabilistic camera tracking system, the aim of localization is to estimate the state of the camera and its environment from some sensor measurements. Therefore, we need a mathematical representation which can help represent and calculate the estimations. Bayes filtering[18] addresses such a problem. The Bayesian filter is used in most probabilistic localization systems. It is, however, only a theoretical framework for this estimation problem. The integration in Bayesian filter is a vital problem. If the state space is continuous, the implementation of the algorithm requires memory storage for the representation of the whole posterior distribution, which is an infinite dimensional vector. In cases where the state space is discrete and of high dimensionality, the integration is still extremely complicated and not practical to implement.

2.3

Particle filter localization

There are various approaches for implementing Bayesian filters. One important family that implements Bayesian filters is based on sampling techniques. It approximates the posterior distribution by a random set of samples. Besides, the observation and motion model can also be represented by a set of random samples. This method is often called “particle filter” and is particularly suitable for nonlinear estimation problems. To solve the integration in Bayesian filter[18] , the particle filter performs a Monte Carlo simulation. The desired posterior distributions are represented by a set of randomly chosen samples (particles) with importance sampling (weighting), and then compute the required estimation based on these samples with regard to associated weights. As the number of particles becomes large enough, these particles will become an equivalent representation of the posterior distribution. After applying the sampling, the belief of the camera at the time t becomes:

i i Bel(st ) ∼ = S1 = {st , wt },

i = 1, · · · , m.

(1)

Here m is the number of the particles wti is the importance factor and sit is the state of the camera. Fig. 2 shows the basic working process of particle filters[19] . Here, we assume that the initial position of the camera is roughly known (represented by the picture in row 1, column A, denoted A1), and the state space is 2D. A2 is the initial particle set. First, the camera moves, for example, one meter away from the initial position. Therefore, B1 is the motion model, which gives the proposal distribution in B2. When the sensor reading becomes available, C1 is the observation model, say the likelihood of observing the sensor at a certain position. After incorporating the sensor reading we then update the weights of the particles. And this gives the new particle set, shown by C2, where the particles with darker grey level indicate a highly possible state.

A1: Initial camera position; A2: Initial particle set; B1: Motion model; B2: Proposal distribution; C1: Observation model; C2: New particle set. Fig. 2

3

Working process of particle filter for camera tracking

Sensor-based SLAM algorithm

The aim of this algorithm is to only achieve the sensorbased 2D camera tracking in a virtual studio environment since the SLAM techniques perform better in a 2D situation than a 3D environment. This section provides a comprehensive description of the implementation of system states, system models, and the particle filter in this algorithm. The particle filter in this SLAM algorithm is not exactly the same as the standard particle filter since we propose a map adjustment in this algorithm. In addition, the algorithm also has the potential to be applied in different sensor network environments, such as radio frequency identification (RFID) sensors, infrared sensors, and the initial estimated position of the camera is unknown. The sensor-based SLAM for camera tracking could be as in Fig. 3.

P. Yang et al. / A Sensor-based SLAM Algorithm for Camera Tracking in Virtual Studio

155

Bayesian filter can be defined as a probabilistic distribution: P r(dt |st ), where dt and st are the camera location state and sensor reading over time t, respectively. The straight observation model is given by the following equation

(4) ds = g(fs , s) = (xf − xs )2 + (yf − ys )2 + w

Fig. 3

3.1

SLAM for camera tracking

System state and model

In this paper, it is assumed that the observation system is based on a sensor network, to successfully obtain the range information. Thus, we just assume that there are several sensor transmitters mounted in the surroundings, and the camera is equipped with a sensor receiver. Actually, the system only requires the distance between the camera points to the feature points over time. Hence, the sensor network could comprise active sensor or passive tags. Then, each feature of the map actually represents a node of the sensor transmitter, they are denoted as fn , where n is the index of transmitters. The location state that represents the position of the camera is defined as s.     xf xs , s= . (2) fn = yf ys Having defined the feature states and location states, the system state, at time t, is then: ⎤ ⎡ st ⎥ ⎢ f ⎥ ⎢ 1,t ⎥ ⎢ (3) xt = ⎢ f2,t ⎥. ⎥ ⎢ ⎦ ⎣ ··· fn,t Given the above overview of the system state, the camera starts moving from an initial position s0 without prior knowledge of the sensor nodes f1, f2 , · · · , fn . As the camera keeps moving, it receives relative range data from the sensor transmitters. Using these sensor data, the SLAM algorithm tries to estimate the path s0:t of the camera. The observation model tells the probability of obtaining a camera position at a certain location state. The

where xf is the coordinate of a frame, xs is the coordinate of the camera, ds is the relative distance from the camera to feature n, and w is the Gaussian noise characterizing the errors of the sensors. At each time step, the sensor attached to the camera will receive observation information from all features. The motion model characterizes the camera location states over time. It helps to predict the next camera location state given the most current one. When implementing the motion model, we have to consider the characteristics of the motion kinematics of the camera. We assumed the target camera trajectory is associated with direction or speed of the movement that is random. Thus, we use a 2D Gaussian model to approximate the motion. More specifically, when given the location state st at the time step t, to predict the location state st+1 at the time t + 1, we draw a number of particles randomly from a 2D Gaussian distribution with zero-mean. These particles will form a circle with origin at st , and its radius is determined by the standard deviation of the 2D Gaussian distribution.

3.2

Particle filter algorithm

Based on the above framework, the data structure of M particles is illustrated in Fig. 4.

Fig. 4

Data structure of Particles[19]

Each particle has 2(n + 1) states, 2 location states and 2n feature states. In a mathematical form, each particle is: m m m m xm t = [st , f1,t , f2,t , · · · , fn,t ] = m m m [(x, y)m t , (x, y)1,t , (x, y)2,t , · · · , (x, y)n,t ]

(5)

where the superscript m is the index of the particle, the subscript t indicates the time step, sm t is the location of the m represents feature n. The particle filter camera and fn,t algorithm is then operating on a set of particles xm t . Each iteration of the algorithm can be divided into the following

156

International Journal of Automation and Computing 05(2), April 2008

stages: 1) initialization, 2) apply motion model, and apply observation model and weight all the particles, 3) map adjustment, 4) resampling. 3.2.1 Initialization Initialization is the most important stage in all SLAM algorithms. In extended Kalman filter (EKF)-based SLAM, its task is to initialize the mean and covariance matrix for the state vector, while in this particle filter-based SLAM, it is to initialize the location state and feature states in each particle. The initialization process can be quite tricky when a single measurement is not enough to constrain a feature s location in all dimensions[20] . This problem will bring great ambiguities about the feature states at the beginning of this algorithm. For instance, some use multiple measurements to help constrain features in initialization[21, 22] . In this research work, an approach is employed to reduce ambiguities by using the first two measurements to obtain a rough idea of where the next location states should be, i. e., in which quadrant the state is. Then, a random point is chosen in that quadrant to be the next location state (see Fig. 5). This approach is similar to the “delayed decision making” by Leonard and Rikoski[23] in which they use a number of previous location states to help initialize new features.

Fig. 5

The initialization process

In Fig. 5, at the beginning (time step 1), the distance measurement (here it is 20) of feature A is received. Hence, we put feature states of all particles on a circle (the grey points) with a radius of 20 to approximate feature A, and put location state of particles on the origin to be the initial location state (the grey triangle). At time step 2, we put a random point to be the next location state (the black triangle) and based on this point to estimate feature A (the black points). The ambiguity about feature A is reduced from a circle to some points. 3.2.2 Weighting After the initialization, the motion model is applied to all particles. More specifically, the location state of each particle will be replaced with a new one generated from the motion model while the feature state of each particle

will remain unchanged. Fig. 6 is an example showing one particle being applied the motion model.

Fig. 6

The weighting process

In Fig. 6 , before applying the motion model, the particle has an estimation of the camera location state at (xs , ys ) and estimation of feature 1 at (xf 1 , yf 1 ). After applying the  motion model, the location state is replaced with (xs , ys ) while the estimation to feature 1 remains unchanged. Only applying the motion model to all particles does not represent the true posterior of the path and features since it does not incorporate the observation. Therefore, the weighting process is required which gives individual particles a weight to reflect the observation. Before describing how to implement the weighting process, we need to define some terms. At time step t, before receiving the observation, each particle has its estimation to the location state and feature states. Then, we define “predicted location state” as the location state after being applied the motion model. Also, we define “predicted observation” as the distance measurement from the predicted location state to a feature. In Fig. 6, (xs , ys ) is the predicted location state, and d is the predicted observation. Then, the weight of each particle should be determined by the difference of the predicted observation and real observation. If the predicted location state (xs , ys ) and feature state (xf 1 , yf 1 ) is very close to the real states. Then, the predicted observation d will be very close to the real observation. Hence, this particle will have a high weight. In a probabilistic math form, the weight of each particle is given by m (6) wm = P r(dt |fn , sm t )P r(fn |s0:t−1 , d0:t−1 )dfn where the superscript m is the index of the particle, subscript t is time step, tn is feature n, and dt is the observation. Equation (7) is implemented to calculate the real  observation dt under a Gaussian with mean dt and standard deviation σ determined by the observation noise. More specifically, the weight of each particle is calculated using the following equation w= all f eatures



(2πσ 2 )1/2 e

 (d−d )2 2σ2

dd

(7)

157

P. Yang et al. / A Sensor-based SLAM Algorithm for Camera Tracking in Virtual Studio

3.2.3 Map adjustment The “map adjustment” is a novel techniques developed in this paper. Its inspiration comes from the “landmark update” in fast SLAM[24] where the estimations to landmark (feature) are updated using EKF. The EKF approach is not suitable in this SLAM problem due to the non-linear and non-invertible observation model. The basic idea of map adjustment is as follows: for each particle, after applying the motion model and weighting, when the observation is received, each feature s state is then adjusted so that the difference between the predicted observation and real observation becomes smaller. Fig. 7 shows one particle example of the map adjustment. At the beginning, a distance measurement of feature A is received hence we put its estimation (the grey circle) on a circle (with a radius of the distance r). Then, the motion model is applied which moves the location state from  (xs , ys ) to (xs , ys ) (the grey triangle). If the black circle is the real location of feature A, then a new observation d will be received. Then, we compare the real observation d with   the predicted observation d . Obviously d is smaller than d so the estimation to feature A is moved to the dashed circle. By doing so, the estimation to feature A will be closer to the real one. How far the grey circle should be moved  depends on the difference between d and d, and the radius r. In this implementation, we use the following equation to calculate the movement.

duplicated because it means the high probability on that position. However, the number of particles is still 10. The sum of all weights of all particles should remain unchanged, summing up to 1.

Fig. 8

3.3

The resampling process

Algorithms summary

The summary of the whole sensor-based SLAM program is shown in Fig. 9.



(d − d ) (8) r where p is a parameter which must be specified manually based on experiments. By using the map adjustment, the accuracy of the estimation to features can be greatly improved, or can be maintained but fewer particles are required. movement = p ·

Fig. 7

Fig. 9

Illustration of the map adjustment

3.2.4 Resampling Resampling is the last step in each iteration. This step is very much the same as the one in particle filter localization. In this process, those particles with large weight will be duplicated while those with small weight will be deleted (see Fig. 8). In Fig. 8, before the resampling process, the number of particles is 10. After resampling process, some small weight particles are deleted, which means the low probability on that position. The particles with large weight are

4

The sensor-based SLAM algorithm

Experimental evaluations

A number of simulations have been carried out, using different datasets, different numbers of particles, and other various settings. The goal of these experiments is to evaluate the accuracy, robustness, and efficiency of this sensorbased SLAM camera tracking solution, and to investigate if this algorithm has been successfully implemented for camera 2D range position estimation and tracking. The dis-

158

International Journal of Automation and Computing 05(2), April 2008

tance data is simulated by computer.

4.2

4.1

In an experiment, a dataset simulated with longer time steps (120 time steps) is used, to test the stability of this algorithm. The result is shown in Fig. 12 path estimation with long time steps. Fig. 13 illustrates the estimation error on the path over time. From Fig. 12, the limitation of X and Y axis are separately (-20, 10) and (-10, 20), thus we get the max X and Y error on unit percentage to be about 1/30.

Regular dataset

In the first dataset, the real path of the camera is simulated deliberately to avoid ambiguities and should have some varieties in all directions. We can see from the experimental result that at the beginning, the path estimation is not correct, and neither do the feature estimations. This is due to the fact that there are a lot of ambiguities about each feature. For instance, at time step 3, there are several estimations to feature C, which are distributed quite depressively (the grey circles). These ambiguities cause the path estimation to be “twisted” (the solid line). However, as the camera keeps moving, at time step 60, both the feature estimations and path estimation converge to the real ones. Fig. 10 illustrates the errors of the path estimation from time step 0 to 60. At time step 0, since we are assuming that the estimated location and the real location are both at the origin, there are only small errors at the beginning. As the algorithm keeps iterating, we can see that there is a significant error at about time step 20. Compared with the real path in Fig. 11, we may conclude that this is because the camera changes its direction at that time. Nevertheless, as the camera keeps moving, the errors get smaller, and smaller and finally converge. Fig. 11 shows that compared with the real feature state, the errors are very little.

Long dataset

Fig. 12

Fig. 13 Fig. 10

Error on path estimation over time

Regular dataset

4.3

Fig. 11

Path estimation with long time steps

Errors of the location estimation over time

Map adjustment improvement

The map adjustment technique proposed in this paper can help to improve the accuracy. It can also maintain the same accuracy but fewer particles are required. Fig. 14 illustrates a comparison of two experiments using 200 particles. Fig. 14 (a) is the error of the path estimation over time without the map adjustment, while Fig. 14 (b) is with map adjustment. Clearly, after applying the map adjustment, the estimated path converges more quickly to the real path. The reason is that the map adjustment gives a limitation on the predicted particle estimation, and shortens the time of camera tracking from unstable state to stable state.

159

P. Yang et al. / A Sensor-based SLAM Algorithm for Camera Tracking in Virtual Studio Table 2

Performance with different number of particles

Numbers of Steps Particles

(a) Algorithm without map adjustment

Run

Average effective

time (s) numbers particles 4.2

12.54

Average errors (unit) X&Y

100

120

1.5105 & 2.9111

200

120

8.1

24.46

0.212 & 0.133

300

120

11.1

36.83

0.170 & 0.126

400

120

15.6

49.23

0.095 & 0.189

500

120

18.8

61.34

0.332 & 0.298

600

120

22.1

74.01

0.085 & 0.133

700

120

28.6

85.48

0.337 & 0.196

800

120

31.1

98.78

0.155 & 0.176

900

120

33.3

110.67

0.31294 & 0.228

1000

120

38.1

121.47

0.146 & 0.171

(b) Algorithm with map adjustment

Fig. 14

4.4

How the map adjustment improves the accuracy

(a) Particles 100 path estimation with long time steps

Performances with different number of particles

The number of particles used in this algorithm will significantly affect its performance. Three experiments are carried out using 100, 200, and 800 particles, respectively. Figs. 15−17 separately shows the performance of particles 100, particles 200, and particles 800, respectively. In the following figures, the above ones are the path figures, and the below ones are the error figures. Below the figures are the estimations of each feature at the last time step, and the time (in seconds) that this algorithm takes to calculate these results. Based on the above experiments, we can see that the time that this algorithm takes is linear to the number to particles, and is also linear to the number of features in the map. Table 2 shows the performance of this algorithm in different numbers of particles. (CUP 2.4 GHz, RAM 1 GB). The platform of the algorithm operation is the common experiment environment on Windows XP and Visual C++.

(b) Particles 100 error on path estimation over time

Fig. 15

Performance of particles 100

In the 100 particle experiment shown in Fig. 15, this algorithm fails to estimate the path, and also the features. It takes 2.4 s to finish 60 time steps.

160

International Journal of Automation and Computing 05(2), April 2008

(b) Particles 800 error on path estimation over time

(a) Particles 200 path estimation with long time steps

Fig. 17

(b) Particles 200 error on path estimation over time

Fig. 16

Performance of particles 800

In Fig. 17, when using 800 particles, we get a very accurate estimation to both the path and the features. The algorithm takes a significantly longer time to finish in 60 time steps. In order to evaluate the algorithm for camera tracking, we use Maya to simulate a virtual world and virtual camera. Based on the time step, we produce the short movie for 2D camera trajectory estimation. The pictures below are the frames from the original and estimated camera movement, respectively. The initial camera position is (0, 0, 4), and the Z axis is instant. The distance of the blue screen from the camera is 20, and the distance of the character from the camera is 15. The X, Y error of the camera in the 3D environment has not much influence. The results in Fig. 18 show that the algorithm is efficient to 2D camera tracking in the virtual studio on the time steps 50.

Performance of particles 200

5 In Fig. 16, by increasing the number of particles to 200, this algorithm successfully calculates all the estimations. Within 3.5 s, the errors are quite acceptably small.

(a) Particles 800 path estimation with long time steps

Conclusions and future work

Virtual studios have long been used in commercial broadcasting and are starting to evolve into the next level with improved technology in image processing and computer graphics. In this paper, a sensor-based SLAM algorithm

(a) Original camera view position A

P. Yang et al. / A Sensor-based SLAM Algorithm for Camera Tracking in Virtual Studio

161

progress of this work will lead to the following issues that warrant future research: i) The current implementation of the motion model only makes use of previous location state and totally ignores the past states. Possible improvements can be made to consider the historical location states so that we can obtain some directional information. ii) It would be interesting if this algorithm can be extended to solve a 3D range camera tracking in virtual studios. The estimation of orientation of the camera would be considered in further research. iii) This work is tested under simulation. It will be integrated with various sensors, such as RFID techniques. Thus, the deployment of the algorithm into practical applications is attractive for future work. (b) Estimation camera view position B

Fig. 18

Original and estimation camera view positions

has been designed and implemented to solve the problem of camera tracking in a virtual studio environment. The simulation results show that the algorithm can achieve the research aim successfully. Unlike past works whose approaches are mostly based on EKF, and where the system models are linear or easy to be linearized, the particle filter can cope with non-linear system models and therefore vastly improve the robustness of this SLAM solution, with the cost of higher computational complexity. The main difficulties in this project are the non-linear observation model, the motion model without direction information, and the complexity of implementing a particle filter. A novel approach called map adjustment has been proposed to improve the accuracy and efficiency. In general, the achievements and limitations of this work can be summarized as follows: 1) The usage of the particle filter approach has been successful to improve the efficiency by factoring the highdimensional SLAM problem into a product of several low-dimensional estimation problems. Thus, the highdimensional SLAM problem is possible to be solved using a particle filter. Furthermore, through experiments, we can see that the particle filter is capable of dealing with noisy observations. 2) The non-linearity of the motion model and observation model has been represented correctly with particle filter. Several techniques have been proved to be able to reduce the ambiguities at the initialization step. 3) The proposed map adjustment technique can significantly improve the accuracy and efficiency, or reduce the number of the particles that is required to get satisfactory results. However, it is designed specially for this SLAM problem, thus it is not applicable for all SLAM problems. Also, it has a parameter that must be set carefully through experiments. This research work has achieved a reasonable degree of success. 4) The proposed sensor-based SLAM algorithm can solve the 2D camera tracking problem in virtual studio. The accuracy can satisfy the general requirement of virtual studios. The strengths and limitations that arise throughout the

References [1] K. Cornelis, M. Pollefeys, L. V. Gool. Tracking Based Structure and Motion Recovery for Augmented Vdeo Productions. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology, ACM, Alberta, Canada, pp. 17–24. 2001. [2] M. I. Lourakis, A. A. Argyros. Efficient, Causal Camera Tracking in Unprepared Environments. Computer Vision and Image Understanding, vol. 99, no. 2, pp. 259–290, 2005. [3] M. A. Livngston, A. State. Magnetic Tracker Calibration for Improved Augmented Reality Registration. Teleoperators and Virtual Enviroment, vol. 6, no. 5, pp. 532–546, 1997. [4] F. Madritsch. Optical Beacon Tracking for Humancomputer Interfaces, Ph. D. dissertation, Technical University Graz, Austria, 1996. [5] R. Want. A. Hopper. V. Falcao, I. Gibbons. The Active Badge Location System. ACM Transactions on Information Systems, vol. 10, no. 1, pp. 91–102, 1992. [6] C. Randell, H. Muller. Low Cost Indoor Positioning System. Ubicomp 2001: Ubiquitous Computing, G. D. Abowd (ed.), Springer-Verlag, Berlin, Germany, pp. 42–48. 2001. [7] S. Gibbs, C. Arapis, C. Breiteneder, V. Lalioti, S. Mostafaway, J. Speier. Virtual Studios: An Overview. IEEE Multimedia, vol. 5, no.1, pp. 18–35, 1998. [8] A. Wojdala. Challenges of Virtual Set Technology. IEEE Multimedia, vol. 5, no. 1, pp. 50–57, 1998. [9] Y. Yamanouchi, H. Mitsumine, T. Fukaya, M. Kawakita, N. Yagi, S. Inoue. Real Space-based Virtual Studio – Seamless Synthesis of a Real Set Image with a Virtual Set Image. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology, ACM, HongKong, PRC, pp. 194– 200, 2002. [10] D. Scharstein, R. Szeliski. A Taxonomy and Evaluation of Dense Two-frame Stereo Correspondence Algorithms. International Journal of Computer Vision, vol. 47, no. 1-3, pp. 7–42, 2002. [11] R. Yang, M. Pollefeys, H. Yang, G. Welch. A Unified Approach to Real-time, Multi-resolution, Multi-baseline 2D View Synthesis and 3D Depth Estimation Using Commodity and Graphics Hardware. International Journal of Image and Graphics, vol. 4, no. 4, pp. 627–651, 2004. [12] A. Ward, A. Jones, A. Hopper. A New Location Technique for the Active Office. IEEE Personal Communications, vol. 4, no. 5, pp. 42–47, 1997. [13] R. A. Brooks. A Robot That Walks: Emergent Behavior from a Carefully Evolved Network. Neural Computation, vol. 1, no. 2, pp. 253–262, 1989.

162

International Journal of Automation and Computing 05(2), April 2008

[14] S. Thrun, D. Fox, W, Burgard, F. Dellaert. Robust Monte Carlo Localization for Mobile Robots. Artificial Intelligence, vol. 128, no. 1-2, pp. 99–141. 2001. [15] M. Pupilli, A. Calway. Real-time Camera Tracking Using a Particle Filter. In Proceedings of British Machine Vision Conference, Oxford, UK, pp. 519–528, 2005. [16] A. J. Davison. Real-time Simultaneous Localisation and Mapping with a Single Camera. In Proceedings of IEEE International Conference on Computer and Vision, IEEE Press, Washington D.C., USA, vol. 2, pp. 1403–1410, 2003. [17] A. J. Davison, D. W. Murray. Simultaneous Localization and Map-building Using Active Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 865–880 2002.

Wenyan Wu received the B. Sc. and M. Sc. degrees from Dalian University of Technology, China, in 1988 and 1991, respectively. She received her Ph. D. degree in modelling and optimization from Harbin Institute of Technology in 1999 and received her Ph. D. degree in virtual reality from University of Derby, UK, in 2002. She has taught and conducted researches at Harbin Institute of Technology, China, and De Montfort University, UK. She is currently a senior lecturer in simulation and virtual reality at Staffordshire University, UK. Her research interests include modelling and simulation, virtual reality and augmented reality system, advanced interface, digital media, and distribution system.

[18] M. H. Degroot, M. J. Schervish. Probability and Statistics, 3rd ed., Addison-Wesley, USA, 2002. [19] A. Doucet, D. Freitas, K. Murphy, S. Russell. Raoblackwellised Particle Filtering for Dynamic Bayesian Networks. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, Stanford, USA, pp. 176–183, 2000. [20] M. C. Deans, M. Hebert. Experimental Comparison of Techniques for Localization and Mapping Using a Bearingonly Sensor. Lecture Notes in Control and Information Sciences, Experimental Robotics VII, Springer-Verlag, London, UK, vol. 271, pp. 395–404, 2002. [21] S. B. Williams, G. Dissanayake, H. Durrant-Whyte. Constrained Initialisation of the Simultaneous Localization and Mapping Algorithm. In Proceedings of International Conference on Field and Service Robotics, Helsinki, Finland, pp. 315–330, 2001. [22] J. R. Spletzer. A New Approach to Range-only SLAM for Wireless Sensor Networks, Techniqucal Report, Lehigh University Bethlehem, PA, USA, 2003 [23] J. J. Leonard, R. J. Rikoski. Incorporation of Delayed Decision Making into Stochastic Mapping. Lecture Notes in Control and Information Sciences, Springer-Verlag, Berlin, Germany, Volume 271, pp. 533–542, 2000. [24] J. M. Buhmann, W. Burgard, A. B. Cremers, D. Fox, T. Hofmann, F. E. Schneider, J. Strikos, S. Thrun. The Mobile Robot Rhino. AI Magazine, vol. 16, no. 2, pp. 31–38, 1995. Po Yang received the B. Sc. degree in computer science from the Wuhan University, China, in 2004, and the M. Sc. degree in computer science from Bristol University, UK, in 2006. Currently, he is a Ph. D. candidate at the Faculty of Computing, Engineering and Technology in Staffordshire University, UK. He has published about 4 refereed conference papers. His research interests include image processing, computer vision, virtual reality, RFID, and sensor networking.

Mansour Moniri received Ph. D. and B. Sc. degrees from Department of Electronics and Computer Science, Swansea University, UK, in 1993 and 1987, respectively. From 1987 to 1989 he worked with INMOS UK on developing transporter based testing systems for data converters. He was awarded R. O. Dunmore prize in electronic engineering in 1985. He was the director of Technology Research Institute until 2003, and he is currently faculty head of research at Staffordshire University, UK. He established research and enterprise centre in this area through government funding and provides consultancy to national and international companies. His research interests include algorithmic development and implementation of signal, image and video processing systems.

Claude C. Chibelushi received the B. Eng. degree in electronics and telecommunications from the University of Zambia, Lusaka, Zambia, in 1987, the M. Sc. degree in microelectronics and computer engineering from the University of Surrey, Surrey, UK, in 1989, and the Ph. D. degree in electronic engineering from the University of Wales Swansea, Swansea, UK, in 1997. In 1997, he joined the Faculty of Computing, Engineering and Technology, at Staffordshire University Staffordshire University, Stafford, UK, where he is currently a reader in digital media processing. Before he joined Staffordshire University, he was a senior research assistant at the University of Wales Swansea from 1995 to 1996. He also worked as a lecturer in the Department of Electrical and Electronic Engineering at the University of Zambia from 1989 to 1991. He was a Beit Fellow from 1991 to 1995. He is currently a member of the Institution of Engineering and Technology. His research interests include multimodal recognition, robust pattern recognition, medical image analysis, and image synthesis and animation.

Suggest Documents