Visual Inertial Navigation and Calibration

Linköping studies in science and technology. Thesis. No. 1500 Visual Inertial Navigation and Calibration Martin A. Skoglund LERTEKNIK REG AU T O ...

Author: Giles Bruce

2 downloads 1 Views 626KB Size

Report

Download PDF

Recommend Documents

Inertial Instruments and Inertial Navigation

On Inertial Navigation: Calibration Techniques and Dead-Reckoning

A Visual-aided Inertial Navigation and Mapping System

RT3000 Inertial and GPS Navigation System

Visual-Inertial Robot Localization

Pose Estimation and Calibration Algorithms for Vision and Inertial Sensors

Efficient Visual-Inertial Navigation using a Rolling-Shutter Camera with Inaccurate Timestamps

Visual Navigation with Schematic Maps

A Traceable Inertial Calibration Procedure Suited for MEMS Sensing

Performance Modeling of Ring Laser Gyro in Inertial Navigation System

Scalable, Multi-Sensor Aided Inertial Navigation for Agile Robots

Inertial Navigation System Simulator Program: Top-Level Design

A LOW COST INERTIAL NAVIGATION EXPERIMENT ONBOARD BALLOONS

4. Ear Structures. The ear is an inertial navigation device

DESIGN DRAWINGS FOR VISUAL AIR NAVIGATION FACILITIES

Visual Learning with Navigation as an Example

Navigation and Mapping for Aerial Vehicles Based on Inertial and Imaging Sensors

Visual odometry and map fusion for GPS navigation assistance

Data Fusion Calibration for a 3D Laser Range Finder and a Camera using Inertial Data

Architecture, Inertial Navigation, and Payload Designs for Low-Cost Unmanned Aerial Vehicle- Based Personal Remote Sensing

3.1 Inertial and Non-inertial Frames of Reference

Modelling human visual navigation using multi-view scene reconstruction

Back to the Future. Inertial Navigation for Precise Well Bore Survey

Linköping studies in science and technology. Thesis. No. 1500

Visual Inertial Navigation and Calibration

Martin A. Skoglund

LERTEKNIK REG

AU T

O MA RO TI C C O N T

L

LINKÖPING

Division of Automatic Control Department of Electrical Engineering Linköping University, SE-581 83 Linköping, Sweden http://www.control.isy.liu.se [email protected] Linköping 2011

This is a Swedish Licentiate’s Thesis. Swedish postgraduate education leads to a Doctor’s degree and/or a Licentiate’s degree. A Doctor’s Degree comprises 240 ECTS credits (4 years of full-time studies). A Licentiate’s degree comprises 120 ECTS credits, of which at least 60 ECTS credits constitute a Licentiate’s thesis.

Linköping studies in science and technology. Thesis. No. 1500 Visual Inertial Navigation and Calibration

Martin A. Skoglund [email protected] www.control.isy.liu.se Department of Electrical Engineering Linköping University SE-581 83 Linköping Sweden ISBN 978-91-7393-126-7

ISSN 0280-7971

LiU-TEK-LIC-2011:39

Copyright © 2011 Martin A. Skoglund Printed by LiU-Tryck, Linköping, Sweden 2011

To the memory of my mother

Abstract Processing and interpretation of visual content is essential to many systems and applications. This requires knowledge of how the content is sensed and also what is sensed. Such knowledge is captured in models which, depending on the application, can be very advanced or simple. An application example is scene reconstruction using a camera; if a suitable model of the camera is known, then a model of the scene can be estimated from images acquired at different, unknown, locations, yet, the quality of the scene model depends on the quality of the camera model. The opposite is to estimate the camera model and the unknown locations using a known scene model. In this work, two such problems are treated in two rather different applications. There is an increasing need for navigation solutions less dependent on external navigation systems such as the Global Positioning System (GPS). Simultaneous Localisation and Mapping (slam) provides a solution to this by estimating both navigation states and some properties of the environment without considering any external navigation systems. The first problem considers visual inertial navigation and mapping using a monocular camera and inertial measurements which is a slam problem. Our aim is to provide improved estimates of the navigation states and a landmark map, given a slam solution. To do this, the measurements are fused in an Extended Kalman Filter (ekf) and then the filtered estimates are used as a starting solution in a nonlinear least-squares problem which is solved using the Gauss-Newton method. This approach is evaluated on experimental data with accurate ground truth for reference. In Augmented Reality (ar), additional information is superimposed onto the surrounding environment in real time to reinforce our impressions. For this to be a pleasant experience it is necessary to have a good models of the ar system and the environment. The second problem considers calibration of an Optical See-Through Head Mounted Display system (osthmd), which is a wearable ar system. We show and motivate how the pinhole camera model can be used to represent the osthmd and the user’s eye position. The pinhole camera model is estimated using the Direct Linear Transformation algorithm. Results are evaluated in experiments which also compare different data acquisition methods.

v

Populärvetenskaplig sammanfattning Bearbetning och tolkning av visuellt innehåll är avgörande för många system och tillämpningar. Detta kräver kunskap om hur innehållet mäts och även vad mäts. Sådan kunskap kan fångas i modeller som, beroende på tillämpning, kan vara mycket avancerad eller enkel. En applikationsexempel är scenrekonstruktion med en kamera; om en lämplig modell av kameran är känd, då kan en modell av scenen uppskattas från bilder som tagits ifrån olika, okända, platser. Dock beror scenmodellens kvalitetet påkamereramodellens kvalitetet. Motsatsen är att uppskatta kameramodellen och de okända platserna med en känd scenmodell. I detta arbete behandlas två sådana problem i två olika tillämpningar. Det finns ett ökande behov av navigationslösningar som är mindre beroende av externa navigationssystem såsom Global Positioning System (GPS). Samtidig Lokalisering och kartering (slam) erbjuder en lösning genom att uppskatta både navigationstillstånd och vissa egenskaper i den omgivande miljön utan något externt navigationssystem. Det första problemet handlar om visuell tröghetsnavigering och kartering med monokulär-kamera och tröghetssensorer. Vårt mål är att ge förbättrad uppskattning av navigeringstillstånd och en landmärkseskarta, givet en lösning från slam. För att göra detta vägs mätningarna samman i ett Extended Kalman Filter (ekf) och sedan används de filtrerade skattningar som en initiallösning i ett ickelinjärt minsta-kvadratproblem, som i sin tur löses med Gauss-Newtons metod. Denna strategi utvärderas i experiment där en tillförlitlig referens är känd. I Augmented Reality (AR)-applikationer överlagras information på den omgivande miljö i realtid för att förstärka våra intryck. För att detta skall bli en trevlig upplevelse är det nödvändigt att ha en bra modeller av AR-systemet och miljön. Det andra problemet handlar om kalibrering av ett Optisk Halvtransparent Huvudburet Displaysystem (OSTHMD), som är ett bärbart AR-system. Vi visar och motiverar varför pinhålskameramodellen kan användas för att representera OSTHMDn och användarens ögonposition. Pinnhålskameramodellen uppskattas med hjälp av den så kallade Direct Linear Transformation algritmen. Resultaten utvärderas i experiment som också jämför olika datainsamlingmetoder.

vii

Acknowledgments First of all I like to thank my supervisor Prof. Fredrik Gustafsson for your 24/7 inspiring expertise guidance, how do manage? My co-supervisor Dr Thomas Schön, I admire your sound approach to research and your never ceasing positiveness. Dr David Törnqvist, thanks for sharing so much of your knowledge, time and for finding the bargains that no one else can. Thanks Prof. Lennart Ljung and Fredrik for inviting me to join the Automatic Control group, it’s been a great ride and I hope it will continue this way. Prof. Svante Gunnarsson heads the group, and Ninna Stensgår makes this (and many other things) a lot easier and for this I am truly grateful, thanks! The former head Lennart, well supported by Ulla Salaneck and later Åsa Karmelind, did a terrific job and created a pleasant environment! There are many people at Automatic Control I would like to thank but I have only time to mention a few. In no particular order; Prof. Mikael Nörrlöf, sorry for killing Marvin, thanks for not killing me! Lic. Zoran Sjanic (The Zoran), it’s allways interesting and fun to collaborate with you and I really like your weird sense of ”humour”. Magnus Axholt, thanks for showing me the strange world of Augmented Reality, also thanks for introducing me to all the nice people at ITN. Lic. Jeroen Hol, thanks for sharing your expertise, I like the interesting discussions we have when you visit Linköping. Dr Ragnar Wallin, this is for you: P ? . Special thanks goes to Lic. Jonas Callmer, Lic. Karl Granström, Lic. Christan Lundquist for the TAFWJP we spent together. I also want to thank the super hacker geniuses Dr Henrik Tidefelt and Dr Gustaf Hendeby for all the excellent LATEX advice and Shapes support. To all the people I did not mention; remember, to me you are all equally unimporta..., eeeeh sorry, equally important! I would also like to thank LINK-SIC Industry Excellence Centre for the financial support, thanks! There are many people outside the Automatic Control group that means a lot to me, but I will only mention a few since LiU-Tryck want me to finish this now. My second family in Falkenberg; Lena, Kurt, Pär, Lotta, Anna, Bertil, Amanda and all the others, your hospitality is incredible and I truly enjoy when we meet! My brother Johan; I’m amazed how much fun we have together, let it stay this way! My father Bernt; our family has had some really tough years and you have always taken the biggest load, I can not find words to thank you for it! Finally; Maria, time flies when we are together and I enjoy every second of it! Thanks for being supportive, loving, caring and being you! Solebo, WGS 84 − decimal : 58.329808 15.734846, May 2011 Martin Skoglund

ix

Contents

Notation

I

xv

Background

1 Introduction 1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Visual Inertial Navigation and Mapping . . . . . . . . . . 1.1.2 Optical See-Through Head Mounted Display Calibration 1.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

3 3 3 4 4 5 6

2 Models 2.1 Kinematics . . . . . . . . . . . . . . . . 2.1.1 Notation . . . . . . . . . . . . . 2.1.2 Translational Kinematics . . . . 2.1.3 Rotational Kinematics . . . . . 2.1.4 Quaternions . . . . . . . . . . . 2.2 Dynamics . . . . . . . . . . . . . . . . . 2.2.1 Translational Dynamics . . . . 2.2.2 Rotational Dynamics . . . . . . 2.2.3 Dynamic Models . . . . . . . . 2.3 Inertial Measurements . . . . . . . . . 2.3.1 Gyroscopes . . . . . . . . . . . . 2.3.2 Accelerometers . . . . . . . . . 2.3.3 Discrete-time Dynamic Models 2.4 The Pinhole Camera Model . . . . . . . 2.4.1 Camera Coordinates . . . . . . 2.4.2 World to Pixel Coordinates . . .

. . . . . . . . . . . . . . . .

7 7 8 8 8 9 10 10 10 11 11 11 11 12 13 13 16

3 Estimation 3.1 Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17 17

xi

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

xii

CONTENTS

3.2 Calibration . . . . . . . . . . . . 3.3 Nonlinear Estimation . . . . . . 3.3.1 Extended Kalman Filter 3.3.2 Nonlinear Least-Squares

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

18 19 19 19

4 Visual Inertial Navigation and Mapping 4.1 Visual and Inertial ekf-slam . . . . . 4.1.1 Prediction . . . . . . . . . . . 4.1.2 Measurement Update . . . . . 4.2 Nonlinear Least-Squares and slam . 4.2.1 Remarks . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

21 21 22 22 23 23

5 Optical See-Through Head Mounted Display Calibration 5.1 System Overview . . . . . . . . . . . . . . . . . . . . . 5.1.1 Head Mounted Display . . . . . . . . . . . . . . 5.1.2 Tracker . . . . . . . . . . . . . . . . . . . . . . . 5.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Tracker Measurements . . . . . . . . . . . . . . 5.2.2 Direct Linear Transformation . . . . . . . . . . 5.2.3 Decomposition . . . . . . . . . . . . . . . . . . 5.2.4 Remarks . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

25 25 25 26 26 28 29 30 30

6 Concluding Remarks 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 33 33

Bibliography

35

II

. . . .

. . . .

Publications

A A Nonlinear Least-Squares Approach to the SLAM Problem 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 Problem Formulation . . . . . . . . . . . . . . . . . . . . 3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Dynamics . . . . . . . . . . . . . . . . . . . . . . 3.2 Landmark State Parametrisation . . . . . . . . . 3.3 Camera Measurements . . . . . . . . . . . . . . . 4 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Initialisation . . . . . . . . . . . . . . . . . . . . . 4.2 Nonlinear Least-Squares Smoothing . . . . . . . 5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Experimental Setup . . . . . . . . . . . . . . . . . 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions and Future Work . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

43 45 46 47 48 48 49 50 50 50 54 54 55 56 58

CONTENTS

B Parameter Estimation Variance of the Single Point Active Alignment Method in Optical See-Through Head Mounted Display Calibration 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background and Related Work . . . . . . . . . . . . . . . . . . . . . 2.1 Structure of Transformations . . . . . . . . . . . . . . . . . 2.2 Camera Resectioning Method . . . . . . . . . . . . . . . . . 2.3 Sensitivity of the DLT Algorithm . . . . . . . . . . . . . . . 2.4 Previous SPAAM Evaluations . . . . . . . . . . . . . . . . . 2.5 Other OSTHMD Calibration Methods . . . . . . . . . . . . 3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Software Setup . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Independent Variables . . . . . . . . . . . . . . . . . . . . . 3.7 Dependent Variables . . . . . . . . . . . . . . . . . . . . . . 4 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Eye Point Estimation . . . . . . . . . . . . . . . . . . . . . . 4.2 Parameter Variance . . . . . . . . . . . . . . . . . . . . . . . 4.3 Condition Number . . . . . . . . . . . . . . . . . . . . . . . 5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Independent Variables . . . . . . . . . . . . . . . . . . . . . 5.2 Dependent Variables . . . . . . . . . . . . . . . . . . . . . . 6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Condition Number . . . . . . . . . . . . . . . . . . . . . . . 6.2 Parameter Variance . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiii

61 63 65 65 65 66 67 67 68 68 69 70 70 70 71 71 73 73 73 74 74 74 75 75 75 75 76 76 81

Notation

Abbreviations Abbreviation ar dof ekf hmd imu kf mems ost osthmd pose sift slam vr

Meaning Augmented Reality Degrees of Freedom Extended Kalman Filter Head Mounted Display Inertial Measurement Unit Kalman Filter Microelectromechanical Systems Optical See-Through Optical See-Through Head Mounted Display Position and Orientation Scale-Invariant Feature Transform Simultaneous Localisation and Mapping Virtual Reality

xv

Part I

Background

1

Introduction

Processing and interpretation of visual content is essential to many systems and applications. This requires knowledge of how the content is sensed and also what is sensed. Such knowledge is captured in models which, depending on the application, can be very advanced or simple. An application example is scene reconstruction using a camera; if a suitable model of the camera is known, then a model of the scene can be estimated from images acquired at different unknown locations, yet, the quality of the scene model depends on the quality of the camera model. The opposite is to estimate the camera model and the unknown locations using a known scene model. The aim of this work is to approach two such problems in two rather different applications.

1.1

Problem Formulation

This thesis considers two problems which are seemingly unrelated but actually have many things in common. The first problem is visual inertial navigation and mapping using a camera and inertial measurements. The second problem is calibration of an Optical See-Through Head Mounted Display system (osthmd) modelled as a pinhole camera.

1.1.1

Visual Inertial Navigation and Mapping

Navigation is the problem of estimating position, attitude and motion an object with respect to an inertial frame. The problem is fundamental to many applications and reliable solutions are necessary. In aircraft applications the navigation is often provided using inertial sensors and compass in combination with a Global Navigation Satellite System (gnss), such as the Global Positioning System (gps). In some environments, gnss is unavailable, e.g., underwater, or unreli3

4

1

Introduction

able due to multipath effects, (un)intentional jamming, among others. The mapping problem, on the other hand, seeks to describe properties of the environment, given data. The purpose of the Simultaneous Localisation and Mapping (slam) problem is to estimate both properties of the environment and the navigation. In this thesis, we consider the slam problem using inertial sensors and a monocular camera. To solve this problem, mathematical models which captures the most essential parts of the sensors measurements are needed for statistical inference. The slam problem can be solved using an Extended Kalman Filter (ekf) or similar methods, yet without any guarantee of convergence. We treat the slam estimates given by ekf as initial parameters for a nonlinear least-squares problem which we solve using the Gauss-Newton method.

1.1.2

Optical See-Through Head Mounted Display Calibration

We use Augmented Reality (ar) to denote visual information overlaid, augmented, on human vision. Optical See-Through Head Mounted Displays (osthmd) are wearable ar systems, which means that the osthmd displayed graphics moves with the user. To augment the real world with graphics in such a system three main problems needs to be solved. First, the position and rotation of the osthmd with respect to the inertial frame needs to be known, i.e., the navigation. Second, the calibration problem which constitutes the static transformations of the relative position and rotation of the semitransparent graphics screen with respect to the user’s eye and an inertial frame. Third, a model of the environment where the graphics should be superimposed must be found. In this thesis, the first two of the three problems are addressed. The first problem, navigation, is solved using a visual tracking system. The second problem, calibration, is solved using the navigation and user acquired measurements.

1.2

Publications

Published work of relevance to this thesis are listed below in chronological order. M. Axholt, M. A. Skoglund, S. D. O’Connell, M. D. Cooper, S. R. Ellis, and A. Ynnerman. Parameter Estimation Variance of the Single Point Active Alignment Method in Optical See-Through Head Mounted Display Calibration. In Proceedings of the IEEE Virtual Reality Conference, pages 27–34, Singapore, Republic of Singapore, March 2011. Background: The author was invited by Magnus Axholt to join his research in 2010. Magnus has great experience of various display systems and had already been working on this project for more than a year. This work reflects some labour intense engineering where several months were spent on the ar system which consists of several hardware components interacting through several software layers. We adopt the theoretical framework for camera calibration founded in the computer vision and photogrammetry domains to osthmd calibration. The calibration problem is rather ill-posed since the measurements are few, and fairly noisy, compared to the parameter space.

1.3

Contributions

5

Z. Sjanic, M. A. Skoglund, T. B. Schön, and F. Gustafsson. A Nonlinear Least-Squares Approach to the SLAM Problem. In Proceedings of the IFAC World Congress, Milan, August - September 2011. To appear. Background: This work started as a mini project by the author and Zoran Sjanic in the graduate course Dynamic Vision, which was given by Thomas Schön in 2008 and some parts where also done within the graduate course Robot Modeling and Control, given by Mikael Norrlöf. The initial ideas actually emerged during the author’s Master’s Thesis project in 2008. Parts of this work has also been used in Nilsson (2010); Nilsson et al. (2011) where the sensor is a forward looking infrared (IR) camera attached to a car, and in Karlsson and Bjärkefur (2010) where the authors use stereo camera and laser camera in an indoor environment. Published work not included in this thesis J. Callmer, M. Skoglund, and F. Gustafsson. Silent Localization of Underwater Sensors Using Magnetometers. EURASIP Journal on Advances in Signal Processing, 2010, 2010. doi: 10.1155/2010/709318. URL http://dx.doi.org/10.1155/2010/709318. Article ID 709318. M. Axholt, M. Skoglund, S. D. Peterson, M. D. Cooper, T. B. Schön, F. Gustafsson, A. Ynnerman, and S. R. Ellis. Optical See-Through Head Mounted Display Direct Linear Transformation Calibration Robustness in the Presence of User Alignment Noise. In Proceedings of the 54th Annual Meeting of the Human Factors and Ergonomics Society, volume 54, pages 2427–2431, San Francisco, CA, USA, September - October 2010. Human Factors and Ergonomics Society.

1.3

Contributions

The main contributions in this thesis are listed below. • In Sjanic et al. (2011) we provide a solution to the slam problem using a camera and inertial sensors. The individual contributions of the author and Zoran Sjanic are equal and consist of deriving the proposed solution, carrying out the experiments onto which the algorithms where evaluated and documenting the work. • In Axholt et al. (2011) the Single Point Active Alignment Method (spaam) used for osthmd calibration is evaluated in both experiments and simulations. Previously, much effort has been spent on analysing the effects of user’s interaction, which is known as human factors, in ar systems. The individual contributions of the author to this work is parts of the experiment design, derivation and verification of mathematical models and implementation in both software and hardware. Interdisciplinary research collaboration can be challenging but also very rewarding. The author believes an important contribution is due to this clash. It has led to a change of focus from the analysis of errors possibly introduced by humans to focus on the

6

1

Introduction

analysis of different experiment designs and the tracker data. Most of the documentation was carried out by Magnus Axholt in which the author only had minor contribution.

1.4

Thesis outline

The thesis is divided into two parts, with edited versions of published papers in the second part. The first part starts with Chapter 2 that introduce kinematics, dynamics, sensors and model structures. Some estimation techniques relevant two this thesis are presented in Chapter 3. Chapter 4 describes some details not covered in Paper A and Chapter 5 describes the ar system and some details of the calibration procedure used in Paper 5. The first part ends with Chapter 6 which presents conclusions and directions for future work. The second part contains edited versions of unpublished work, Sjanic et al. (2011) in Paper A and and unpublished work Axholt et al. (2011) in Paper B. Paper A is about the slam problem using a camera and inertial sensors, a solution is provided using nonlinear least-squares and the results are evaluated on experimental data. Paper B is about osthmd calibration where the is collected using spaam which is solved using the Direct Linear Transformation algorithm. The parameter variance for different acquisition methods using spaam is evaluated in both experiments and simulations.

2

Models

In this chapter we outline the most important model structures which are used in the publications in Part II. The models describe rigid body kinematics and dynamics in three dimensions, the pinhole camera and inertial measurements.

2.1

Kinematics

Kinematics describe the motion of bodies without considering the forces causing the motion. Kinematic transformation between coordinate frames consist of length preserving translations and rotations which is illustrated in Figure 2.1.

zw

yc

zc

yw

zw

yc

zc

yw

w cw

xw

x

xc

(a) A rotation is a displacement of the axes while the origin is kept fix.

w

c

xc

(b) A translation is a displacement of the origin while the axes are kept aligned.

Figure 2.1: Rigid body transformations. 7

8

2.1.1

2

Models

Notation

Throughout the thesis, superscript lower case letters are used to denote the associated coordinate frame. Subscript lower case letters denote, either a coordinate and/or a time instant. The common convention is to use bold font lower case letters to denote vectors and upper case letters to denote matrices. An example is  c  vx,t   c  vtc = vy,t  = Rcw vtw , (2.1) v c  y,t

vtw

where the time dependent vector in coordinate frame w is expressed in the coordinate frame c by a matrix multiplication with Rcw .

2.1.2

Translational Kinematics

A translation is a displacement of the body origin c while the axes are kept aligned. The translation is a vector, cw , from the origin of the frame w to the coordinate c expressed in the w frame and it is also called the position.

2.1.3

Rotational Kinematics

Proper rotations are maps that preserve length and orientation, i.e., the handedness of the basis. These maps are actually linear, hence, rotations can be done using matrices. There are many other parametrisation alternatives to rotation matrices and perhaps the most common ones are Euler angles and the unit quaternion, see Section 2.1.4. 2.1 Definition (Special Orthogonal Group SO(3)). A matrix R ∈ R3×3 belongs to the special orthogonal group SO(3) if and only if it holds R T R = I3 , det R = 1.

(2.2a) (2.2b)

Figure 2.1a describe a rotation from the w frame to the c frame. This can be parametrised by the rotation matrix Rcw . The inverse rotation direction using (2.2a) is (Rcw )−1 = (Rcw )T , hence (Rcw )T = Rwc .

2.2 Example: Rotation A vector in the a-frame, x a ∈ R3 , is rotated to the b-frame by Rba ∈ SO(3) according to x b = Rba x a ,

(2.3)

2.1

9

Kinematics

and using (2.2a) ||x b ||2 = ||Rba x a ||2 T (x b )T x b = (x a )T Rba Rba x a b 2

⇐⇒

(2.4a)

⇐⇒

(2.4b)

a 2

||x || = ||x || ,

(2.4c)

hence, the length is preserved. Note that using (2.2b) we have T 2 det Rba = det Rba =⇒ det Rba = 1

(2.5)

and the positive root of (2.5) assures Rba ∈ SO(3).

2.1.4

Quaternions

Quaternions were invented by Sir William Rowan Hamilton (Hamilton, 1844) as a tool to extend the imaginary numbers. The unit quaternion is widely used in aerospace industry as well as computer graphics since it allows a computationally efficient and singularity free rotation representation. 2.3 Definition (Quaternion). The quaternion is a four-tuple of real numbers q ∈ R4 . An alternative definition, as in Kuipers (2002, page 104), is to define a scalar part, q0 , and a real valued vector, q = e1 q1 + e2 q2 + e3 q3 , where e1 , e2 , e3 is the standard orthonormal basis in R3 . With Kuipers definition we have q = q0 + q,

(2.6)

which is a sum of a vector and a scalar which is not defined in ordinary linear algebra. The full vector representation is then   q0  " # q  q q =  1  = 0 . (2.7) q q2  q3 Rotation using Quaternions

A quaternion can be used to represent a rotation in R3 but in order to do so it must be constrained to the unit sphere, i.e., q T q = 1, hence the name unit quaternion. Let " # cos δ ba q = , (2.8) sin δn which is a unit quaternion describing a rotation of an angle 2δ around the unit vector n ∈ R3 . Then a rotation using this unit quaternion is x˜b = q ba x˜a q ab

(2.9)

where x˜? = [0, x? ] are the vectors’ quaternion equivalents denote the quaternion multiplication, see e.g., Törnqvist (2008). The unit quaternion can also be

10

2

Models

used to construct a rotation matrix, see for instance Shuster (1993, page 462), where the superscript ab is omitted for the sake of readability,  2  2(q1 q2 + q0 q3 ) 2(q1 q3 − q0 q2 )  (q0 + q12 − q22 − q32 )   R(q ab ) =  2(q1 q2 − q0 q3 ) (q02 − q12 + q22 − q32 ) 2(q2 q3 + q0 q1 )  = (2.10)   2(q1 q3 + q0 q2 ) 2(q2 q3 − q0 q1 ) (q02 − q12 − q22 + q32 )  2  (2q0 + 2q12 − 1) 2(q1 q2 + q0 q3 ) 2(q1 q3 − q0 q2 )    =  2(q1 q2 − q0 q3 ) (2q02 + 2q22 − 1) 2(q2 q3 + q0 q1 )  , (2.11)   2(q1 q3 + q0 q2 ) 2(q2 q3 − q0 q1 ) (2q02 + 2q32 − 1) where the last equality is established using q02 + q12 + q22 + q32 = 1. Throughout the thesis R(q ab ) and Rba will be used interchangeably.

2.2

Dynamics

As rigid bodies are studied, a fixed coordinate frame can be assigned to the local body frame to reveal the time-evolution of the body with respect to another coordinate frame.

2.2.1

Translational Dynamics

A body, in the b frame, with mass will be influenced by the Earths’ gravitational field. Assuming that the gravity component, ge = [0 0 − 9.81]T m/s2 , is constant we have v˙ te = aet − ge = Reb abt − ge ,

(2.12)

meaning that the e-referenced acceleration depends on the orientation of the body. If there are errors in the rotation, the gravity subtraction will cause v˙ te to deviate from the truth and this will result in velocity and position errors if (2.12) is used to resolve theses quantities. This is commonly known as gravity leakage and it contributes to the inherent drift in inertial navigation systems.

2.2.2

Rotational Dynamics

The differential form of the unit quaternion is bilinear in the angular velocity and is given by   be b b b   be  be be   0 −ωx −ωy −ωz  q0,t  −q1,t −q2,t −q3,t   b      b  be  be be    ωx   be 0 ωzb −ωyb  q1,t 1 ω  1  q0,t −q3,t q2,t  ω b  = q˙ tbe =  xb   y  , (2.13)   be   be be  be   b   2  q3,t 0 ωxb  q2,t q0,t −q1,t 2 ωy −ωzb      be   b  be be be  ωz ωz ωyb −ωxb 0 −q2,t q1,t q0,t q3,t | {z } | {z } be b e S (qt ) S(ωt ) here, ωtb is the instantaneous angular velocity of the moving body frame b relative to the fixed e frame, expressed in the b frame. For a complete derivation see Törnqvist (2008).

2.3

11

Inertial Measurements

2.2.3

Dynamic Models

Assume that the acceleration, abt , the gravity, ge , and the angular acceleration, ωtb , are known. This leaves us a state vector resolved in the e frame  e  pt    xte =  vte  , (2.14)  be  qt where pet , vte and qtbe , represents the position, velocity and the rotation (attitude) of the b frame with respect to the e frame, respectively. Obviously p¨ et = v˙ te = aet , and by using (2.13) and (2.12) the dynamic model becomes     e  0 I 0 0   pet     p˙ t        T   0   vte  + R q be ab − ge  . x˙ te =  v˙ te  0 0 t t       be    0 0 12 S ωtb  qtbe q˙ t 0

2.3

(2.15)

(2.16)

Inertial Measurements

Inertial sensors measure specific force and angular velocity and they should be modeled as states which are measured. However, to reduce the dimension of the state space needed the inertial measurements are considered to be inputs.

2.3.1

Gyroscopes

The output of the gyroscopes is ωtb = ubω,t + wω,t ,

(2.17)

where ubω,t is used to stress that the angular velocity is considered to be an input. Furthermore, wω,t is noise. Using (2.13) the quaternion dynamics becomes 1 1 e be 1 qt wω,t . (2.18) q˙ be = S ubω,t + wω,t qtbe = S ubω,t qtbe + S 2 2 2 Due to unmodeled effects in the sensors, a bias, δω,t , could be added to (2.17) giving ωtb = ubω,t + δω,t + wω,t ,

(2.19)

and δω,t could also be included in the state vector.

2.3.2

Accelerometers

Contrary to what the name implies, an accelerometer does not only measure acceleration when used on earth, since it is subject to the gravitational field. Hence, accelerometers measure the specific force which is a sum of the free acceleration and the gravity field. The accelerometer is also treated as a known signal and

12

2

Models

output becomes e b , = Rbe aet − ge + wa,t uba,t = abt − gb + wa,t

(2.20)

b where wa,t is measurement noise. It should be clear that the measured specific force depends on the attitude Rbe . If the noise is assumed to be zero mean Gaussian with the same variance in all components, (2.20) has the same statistical properties as b uba,t = Rbe (aet − ge ) + wa,t ,

(2.21)

i.e., the noise does not depend on the coordinate frame.

2.3.3

Discrete-time Dynamic Models

The sensors deliver sampled data and the dynamic model (2.16) therefore needs to be expressed in discrete time. This can be done in many ways. Assume that the angular velocity is constant over the sampling interval T and free from noise, then (2.18) integrates to T

b

qt+1 = e 2 S (uω,t ) qtbe , approximating the exponential results in T qt+1 ≈ I + S ubω,t qtbe , 2

(2.22)

(2.23)

and assumed that the noise from (2.16) account for the exponential approximation as well as the true sensor noise we get T T e be qt+1 = qt + S ubω,t qtbe + S qt wω,t . (2.24) 2 2 The discete-time dynamic models for the position and velocity can be computed in a similar fashion. Assuming that a˙ et = 0, from (2.15) we then have  e    e p˙ t  0 I 0 pt   v˙ e  0 0 I   v e  (2.25)  t  =    t     e  a˙ t 0 0 0 aet Assuming that the noise input is constant during the sampling interval, the discretetime translational dynamics is  e   2   e pt+1   I T I T2 I  pt   v e    e (2.26)  t+1  = 0 I T I   vt  .  e    ae at+1 0 0 I t

2.4

13

The Pinhole Camera Model xp

p yp mc

n

m z

c

c

xi

xc

h f yc

yi

Figure 2.2: Pinhole projection with image the plane placed in front of the camera centre.

Using (2.21), (2.24) and (2.26) we get the full discrete-time dynamic model   e   e   2 # 0  "  pt  I3 T I3 0  pt−1   T2 I3 be T b R(qt−1 ) ua,t + ge    e     vte  =  0   I3 0   vt−1  +  T I3 0  be      be   S(ubω,t )qt−1 T  0 0 I4 qt−1 qtbe 0 2 I4 T2   2 I3  " b # 0   wa,t   e +  T I3 (2.27) 0   T e be  wω,t S(q ) 0 t−1 2 which is also to be found in Section 3.1 in Paper A as (5) but is included here for completeness. It is also worth noting that this model integrates the errors in the sensor measurements directly, whereas if the sensor readings are modeled as measurements we would get low-pass filtering, as pointed out in Callmer (2011).

2.4

The Pinhole Camera Model

The pinhole camera is a mathematical model describing how points in R3 relate to points in R2 on the image plane through a central projection. The model only contains five intrinsic parameters and for most systems it is an approximation since it does not model illumination, contrast, lenses and lens-distortion, among others. Additionally, the position and orientation (pose) of the camera centre with respect to some reference frame is also needed if the camera is only a part of a system in which we are interested in. This adds another 6 degrees of freedom (dof) to the pinhole camera model.

2.4.1

Camera Coordinates

Figure 2.2 illustrates a frontal pinhole projection. In the pinhole camera model the coordinate frames are:

14

2

Models

• c - Camera frame with the optical centre as the origin where its z-axis coincides with the optical axis, also known as the principal axis. • w - World frame. • i - Image frame. The image plane is orthogonal to the optical axis and has the principal point h as its origin. The distance between the c and the i frame is called the focal length f . • n - Normalised image frame. • p - Pixel frame with center in the upper left corner as viewed from the optical centre c. A point mc = [x c , y c , z c ]T in the c-frame relates to a point mn = [x n , y n ]T in the n-frame as " # " n# f xc x (2.28) = c c . yn z y It is convenient to write express (2.28) as a linear system which can be done by appending unit elements to the vectors giving the homogeneous coordinate representation    n  n   xc  x  x  1 0 0 0 y c  y n  ∝ z c y n  = 0 1 0 0  c  . (2.29)        z  1 1 0 0 1 0   | {z } 1 Π0

With focal length f = 1 the normalised image frame n and the image frame i coincides. The projection (2.29) can be expressed as λmn = Π0 mc ,

(2.30)

where λ ∈ R+ is an arbitrary scale factor. Intrinsic Parameters

The digitalisalisation of the image plane i is the pixel plane p " p# " #" # " # fx sα x i hx x i x , = + 0 fy y i yp hy y i or in homogeneous coordinates   i  p  x  fx sα hx  x  y p   0 f hy  y i  ,   =  y      1 0 0 1 1 | {z }

(2.31)

(2.32)

K

where K is referred to as the intrinsic parameter matrix or simply the calibration matrix. The parameters of the calibration matrix are the focal lengths in pixel units fx = sx /f and fy = sy /f where f is the focal length expressed in meters per

2.4

15

The Pinhole Camera Model

m

Rcw T mc c

mw w cw , c )

w

(R

Figure 2.3: Camera to world transformation.

pixel and sx , sy are the pixel sizes in metric units. The center of the image plane h = [hx hy ]T , is the principal point coordinate and sα = fx tan α is a skew parameter. In most cameras it is reasonable to assume sα ≈ 0 (Hartley and Zisserman, 2004).

Distortion

Most cameras suffer from non-negligible distortion due to the optical lenses involved. This can be compensated for by tangential- and radial distortion models (Ma et al., 2004; Zhang, 2000), which are provided in camera calibration software, see e.g., Bouguet (2010).

Extrinsic Parameters

Points in another frame, say w, can be expressed in the camera frame c. In the pinhole camera model, such transformation is called extrinsic since it does not depend on the intrinsic camera kalibration matrix K. Figure 2.3 describes the relation between a point mw in world coordinates w expressed in camera coordinates c mc = Rcw (mw − cw ) = Rcw mw − Rcw cw , which can be written in homogeneous coordinates as  c  w x  " x  # cw −Rcw cw  y c  y w  R  c  =  w  .  z   z  0 1     | {z } 1 1 T cw

(2.33)

(2.34)

16

2.4.2

2

Models

World to Pixel Coordinates

Combining the extrinsic end intrinsic parameters, coordinates in the world frame can be expressed in pixel coordinates as  w  p  p   " x  # f s h x x 1 0 0 0   x  p   p   x α  Rcw −Rcw cw y w    c y  ∝ z y  =  0 fy hy  0 1 0 0  w  , (2.35)         0  z  1   1 1 0 0 1 0 0 1 0 | {z } 1 P

where P is called the camera matrix or projection matrix. In compact notation (2.35) is h i x p = K Rcw −Rcw cw x w . (2.36)

3

Estimation

Estimation is the problem of taking measured data, yt , to reveal statistical properties, xt and θ, of systems which are of interest. Basic ingredients are model structures representing the assumptions made on the interaction and time evolution of xt , θ and yt . As tools for solving estimation problems, state space models are commonly used model structures and a fairly general description of such is x˙ t = f (xt , ut , θ, wt ), yt = h(xt , θ, et ),

(3.1a) (3.1b)

where f ( · ) and h( · ) are general nonlinear functions, xt are states, ut are known inputs, θ are unknown or partially known parameters and wt and et are noise representing everything that cannot be predicted yet still affects the system. However, in this work, an important special case is considered where the noise are assumed additive and Gaussian. Then the discrete time version of (3.1) is xt = f (xt−1 , ut , θ) + wt , yt = h(xt , θ) + et .

(3.2a) (3.2b)

The functions f ( · ) and h( · ) are assumed well behaved, meaning they are at least one time continuously differentiable and smooth in the domain.

3.1

Sensor Fusion

Sensor fusion is the process of combining multi-sensory data in a clever way to obtain a state estimate xˆt or a state sequence estimate {xˆt }N t=0 . Model parameters, θ, are usually not of interest. The sensor fusion concept is illustrated in Figure 3.1, where the input sensors could as well be sensor fusion nodes. 17

18

3

Sensor Fusion

Estimation

State Estimates

Sensor

Computing Device

Sensor

and System Models

Sensor

Figure 3.1: Sensor data is processed in a sensor fusion node to obtain state estimates.

An example of sensor fusion; An imu in combination with a vector magnetometer in a single unit facilitate the possibility of estimating the absolute attitude of the unit with respect to the local magnetic field and the local gravity field, assuming the fields are constant and the sensor is still. In Chapter 4 sensor fusion of an imu and a camera data is considered. Cameras and IMUs are somewhat complementary sensors meaning that they together provide information which is readily not available from any of the sensors alone. As imu measurements are noisy they only allow short time attitude estimation, actually, the absolute orientation cannot be obtained from an imu alone. On the other hand, cameras allow long term attitude estimation (Montiel and Davison, 2006) in case of moderate movement of the camera (not causing to much motion blur), assuming robust detection and matching of interest points.

3.2

Calibration

We refer to calibration as the problem of estimating the parameter vector θ associated with the model structures (3.2a) and (3.2b). Some calibration problems also require solving sensor fusion and parameter estimation jointly. In System Identification literature calibration is usually called Grey-box Identification, see e.g., Ljung (1999). In Chapter 5 calibration of an osthmd is considered, where states, xt , are given by an electro optical tracking device and the measurements, yt , are gathered by humans in what is called a bore sighting exercise. The calibration is solved using the Direct Linear Transformation (DLT) algorithm.

3.3

19

Nonlinear Estimation

3.3

Nonlinear Estimation

Methods of handling nonlinarities in estimation problems can broadly be divided into two classes; the first is nonlinear methods not requiering explicit linearisation where some examples are Uncented Kalman Filters (ukf) (Julier and Uhlmann, 1997), particle filters (pf) (Gordon et al., 1993; Schön, 2006) the second is to linearise the nonlinearities (at the current estimate) and use linear methods, e.g., Extended Kalman Filters (ekf).

3.3.1

Extended Kalman Filter

The celebrated Kalman Filter(kf) (Kalman, 1960) provide unbiased estimates with minimum variance if applied to linear systems which are subject to Gaussian noise. For nonlinear systems, a natural extension is to linearise the nonlinear functions by a Taylor expansion at the current estimate and then propagate the mean and the covariance approximations. This can be done in an Extended Kalman Filter (ekf), which is described in many books, see e.g., Kailath et al. (2000). The ekf works in a two step procedure summarised in Algorithm 1 below where the parameters, θ, are omitted and the sampling time is, T = 1. Algorithm 1 Extended Kalman Filter Require an initial state, xˆ0|0 , and an initial state covariance, P0|0 , and use the models (3.2). 1. Time Update xˆt|t−1 = f (xˆt−1|t−1 , ut ), (3.3a) Pt|t−1 = Ft Pt−1|t−1 FtT + Qt , 2. Measurement Update Kt = Pt|t−1 HtT (Ht Pt|t−1 HtT + Rt )−1 , xˆt|t = xˆt|t−1 + Kt (yt − h(xˆt|t−1 )), Pt|t = Pt|t−1 − Kt Ht Pt|t−1 . In Algorithm 1 we have defined ∂f (xt , ut ) Ft , ∂xt

,

Ht ,

(xt ,ut )=(xˆt−1|t−1 ,ut )

∂h(xt ) , ∂xt (xt )=(xˆt|t−1 )

(3.3b) (3.4a) (3.4b) (3.4c)

(3.5)

furthermore, Qt and Rt are the covariance matrices of wt and et , respectively.

3.3.2

Nonlinear Least-Squares

The objective in nonlinear least-squares is to estimate parameters, θ, by minimising the residuals, εt (θ) = yt − ht (θ), see for instance Nocedal and Wright (2006). The residuals are minimised in a least-squares sense N

N

t=1

t=1

1X 1X ||εt (θ)||22 = εt (θ)T εt (θ). V (θ) = 2 2

(3.6)

20

3

Estimation

dε(θ)T

Define ε(θ) = [ε1 (θ) ε2 (θ) . . . εN (θ)]T and the Jacobian J(θ) = dθ , then the gradient and the Hessian of (3.6) with respect to the parameters are given by N

dε (θ) dV (θ) 1 X εt (θ) t = = J(θ)ε(θ), dθ 2 dθ

(3.7)

t=1

and N

d 2 V (θ) d 2 εt (θ)T 1X = J(θ)T J(θ) + , εt (θ) 2 2 dθ dθ 2

(3.8)

t=1

respectively. The extension to multivariable residuals is easily obtained by stacking the vectorisation of the individual residuals which again gives a scalar cost function, see Gustafsson (2010, page 49). The Gauss-Newton method is a popular second order method used for solving nonlinear least-squares problems. It is computationally simple since the Hessian of the objective function is approximated as d 2 V (θ) ≈ J(θ)T J(θ), (3.9) dθ 2 significantly reducing computations. The Gauss-Newton method as an algorithm is summarised in Algorithm 2 Algorithm 2 Gauss-Newton 1. Require initial an estimate θˆ ˆ 2. Compute the Jacobian J(θ) 3. Compute a search direction p by solving ˆ θ) ˆ T p = −J(θ)ε( ˆ θ). ˆ J(θ)J( (3.10) 4. Compute a step length, α, such that the cost (3.6) is decreased. This can be done using line search, see e.g., Nocedal and Wright (2006, page 297) or Boyd and Vandenberghe (2004, page 464). 5. Update the parameters θˆ := θˆ + αp. (3.11) 6. Terminate if a stopping criteria is met. Such criteria can be; the change in cost is small, the number of iterations has exceeded some threshold, among others. 7. Otherwise, return to step 2.

4

Visual Inertial Navigation and Mapping

Simultaneous Localisation and Mapping (slam) emerged in the field of mobile robotics as a tool to enhance navigation and to infere properties of the surounding environment by means of the onboard sensors. A lot of attention was spent on indoor platforms equipped with wheel encoders for odometry and line sweeping laser range scanners measuring the environment. Today, all sorts of platforms and sensors are used in slam applications e.g., submarines (Eustice et al., 2006), monocular camera (Davison et al., 2007; Davison, 2003; Eade, 2008; Klein and Murray, 2007), aerial platforms (Karlsson et al., 2008; Caballero et al., 2009; Bryson and Sukkarieh, 2009; Lupton and Sukkarieh, 2009, 2008; Skoglund, 2008) and even domestic robots, Neato Robotics (2011). In this thesis, the sensors are an imu and a camera contained in a single unit.

4.1

Visual and Inertial

EKF- SLAM

EKF-SLAM is probably the most common slam method and it is often straightforward to implement as described in e.g., Durrant-Whyte and Bailey (2006); Smith et al. (1990). For a thorough treatment, the book by Thrun et al. (2005) serves as a standard reference. In feature based slam, coordinates in the global frame are explicitly represented as landmarks, l, which are part of the state vector. The standard assumption is that the landmarks are stationary but alternatives exist, in Bibby and Reid (2007) a model structure allowing dynamic objects is considered. Some feature characteristics of the landmarks have to be detected and associated to the measurements. This is known as the data association problem in slam, and it depends on the sensors and the environment, among others. For a camera sensor, the data association is the 2D/3D correspondence between detected features in images and landmarks in the landmark map. We solve this using the Scale 21

22

4

Visual Inertial Navigation and Mapping

Invariant Feature Extractor (SIFT), Lowe (1999), see Section 4.1 in Paper A. In our application the camera and imu deliver data at different rates. This should be distinguished in the equations as well as in the implementation. Denote the time when camera measurements arrive by tk , then the dynamic, the landmark and the measurement models are given by xt = f (xt−1 , ut ) + Bw wt , lt = lt−1 , ytk = h(xtk , ltk ) + etk ,

(4.1a) (4.1b) (4.1c)

which is also found as (1) in Paper A. The ekf given in Algorithm 1 applies to (4.1) with just a few modifications.

4.1.1

Prediction

The prediction step in EKF-SLAM is given by xˆt|t−1 = f (xˆt−1|t−1 , ut ), lˆt|t−1 = lˆt−1|t−1 , xx Pt|t−1

=

xx Ft Pt−1|t−1 FtT

(4.2a) (4.2b)

+

Bw (xˆt−1|t−1 )QBTw (xˆt−1|t−1 ),

(4.2c)

where ∂f (xt−1 , ut ) Ft , ∂xt−1

,

(4.3)

(xt−1 ,ut )=(xˆt−1|t−1 ,ut )

and " Q=

Qa 0

# 0 . Qω

(4.4a)

Note that only the vehicle state covariance is updated. The full covariance matrix is " xx # Pt Ptxl Pt = lx . (4.5) Pt Ptll Also, when new landmarks are initialised they are appended to the state vector.

4.1.2

Measurement Update

Assuming the data association is given, then the measurement update for ekfslam is given by "

Kt = Pt|t−1 HtT (Ht Pt|t−1 HtT + Rt )−1 , # " # xˆt|t xˆt|t−1 = ˆ + Kt (yt − h(xˆt|t−1 , lˆt|t−1 )), lˆt|t lt|t−1 Pt|t = Pt|t−1 − Kt Ht Pt|t−1 ,

(4.6a) (4.6b) (4.6c)

4.2

23

Nonlinear Least-Squares and slam

where Ht ,

h

∂ h(xt , lt ) ∂xt

i

∂ h(xt , lt ) ∂lt

(xt ,lt )=(xˆt|t−1 ,lˆt|t−1 )

.

(4.7)

When the unit quanterion is used for rotation parametrisation in an EKF it has to be normalised qt|t qt|t := , (4.8) ||qt|t || preferably after each measurement update. An alternative is to introduce the quaternions’ deviation from unity as a fictive measurement.

4.2

Nonlinear Least-Squares and

SLAM

The basic concept of nonlinear least-squares for slam is to acquire an initial estimate, linearise the models around the estimate and minimise all the measurement errors and the state trajectory errors in a least-squares sense. 0 ekf-slam (Section 4.1) provide an initial trajectory, x0:N , and an initial landmark 0 0 estimate, lN . The linearised dynamics around xt at time t is then 0 xt0 + δxt = f (xt−1 , ut ) + Ft δxt−1 + Bw wt , |{z}

(4.9)

et w

xt0

where δxt is a small deviation from and Ft is the derivative of f (xt−1 , ut ) with 0 0 respect to xt−1 evaluated in xt−1 . The linearised measurements around lN and xt0k for the measurements at time tk is 0 ytk = h(xt0k , lN ) + Htk δxtk + Jtk δlN + etk ,

(4.10)

0 where δlN is a small deviation from lN , Htk is the derivative of h(xtk , lN ) with 0 0 respect to xtk evaluated at xtk and lN and Jtk is the derivative of h(xtk , lN ) with 0 respect to lN evaluated at xt0k and lN .

By rearranging the terms in (4.9) and (4.10) we can formulate the problem of finding the trajectory and map deviations that minimises the noise variances as min

δxt , δlN

N X t=1

et (δxt )||2e−1 + ||w Qt

K X k=1

||etk (δxtk , δlN )||2R−1 .

(4.11)

tk

et = Bw Qt BTw and ||x||2 −1 = xP −1 x T . This is in principle a (weighted) where Q P nonlinear least-squares problem which gives an increment for the parameters, T T T θ = [x0:N lN ] , and the Gauss-Newton algorithm (Algorithm 2) applies.

4.2.1

Remarks

There are, at least, two problems associated to (4.11) which are briefly discussed below.

24

4

Visual Inertial Navigation and Mapping

Process Noise

et is singular, see (2.27) on page 13. This is because the veThe process noise, Q locity is affected by the same noise as the position and the quaternion only has 3dof but 4 states. It turns out that a solution does not have to be consistent with the dynamic model. A solution to this is to add the linearised dynamic model as a constraint to the problem (4.11) which results in a constrained quadratic program. Quaternion

The unit quaternion is constrained to the unit sphere, q T q = 1, which is a nonconvex constraint that also should be handled. A suggestion is to use a local 3 parameter approximation of the unit quaternion, Triggs et al. (2000); Kümmerle et al. (2011).

5

Optical See-Through Head Mounted Display Calibration

The Optical See-Through Head Mounted Display (osthmd) calibration process is introduced by giving an overview of the ar system and its geometry, explaining the measurement procedure and the Direct Linear Transformation used for calibration. This chapter also explains some parts not covered in Paper B. Figure 5.1 show the Kaiser ProView 50ST osthmd which was used in the experiments, Axholt et al. (2011). The red lights are tracker markers which are measured by the tracker system.

5.1

System Overview

The ar system consists of an osthmd and a motion tracking system briefly described below.

5.1.1

Head Mounted Display

In a pinhole camera model the light pass through a lens system and then hits the camera sensor. In an osthmd there is no camera sensor, yet an optical combiner serves the same purpose when used in a pinhole camera model. A simplified illustration of how the graphics is created in th osthmd can be seen in Figure 5.2. The plane-to-plane homography of the pixels from the beamer screen, through a collimator and onto the ost semitransparent screen (optical combiner), is one of the paths light travel before reaching the eye and this is how the virtual overlay is created onto the scene. The other path of light comes from the scene, which does not pass through the beamer or the collimator. Note that the pixels are distorted by the relative rotation of the planes involved. 25

26

5

Optical See-Through Head Mounted Display Calibration

Figure 5.1: osthmd, the red lights are used by the optical tracker system consisting of several cameras.

5.1.2

Tracker

The optical tracker system, which is a PhaseSpace IMPULSE motion capture, measure the position of tracker marks (red diodes) and provides 6dof pose at a maximum of 200Hz, but due to software limitations the calibration application only runs at 60Hz. The tracker is used in two ways. Firstly, 11 tracker marks are attached to the hmd defining a rigid body of which the tracker provides 6dof pose. Secondly, a single tracker mark is mounted on a tripod and its position is also reported by the tracker. This single tracker mark is called the background marker.

5.2

Calibration

Traditional camera calibration considering electro optical cameras is a well studied problem with many effective solutions and freely available software providing very accurate calibration, almost automatically, see e.g., Bouguet (2010) or OpenCV. In osthmd calibration good results are difficult to obtain for various reasons, for example, equipment is fairly non-standard and requires special solutions, data acquisition is time consuming and errors are easily introduced (the screen shifts about 30-40 pixels by just frowning!). The osthmd Figure 5.1 is quite a complex device and with few calibration data we are forced to approximations reducing the parameter space. With Figure 5.2 in mind, some assumptions motivating the pinhole camera model as an approxi-

5.2

27

Calibration

c zc

xc

xi yi

xs

ys

yc

xb y b xi yi

xs

(a) 3D illustration of how the graphics from the beamer b screen is displayed onxbthe virtual image i as the reflected light yb falling on the semitransparent screen s. The black dot represents the beamer light source. The c coordinate frame represents the user’s eye which corresponds to the camera centre in the pinhole camera model.

ys

(b) 2D view of the beamer screen (left), ost display (middle) and the image plane (right).

Figure 5.2: Projection from an image on the beamer, b, to the screen s onto the image plane, i. The beamer is rotated around the x b axis while the screen is rotated around its x s and y s resulting in a distorted image. This is because projective transformations do not necessary preserve angles. The distance from the c frame to the s frame is the focal length, f .

28

5

Optical See-Through Head Mounted Display Calibration

mate model structure for the osthmd are: 5.1 Assumption. The rotation between the frames b and s is small, i.e., Rbs ≈ I, hence, the shape of the pixels projected onto the screen are rectangular. 5.2 Assumption. The pixel plane is perpendicular to the optical axis. Without this assumption another rotation has to be estimated, Robinett and Rolland (1991). 5.3 Assumption. The pixels, as perceived by the user, are points. This is true for most users with uncorrected vision because of the VGA resolution(640 × 480 pixels) of the ost display and the distance of the optical combiner to the user’s eye makes the pixel shape almost indistinguishable.

5.2.1

Tracker Measurements

The aim of the measurement procedure is to gather N correspondences between the background marker x w and screen coordinates pixel x p , we denote this by p xi ↔ xiw , i = 1, . . . , N . This is done manually in a so called boresighting exercise where the user aligns the background marker with crosshair displayed on the ost. This is done N times as described in Section 3.5, in Paper B. The tracker provide the position of the tracker markers referenced in the tracker frame t. The markers attached to the hmd is defined as a rigid body with its origin defined in one of the markers for which the tracker reports the full 6 dof pose. To simplify calculations the hmd is used as the base frame w and it is considered to be inertial. Since the background marker is also measured by the tracker it can easily be referenced in the hmd frame. Let T wt denote the homogeneous representation of Euclidean transformation from the t frame to w frame, then the background marker x t is referenced in the w frame as x w = T wt x t ,

(5.1)

where the vectors are homogeneous. This is a great advantage since we would otherwise need to know the exact location of the background marker in a global frame with respect to the tracker frame. Furthermore, this is basically the same idea as in the Single Point Active Alignment Method (SPAAM) of Tuceryan and Navab (2000) where their background marker is the approximate tracker origin. Using (5.1) and the pinhole camera model (2.35) on page 16, the screen coordinates are related to the background marker as λx p = P x w , where λ ∈ R accounts for the arbitrary scale.

(5.2)

5.2

29

Calibration

5.2.2

Direct Linear Transformation

The Direct Linear Transformation (DLT) (Abdel-Aziz and Karara, 1971) is a pap rameter estimation method which takes N point correspondences xi ↔ xiw , i = 1, . . . , N , given by p

λi xi = P xiw ,

(5.3)

where all λi are unknown scale factors. The sought parameters are the entries in the 3 × 4 camera matrix P . In our setup, P defines the pinhole camera model as p in (2.35), where xi are pixel coordinates and xiw coordinates in the w-frame. The unknown scale factors λi are removed by applying the cross product p

p

p

λi xi × xi = xi × P xiw , | {z }

(5.4)

=0

which can be rearranged into  −xiw T  0T  w T  xi 0T  p T p w −yi xi −xi xiw T

 p yi xiw T   p −xi xiw T   0T

 1 p   2  p  = 0,  3 p |{z}

(5.5)

p

where pi , i = 1, 2, 3 are the transposed rows vectors of P , i.e., the vectorisation of P . The are only 2 linearly independent equations in (5.5), so it is possible to leave one out. Since there will be measurement errors, in either x p or x w , or both, the right hand side of (5.5) will not be exactly zero. It is therefore preferable to collect more than the six correspondence points which is the minimum required to solve (5.5) for P . Taking the N correspondences and stacking the entries in a tall 3N × 12 matrix A we obtain Ap = , [p1

p2

(5.6)

p3 ]T

where p = and are the residual errors associated to the correspondences. The residual errors can be minimised by the following optimisation problem min ||Ap|| p (5.7) subject to ||p|| = 1, which has a non-degenerate solution if the correspondence point configuration is general, see Section 2.3 in Paper B. The solution correspondonds to the smallest nonzero singular value of A. The camera matrix has only 11 dof since the scale is unknown, hence, A has only 11 nonzero singular values. We must therefore add a constraint to p since otherwise the solution would be q = 0. The unit singular vector associated to the smallest nonzero singular value gives the minimising argument pˆ and hence the camera matrix Pˆ is obtained. For a more complete discussion on DLT, see Hartley and Zisserman (2004); Sutherland (1974).

30

5.2.3

5

Optical See-Through Head Mounted Display Calibration

Decomposition

The 3 × 4 camera matrix P can be decomposed into extrinsic parameters, Rcw and cw , and the upper triangular intrinsic parameter matrix K. These parameters are needed for analysis but are also required to assemble the OpenGL matrices for display configuration of the ar graphics in VR Juggler. The decomposition is obtained from (2.36) as h i h i P = K Rcw −Rcw cw = K Rcw I −cw . (5.8) Since Rcw is orthogonal (Rcw Rcw T = I) and K is upper triangular, the matrices can be found as the RQ-decomposition of the left 3 × 3 sub-matrix of P . The ambiguity in the decomposition can be removed by requiring that K has positive diagonal entries (Hartley and Zisserman, 2004). Also, K is normalised by its last entry K :=

K . K33

The camera centre (eye point location) is in the right null space of P " w# c P = 0, 1

(5.9)

(5.10)

and cw can be recovered as cw = −P † p4 ,

(5.11)

where p4 is the last column of P and † denote the pseudo inverse.

5.2.4

Remarks

The DLT minimises the error, ||Ap||, which is called the algebraic error and it is usually difficult to analyse geometrically. Actually, we wouldlike to maximise the p probability of the parameters given the correspondences, p P ; xi , xiw . This can p easier be done using Maximum Likelihood, p xi , xiw ; P , where error measures have to be derived. These methods are called Gold Standard in Hartley and Zisserman (2004) and usually results in nonlinear least-squares which is initialised with an estimate from e.g., DLT. Under the assumption of zero mean independent and identically distributed Gaussian noise in the pixels then min P

N X

p

||xi − P xiw ||2 ,

(5.12)

i=1

is the Maximum Likelihood estimator for P . With the experimental data in Paper B and minimisation of (5.12) using Levenberg-Marquardt no further improvement is obtained. We believe this is because of a poor initial estimate. A more reasonable assumption, given our setup, is that there are errors in the background marker coordinates, xiw , as well. This is due to the fact that the background marker is measured, hence containing errors, and these measurements are referenced by T wt (5.1) which is also uncertain.

5.2

Calibration

31

Care should be taken interpreting the camera parameters as Euclidean from the decomposition (Section 5.2.3). Even if the lens system in the osthmd is ideal there is no guarantee that the ost display is perpendicular to the optical axis.

6

Concluding Remarks

6.1

Conclusions

The work in this thesis has dealt with two estimation problems that are related by the pinhole camera model and inertial sensing. We have shown how measurements from inertial and camera sensors can be used in a compact model. The aim was to provide improved estimates of the navigation states and a landmark map. This was done by treating the filtered estimates from ekf-slam as an initial solution to a nonlinear least-squares problem which was solved using the Gauss-Newton method. The approach was evaluated on experimental data and the results are encouraging. We have presented a method for calibration of an Optical See-Through Head Mounted Display system (osthmd) which is modelled as a pinhole camera. We have shown and motivated why the pinhole camera model can be used to represent the osthmd and the user’s eye position. The camera model was estimated using the Direct Linear Transformation algorithm. Results are presented in Paper B which also compare different data acquisition methods.

6.2

Future Work

Some suggestions for future research related to this thesis are: • Vision Inertial Navigation and Mapping – Utilise the inherent sparseness in the Jacobian of the full slam problem. 33

34

6

Concluding Remarks

– How well does the nonlinear least-squares approach perform compared to other optimisation like methods such as iSAM (Kaess et al., 2008), TORO (Grisetti et al., 2007)? – How can inertial sensors and dynamic models be used in Bundle Adjustment (Triggs et al., 2000; Hartley and Zisserman, 2004)? – Extend the relative pose calibration frame work of Hol et al. (2010); Hol (2008) by adding a slam solution. • osthmd Calibration – Examine Cramér-Rao bounds for the parameters in the pinhole camera model. – In the longer run, display systems should be supported by eye tracking devices which would then reduce the need for manual calibration.

Bibliography

Y. Abdel-Aziz and H. Karara. Direct linear transformation from comparator to object space coordinates in close-range photogrammetry. In Proceedings of the American Society of Photogrammetry Symposium on Close-Range Photogrammetry, pages 1–18, Falls Church, VA, USA, 1971. M. Axholt, M. Skoglund, S. D. Peterson, M. D. Cooper, T. B. Schön, F. Gustafsson, A. Ynnerman, and S. R. Ellis. Optical See-Through Head Mounted Display Direct Linear Transformation Calibration Robustness in the Presence of User Alignment Noise. In Proceedings of the 54th Annual Meeting of the Human Factors and Ergonomics Society, volume 54, pages 2427–2431, San Francisco, CA, USA, September - October 2010. Human Factors and Ergonomics Society. M. Axholt, M. A. Skoglund, S. D. O’Connell, M. D. Cooper, S. R. Ellis, and A. Ynnerman. Parameter Estimation Variance of the Single Point Active Alignment Method in Optical See-Through Head Mounted Display Calibration. In Proceedings of the IEEE Virtual Reality Conference, pages 27–34, Singapore, Republic of Singapore, March 2011. C. Bibby and I. Reid. Simultaneous Localisation and Mapping in Dynamic Environments (SLAMIDE) with Reversible Data Association. In Proceedings of Robotics: Science and Systems, Atlanta, GA, USA, June 2007. J.-Y. Bouguet. www.vision.caltech.edu/bouguetj/calib_doc/, 2010. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. URL http://www.stanford.edu/~boyd/bv_cvxbook.pdf. M. Bryson and S. Sukkarieh. Architectures for Cooperative Airborne Simultaneous Localisation and Mapping. Journal of Intelligent and Robotic Systems, 55 (4-5):267–297, 2009. doi: 10.1007/s10846-008-9303-9. F. Caballero, L. Merino, J. Ferruz, and A. Ollero. Vision-based odometry and SLAM for medium and high altitude flying UAVs. Journal of Intelligent and Robotics Syststems, 54(1-3):137–161, 2009. ISSN 0921-0296. doi: http://dx. doi.org/10.1007/s10846-008-9257-y. 35

36

Bibliography

J. Callmer. Topics in Localization and Mapping. Licentiate thesis no. 1489, Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden, May 2011. J. Callmer, M. Skoglund, and F. Gustafsson. Silent Localization of Underwater Sensors Using Magnetometers. EURASIP Journal on Advances in Signal Processing, 2010, 2010. doi: 10.1155/2010/709318. URL http://dx.doi.org/ 10.1155/2010/709318. Article ID 709318. A. J. Davison. Real-time simultaneous localisation and mapping with a single camera. In Proceedings of the 9th IEEE International Conference on computer vision, pages 1403–1410, 2003. A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse. MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1052–1067, 2007. H. Durrant-Whyte and T. Bailey. Simultaneous Localization and Mapping: Part I. IEEE Robotics & Automation Magazine, 13(12):99–110, June 2006. E. Eade. Monocular Simultaneous Localisation and Mapping. PhD thesis, Cambridge University, 2008. R.M. Eustice, H. Singh, J.J. Leonard, and M.R. Walter. Visually Mapping the RMS Titanic: Conservative Covariance Estimates for SLAM Information Filters. International Journal of Robotics Research, 25(12):1223–1242, December 2006. N.J. Gordon, D.J. Salmond, and A.F.M. Smith. Novel approach to nonlinear/nonGaussian Bayesian state estimation. IEE Proceedings Radar and Signal Processing, 140(2):107–113, 1993. doi: 10.1049/ip-f-2.1993.0015. G. Grisetti, C. Stachniss, S. Grzonka, and W. Burgard. A tree parameterization for efficiently computing maximum likelihood maps using gradient descent. In Proceedings of Robotics: Science and Systems (RSS), Atlanta, GA, USA, 2007. F. Gustafsson. Statistical Sensor Fusion. Studentlitteratur, 2010. ISBN 978-94-4405489-6. Sir W.R. Hamilton. On quaternions; or on a new system of imaginaries in algebra. Philosophical Magazine, xxv:10–13, July 1844. R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, 2004. ISBN 0-521-54051-8. J. Hol. Pose Estimation and Calibration Algorithms for Vision and Inertial Sensors. Licentiate thesis no. 1370, Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden, May 2008. J. Hol, T. B. Schön, and F. Gustafsson. Modeling and Calibration of Inertial and Vision Sensors. The International Journal of Robotics Research, 29(2):231–244, February 2010.

Bibliography

37

S. J. Julier and J. K. Uhlmann. A New Extension of the Kalman Filter to Nonlinear Systems. In Proceedings of AeroSense: The 11th International Symposium on Aerospace/Defense Sensing, Simulation and Controls, SPIE proceedings series, pages 182–193, Orlando, FL, USA, April 1997. M. Kaess, A. Ranganathan, and F. Dellaert. iSAM: Incremental Smoothing and Mapping. IEEE Transactions on Robotics (TRO), 24(6):1365–1378, Dec 2008. T. Kailath, A. H. Sayed, and B. Hassibi. Linear Estimation. Prentice-Hall, Upper Saddle River, New Jersey, 2000. R.E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82(Series D):35–45, 1960. A. Karlsson and J. Bjärkefur. Simultaneous Localisation and Mapping of Indoor Environments Using a Stereo Camera and a Laser Camera. Master’s thesis, Department of Electrical Engineering, Linköping University, Linköping, Sweden, 2010. R. Karlsson, T.B. Schön, D. Törnqvist, G. Conte, and F. Gustafsson. Utilizing Model Structure for Efficient Simultaneous Localization and Mapping for a UAV Application. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, March 2008. G. Klein and D. Murray. Parallel tracking and mapping for small AR workspaces. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), pages 225–234, Nara, Japan, 2007. J.B. Kuipers. Quaternions and Rotations Sequences. Princeton University Press, Princeton University Press, 41 Williams Street, Princeton, New Jersey 08540, 2002. ISBN 0-691-10298-8. R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. g2o: A General Framework for Graph Optimization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 2011. L. Ljung. System Identification: Theory for the User. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2 edition, 1999. ISBN 0-13-656695-2. D. Lowe. Object Recognition from Local Scale-Invariant Features. In Proceedings of the Seventh International Conference on Computer Vision (ICCV’99), pages 1150–1157, Corfu, Greece, 1999. T. Lupton and S. Sukkarieh. Removing scale biases and ambiguity from 6DoF monocular SLAM using inertial. In Proceedings of the International Conference on Robotics and Automation (ICRA), pages 3698–3703, Pasadena, California, USA, 2008. IEEE. doi: 10.1109/ROBOT.2008.4543778. URL http: //dx.doi.org/10.1109/ROBOT.2008.4543778. T. Lupton and S. Sukkarieh. Efficient integration of inertial observations into visual SLAM without initialization. In Proceedings of the International Confer-

38

Bibliography

ence on Intelligent Robots and Systems (IROS), pages 1547–1552. IEEE, 2009. doi: 10.1109/IROS.2009.5354267. URL http://dx.doi.org/10.1109/ IROS.2009.5354267. Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry. An Invitation to 3-D Vision, From Images to Geometric Models. Springer Science, 2004. ISBN: 978-0-387-008936. J.M.M. Montiel and A.J. Davison. A visual compass based on SLAM. In Proceedings the IEEE International Conference on Robotics and Automation (ICRA), pages 1917–1922, Orlando, Florida, USA, May 2006. doi: 10.1109/ROBOT. 2006.1641986. Neato Robotics, 2011. [online] http://www.neatorobotics.com/, Visited May 2011. E. Nilsson. An optimization based approach to visual odometry using infrared images. Master’s thesis, Department of Electrical Engineering, Linköping University, Linköping, Sweden, 2010. E. Nilsson, C. Lundquist, T. B. Schön, D. Forslund, and J. Roll. Vehicle Motion Estimation Using an Infrared Camera. In Proceedings of the IFAC World Congress, Milan, Italy, August - September 2011. To appear. J. Nocedal and S. Wright. Numerical Optimization (Springer Series in Operations Research and Financial Engineering). Springer, 2nd edition, July 2006. W. Robinett and J. Rolland. A Computational Model for the Stereoscopic Optics of a Head-Mounted Display. SPIE, 1457:140–160, 1991. T. B. Schön. Estimation of Nonlinear Dynamic Systems - Theory and Applications. Linköping studies in science and technology. dissertations. no. 998, Department of Electrical Engineering, Linköping University, Linköping, Sweden, February 2006. M. D. Shuster. Survey of attitude representations. Journal of the Astronautical Sciences, 41(4):439–517, 1993. Z. Sjanic, M. A. Skoglund, T. B. Schön, and F. Gustafsson. A Nonlinear LeastSquares Approach to the SLAM Problem. In Proceedings of the IFAC World Congress, Milan, August - September 2011. To appear. M. Skoglund. Evaluating SLAM algorithms for Autonomous Helicopters. Master’s thesis, Department of Electrical Engineering, Linköping University, Linköping, Sweden, 2008. R. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relationships in robotics. In Autonomous robot vehicles, pages 167–193. Springer-Verlag New York, Inc., New York, NY, USA, 1990. ISBN 0-387-97240-4. I. E. Sutherland. Three-Dimensional Data Input by Tablet. Proceedings of the IEEE, 62(4):453–461, 1974.

Bibliography

39

S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, September 2005. ISBN 0-262-20162-3. D. Törnqvist. Estimation and Detection with Applications to Navigation. Linköping Studies in Science and Technology. Dissertations. No. 1216, Linköping University, November 2008. B. Triggs, P. Mclauchlan, R. Hartley, and A. Fitzgibbon. Bundle adjustment - a modern synthesis. In B. Triggs, A. Zisserman, and R. Szeliski, editors, Vision Algorithms: Theory and Practice, volume 1883 of Lecture Notes in Computer Science, pages 298–372. Springer-Verlag, 2000. M. Tuceryan and N. Navab. Single Point Active Alignment Method (SPAAM) for Optical See-through HMD Calibration for AR. In Proceedings of the IEEE/ACM International Symposium on Augmented Reality, pages 149–158, TUM, Munich, Germany, October 2000. Z. Zhang. A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(11):1330–1334, November 2000. ISSN 0162-8828. doi: 10.1109/34.888718.