AGH UNIVERSITY OF SCIENCE AND TECHNOLOGY FACULTY OF MECHANICAL ENGINEERING AND ROBOTICS

Magisterska praca dyplomowa Jacek Hordyj Imię i nazwisko Mechatronika (in English) Kierunek studiów

Kinematyka odwrotna struktur antropomorficznych dla zastosowań w systemach wizyjnych Temat pracy dyplomowej

Inverse Kinematics of anthropomorphic structures for vision systems applications Subject of master diploma thesis

dr hab. inż. Bogdan Kwolek Promotor pracy

………………… Ocena, data, podpis Promotora

Kraków, rok 2014/2015 1

Kraków, ...........…………….. Imię i nazwisko: Nr albumu: Kierunek studiów: Specjalność:

Jacek Hordyj 231948 Mechatronika (in English) Projektowanie mechatroniczne

OŚWIADCZENIE Świadomy/a

odpowiedzialności

karnej

za

poświadczanie

nieprawdy

oświadczam, że niniejszą magisterską pracę dyplomową wykonałem/łam osobiście i samodzielnie oraz nie korzystałem/łam ze źródeł innych niż wymienione w pracy. Jednocześnie oświadczam, że dokumentacja oraz praca nie narusza praw autorskich w rozumieniu ustawy z dnia 4 lutego 1994 roku o prawie autorskim i prawach pokrewnych (Dz. U. z 2006 r. Nr 90 poz. 631 z późniejszymi zmianami) oraz dóbr osobistych chronionych prawem cywilnym. Nie zawiera ona również danych i informacji, które uzyskałem/łam w sposób niedozwolony. Wersja dokumentacji dołączona przeze mnie na nośniku elektronicznym jest w pełni zgodna z wydrukiem przedstawionym do recenzji. Zaświadczam także, że niniejsza magisterska praca dyplomowa nie była wcześniej podstawą żadnej innej urzędowej procedury związanej z nadawaniem dyplomów wyższej uczelni lub tytułów zawodowych.

……………………………….. podpis dyplomanta

2

Kraków, ……………......... Imię i nazwisko: Jacek Hordyj Adres korespondencyjny: ul.Mogilska 121D/46, 31-571, Kraków Temat magisterskiej pracy dyplomowej: Kinematyka odwrotna struktur antropomorficznych dla zastosowań w systemach wizyjnych Rok ukończenia: 2015 Nr albumu: 231948 Kierunek studiów: Mechatronika (in English) Specjalność: Projektowanie mechatroniczne

OŚWIADCZENIE

Niniejszym oświadczam, że zachowując moje prawa autorskie, udzielam Akademii Górniczo-Hutniczej im. S. Staszica w Krakowie nieograniczonej w czasie nieodpłatnej licencji niewyłącznej do korzystania z przedstawionej dokumentacji magisterskiej pracy dyplomowej, w zakresie publicznego udostępniania i rozpowszechniania w wersji drukowanej i elektronicznej. Kraków, ..................… ……………………………........ data podpis dyplomanta

i

Na podstawie Ustawy z dnia 27 lipca 2005 r. Prawo o szkolnictwie wyższym (Dz.U. 2005 nr 164 poz. 1365) Art. 239. oraz Ustawy z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (Dz.U. z 2000 r. Nr 80, poz. 904, z późn. zm.) Art. 15a. "Uczelni w rozumieniu przepisów o szkolnictwie wyższym przysługuje pierwszeństwo w opublikowaniu pracy dyplomowej studenta. Jeżeli uczelnia nie opublikowała pracy dyplomowej w ciągu 6 miesięcy od jej obrony, student, który ją przygotował, może ją opublikować, chyba że praca dyplomowa jest częścią utworu zbiorowego."

3

Kraków, dnia AKADEMIA GÓRNICZO-HUTNICZA WYDZIAŁ INŻYNIERII MECHANICZNEJ I ROBOTYKI

TEMATYKA MAGISTERSKIEJ PRACY DYPLOMOWEJ dla studenta II roku studiów stacjonarnych

Jacek Hordyj imię i nazwisko studenta TEMAT MAGISTERSKIEJ PRACY DYPLOMOWEJ: Kinematyka odwrotna struktur antropomorficznych dla zastosowań w systemach wizyjnych Promotor pracy:

dr hab. inż. Bogdan Kwolek

Recenzent pracy: dziekana:

dr hab. inż. Wojciech Lisowski

Podpis

Miejsce praktyki dyplomowej: Akademia Górniczo-Hutnicza w Krakowie, Wydział Informatyki, Elektroniki i Telekomunikacji PROGRAM PRACY I PRAKTYKI DYPLOMOWEJ 1. Omówienie tematu pracy i sposobu realizacji z promotorem. 2. Zebranie i opracowanie literatury dotyczącej tematu pracy. 3. Praktyka dyplomowa: a. Wprowadzenie do techniki Motion Capture. b. Nagranie sesji Motion Capture w laboratorium IET. c. Analiza wyników sesji Motion Capture. 4. Zebranie i opracowanie wyników badań. 5. Analiza wyników badań, ich omówienie i zatwierdzenie przez promotora. 6. Opracowanie redakcyjne. Kraków, ....................… …………………………….......... data podpis dyplomanta TERMIN ODDANIA DO DZIEKANATU:

2015 r.

podpis promotora

4

Akademia Górniczo-Hutnicza im. Stanisława Staszica Wydział Inżynierii Mechanicznej i Robotyki

Kraków, .................

Kierunek: Mechatronika(in English) Profil dyplomowania: Projektowanie mechatroniczne Jacek Hordyj Magisterska praca dyplomowa Kinematyka odwrotna struktur antropomorficznych dla zastosowań w systemach wizyjnych Opiekun: dr hab inż. Bogdan Kwolek STRESZCZENIE W prezentowanej pracy opisano proces tworzenia oprogramowania służącego do określenia pozy człowieka w czasie rzeczywistym, bazującego na technice kinematyki odwrotnej. Praca zawiera przegląd nowoczesnych technik estymacji pozy i kinematyki odwrotnej zakończony wyborem metody odpowiedniej do wymaganego zastosowania. Wybrany algorytm z rozwiązaniem analitycznym został zaimplementowany w skrypcie Matlab, a jego precyzja została zweryfikowana poprzez obszerne testy wykorzystujące dane uzyskane techniką motion capture. Dane te zostały przypisane odpowiadającym punktom przegubowego modelu ciała posiadającego 16 stopni swobody, stworzonego wyłącznie na potrzeby tej pracy. Ze względu na złożoną strukturę danych motion capture, praca zawiera również przegląd technik stosowanych w tej dziedzinie oraz opis formatów danych typowych dla tej metody. Ten aspekt pracy był kontynuowany poprzez nagranie sekwencji motion capture w jednym z laboratoriów uniwersytetu AGH w celu pełniejszej weryfikacji oprogramowania. Praca nad tym dokumentem zaowocowała stworzeniem w pełni sprawnego oprogramowania bazującego na skryptach Matlab, zdolnego do określenia pozy z dopuszczalną dokładnością oraz generowania póz o naturalnym wyglądzie. W dodatku do podstawowych funkcji programu, stworzono narzędzie służące do animacji dowolnej sekwencji motion capture o formacie ASF/AMC. W celu ułatwienia przyszłej pracy z oprogramowaniem stworzono również podstawowy GUI. 5

AGH University of Science and Technology Faculty of Mechanical Engineering and Robotics

Kraków, the............

Field of Study: Mechatronics(in English) Specializations: Mechatronic design Jacek Hordyj Master Diploma Thesis Inverse Kinematics of anthropomorphic structures for vision system applications. Supervisor: dr hab inż. Bogdan Kwolek SUMMARY This thesis describes the process of software development of real-time inverse kinematics based pose estimation software. It starts with review of state of art solutions in field of inverse kinematics and pose estimation techniques, which concluded with choice of analytical inverse kinematics algorithm as the technique best suited for this application. The selected algorithm was successfully implemented in Matlab and its accuracy was verified through extensive tests using motion capture data. This data was mapped to the particular points of 16 DOF articulated body model created for the purpose of this thesis. An extra work was put into review of motion capture techniques data file formats due to their complex nature. This effort was continued by recording motion capture session in one of AGH University laboratories for further verification of software. The work on this thesis resulted in development of fully operating software based on Matlab script capable of estimation of human pose with acceptable accuracy and generation of “natural looking” poses. In addition to main functionality of the software, a Matlab based tool for animation of arbitrary ASF/AMC motion capture sequences was developed. A basic GUI was created to ease the future work with developed software.

6

Acknowledgments I would like to express my enormous thanks to dr hab. inż Bogdan Kwolek for encouragement and useful critiques of this research work as well as for granting an access to IET MoCap laboratory. I am particularly grateful for the assistance given by Katarzyna Czesak with capturing the MoCap sequence. I would like to thank CMU Graphic Laboratories for sharing and distributing MoCap datasets, thus enabling verification of methods developed in this thesis. I am also very grateful to the creators of HDM05 database, your parser for importing MoCap data into Matlab environment was a great help. I would like to express my very great appreciation to Leonid Sigal for sharing Image & MOCAP Synchronized Dataset 1.0, testing my method against your state of art solutions was a priceless asset. Eventually I would like to thank my friends and family for supporting my work on this thesis and all their encouragement throughout my study.

7

TABLE OF CONTENTS 1. Introduction............................................................................10 1.1. Literature review............................................................................10 1.2. Objectives........................................................................................13 1.3. Overview of chapters......................................................................14

2. Anthropomorphic structures modeling................................15 2.1. Articulated body model..................................................................15 2.1.1. Simplified articulated body model used in software...............................17

3. Review of Inverse Kinematics methods................................19 3.1. Introduction to inverse kinematics...............................................19 3.2. Desired characteristics of IK technique.......................................21 3.3. Jacobian-based methods................................................................21 3.3.1 Jacobian inverse..........................................................................................21 3.3.2 Jacobian pseudo-inverse.............................................................................23 3.3.3 Singular value decomposition....................................................................24 3.3.4 Damped Least Squares...............................................................................25 3.3.5 Summary of Jacobian-based method........................................................26

3.4 Cyclic Coordinate Descent method................................................27 3.5 Newton's methods............................................................................27 3.5 Analytical methods..........................................................................28 3.6 Final choice of IK method...............................................................29

4. Motion Capture data..............................................................30 4.1 Methods and systems.......................................................................30 4.1.1 Optical motion capture techniques............................................................31 4.1.2 Non-optical motion capture techniques.....................................................32 4.1.3 Facial motion capture.................................................................................33

4.2 Data types – file structure...............................................................34 4.2.1 The BVA/BVH file format..........................................................................34 4.2.2 The ASF/AMC file format..........................................................................37 4.2.3 The C3D file format....................................................................................39

5. Developed Pose Estimation software....................................40 5.1 Introduction.....................................................................................40 5.1.1 Input data and initial conditions................................................................40 5.1.2 Articulated body model...............................................................................41 5.1.3 Analytical Inverse Kinematics algorithm and optimization method......42 5.1.4 Programming environment........................................................................47

5.2 Software structure...........................................................................47 5.2.1 Main script...................................................................................................48 5.2.2 Flow of data.................................................................................................51 8

5.2.3 Implementation of analytical IK algorithm..............................................53

6. Experimental data..................................................................55 6.1 CMU Graphic Laboratory MoCap data.......................................55 6.2 Research data - C3D walking sequence.........................................56 6.3 Experimental C3D data..................................................................56 6.4 Experimental Motion Capture setup at AGH laboratory...........57

7. Results.....................................................................................58 7.1 Visual results....................................................................................58 7.2 Numerical results.............................................................................63 7.3 Conclusions......................................................................................66

8. Summary.................................................................................68 8.1 Final conclusions..............................................................................68 8.2 Future work.....................................................................................69

9. Appendix A................................................................................ 10. Appendix B................................................................................

9

1. Introduction

The main objective of this thesis was development and verification of real-time pose estimation software. To achieve this goal, combination of techniques from the field of Inverse Kinematics (IK), Motion Capture (MoCap) and optimization of human pose were employed. The solution of this problem was implemented in Matlab programming environment. An analytical IK algorithm was adapted for the purpose of identification and optimization of human pose for “natural look”. It was assumed that input data for the algorithm is provided by image feature extraction tools working on digital images. On the verification stage Motion Capture data was used instead to estimate the accuracy of the software. Three different MoCap datasets were used to increase quality of verification, including dataset captured exclusively for the purpose of this thesis using experimental motion capture setup at AGH University. This effort resulted in development of real-time pose estimation software with additional features of MoCap data animation tool and basic GUI. Potential applications for the software as well as advantages resulting from pose estimation are discussed further in this section.

1.1. Literature review Human pose estimation (HPE) is one of the classic problems in computer vision. For over 15 years researchers were struggling to find the optimal and final solution to this problem. The complexity of this problem arises from the number of difficulties that need to be considered in order to obtain accurate final result. State of art solutions combine high quality image feature extraction algorithms, probabilistic approach, inverse kinematics and other techniques, as it was described by L. Sigal, an expert in field of HPE, in [1]. The huge interest in the field of HPE is an outcome of the amount of potential applications where pose estimation could play the key role. One of the most popular is Motion Capture technique, in which a set of markers is placed in crucial points on a special suit. Actor wearing the suit is performing a sequence of motion e.g.

10

fighting sequence, while position data of the markers is recorded for specific points in time called frames. MoCap data is widely used in entertainment industry for animation of virtual characters [2]. Actor wearing a suit with markers adjusted for optical motion capture along with virtual representation of his posture is presented in Fig.1.1. Human pose estimation combined with MoCap data allows for interpolation of missing marker data and creating custom human poses, which are partially solved by pose estimation techniques making the whole process significantly faster and user friendly. Moreover accurate HPE techniques combined with stereo-vision camera system can be used for so called marker-less Motion Capture, where all motion data is extracted without use of any visible markers on the actor's clothing. An example of such solution was presented by B. Kwolek et al. in [3]. Conversely MoCap data constitutes the perfect database for testing human pose estimation techniques. Another popular use of HPE is a Microsoft Kinect sensor working with Xbox game console. It uses so called structured infrared light pattern projected on the closest environment to detect depth of the surroundings and player motion. Detected pose is compared with poses from large embedded database of poses in order to find the best match [4].

Fig.1.1 Actor wearing suit adjusted for optical motion capture on the left. Virtual model obtained using marker position data on the right. [[5]]

This way the sensor provides information on player's current actions, e.g. jumping, crouching or throwing a punch. Process of pose estimation used by Microsoft Kinect is presented in Fig.1.2 Information about user pose and activity creates possibility for various human-computer interactions. An example of such interaction could be an automated camera surveillance system detecting hazardous actions of pedestrians. 11

Fig.1.2 Process of pose estimation used by Microsoft Kinect using regression voting and highest confidence prediction for joint location. Even occluded limbs can be detected.[4]

An increasing interest in using pose estimation techniques for variety of medical applications can be observed nowadays. Leading branch of medical pose estimation applications is gait analysis, which examines patterns of proper and pathological walking and movement sequences as discussed in [6]. An interesting method of gait recognition using markerless 3D motion capture was presented in [7] by B. Kwolek et al. In result allowing for identification of a person basing on gait signature which is unique for every person, thus providing long range person identification technique. Researchers Š. Obdržálek, G. Kurilo et al. had successfully used optical stereo-vision pose estimation for tele-rehabilitation in virtual reality. This approach allowed for remote cooperation of geographically dislocated patient and therapist [8]. Alternatively possibilities of using Microsoft Kinect sensor for rehabilitation purposes were studied in [9]. Biomechanics is another field where HPE has found its application. Novel activity recognition system was introduced by A. Reiss et al. in [10]. Mentioned system is using pose estimation combined with wearable IMU (Inertial Measurement Unit) sensors placed on patients clothing. Sensors gather information about users activities and process them to monitor and inform the user about his activity level. Similar approach was used in [11], for construction of motion capture suit with embedded MUI sensors. This resolved most of the problems related to optical pose estimation such as: occlusion and variance in lightening.

12

It is clear that there is a big need for innovative human pose estimation application as well as space for improvement and experiments with already existing solutions. Motivation for choosing and realizing this subject is my interest in vision system applications and prior knowledge on inverse kinematics techniques. Moreover I see a big potential in widely understood motion capture techniques combined with human pose estimation.

1.2. Objectives The objective of this thesis was to develop inverse kinematics technique for realtime pose estimation task, that would allow for faster convergence to exact solution of computer vision pose estimation software that is under development. Main function of the algorithm is to estimate the position of intermediate joint of simplified model of human limb, rather than its end effector. Information about base and end of limb kinematic chain is assumed to be known as input data provided by image feature extraction algorithms of CV software. In order to accomplish this objective effective inverse kinematics methods were reviewed to find solution well suited for this task. Optimal solution was implemented in Matlab environment. Moreover a study on Motion Capture data and its types was conducted as a part of the thesis. Eventually variety of MoCap data, including data acquired at AGH faculty of IET laboratory exclusively for the purpose of this thesis, was imported into Matlab environment for detailed testing of the algorithm. Animation tool for MoCap data was developed as an extension of pose estimation algorithm to enhance the quality of verification process of the software. Additionally GUI was developed to ease the work with the software for future users and researchers.

13

1.3. Overview of chapters Anthropomorphic structures modeling – This section briefly discusses the process of creating mathematical model of anthropomorphic structures with use of articulated structures and IK techniques to control the motion of the developed model. Section is concluded with presentation of model used as an element of the software. Review of Inverse Kinematics methods – This chapter is devoted to discussion of the existing IK algorithms and their evaluation in terms of its usefulness as part of the software developed in scope of this thesis. Similarly to previous section, this chapter is concluded with justified choice of IK technique that is used as the next element of the software. Motion Capture data types – Overall discussion on Motion Capture and types of MoCap data is presented in this section. Relation between hierarchical skeletal structures and MoCap data is explained. Developed pose estimation software – In this section detailed description of the developed software is presented. It includes initial requirements for the software, articulated model of human body, detailed explanation of used IK technique, implementation of this method, general structure of the software and description of data flow within the software. Experimental data – This chapter discusses various data used for the verification of the data considering different file formats of motion capture data. It also contains description of experimental setup used for capture of motion sequence at AGH University. Results – In this section obtained results are presented and discussed. Both visual and numerical results are included in this chapter. Obtained results are confronted with objectives of the thesis. Summary – Closing chapter of this thesis summarizes what was achieved in scope of this thesis and discusses the possibilities of future work on the subject.

14

2. Anthropomorphic structures modeling 2.1. Articulated body model This section provides a brief introduction to human skeleton and joint modeling. Human musculoskeletal system consists of over 200 bones and joints propelled by muscles that constitute one complex hierarchical system [12]. Each of such joints has different range of motion described by degrees of freedom (DOF). Due to complexity of this system it is essential to ensure correctness of hierarchical structure and joint arrangement when creating its mathematical model. This chapter will be concluded with presentation of simple articulated body model used along with developed inverse kinematics algorithm. In order to define human body as an articulated structure we will reach out to the concept of kinematic chain, which consists of the set of rigid objects called links, that are connected to each other with joints. A joint is the component concerned with motion; it permits some degree of relative motion between connected segments. In case of hierarchical skeletal structure links represent corresponding bones, thus they obtain a set of additional properties such as length, shape, volume and mass. Example of such structure is presented in Fig.2.1.

Fig.2.1 Hierarchical skeletal structure

15

It is a common practice to define hierarchical structure using parent-child formulation for kinematic chains, where any rotation or translation of the parent joint or link is inherited by its children located further from the root of the kinematic chain. Typical assumptions for parent-child formulation are listed below: •

All links have at most one parent and any number of children.



All bones with no children are marked as end effectors.



A chain can be built for each end effector by moving back through the skeleton. going from parent to parent, until the root is reached.



In most cases pelvis is considered to be the root of whole skeleton.

The more accurate model of the skeletal structure, the higher probability of obtaining reasonable results and realistic looking postures. Therefore it is of great importance to carefully study allowable range of motion of human joints. Infeasible positions of joints should be restricted using constraints. Human joints are hard to model due to their biomechanical complexity. They are simplified and represented by kinematic pairs, which allow for relative rotation, translation or combination of both of two connected links. Examples of human joints and their corresponding kinematic pairs are presented in Fig.2.1

Fig.2.2 Human joints with their allowable motion.[13]

16

2.1.1. Simplified articulated body model used in software

This section describes simplified model of articulated body that was used in developed software. The level of complexity of this model was imposed by the objectives of the thesis. Since only generalized posture of human needs to be determined, the model was greatly simplified. Limbs were modeled as 4 DOF kinematic chains. Shoulder and hip joints were modeled as ball and socket joints with 3 DOF. This type of joint is presented in bottom left corner of Fig.2.2. Elbow and knee joints were modeled as hinge joints with 1 DOF. This type of joint is presented in upper right corner of Fig.2.2 Wrist and ankle joints were completely removed and the tip of the link representing radius and tibia bones was considered to be the end effector of limb kinematic chain. Assumed constraints for the joints were found in [14] and are explained below in Fig.2.3 with corresponding ranges of motion presented in Tab.2.1 Ranges are defined for axis of rotation pointing out of plane of Fig.2.3 and intersecting discussed joint. Even this simplified articulated model of limb after considering its constraints is redundant to certain degree for achievable positions of end effector. This redundancy was exploited to achieve optimized natural looking postures, this process however is explained later in this paper.

Total range of motion [º]

a.

b.

c.

d.

e.

f.

g.

173.2º

136.8º

204º

140.5º

116.5º

26.8º

118.4º

Tab.2.1 Ranges of motion of corresponding joints from Fig.2.3

17

Fig.2.3 Ranges of motion assumed for simplified articulated model of human.

18

3. Review of Inverse Kinematics methods Inverse Kinematics belongs to the classical problems in field of robotics and computer graphics. It finds its application in calculation of robotic manipulators joint trajectories as well as in game and entertainment industry for animation of virtual characters. Some of the IK algorithms were also used for prediction of protein structure as in [15]. Within last decades many researchers contributed to developing several algorithms. Existing methods cover most of the problems arising from the applications of IK methods including consideration of joint limits and redundancy of articulated structures with high DOF. Combination of various methods allows for solving IK problems with complex criteria of simultaneous satisfaction of multiple goals [16]. Example of such problem is anthropomorphic robot reaching to pick up an object, while maintaining balance and avoiding collision of reaching hand with its body and obstacles in environment. Yet all of the existing IK algorithms have their advantages and drawbacks. It belongs to the researcher to pick algorithms best fitting their application and tailor them to satisfy all the required criteria. In this section an introduction to IK technique and the review of most popular and effective IK algorithms is presented. The section is concluded with the choice of method used to develop the software being part of this thesis along with the justification of this choice.

3.1. Introduction to inverse kinematics Open kinematic chain consists of rigid bodies called links, connected by joints. One of its ends is considered to be fixed to the base, while the other one called end effector is freely moving in 3D space, constrained by links and joints geometrical parameters. Assuming that there are n joints, complete configuration of such chain can be specified by scalars θ1 ,... , θn , which are defining joints configuration (rotation or translation). Following definitions can be introduced for described kinematic chain.

Degrees of freedom (DOF) - of a kinematic chain is the number of parameters that define the complete geometrical configuration of the chain.

19

Kinematic equations – for the kinematic chain are obtained using a rigid transformation [z] to characterize the relative movement allowed at each joint and separate rigid transformation [x] to define the dimensions of each link. The result is a sequence of rigid transformations alternating joint and link transformations from the base of the chain to its end effector, which is equated to the specified position for the end effector. An adequate convention for this transformations known as Denavit-Hartenberg notation was introduced by J. Denavit and R. Hartenberg and can be found in [17],[18]

Forward kinematics – Refers to the use of kinematic equations to calculate the position of end effector using specified joint parameters θ1 ,... , θn as input data.

Inverse kinematics – Opposite to forward kinematics, it uses kinematic equations to calculate values of θ1 ,... , θn with desired position of end effector as an input data.

An example of serial kinematic chain consisting of 3 links and 3 joints is presented in Fig.3.1.

Fig.3.1 Kinematic chain with configuration specified by parameters

20

θ1,θ2 , θ3

3.2. Desired characteristics of IK technique Developed pose estimation software is required to process images captured by video camera in real-time, thus the main focus is to quickly estimate the pose of human in the image. Images are captured with the frequency of 60 Hz, what gives approximately 0,017 second for estimating the pose. All limbs will be simultaneously solved for each image, the spine to head axis will be estimated using contour detection, thus most effort should be put into finding appropriate solver for 4 DOF limbs. It is vital for the IK method to incorporate some method for optimization of redundancy of the limbs in order to obtain natural looking postures. Secondary objective is to introduce joint limits into articulated model.

3.3. Jacobian-based methods

3.3.1 Jacobian inverse

Before describing the method itself it is needed to introduce necessary mathematical background basing on [12]. If complete configuration of kinematic chain can be described with θ1 ,... , θn where n is the number of joints and θ is the value of joint parameter. Than for k end effectors with their positions denoted by s 1 ,... , s k relative to fixed origin. The column vector

(s 1 , s2 , ... , sk )T can be defined as ⃗s . Let

us define the target positions by a vector ⃗t = (t 1 , t 2 , ... ,t k )T , assuming that t i

is the

target position for the i-th end effector. Let e i = t i − si , be the desired change in position of the i-th end effector (while moving to the desired i-th target). This equation can be redefined as ⃗e = ⃗t − ⃗s . Similar notation can be applied to joint angles T

θ = (θ 1 , ... ,θ n)

. The fact that end effectors positions are function of joint angles can

be described as:

21

Jacobian-based inverse kinematics solvers are well established and have started out in the field of robotics [19]. The basic principle of this technique is to iterate towards a desired solution by incrementally changing the joint configuration state, starting from a stable state. The magnitude of incremental change per iteration is specified by the system Jacobian, which relates positional offsets to small changes in joint configurations. Following this logic, Jacobian is a matrix of partial derivatives of the entire chain system relative to the end effectors position. J is a function of

θ

values and can be defined as

where i = 1, ..., k and j = 1, ..., n. As it was described by Orin and Schrader [20], Jacobian matrix entries for the j-th rotational joint can be calculated as follows:

where p j is the position of the joint, and v j

is the unit vector pointing along the

current axis of rotation for the joint. Equation 3.1 for forward dynamics can now be written as

where the dot notation specifies the first derivative with respect to time. Using the current values θ , ⃗s and ⃗t , the Jacobian J = J (θ )

can be calculated. Next we

seek an update value Δ θ in order to increment the joint angles θ by Δ θ . The difference in end effector positions caused by this change in joint angles can be estimated as:

The idea is that the Δ θ

value should be chosen so that

Δ⃗s is approximately equal

to ⃗e . Now the FK problem can be expressed as ⃗e = J Δθ and the IK problem can be rewritten as Δ θ = J −1 ⃗e . Very often the IK equation cannot be solved uniquely. J may not be square or invertible, and even if it is invertible it may work poorly due to 22

singularities. Singularities occur when no change in joint angle can achieve a desired change in chain end position. Several approaches were introduced to cope with this problems. 3.3.2 Jacobian pseudo-inverse

This variation of Jacobian method widely discussed in literature is also known as the Moore-Penrose inverse of the Jacobian. It defines the value

Δθ

equal to

where J † is an n×m matrix and is called the pseudo-inverse of J, where J is m×n matrix. It is defined for all matrices J, even ones which are not square or not of full row rank. The pseudo-inverse gives the best possible solution to the equation J Δ θ = ⃗e in the least squares sense. The pseudo-inverse has the property that the

matrix

(I − J † J ) performs a projection onto the nullspace of J. Therefore, for all

vectors φ , J (I − J † J ) φ = 0 . Following this statement we can set Δ θ by

for any vector φ and still obtain a value for Δ θ which minimizes the value JΔ θ − ⃗e . The pseudo-inverse method can be derived as follows:

Then let ⃗z = J T ⃗e and solve the equation

It can be shown that ⃗z is always in the range of J T J , hence the above equation always has a solution. When J is full row rank, J T J or JJ T invertible. In this case, the minimum value solution

23

Δθ

is guaranteed to be

can be expressed as

The Jacobian pseudo-inverse method is a well known IK method, however it often performs poorly because of its instability near singularities. Presented formulas were found in [12]

3.3.3 Singular value decomposition

The singular value decomposition (SVD) combined with pseudo-inverse Jacobian provides a powerful IK method. Let J be the Jacobian matrix. To use SVD for J we first need to express it as

where U and V are orthogonal matrices and D is diagonal. For an m×n Jacobian matrix, U is m×m , D is m×n , and V is n×n . The non-zero entries of the D matrix are the values σ i = d ii

along the diagonal. It is assumed that m≤ n and,

without loss of generality, σ 1 ≥ σ 2 ≥ ... ≥ σ m ≥ 0.

Note that there are cases where

σ i=0 , for some i. In fact, the rank of J is equal to the largest value r such that σ r ≠0 and σ i=0 for i> r . We use ui and v i to denote the i-th columns of U and V respectively. Their orthogonality implies that their columns form an orthonormal basis for Rm (respectively Rn ). The vectors v r +1 ,... , v n are an orthonormal basis for the nullspace of J. The singular value decomposition of the Jacobian J always exists, and can be formed as

The transpose, DT , of D is the n×m diagonal matrix form. The product DDT is † 2 † the m×m matrix form with diagonal entries d . The pseudo-inverse, D =(d ) , ii ii

of D is an n×m with diagonal matrix with diagonal entries

24

The pseudo-inverse of the Jacobian is thus equal to J † =V D† U T and can be rewritten as

3.3.4 Damped Least Squares

The Damped Least Squares method (DLS) was first used for inverse kinematics by [21], [22]. DLS avoids many of the pseudo-inverse method’s problems with singularities and can give a numerically stable method of selecting Δ θ . In the DLS method, instead of finding the minimum vector Δ θ that gives a best solution to equation ⃗e =JΔ θ , we find the value of Δ θ that minimizes the quantity

where λ ∈ R is a non-zero damping constant. This is given by

It is shown that J T J + λ2 I is non-singular, thus the DLS solution is equal to

Now J T J is an n×n matrix, where n is the number of degrees of freedom. It is easily proven that

(J T J + λ 2 I )−1 J T =J T (J J T + λ 2 I )−1 ; the advantages of this

transform over the one in eq. 3.17 is that the matrix being inverted is m×m where m = 3k is the dimension of the space of the target positions, and m is often much less than n. Thus, 25

The damping constant depends on the details of the kinematic chain and the target positions and must be chosen carefully to make equation 3.18 numerically stable. The damping constant should be large enough so that the solutions for Δ θ

are well-

behaved near singularities, but if it is too large, the convergence rate is slow. An overall scheme of consequent iterations of Jacobian method is presented in Fig.3.2

Fig.3.2 Jacobian-based inverse kinematic solver - the kinematic chain is modified to meet the position and orientation goals, specified by red marker and violet axis respectively .

3.3.5 Summary of Jacobian-based method Few variations of Jacobian-based method such as Pseudo-inverse DLS method or Selectively Damped Least Squares (SDLS) method can further optimize the performance of this algorithm. However their presentation was skipped because of general nature of method not fitting well to the requirements of the developed software. Most of the times Jacobian-based method need few iterations to converge to final solution. Each iteration demands calculating the matrix of partial derivatives, what makes the computational efficiency and overall speed of the algorithm questionable for real-time application. Moreover implementation of Jacobian-based methods is not straightforward, particularly if one wants to incorporate joint constraints.

26

3.4 Cyclic Coordinate Descent method Cyclic Coordinate Descent (CCD) is an iterative heuristic inverse kinematic algorithm appropriate for the control of articulated body, [23], [24]. It is mostly used in computer graphics and computer games industry but some researchers also effectively used it to predict the structure of protein[15]. Within the one iteration of the CCD technique, there exists a sub-iteration over the joints, which are subsequently and independently updated one-by-one, starting from the end effector and moving down the kinematic chain towards the base of the chain. System updates immediately within sub-iteration for the new state obtained through the change of single joint coordinate and proceeds to the next joint. This method is very simple to implement and is extremely fast. It provides numerically stable solution and it has linear-time complexity in the number of degrees of freedom. Accuracy of this method is acceptable but for multiple solutions of redundant kinematic chains a lot of potential postures are generated. It is rather straightforward to incorporate local constraints that can slightly reduce the number of generated postures but it is complex to implement global constraints. The main drawback of this method is generation of unrealistic postures for articulated body structures. The issue stems from the way in which it breaks-down the problem into smaller, more manageable parts, in that it always favors changing the earlier nodes most to reach a target. In effect, the technique will always produce a result that moves as few links as possible. Consequently, this leads to the undesirable and unrealistic result of earlier joints always moving much more than the later limbs in the IK chain, which may not even change in their orientation. Consecutive iterations of this algorithm are presented in Fig.3.3

3.5 Newton's methods The group of Newton methods is founded on a second order Taylor series expansion of the object function f (x) :

27

Fig.3.3 Cyclic Coordinate Descent inverse kinematic solver - the kinematic chain is modified to meet the position and orientation goals, specified by red marker and violet axis respectively .

, where H f (x ) is the Hessian matrix. Nonetheless, due to its complexity, calculation of Hessian matrix results in high computational cost for each iteration. Several approaches have been proposed which, instead of calculating the Hessian matrix, use an approximation of the Hessian matrix based on a function gradient value. The most well known methods are Broyden’s method, Powell’s method and the Broyden, Fletcher, Goldfarb and Shanno (BFGS) method [25], [26]. Considering the use of Newtons method in the software it returns a smooth motion without erratic discontinuities because of being posed as a minimization problem. Incorporating joint constraints is rather straightforward. The Newton methods also have the advantage that they do not suffer from singularity problems, such as that which occurs when finding the Jacobian Inverse; however they are complex, difficult to implement and have high computational cost per iteration.

3.5 Analytical methods Analytical methods refer to group of methods using mathematical and geometrical relationships between end effector locations and joint space configurations, which can be evaluated in a single step. This techniques work particularly good with kinematic chains of low complexity. Due to the direct mathematical relationships, analytical solutions tend to be very quick at proposing a suitable joint space

28

configuration for a desired goal position of end effector. However highly redundant kinematic chains quickly reveal the limitations of this method, where single step solution becomes impractical or even impossible due to under-defined nature of the problem. The current limit of analytical solutions was set by D.Tolani and N.I.Badler with their work [27] describing 7-DOF chain, which has been applied to a human arm. Their solutions included incorporating constraints for joints and orientation of end effector, as well as possibility of interactive optimization of postures. In general analytical methods are well suited for real-time applications due to their speed, however their implementation is not straightforward and becomes very complex for kinematic chains with high number of DOF.

3.6 Final choice of IK method Considering all factors important for real-time pose estimation IK method, the simplified version of algorithm proposed by D.Tolani and N.I.Badler, described in [27] and [28], was chosen as best solution for this application. The decision was taken upon few main characteristics of this method. Firstly analytical algorithms converge to solution in single step, thus they provide results very quickly and are well suited for real-time tasks. Secondly for simpler kinematic chain their implementation is rather intuitive and allows to incorporate additional restrictions e.g. joint constraints. Finally authors of this method proposed a technique for optimization of posture for their “natural look”. This method and its implementation in Matlab is described in detail in chapter Developed Pose Estimation software.

29

4. Motion Capture data This chapter introduces reader to Motion Capture technique through description of currently used techniques of MoCap data acquisition and by presentation of most popular types of MoCap data and detailed explanation of data structure. Information used in this review were found in [29], [30], [31], [32] Motion Capture is a technique of recording movements of people or objects. It started as photogrammetric analysis tool in biomechanics research in the 1970s and then propagated into other fields. It is currently used in military, entertainment, sport, medicine, robotics and computer vision. In most cases capturing the motion is carried on by placing markers on the object of interest, which can be an object, actor's body, actor's face or fingers. During MoCap session movements of recorded object are sampled many times per second. Spatial trajectories of markers are later mapped to specific points of virtual object or character and are used to create virtual animation.

4.1 Methods and systems As the technology of Motion Capture matured, new motion acquisition methods and systems were arising. Existing solutions can be assigned to two main categories; optical and non-optical methods. Before we start to discuss this two groups let us introduce more context to Motion Capture data subject by explaining notion of translational and rotational MoCap data. Translational data – this term refers to data representing position of each marker in reference to the world origin, without including any hierarchy or skeleton definition. This kind of data can already be used for animation, yet it can be further processed. Rotational data – this kind of MoCap data divides into hierarchical segment translation and rotation and the global segment translation and rotation. Both of them are based on the definition of skeleton to be used for driving the motion of the character and are created by combining the points of the translational data. They are created by combining the points in the translational file. In case of hierarchical rotational data rotations and translations are applied to the root of the skeleton, which in most cases is assigned to the 30

pelvis. The rest of skeletal segments only have initial local translations that define the length of each segment, plus a stream of local rotations. Additionally there is a set of scale channels that represent the variation in length of each segment. This is necessary because the segments are not exactly rigid, as the markers have to be placed over the skin or clothes. The global rotational file contains translations, rotations and scale channels for each independent segment, and there are no dependencies between them.

Summing up the rotational data are easier to use than translational data files because most of the work in setting the skeleton has already been done; however, translational files allow for more flexibility when working with files.

4.1.1 Optical motion capture techniques

In most cases optical MoCap systems use multiple cameras to capture the displacements of markers placed on the object of interest. The 3D position of marker for specific time-frame is triangulated using captured 2D data and information about camera coordinates. This operation is performed for all time-frames and the trajectory of the marker throughout the captured sequence is identified in a 3D coordinate system. Marker itself is an important part of the MoCap system and can significantly influence the quality of captured data, thus along with the development of the method, new types were introduced. Most popular and innovative types of markers are described below. Passive markers – Belong to the most classical solutions in Motion Capture. They use markers coated with retroreflective material to reflect light that is generated near the camera lens. Active markers – In contrary to passive markers they rather emit their own light using LED's than reflect externally generated light. This opens up whole range of new possibilities; firstly since light is no more reflected it only has half of the way to travel, meaning the volume of capture area can be increased. Secondly this kind of markers provide high signal-to-noise ratio, resulting in high measurement resolution, up to 0.1 mm within the calibrated volume. Finally the power for each marker can be provided 31

sequentially in phase with the capture system providing a unique identification of each marker for a given capture frame at a cost to the resultant frame rate. Time modulated active marker – This kind of markers allows for strobing one marker on at a time, or tracking multiple markers over time and modulating the amplitude or pulse width to provide marker ID. This results in a higher spatial and temporal resolution. Moreover directors can see the actors performance in real time, and watch the results on the motion capture driven CG character. Semi-passive imperceptible marker – This technology is currently under development and uses photosensitive markers as usual placed on the object of interest. In contrary to previously described systems this one uses specially built multi-LED IR projector optically encoding space. Markers with photo sensors can compute their own locations but also orientation, incident illumination and reflectance. Last two are particularly useful for matching the lightening in virtual scene with the lighting conditions during capture session. Markerless – optical method does not require actors to wear any special equipment for tracking. It bases on processing of the images captured by MoCap equipment and extraction of features that allow for identifying human forms. 4.1.2 Non-optical motion capture techniques

Inertial – Inertial MoCap technology is based on miniature inertial sensors, biomechanical models and sensor fusion algorithms. Inertial sensors have analogical function to markers in optical systems. They measure rotations and translations using gyroscopic effect and wirelessly transmit data to the computer. Transmitted data is mapped to the skeleton in the software. Benefits of this method include portability and large capture areas; however they provide lower positional accuracy and positional drift which can compound over time. A suit with inertial sensors used as commercialized MoCap package is presented in Fig.4.1

Magnetic – Data generation is more straightforward with magnetic systems than optical systems since both the positions and orientations of tracking sensors placed on the 32

performer are captured. Three orthogonal sets of coils in the transmitter source produce an electromagnetic field of three frequencies that fills the capture volume. The tracking sensors must be in this field for detection. Three orthogonal coils in each tracking sensor generate resonant response and the system evaluates the orientations as well as the positions of sensors. The magnetic sensor output is 6 DOF's, because x, y, z translations and x, y, z rotations are captured.

Fig.4.1 A suit with inertial sensors used by Xsens as commercialized MoCap solution. [11]

Mechanical – Mechanical systems measure joint rotation using potentiometers or similar measurement devices. The relative positions of joints are computed using the rotational information. Global translation is not measured unless an electromagnetic sensor or some other positioning device is added. This method uses rotational data to compute positions. 4.1.3 Facial motion capture

One branch of motion capture technique is facial motion capture, known also as performance capture. This method uses optical passive or active markers placed over actors face. The number of markers can vary from 32 to 300. Main goal is to record 33

complex facial movements to capture higher degrees of emotion and map them to the face of virtual character.

4.2 Data types – file structure This section provides detailed description of the structure of most popular Motion Capture file formats. Good understanding of such structure is crucial for working with this type of data. Among discussed formats we can find both translational and rotational data. Bringing these files into any software successfully is not always as easy as it seems. The main problems include varying conventions for frame of coordinates between the motion capture software and the software we want to import data to. Another problem is the base position of skeleton, which is required by some of animation software. Concluding this introduction, the easiest type of data to import is translational, since no hierarchy and rotation or scale channels are present. The down side of importing translational data is that the character setup is much more involved, but, on the other hand, one has better control of what he wants to do with the setup. For the sake of easier discussion on different file formats let us introduce few notions. We will refer to the character that you apply motion to as a skeleton. The skeleton is made up of bones. To create characters look, geometry is attached to the bones. The attributes that describe the position, orientation and scale of the bone will be referred to as channels. Each slice of time will be called frame.

4.2.1 The BVA/BVH file format

BIOVISION'S BVA - format belongs to the category of translational data, since there is no hierarchy introduced. It is also one of the easiest formats to handle. The position and orientation of bones is specified with respect to global frame of coordinates. Fig 4.2 shows a section of .BVA file, where Hips is the first bone described. There are 29 frames of animation in the Hips. The frame time is described as 0.03333 seconds (per frame), which corresponds to 30 frames per second. Next comes a description of the channels and units used, then the actual channel data. There are 29 34

lines of nine values, followed by a segment block that describes the next bone. It is continued in this manner until the end of the file.

BIOVISION'S BVH - file format is belongs to the category of hierarchical rotational data. This means that the motion of the child bone is directly dependent on the motion of the parent bone. Fig. 4.3 shows the section of .BVH file and the hierarchy of the format. In Fig.4.3 we can observe that the bone Hips is the root of the skeleton. All

Fig. 4.2 Section of Biovision's .BVA file. [32]

other bones are children of the Hips. The rotation of the LeftHip is added to the rotation and translation of the Hips. This operation is repeated until the last children in the hierarchy (end effector of kinematic chain) is reached. On the left side of Fig.4.3 we can see a section of .BVA file. The word HIERARCHY in the first line signifies the start of the skeleton definition. The first bone that is defined is the ROOT. This bone is the parent to all other bones in the hierarchy. Each bone in this hierarchy is defined as a JOINT. Braces contain the root and each joint. All joints within a set of braces are the children of that parent joint. Within each braced block there are OFFSET and CHANNELS parameters for that bone (or JOINT). The OFFSET describes displacement of the root of the bone from its parent. This (x,y,z) coordinate is the global coordinate offset from the parent bone. The CHANNELS line defines which bone parameters will be animated in the 35

file, first sign describes the number of parameters. The possible types are: x-position, yposition, z-position, x-rotation, y-rotation, and z-rotation. Most often, only the root bone has any position data, the rest of the bones have only rotational data and rely on the root and the hierarchy for their position. The branch of the hierarchy ends with the End Site joint. This joint is offset is only useful in determining the length of the last bone. Following the HIERARCHY section is the MOTION section. This section describes the animation of each bone over time. As in the .BVA format, the first two lines of this section describe the number of frames and the time for each frame. However, unlike the .BVA format,

the next lines describe the animation for all the bones at once. In each line

in the rest of the MOTION section, there is a value for every CHANNEL described in the HIERARCHY

section. For example, if the HIERARCHY section describes 48 channels,

there will be 48 values on each line of the MOTION section. That continues for the total number of frames in the animation.

Fig. 4.3 Section of the .BVH file on the left. Hierarchy of .BVH format on the right.[32]

36

4.2.2 The ASF/AMC file format This type of Motion Capture data was developed by Acclaim Entertainment, Inc. Acclaim was a video game company and one of the key contributors to the development of optical MoCap technology. An Acclaim motion capture file is actually made up of two files; the .ASF, which describes the actual skeleton and its hierarchy, and the .AMC file, which contains the motion data. ASF FILE – The .ASF file is similar to the HIERARCHY section of the .BVH file. Fig.4.4 presents the fragment of both .ASF and .AMC files.

The :root section describes the parent of the hierarchy. The axis and order elements describe the order of operations for the initial offset and root node transformation. The position element describes the root translation of the skeleton and the orientation element defines the rotation. The :bonedata keyword starts a block that describes all of the remaining bones in the hierarchy. Each bone is delimited by begin and end statements. The id and name elements describe the bone by number or string. The initial rest position of the bone is described by the direction vector, and the length describes the physical length of the bone. The axis parameter describes the global orientation via an axis vector, and the token letters xyz describe the order of rotations. The dof element describes the degrees of freedom possible in the bone. The possible values are tx, ty, tz, rx, ry, rz, and l. The first of these six define freedom to translate and rotate around the three axes. The last dof defines the bone’s ability to stretch in length over time. Each of these represents a channel that will be present in the .AMC file in that order. The order of these channel values also describes the order of operations in the transformation of the bone. The limits element describes the limits of the degrees of freedom. It consists of value pairs of either floats or the keyword inf, meaning infinite. This information can be useful for setting up an inverse kinematic or dynamic 3D character. The next section in the .ASF file is :hierarchy. It describes the hierarchy of the bones declared in the :bonedata section. It is a begin…end block in which each line is the parent bone followed by its children. From this information, the bones should be connected together in the proper hierarchy. 37

AMC FILE - The .AMC file defines the channel animation. Right side of Fig.4.4 presets a sample .AMC file fragment. Each frame of animation starts with a line declaring the frame number. Next is the bone animation data, which is comprised of the bone name and data for each channel defined for that bone, as in the dof section of .ASF file. The frame sections in the file continue until the end of the file.

Fig.4.4 A section of .ASF file on the left and a section of .AMC file on the right.[32]

38

4.2.3 The C3D file format

The .C3D file format belongs to the category of translational MoCap data, since no hierarchy is set in the file. It is one of the most popular motion capture file formats along with the ASF/AMC files. The C3D file format is a public domain, binary file format developed in the mid 1980s at The National Institutes of Health in Bethesda, Maryland. It stores 3D coordinate information, analog data and associated information used in 3D motion data capture and subsequent analysis operations. At the time of its development all 3D motion capture systems stored their data in multiple files, each with a different proprietary format, making the exchange of data between various biomechanics and gait analysis laboratories very difficult. With the introduction and adoption of the C3D file format by all major 3D motion capture companies all necessary 3D information, analog data and parameters describing the data can be seamlessly transfered between researchers and laboratories, regardless of the hardware or environment used to collect the data. The file is divided into three sections: Header, Parameter, and Data. Each section comprises one or more 512 – byte blocks. Header section - is the first block in the C3D file. The header allows software applications to quickly obtain information about the file content without reading the next section which has a more complex structure for the rest of the file. The first word in the header points to the start of the parameter section in the file. The header contains other information such as the number of trajectories, the number of analog channels, the trajectory sample rate, ant the analog sample rate. Parameter section - normally starts as the second block in the C3D file. The length of the parameter section varies. The section contains the information necessary to read the data contained in the file, such as the name, data type, dimension, and location of each parameter, where parameters are descriptions of types of the data stored in the file, but not actual data. Data section - contains actual data which can be 3D data, 2D data, analog data, or any combination. The C3D format is in the public domain and the detailed information necessary for implementations with the format can be found at []

39

5. Developed Pose Estimation software This chapter discusses development process of pose estimation software based on inverse kinematics algorithm. It contains information about requirements for the software and assumptions under which this software becomes a useful tool. This information is followed by explanation of mathematics underlying the analytical inverse kinematics algorithm used for pose estimation in this software. Next the structure of developed software is described in detailed manner. Starting with overview of main script followed by explanation of major functions, through flow of data in the software and concluding with implementation of analytical IK algorithm in Matlab. Last two sections of this chapter describe motion capture data used for verification of software accuracy and results obtained while evaluating software with this data.

5.1 Introduction Before any work on the development of software started it was needed to precisely define the tasks it was supposed to execute and the accuracy that is expected. Besides discussing initial assumptions following subsections introduce; mathematical model of articulated body to which IK algorithm was applied, the IK algorithm itself and programming environment used for the scope of this project. 5.1.1 Input data and initial conditions

Developed software was initially meant to become the part of bigger software package. Precisely become a method providing initial guess of human pose for real-time optical pose estimation system using single camera. This initial guess would be later tuned by already existing method which uses Gaussian distribution on joint values to generate whole range of poses. The outlines of poses from generated pool are later compared against outline obtained from camera view through image feature extraction methods. The best matching contour provides final pose estimation. However generating poses through Gaussian distribution without any initial guess of pose is computationally expensive and not well suited for real-time applications. Providing 40

initial guess via IK method was supposed to bridge the gap between software operating in real-time and software producing accurate results. Eventually software developed for the scope of this thesis became a standalone tool for pose estimation with rather rich input data required to use it. Its functionality was extended to displaying basic animations of arbitrary ASF/ASM motion capture file. Below requirement and assumptions for the software induced by its intended use are listed. a) Real-time operating system – frequency of operation is 60 Hz, resulting in time of approximately 0.017 second for estimating the pose. b) Input data required for operation of developed real-time IK pose estimation software provides x, y, z coordinates of base and end effector of kinematic chain imitating human limb. c) Additionally the length of bones is required as input data. For optical pose estimation software such information can be obtained by measuring body parts during calibration or using golden ratio principle. d) Articulated body model is reduced to 4 DOF's per kinematic chain of each limb, resulting in simplified 16 DOF model. No additional DOF's are introduced to model because pelvis to head axis connected by spine is assumed to be rigid. Such model is sufficient to obtain initial guess required by optical pose estimation software.

5.1.2 Articulated body model

This section contains short discussion of articulated body model presented in chapter 2.1. The model selected for final solution of the problem consists of four limbs replaced by 4 DOF kinematic chains connected by rigid section pelvis-spine-head. This way the software decreases its achievable accuracy but provides acceptable results for intended use, while gaining on simplicity. In the model of limb the arm and hip joints were replaced with 3 DOF ball and socket joints, whereas elbow and knee joints were replaced with 1 DOF hinge joints, resulting in 4 DOF kinematic chain consisting of two links. This links correspond to bones in human body, precisely humerus and femur bones correspond to 1st link in kinematic chain (counting from the base), while radius 41

and tibia bones correspond to 2nd link in kinematic chain. Fig.5.1 presents overview of articulated body model used in software. Red spots are marking the input data needed to run software, violet spots indicate the output data - positions of elbows and knees estimated by the software. These positions are crucial to the general pose of human. The magenta axis z is a visualization of rigid fragment pelvis-spine-head.

Fig.5.1 Overview of articulated body model with 16 DOF's used in pose estimation software

5.1.3 Analytical Inverse Kinematics algorithm and optimization method

The analytical inverse kinematic algorithm described in this section was used to solve IK problem for pose estimation software. This method was first introduced by D.Tolani and N.I.Badler in their work [28] in 1996. It provided analytical solution to 7 DOF model of human arm consisting of ball and socket joints for arm and wrist joints and hinge joint for elbow. It was further improved to be released again in [27] in 2000 as a IK method for animation of virtual characters allowing for interaction with the character and setting different goals like position and orientation of end effectors. According to authors this method is also applicable for human leg, since its articulation is similar to the human arm. Their 7 DOF model of arm can be reduced to 4 DOF model used in this paper, by simply neglecting the orientation of wrist. Below in Fig.5.2 model 42

of such reduced structure is presented. There is a fixed system of coordinates placed in shoulder location, local system of coordinates placed in elbow and another local system of coordinates placed at the wrist (end effector) location.

Fig.5.2 A model of human upper limb reduced to 4 DOF system. [28]

It can be observed that regardless of the assumed convention for local system of coordinates first joint is described by three angular values one by only one value of θ4 .

θ1 ,θ 2 ,θ 3 while the second

It is straightforward that knowing x,y,z position

coordinates we can determine distance between shoulder and the wrist, let us denote this distance by ||p||. Next, assuming that lengths of L1 and L2 are also known the elbow angle

θ4 can be uniquely determined by the distance of the wrist from the shoulder

according to the formula.

Although there are two solutions for

θ4 only one answer is physically realizable

because of joint limits on the knee or elbow. Analyzed system of 4 DOF arm has one degree of redundancy. A simple 43

physical interpretation of the redundant degree of freedom is based on the observation that if the wrist is held fixed, the elbow is still free to swivel about a circular arc whose normal vector is parallel to the axis from the shoulder to the wrist. The workspace of this mechanism was first examined by Korein in [33] and is derived from a geometric analysis of the problem. Method for optimization of poses for natural look is based on Korein's observations and their interpretation conducted by D.Tolani and N.I.Badler. Fig.5.3 illustrates geometrical properties of discussed mechanism that were utilized to create optimization method. Let s, e, and tg define the positions of the shoulder, the elbow, and the goal location of the wrist respectively. The origin (0, 0, 0) is coincident with the shoulder position. The scalars L1, L2, and L3 denote the lengths of the upper arm, lower arm, and the distance from the goal position to the shoulder. The origin of the coordinate system is taken as s. As the swivel angle ϕ varies, the elbow traces an arc of a circle lying on a plane whose normal is parallel to the wrist-to-shoulder axis. To mathematically describe the circle we first define the normal vector of the plane by the unit vector in the direction from the shoulder to the wrist,

Additionally, we need two unit vectors u^ and v^ that form a local coordinate system for the plane containing the circle. We set u^ to be the projection of an arbitrary axis a^ (corresponding to ϕ = 0), selected by the user, onto the plane of the circle. Vector u^ can be calculated using formula

while vector v^ can be simply obtained as a cross product of u^ and n^ according to v^ = u^ × n^ . To compute the center of the circle c and its radius R simple geometrical formulas can be utilized, 44

Fig.5.3 For a given goal tg the elbow is free to swivel around a circle[27].

where α is an angle between link L1 and shoulder to wrist vector n^ . Finally the position of elbow is given by

The swivel angle ϕ gives a useful way of characterizing the space of solutions. Defining

45

u^ as a projection of the z axis allows ϕ to be interpreted as a variable that controls the

elbow height. It is easy to verify that the elbow is at its lowest point when ϕ = 0 and that increasing the magnitude of ϕ will elevate the elbow. The optimization of natural looking pose in developed pose estimation software is performed utilizing discussed properties. Yet proper choice of a^ axis and ϕ angle is crucial for obtaining satisfactory results. The effectiveness of this method is discussed in results section, the stress is put to identify strengths and weaknesses of this solution. Since θ4 and elbow position e are already identified, the only thing missing to fully define the configuration of our 4 DOF kinematic chain are angles θ1 ,θ 2 , θ3 . It needs to be observed that elbow position e is a function of joint values of θ1 ,θ 2 , θ3 . This property can be utilized to uniquely define the values of first two joints once the elbow position has been estimated by (5.10)

Where arctan2 is the two argument arctangent function, ex, ey, ez are x, y, z coordinates of elbow position. To solve θ3 let us make use of fact that the position of wrist depends only upon first four joints. For known values of θ3 can be obtained by solving the equation

46

θ1 ,θ 2 ,θ 4 the value of

This way we fully determined configuration of 4 DOF kinematic chain by identification of all variables deciding upon it. This derivation can be propagated to both human arm and human leg because of their similar structure. Further chapters describe the process of implementation of this method in Matlab programming environment. 5.1.4 Programming environment

Matlab computing language was chosen as a tool suitable for realizing required task. Because of the need to verify the accuracy of developed software motion capture data which is mostly stored in structures of similar character to large matrices was employed to the solution. Matlab has proven its robustness and ability to efficiently handle large amounts of data stored in matrices. Another important feature of this software is its high level computing language that significantly facilitate developing complex projects. Moreover its embedded image processing toolbox could easily bridge the gap between developed software and optical pose estimation software. For more information about specification of used hardware and Matlab version please refer to Appendix A - Instruction of use of Inverse Kinematics Matlab script.

5.2 Software structure This section provides wide discussion on the structure of developed software. In first order the Matlab code of MainPoseEstimationScript.m is presented. This script is the main frame of the software. It handles the respective calls of major functions that are responsible for particular tasks performed by the software like preprocessing of motion capture data or animation of MoCap sequences. Presentation of code is followed by explanation of major functions performance and data flow inside the software. At last the process of implementation of analytical IK algorithm is discussed.

47

5.2.1 Main script

Below in Fig.5.4 complete code of MainPoseEstimationScript.m Matlab script is presented. Its main task is to set a proper order of operation on data and call for particular functions. Major functions will now be discussed respectively, however for information oriented towards usage of software user should refer to Appendix A.

Fig.5.4 Complete code of MainPoseEstimationScript.m

GUIchoice – This function is responsible for displaying the pop up window questioning the user about what data should load to Matlab for further processing. Software is capable of processing any motion capture sequence of ASF/AMC file format. Additionally user can choose to test software with two different sets of C3D. In

48

order to process arbitrary data of this type user would need to adapt function LoadData.m since the current solution only works for attached C3D datasets. User answer is stored in variable option and is used throughout the execution of main script to adjust existing functions for chosen motion capture data type. LoadData.m - This function is responsible for loading MoCap sequence of data type chosen by user to Matlab workspace. Basing on variable option, different loaders are called for different type of data. This part of software uses supplementary scripts for importing different type of MoCap data into Matlab. These were provided by other researchers under license for non-commercial use. More information about authors and source of this scripts can be found in Appendix B. As an output it returns vector containing variables. First four variables are of structure type and store trajectories of markers placed on key points on the motion capture suit dressed by actor during recording of motion sequence. Besides trajectories this variables store lengths of particular bones and additional data presented in Tab.5.1. More information about additional data can be found in Appendix A. Markers were attached to the points corresponding to base and end effector of kinematic chains of human limbs. Arm and wrist joints for upper limbs and hip and ankle joints for lower limbs. This trajectories are used as an input to the software as it was explained previously in section 5.1.1. Variable Spine contains trajectories of markers placed along the spine of actor and is mainly used to construct the skeleton for animation purposes. Variable opt_vect stores information about vector used for optimization of elbow and knee positions for “natural look”. Last variable – frames simply stores the number of samples in chosen MoCap sequence.

Name Data

'Case' 'base' 'L1'

num

x y z num

'middle'

'L2'

x

num x y z

y

z

'end'

'previous'

x

Tab.5.1 Structure of variables storing trajectories of key points and additional data.

49

y

z

InverseKinematics – Part of code where analytical IK algorithm was implemented. It's main role is to loop over the rows of variable Limbs and find the possible configurations of middle joint of kinematic chain. From the pool of positions algorithm finds the one optimized for natural looking posture. Optimization is performed as discussed in section 5.1.3 by minimization of angle ϕ between middle joint position and projection of vector a^ on the plane containing circle describing redundancy of kinematic chain. The output of function is the optimized position of middle joint. Implementation of analytical algorithm and optimization method is presented in section 5.2.4. Animate – As suggested by the name this function handles the visualization of motion capture sequence loaded to the software. It bases on the skeletal structure created basing on articulated body model described in chapter 5.1.2. Displayed view contains two skeletal structures. Red skeletal structure shows poses estimated by InverseKinematics

function. Blue skeletal structure shows original poses

obtained by using complete motion capture data. Visible displacement between red and blue structure indicates error of method. Therefore matching red and blue structures indicate well estimated pose. This function enables few animation parameters for user adjustment. One can choose speed and perspective of animation. CalculateJointAngles - Function responsible for calculating values of joint angles of all limbs involved in analysis. Angles are calculated with respect to rest position of limbs. It uses auxiliary functions for transformation from global frame of coordinates to local frame of coordinates. After extensive testing of code of this function the results it provides remain questionable. Yet they are presented in the thesis due to their high accuracy for specific motion capture sequences. Interpretation of this results and possible sources of errors of this method are discussed further in section presenting results. Output of the function is stored in variable Thetas and consist of values of θ1 ,θ 2 ,θ 3 , θ 4 for all frames of considered motion capture sequence. PlotResults – A piece of code handling the creation of plots presenting results. It displays a pop up window allowing for choice of results to be plotted. This way the 50

user can only view data in which he is interested. Leewalk_image – This function is optional and is only called in case when user is testing software with C3D MoCap data obtained from research [34] by L.Sigal. It handles the visualization of obtained results by projecting them on the image taken during recording of the MoCap sequence by calibrated cameras to compare them with actual posture of human.

5.2.2 Flow of data

This section is dedicated to flow of the data in the software. Well organized data flow in the software is crucial for real-time applications. Therefore before data reaches the point in the software were code is supposed to operate in real-time it is preprocessed to minimize the size of actual code for real-time application. The function InverseKinematics is the part of code containing analytical algorithm handling inverse kinematics and optimization of middle joint pose. Visualization of data flow is presented in Fig.5.5 using flowchart. In a first step software loads data of a type chosen by user. Regardless of chosen type data is imported into Matlab environment and restructured into homogeneous data of struct type which is embedded Matlab data type. Afterward it is used as an input data to function InverseKinematics which estimates the position of middle joint for all modeled limbs in all frames of loaded sequence. Only one column per limb needs to altered to obtain new trajectory of middle joint estimated by the software. This arrangement is very helpful in simultaneous animation of both trajectories and generating accuracy results of the method. Obtained trajectory can be animated in the next step by passing output data of InverseKinematics and LoadData to the input of Animate function. Animate displays consequent frames of loaded MoCap sequence for original and estimated pose of middle joints, thus revealing potential errors in estimation. Function PlotResults takes identical input data and compares original sequence against middle joints position estimated by InverseKinematics. In case of Animate and PlotResults user can choose the speed of animation and quantity of data he wants to 51

MainPoseEstimationScript.m

Choice of data type. ASF/AMC

AGH - C3D

Research - C3D

LoadData

A InverseKinematics

NO

Animate ? YES

E

Before InverseKinematics is used position of middle joint is unknown.

Animation NO

W

A PlotResults ?

a ^ YES

u ^

Decide what to plot Multi-choice

Plot selected data

Terminate software session

W

E

Using InverseKinematics position of elbow is estimated through minimization of angle α between vector u^ and vector pointing from the center of the circle to the location of the elbow on the circle. Vector u^ is a projection of vector a^ on the circle plane.

Fig.5.5 Flow chart of the developed software.

52

plot. This approach allows for investigation of part of sequence rather than forcing user into processing complete dataset. 5.2.3 Implementation of analytical IK algorithm

This subsection presents implementation of inverse kinematics algorithm in Matlab script InverseKinematics. Code snippets containing implementation of formulas derived for IK algorithm purpose are presented and commented below. Only selected sections are presented. Full code is available in the developed software script files which are attached to this thesis.

Limbs(i).end(j,:) = Limbs(i).end(j,:) - Limbs(i).base(j,:); P_End = [Limbs(i).end(j,1), Limbs(i).end(j,2), Limbs(i).end(j,3)];

For each frame of analyzed sequence denoted by i and each limb of articulated body model described by j function InverseKinematics is converting the coordinates of markers. Previously coordinates expressed with respect to global frame of coordinates become coordinates expressed with respect to local system placed in the location of base of particular limb kinematic chain.

[z] = Choose_opt_vect(Limbs,i,j,opt_vect,option,P_End);

This section calls for auxiliary function Choose_opt_vect which is responsible for selection of proper vector a^ which is later projected as vector u^ onto the plane containing circle describing redundancy of 4 DOF limb. For different type of loaded data different vector were chosen using empirical approach. Some techniques of analyzing the trajectory of markers from previous frames were used to estimate the direction of movement vector. This solution performs well only under condition that character moves forward, thus additional techniques were introduced.

53

dist_base2end = sqrt(P_End(1)^2 + P_End(2)^2 + P_End(3)^2);

dist_base2end is a distance from base of kinematic chain to its end effector. n = P_End/dist_base2end; z_x_n = dot(z,n); z_x_n_n = z_x_n.*n; u_nominator = z - z_x_n_n; u_denominator = sqrt(u_nominator(1)^2 + ... u_nominator(2)^2 + u_nominator(3)^2); u = u_nominator/u_denominator; v = cross(n,u);

Code presented above is responsible for defining the local frame of coordinates describing plane on which circle depicting redundancy of limb is constructed. Vector n^ is calculated as unit vector pointing from base of kinematic chain to its end

effector. Complete local frame of coordinates is obtained with vector u^ being the projection of vector z previously described as a^ onto the circles plane. c = Cos_alpha*L1*n; r = sin(alpha)*L1;

These two lines calculate the position of the center of the circle describing redundancy of a limb and the radius of said circle where alpha is an angle between vector n^ and the first link of kinematic chain and can be obtained using formula 5.4. IK_middle = r*(cos(phi)*u + sin(phi)*v) + c;

Eventually the position of elbow can be calculated using above formula which refers to equation 5.7. Limbs(i).middle(j,:) = IK_middle + Limbs(i).base(j,:);

In last step of each iteration position of middle joint is restored to the global frame of coordinates.

54

6. Experimental data Developed software was extensively tested using two different motion capture data types and few different motion sequences. In this section data used for verification of software is described. Among this data is motion sequence captured in AGH laboratory. It is accompanied with the description of laboratory motion capture setup and specification of used equipment. Fig 6.1 shows the process of importing ASF/AMC data into Matlab environment. Corresponding frames are shown in both animation tools.

Fig.6.1 Process of importing MoCap data into Matlab environment. On the left view from ASF/AMC MoCap Player by CMU. On the right view from function Animate.

6.1 CMU Graphic Laboratory MoCap data At first software was verified with data released by Carnegie Mellon University for research purposes. Large motion capture databases are available at [35]. CMU data is captured using Vicon motion capture system consisting of 12 infrared MX-40 cameras, each of which is capable of recording at 120 Hz with images of 4 megapixel resolution. Motions are captured in a working volume of approximately 3m x 8m. The capture subject wears 41 markers and a black garment. Two MoCap sequences from this 55

source were used to test the software. The first one is a sequence of a person walking on a straight line consisting of 517 frames. The second one is a sequence of person running on a straight line and it contains 141 frames.

6.2 Research data - C3D walking sequence This motion capture dataset was developed by L.Sigal for the purpose of his research [34]. Later it was turned into standardized dataset available for researchers around the world. It contains software synchronized data from 7 video cameras (4 grayscale and 3 color) and 3D motion capture data (obtained using a standard markerbased motion capture system from Vicon company). The sequence used for verification of the software presents a person clapping his hands and walking in a circles. It contains 3551 frames. Obtained results were transformed and mapped to the images from corresponding cameras allowing for aligning skeleton obtained as a result with the actual posture of an actor performing a motion sequence.

6.3 Experimental C3D data In addition to third parties datasets a motion capture sequence was captured exclusively for the purpose of this paper. The recording took place in AGH laboratory of IET faculty. The region of marker placement was limited to single limb, precisely right arm of the actor performing the sequence. Captured motion presents and arm picking an object from the table and placing it in different locations within arm's workspace on the table. This sequence refers to pick and place action frequently performed in industrial processes by robotic manipulators. It attempts to examine the possibility of using motion capture data for on-line manipulation of robotic arm by a user. Recorded sequence contains 13300 frames of motion. More information about capture equipment can be found in next section.

56

6.4 Experimental Motion Capture setup at AGH laboratory Motion capture sequence recorded in AGH IET laboratory was captured using PhaseSpace motion capture system [36]. The PhaseSpace Impulse system captures complex motion in real time using advanced hardware and software technology. Motion capture is accomplished by placing the PhaseSpace cameras around a capture volume, and moving objects with LEDs attached to them. The cameras detect the positions of the LEDs and transmit this information to a central computer that processes the data and calculates actual positions. These positions are then available for further processing by client systems in a client server environment. The PhaseSpace Impulse system uses high-speed, high-resolution linear CCD cameras to triangulate the position of LED markers in real time. Each PhaseSpace camera has two detectors. Each detector consists of a semi-cylindrical lens and a linear CCD at the focal distance of the lens and perpendicular to the axis of the lens. The motion capture sequence was recorded within capture volume of 3x3 meters.

57

7. Results This chapter is dedicated to presentation of all results obtained with the developed software. In first order the possibility of generating visual results with the software is discussed and proved to be one of the best assets of the software. Various forms of visual results are presented and commented. Secondly the accuracy of the software is considered and results of joint angle trajectories are presented.

7.1 Visual results All of the motion capture sequences used for verification of software accuracy were animated using Animate function. An example of animated sequence is presented in Fig.7.1 and Fig.7.2 which are showing walking on a straight line MoCap sequence from isometric and side view respectively. For the clarity of the illustration only few frames from the sequence were shown in figures. In case of all visual results obtained with Animate function the blue skeleton refers to data obtained from motion capture sequences and the red skeleton refers to data obtained using analytical IK algorithm. Well matching blue and red contours indicate proper pose estimation, while separated red and blue contours indicate erroneous pose estimation.

58

Fig.7.1 Isometric view of ASF/AMC sequence of performer walking on a straight line. Only selected frames of sequence are presented.

Fig.7.2 Side view of ASF/AMC sequence of performer walking on a straight line. Only selected frames of sequence are presented.

59

Another example of visualization obtained with software is presented in Fig.7.3. Fragment of motion sequence of C3D file format of a performer walking in a circle is shown in this figure. Besides the red and blue contours typical for Animate function trajectories obtained as a result of analytical IK algorithm are added to the visualization. The trajectories correspond to the poses of articulated body model from the previous frames of the sequence with data at their tip defining current pose of skeleton. Color magenta, cyan, green and yellow refer to the left elbow, right elbow, left knee and right knee positions respectively. The data used in this sequence origins from the research [34] by L.Sigal. In his paper the motion capture sequence was synchronized with the view from multi-camera system, what allowed for fitting marker data to corresponding locations on the images captured by the system. This technique was used to overlay the contour of human pose on the image captured by camera. Data and scripts used in that research were adapted for verification of software developed in this software. The positions of markers of middle joints in original sequence were replaced with positions obtained from analytical IK algorithm. This allowed for presentation of contour of pose on the image with performer. Results of this operation are shown in Fig.7.4 and are in good agreement with the pose of performer in corresponding camera views.

Fig.7.3 Trajectories of middle joints obtained with analytical IK algorithm for the sequence of performer walking in a circle.

60

7.4 Different views of multi-camera system synchronized with MoCap equipment. Colored skeleton refers to data obtained using analytical IK algorithm.

The last dataset was captured at AGH University and contains sequence of pick and place action characteristic for robotics applications. The placement of markers was limited to the left arm of performer. In Fig.7.5 and Fig. 7.6 selected frames of this sequence are presented in the front and isometric view respectively. The green rectangle imitates the surface from which the object was first lifted and than placed in another location on the table within the workspace of the performers arm. The trajectory of end effector of left arm kinematic chain was plotted as a curve in magenta color. The discontinuities in the trajectory reveal lack of marker data for particular frames. This might be caused by occlusion of marker during the performance of actor or improper collection of marker data.

61

Fig.7.5 Front view of pick and place action.

Fig.7.6 Isometric view of pick and place action.

62

7.2 Numerical results The position error was estimated for all sequences used to verify the software. It was estimated as the absolute difference in position of middle joint of kinematic chain between position obtained using analytical IK algorithm and original position from the motion capture data. All results were given in centimeters. In the upper part of Fig.7.7 results of position error for upper limbs are presented, while results of position error for lower limbs are presented in lower part of the figure. At the upper graph one can observe cyclic peaks appearing in the position error. They occur due to cyclic movements of performer during sequence of walking. The optimization vector does not consider the inertial nature of movements, thus the higher error in dynamic sections of the movement. Some improvements could be introduced into selection of optimization vector through analysis of rate of change of markers position, yet this is not the solution included in the software. Similar problem occurs for the lower limbs.

Fig.7.7 Position error of upper and lower limbs during ASF/AMC walk sequence.

63

Further similar investigation was performed for the sequence of actor walking in circles. The results were presented in Fig.7.8. Again the upper part presents the position error of upper limbs, while the bottom part presents the results for lower limbs. Cyclic pattern characteristic for walking sequences can be observed once again with significantly higher peaks reaching up to 30 cm error in the estimation of middle joint position. Unfortunately the reason of this peaks appearance was not identified.

Fig.7.8 Position error of upper and lower limbs during the motion sequence of a performer walking in circles.

Eventually the same verification procedure was applied to the last dataset with motion sequence of pick and place action. In contrary to other sequences, here only left arm kinematic chain is considered. Position error is slightly higher for this sequence, what may be caused by the quality of collected marker data. Two markers were placed at each joint of the left arm. The positions of arm joints were interpolated from the positions of

64

two markers for each joint. This solution helped to avoid the problem of missing marker data for some frames, which is very common for motion capture sequences. However this might have introduced an error to the collected data, thus lowering the quality of final results. It can be also noticed that cyclic pattern of high error disappeared due to different nature of motion. Position error for this sequence was presented in Fig.7.9. Moreover mean position error was calculated for specific limbs for all sequences and presented in Tab.7.1

Fig.7.9 Position error of the middle joint of left arm during motion sequence of pick and place action.

Left elbow [cm]

Right elbow [cm]

Left knee [cm]

Right knee [cm]

ASF/AMC walk

3.8734

2.6867

2.8027

3.6822

C3D Research data

4.9173

3.5503

4.9321

3.9480

C3D AGH pick&place

5.3322

-

-

-

Tab.7.1 Mean position error of middle joint of limbs kinematic chains for different motion capture data.

To consider if the proposed method is suitable for the real-time applications time needed for calculation of single frame was estimated by dividing total time of sequence analysis by the number of frames. Results are presented in Tab.7.2 Additionally to estimation of position error and speed of analysis the trajectories of joint angles were obtained. Results were presented in Fig.7.10. Theta 1,2 and 3 correspond to 3 degrees of freedom characterizing the joint at the base of kinematic chain of limb. Theta 4 describes the angle of middle joint of kinematic chain and corresponds to one degree of freedom describing this joint. Joint angle trajectories were 65

Time needed for analysis of one frame [s] ASF/AMC walk

0,00094676

C3D Research data

0,00079764

C3D AGH pick&place

0,00016861

Tab.7.2 Time needed for analysis of one frame for different motion capture sequences.

calculated using formulas presented in section 5.1.3. The correctness of obtained results was not verified thus presented trajectories are unreliable at this point. The discontinuities appearing in the trajectories suggest faulty interpretation of results of atan2 function used for calculation of angles. Properly aligned data would provide smooth trajectories for all angles. Once this functionality is verified one could easily implement automatic verification of joint constraints rather than relay on empirical method in this matter. For brevity of the section only joint angle trajectories of left arm for different motion capture sequences were presented in Fig.7.10

7.3 Conclusions In this section all of the obtained results were presented and confronted with the objectives of the paper. Main focus of the software was providing initial guess of pose for further improvements with computational efficiency allowing for real-time applications. The error of position estimation of middle joint oscillates around 5 cm leaning towards 4 cm, thus providing acceptable accuracy for initial guess of pose. Developed software is capable of providing the result for single frame of a sequence within time of 0.001 second which is more than enough for real-time applications. Additionally various visual results were presented, where it could be observed that proposed method provides natural looking poses within joint constraints. In general the method provides reliable results and satisfies the objectives of the thesis, yet there are issues that should be addressed in future work on the software.

66

7.10 Joint angle trajectories of left arm kinematic chain joints for different motion capture sequences.

67

8. Summary In order to conclude the work presented in thesis final conclusions were presented in this chapter. Emphasis was put on fulfillment of all initial assumptions and critic evaluation of obtained results. Additionally possible applications of developed software were presented. At last consideration of future work on the software in most promising directions is discussed.

8.1 Final conclusions Main motivation for choosing the subject of this thesis was the interest of author in field of computer vision and image processing techniques and interest in self development in this field. More importantly it was an opportunity to take part in a research of optical pose estimation software development project. To conclude this thesis it should be clearly stated that all the initial objectives were successfully fulfilled, moreover some additional features like animation tool were included in the final solution. The objective was to bridge the gap between real-time operation of such software, while maintaining satisfactory accuracy of provided results. This condition was successfully met through reviewing state of the art solutions in the field of inverse kinematics and pose estimation which resulted in choosing appropriate method for considered problem. Next, selected analytical inverse kinematics method was implemented in Matlab programming environment allowing for quick prototyping of initial pose estimation. To achieve satisfactory results an optimization method exploiting redundancy of human limb kinematic chain was employed to the solution. Algorithm was working on 16 DOF articulated body model developed in scope of this thesis. In order to verify the accuracy of the method a motion capture data was used. Due to complicated structure of this data at first extensive review of existing capturing methods and file formats was conducted. Developed software was evaluated basing on MoCap data obtained from various sources, including motion sequence captured in one of AGH University laboratories. Final results yielded acceptable accuracy with mean pose estimation error in the range of 3-6 centimeters. The most accurate results of approximately 3 centimeters were obtained for walking sequence of ASF/AMC 68

datatype. For walking in circles sequence of C3D datatype the obtained accuracy was approximately 5 centimeters. This specific data was released in [34] as a comparative tool for pose estimation community to easily compare state of art solutions in this field. Existing solutions provide similar accuracy of approximately 7 centimeters in [37] and 10 cm in [38]. However both of this solutions are working on input data obtained from image feature extraction rather than MoCap data. Lowest accuracy was obtained for C3D dataset, position of middle joint was estimated with 5.5 centimeters mean error. It was observed that implemented method shown significantly higher accuracy for gait MoCap sequences then for sequences of complex motions e.g. martial arts sequence. Eventual outcome of this project was development of fully operating standalone software for pose estimation. Developed method takes approximated position of base and end effector of limb kinematic chain as input data and returns optimized position of middle joint. This process is repeated until configuration of all human limbs is determined. An interesting additional feature extending basic function of the software is a tool capable of animation of arbitrary motion capture sequence stored in ASF/AMC file format. applications of the developed software one can place pose estimation, animation of virtual characters, gait analysis, robotics and tracking of joint trajectories of articulated structure. A critical approach was taken to evaluate vulnerabilities of developed software. Among those is lack of method for incorporation of joint constraints to the software, yet empirical verification of results based on animation of sequence shown good agreement with assumed ranges of motion for specific joints. Another feature that should be taken care of is proper interpretation of joint angles obtained using inverse kinematic method. Besides the source code of the developed software and MoCap data used for its verification the content of this thesis includes: Appendix A describing the usage of the software, video in .avi format showing exemplary use of the software and Appendix B with acknowledgment to authors of scripts used to import MoCap data into Matlab. Appendix A and B are only included in digital version of the thesis.

69

8.2 Future work Although developed software provided acceptable results a lot of potential improvements could be introduced. Main focus of future work will be to incorporate developed method into existing optical pose estimation software package. In order to success in this matter one should consider decreasing the quality of input data while maintaining acceptable accuracy of output motion, since motion capture data provides better quality than data obtained through feature extraction from 2D image. Considering the accuracy and robustness of software the amendments should be introduced to evaluation of joint constraints and interpretation of obtained joint trajectories. Further improvement should also consider dealing with occlusions which is a classical problem in field of pose estimation. At last a research towards another potential applications for this method should be conducted. To sum up this section it needs to be said that there is a lot of space for potential improvement of the software and further research in this field.

70

BIBLIOGRAPHY 1. L. Sigal, Human pose estimation, Disney research, Pittsburgh, http://cs.brown.edu/~ls/Publications/SigalEncyclopediaCVdraft.pdf. 2. M. Furniss, Motion Capture, MIT communications forum, 1999, http://web.mit.edu/comm-forum/papers/furniss.html. 3. T. Krzeszowski, B. Kwolek, K. Wojciechowski, Model based 3D human motion capture using global-local particle swarm optimizations, CORES 2011, Advances in Intelligent and Soft Computing, Springer, 2011. 4. J. Shotton, R. Girshick, A. Fitzgibbon,T. Sharp, M. Cook, M. Finocchio, et al. , Efficient Human Pose Estimation from Single Depth Images, CVPR 2011, Colorado Springs. 5. www.andrewbedford.com, 13.09.2015. 6. R. L. Lathrop, A. M. W. Chaudhari, R. A. Siston, Comparative Assessment of Bone Pose Estimation Using Point Cluster Technique and OpenSim, Journal of Biomechanical Engineering, Volume 133, Issue 11. 7. T. Krzeszowski, B. Kwolek, A. Michalczuk, A. Switonski, H. Josinski, View Independent Human Gait Recognition using Markerless 3D Human Motion Capture, ICCVG 2012, LNCS, Springer, 2012. 8. Š. Obdržáleka, G. Kuriloa, J. Hanb, T. Abreschb, R. Bajcsya, Real-Time Human Pose Detection and Tracking for Tele-Rehabilitation in Virtual Reality, University of California, Berkeley, USA, http://www.eecs.berkeley.edu/~gregorij/MMVR2012.pdf. 9. S. Obdržálek, G. Kurillo, F. Ofli, R. Bajcsy,E. Seto, H. Jimison and M. Pavel, Accuracy and Robustness of Kinect Pose Estimation in the Context of Coaching of Elderly Population, EMBC, IEEE 2012, San Diego,CA, pages 1188 – 1193. 10. A. Reiss, G. Hendeby, G. Bleser, D. Stricker, Activity Recognition Using Biomechanical Model Based Pose Estimation, Lecture Notes in Computer Science, Volume 6446, 2010, pp 42-55. 11. D. Roetenberg, H. Luinge, P. Slycke, Xsens MVN: Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors, https://www.xsens.com/wpcontent/uploads/2013/12/MVN_white_paper1.pdf (visited 18.08.2015).

71

12. A. Aristidou, J. Lasenby, Inverse Kinematics: a review of existing techniques and introduction of a new fast iterative solver, Cambridge University Engineering Department, 2009. 13. Microsoft Encarta Online Encyclopedia 2008. 14. NASA, Man-System Integration Standards, Anthropometry and Biomechanics, http://msis.jsc.nasa.gov/sections/section03.html. 15. A. A. Canutescu, R. L. Dunbrack. Cyclic coordinate descent: A robotics algorithmfor protein loop closure. Protein Science, 12(5):963–972, May 2003. 16. P. Baerlocher, Inverse Kinematics techniques of the interactive posture control of articulated figures, Lausanne, EPFL 2001. 17. J. Denavit and R.S. Hartenberg, 1955, "A kinematic notation for lower-pair mechanisms based on matrices." Trans ASME J. Appl. Mech, 23:215–221. 18. R. S. Hartenberg, J. Denavit, Kinematic Synthesis of Linkages, New York, McGraw-Hill, 1964. 19. J. J. Craig, “Introduction to Robotics: Mechanics and Control”, Addison-Wesley, 1955. 20 David E. Orin and William W. Schrader. Efficient computation of the jacobian for robot manipulators, The International Journal of Robotics Research, 3(4):66–75, 1984. 21. C. W. Wampler. Manipulator inverse kinematic solutions based on vector formulations and damped least-squares methods. Proceeding of the IEEE Transactions on Systems, Man and Cybernetics,16(1):93–101, 1986. 22. Y. Nakamura and H. Hanafusa. Inverse kinematic solutions with singularity robustness for robot manipulator control. Trans. ASME, Journal of Dynamic Systems, Measurement, and Control,108(3):163–171, September 1986. 23. L. Wang, C. Chen, “A Combined Optimization Method for Solving the Inverse Kinematics Problem of Mechanical Manipulators”, IEEE Transactions on Robotics & Applications, Vol. 7, No. 4, pp.489-499, 1991. 24. C. Welman, “Inverse kinematics and geometric constraints for articulated figure manipulation”, Master of Science Thesis, School of Computing Science, Simon Fraser University, 1993. 25. R. Fletcher. Practical methods of optimization; (2nd Ed.). Wiley-Interscience, New York, NY,USA, 1987. 26. Kwan W. Chin, B. R. von Konsky, and A. Marriott. Closed-form and generalized 72

inverse kinematics solutions for the analysis of human motion. volume 5, pages 1911– 1914, 1997. 27. D.Tolani, A. Goswami, N.I.Badler, Real-Time Inverse Kinematics Techniques for Anthropomorphic Limbs, Graphical Models 62, 353–388 (2000). 28. D. Tolani, N. I. Badler, “Real-time inverse kinematics of the human arm”, Presence, Vol.5, No. 4, 1996. 29. https://en.wikipedia.org/wiki/Motion_capture, 01.09.2015. 30. M. Kitagawa, B. Windsor, MoCap for Artists: Workflow and Techniques for Motion Capture, USA, Elsevier CRC Press, 26.07.2012. 31. A. Menache,Understanding Motion Capture for Computer Animation, USA, Elsevier, 2011. 32. J. Lander, Working with Motion Capture file formats, Game developer, Jan 1998. 33. J. Korein, A Geometric Investigation of Reach, Ph.D. thesis, University of Pennsylvania, 1985. 34. L.Sigal, S.Bhatia, S.Roth, M.J.Black, M. Isardy, Tracking Loose-limbed People, CVPR, 2004. 35. http://mocap.cs.cmu.edu/, visited 07.09.2015. 36. ftp://sccn.ucsd.edu/pub/MoBI/Phasespace %20Documentation/PSImpulseOverview.html. 37. J. B. Huang, M. H. Yang, Estimating Human Pose from Occluded Images , Asian Computer Vision Conference (ACCV), 2009. 38. L. Raskin, E. Rivlin, M. Rudzsky, Using Gaussian Process Annealing Particle Filter for 3D Human Tracking , EURASIP Journal on Advances in Signal Processing, 2007.

73