A brief overview of Microsoft Kinect and its applications

A brief overview of Microsoft Kinect and its applications Matheus Giovanni Soares Beleboni University of Southampton Southampton, England mgsb1g13@so...
8 downloads 0 Views 1MB Size
A brief overview of Microsoft Kinect and its applications Matheus Giovanni Soares Beleboni University of Southampton Southampton, England

[email protected]

ABSTRACT Originally designed by Microsoft as an add-on for the Xbox 360 video game console, Kinect has proven to be an incredible piece of technology outside of the gaming scope. Its relatively low cost depth sensor, able to interpret full-body 3D movements, and even recognize gestures and voices, has provided a plethora of new opportunities, especially in the area of multimedia computing. This paper initially presents a succinct overview of Microsoft Kinect itself, introducing the basic ideas behind this powerful technology, and later, discusses some of the current studies that have been conducted, focusing on areas such as robotics and medicine, for example. Finally, a conclusion about the effectiveness of Kinect is presented and some considerations are made about the potential future of this remarkable technology.

General Terms Design, Experimentation, Human Factors.

Keywords Microsoft Kinect, Depth Sensor, RGB-D, Computer Vision, Skeletal Tracking, Scene Recognition, Object Recognition, Gesture Recognition, Robotics Applications, Natural Interaction Interface, Human-computer Interaction.

1. INTRODUCTION

Since its inception in November 2010, Kinect1 has come on leaps and bounds. From its fledgling gaming concept years ago, it has evolved into an incredibly promising new technology, proving to have a plethora of possible applications. When designing this technology, Microsoft’s aim was to enhance entertainment experience with the Xbox 360 video game console. Due to the Kinect being a motion sensing input device, able to capture, track, and interpret full-body movements, and recognize gestures and voices2, the scope for research outside of the gaming experience was quickly realized. Soon after its announcement, the Kinect had already attracted the attention of several hackers, interested in the various different 1

For the sake of simplicity, in this paper, Kinect refers to both the sensing hardware able to capture RGB/Depth information and the software that interprets these signals. 2

This paper focuses on the aspects of Kinect related to computer vision. For this reason, particulars about the audio component will not be discussed. More details about this technology can be found in [24].

applications that would be possible if the Kinect could “communicate” with computers. “ ‘The day it was announced we were like, we’re going to reverse engineer this’ says Kyle Machulis, a hacker based in Berkeley, California” [29]. It did not take long until the Kinect was successfully hacked and drivers to allow the use of its main functionalities were created. In response to that, Microsoft released a non-commercial software development kit (SDK) for Windows, and since then, Kinect's potential has increased substantially [29]. This paper attempts to introduce and explain the Kinect’s technology itself, as well as to explore some of the current applications into fields of research such as medicine, robotics, and natural interfaces, for example.

2. RELATED WORK Zang (2012) wrote about Kinect’s impact in the context of multimedia computing and pointed out how this incredible device could create new opportunities for many different kinds of applications [1]. Thanks to this, I was inspired to write this paper – attempting to summarize some of the most diverse application of this amazing technology. Following a similar line of reasoning, Han et al. (2013) wrote about how the Kinect has provided potential new and innovative solutions to some of the classical problems of computer vision, mostly by presenting and evaluating many of its possible applications in this area [2]. Durrant (2013) addressed a similar topic – but, instead focusing more on the technology’s history as well as advantages and disadvantages of its possible uses in some particular fields, such as education and medicine, for example [4].

3. KINECT SENSOR The Kinect Sensor is the outcome of several devices working together. As can be seen in figure 1, it is comprised of a colour camera (also known as RGB), a depth sensor, a 4-microphone array, and a motorized tilt. The depth sensor, in itself, consists of an infrared (IR) projector and an IR camera. Separately, these devices may seem simple, and not much different from the other technologies available on the market; however, when they’re put to work together, and use some existing software (e.g., openNI [5] or Microsoft Kinect SDK [6]) to interpret the signals obtained, they become a powerful device, which can provide full-body 3D motion capture, facial recognition and even voice recognition [1]. One of Kinect’s greatest differentials is its relatively low cost depth sensor. Even though the details of its implementation are not publicly available, some believe that it is based in the structured light principle [1, 3]. The basic idea behind this principle is to project a known pattern onto the scene, and based

Figure 1: Microsoft Kinect [2]

Figure 2: Kinect’s speckle dot pattern [Available at: https://www.youtube.com/watch?v=nvvQJxgykcU]

on this patterns’ deformation, infer some depth information about the scene [3]. In the Kinect’s case, the IR projector displays a speckle dot pattern, as shown in figure 2. This pattern is invisible both to human beings eyes and to the RGB camera. However, when it is reflected, it can be captured by the IR camera, which uses triangulation methods to reconstruct the scene in 3D [1, 2]. Subsequently, the depth information about the scene is represented using a grey-scale: the darker a pixel, the closer it is to the camera. The pixels represented using the black colour correspond to those without any depth information; it usually happens when the objects are too close or too far away from the camera. One example of this kind of representation can be seen in figure 3b, which corresponds to the depth information of the scene shown in figure 3a. Each application can use the depth information obtained by Kinect in a number of different ways, in order to solve specific problems. Some are interested in Skeletal Tracking, for example, whereas others, in Scene Recognition. Therefore, this paper aims not to present all the possible applications of this amazing piece of technology, but to show how different kinds of applications can benefit from the information acquired by the Kinect.

4. RECENT APPLICATIONS As was previously mentioned, there are several possible uses for the Kinect outside the scope of video games, and the context and complexity of these potential applications may vary significantly: from a simple system to produce sound effects in response to body movements [26], to high risk applications in which robots are teleoperated to try to find survivors in disaster hit areas [13]. The following sections attempt to summarize some of the current researches involving the Kinect, focusing on a few specific fields.

4.1 Robotics Despite being originally designed as an add-on to a video game console, it has been proven that the Kinect sensor can be useful in many robotics applications, especially for indoor navigation [7]. El-laithy et al. (2012) compared the accuracy of the results obtained with a traditional laser sensor, with the results obtained using the Kinect. It was shown that although the latter is not as accurate as the laser, the results could be considered acceptable, mainly because of the significant cost difference between these two technologies. The Table 1 shows the results of one of the experiments proposed by [7]. It can be verified that the difference in accuracy between these two is very low, except for the cases in which the objects are too close or too far away from the Kinect.

The Kinect has also been used in some applications for people with disabilities. Zöllner et al. (2011) developed a mobile navigation aid that uses the information obtained by the Kinect to help visually impaired people move inside buildings [8]. The main purpose of the project was to enable both micro- and macronavigation, i.e. to create a system that identifies and notifies the user about other people and/or obstacles in the way, and, at the same time, provides sound warnings about the environment (e.g., “Door in 3”, “2”, “1”, “Open the door”) using printed augmented reality markers. With a slightly similar objective, German researchers have used the Kinect to assist an autonomous wheelchair to navigate to pre-selected waypoints whilst identifying obstacles and adjusting the route to avoid them [9, 10].

Table 1: Indoor results from 0 to 4 meters comparing the laser sensor with the Kinect sensor [7] Actual Distance

Laser Sensor

Kinect Sensor

0

0

0

40

40

0

80

80

81

120

120

120

160

160

160

200

200

201

240

240

241

280

280

279

320

320

320

360

360

360

400

400

395

Finally, it would be almost impossible to discuss Kinect’s possible applications in Robotics without mentioning what, perhaps, is the most promising of all of them: teleoperations – in which a system can be operated over a distance. With the advancement of the internet and wireless networks, an environment highly suitable for this kind of activity was formed, and a lot of research has been conducted on the use of Kinect in teleoperated systems [11]. In most cases, the information obtained by the Kinect is used to control robots using gesturebased imitations, with projects that vary from systems to control a

Figure 3: a) RGB Camera (l.) b) Depth Representation (r.) [3] robot hand [12] to humanoid robots for Search and Rescue operations [13, 14].

4.2 Medicine Another field in which the Kinect can be useful is Medicine. The relatively low cost of this technology and its capacity to interpret human movements has enhanced the research and provided great results. Chang et al. (2011) was one of the first to evaluate the effectiveness of Kinect in studies involving physical rehabilitation [15]. Through experiments with two young adults, it was verified that the use of the Kinect could truly increase the motivation of people with motor disabilities to exercise. A similar study was conducted by Lange et al. (2011), which developed and evaluated a tool for balance rehabilitation using the Kinect [16]. One of the important aspects of this work is how it highlights the difficulties that some users faced to calibrate the Kinect sensor before starting the experiment. Calibration is an important and essential step for the Kinect to be able to track and interpret the users’ movements. What was reported in [16] is that a considerable number of users could not participate in the experiment for not being able to make the initial pose (as can be seen in figure 4) during the calibration step, due to physical disabilities. In addition to being able to detect and interpret full-body movement, the Kinect is also able to interpret gestures, such as slight hand movements, for example. It was based on this functionality that Gallo et al. (2011) proposed a system to manipulate medical image data using the Kinect as the only input device [17]. By doing this, [17] created a new controller-free interaction that can be useful in many different situations, for instance, during a surgical procedure [18], when the surgeon cannot touch any kind of non-sterilized materials.

4.3 Natural Interfaces One of the main reasons why the Kinect was created was so that Xbox 360 users could interact with the video game console using gestures very close (if not identical) to those we use in our daily routine. In other words, Kinect would serve as an interaction tool, enhancing entertainment experiences. In many recent applications, the basic idea behind the decision of using Kinect is still the same: to create a new way of interaction, but in this case between users and systems (and not games). It was with this purpose that Boulos et al. (2011) created the Kinoogle: a software that provides a natural interface with Google Earth and Street View, allowing users to navigate using only body movements and gestures [19]. Moreover, there has been research into the use of Kinect in 3D geographic information systems (GIS) in archaeology. Richards-Rissetto et al. (2012) created a portable system to virtually navigate through a digitally

Figure 4: Initial pose in the calibration step [16] reconstructed ancient Maya city and some positive results have been achieved [20].

4.4 3D Scanning Several applications, nowadays, require 3D representations of scenes and/or objects, as previously mentioned [19, 20], and the task of 3D scanning has gained more and more prominence. There are many technologies to assist in this kind of activity, but nearly all of them are at a very high cost and are considerably bigger. On the other hand, the Kinect has proven to be a relatively cheap and accessible alternative to perform this task, mainly because it can be easily handled and it is able to construct 3D representations in real time, with an accuracy that is acceptable for many applications [21]. Various researches have been conducted in this field, and applications may vary from a prototype to reconstruct 3D representations of archaeological sites [20, 22], to indoor scenes created from a set of Kinect frames that allow augmented reality simulations and physics-based interactions [23]. Izadi et al. (2011) developed a system called KinectFusion [23], which essentially allows the user to pick up the Kinect and move around the scene to create a detailed 3D representation in real time, using a novel GPU-based pipeline. However, what really differs [23] from other studies is the innovative idea to allow multi-touch interaction on any indoor scene with arbitrary surface geometries created using KinectFusion (as shown in Figure 5).

4.5 Other applications Even though only a few fields have been mentioned, the researches involving the Kinect are not restricted to them, and thus, before presenting the final discussions, some other applications worth mentioning.

4.5.1 Augmented Reality A good example that demonstrates the use of Kinect in augmented reality applications is the MirageTable [30]. This work utilizes the depth information obtained by the Kinect along with a stereoscopic projector and a curved screen to try to merge real and virtual worlds into the same experience. Figure 6 illustrates the system's setup and the basic arrangement of its components. The Kinect allows this interactive system to track the user’s eyes, and to detect and capture the shape of any object placed in from of the camera in real time, enabling the user to interact with 3D objects without wearing any additional trackers or gloves [30].

Figure 5: “Enabling touch input on arbitrary surfaces with a moving camera. A) Live RGB. B) Composited view with segmented hand and single finger touching curved surface. C) Rendered as surface normals. D) Single finger drawing on a curved surface. E) Multi-touch on regular planar book surface. F) Multi-touch on an arbitrarily shaped surface.” [23] Even though it may seem a simple project, the possibilities of applications proposed by the authors are diverse, ranging from games and virtual 3D model creation, to complex 3D simulations and 3D teleconferencing experience using interactive scenarios [30].

4.5.2 Music With a different purpose from the other applications shown so far, Yoo et al. (2011) developed a system that is able to interpret user's movements and turn them into sound effects [26]. Using some skeletal tracking techniques, the system detects the position of 24 skeleton joints, ranging from head to foot, and based on the proximity of these joints, it repeats certain sounds.

4.5.3 Education One of the fields in which the Kinect seems to have a great potential is education, which was not mentioned in the previous section for a particular reason: much of what is known about the possible advantages of using the Kinect to enhance teaching and learning experiences, is based only in theses and hypothesis. Only a few applications of this technology in education are available to be put into practice and have, once and for all, its effectiveness evaluated. Many studies conducted in this area are based in the outcomes of using interactive whiteboard (IWB) in the classrooms, and how this tool is able to make lessons much more interactive and interesting. Hsu (2011) presented a complete analysis on the use of Kinect in education and concluded that it should really achieve the expected results, enhancing classroom interaction and participation, as well as improving the way that teachers can manipulate multimedia materials during the classes [27]; however, as was already mentioned, this is just a hypothesis.

5. THE EFFECTIVENESS OF KINECT Throughout this paper, numerous applications that have benefited from the information provided by the Kinect were presented. However, it is important to note that, depending on the requirements of the problem, the Kinect may not be the best solution. Han et al. (2013) presented a detailed analysis on some of the major problems faced by applications that use the Kinect, such as:

Figure 6: MirageTable setup [30] the difficulty of performing the human pose analysis in real applications, in which the system has no information about the environment or about the users; how the results can be influenced by the distance between the objects and the Kinect, i.e., how the precision errors increase as we get closer or move away from the camera; and finally, how the illumination can influence the final results [2]. In order to solve, or at least minimize, some of these problems, it is important to make an efficient preprocessing with the information provided by the Kinect. This preprocessing can include steps such as: recalibration, in which the parameters stored during manufacturing can be readjusted; and depth data filtering, that can be a useful approach for depth image denoising, or missing depth recovering [2]. If even after preprocessing the original information, the data provided is not accurate enough, it could be a good alternative to use another kind of technology, but it would probably have a higher cost.

6. FUTURE WORK The future of Kinect is strongly related to the issues mentioned in the previous section and how Microsoft will try to address them. A possible alternative would be to increase the resolution and range of the depth sensor, providing more accurate information. By doing this, and achieving a higher accuracy level, new applications could benefit from the data provided by the Kinect, such as (perhaps) teleoperated robot-assisted surgery, for example. One of the current challenges is the use of multiple Kinects in the same application. Some projects have already found possible solutions, as shown in [28], but it still is an interesting topic because the possibility that the infrared dots projected by both devices onto the scene become entwined and confuse the result is very high. Finally, it would be impossible not to mention the new generation of the Kinect sensor, launched a couple of weeks ago, in November 2013, with the Xbox One. According to Microsoft, this “all-new” Kinect introduces higher fidelity of movements and details, expanded field of view, improved skeletal tracking, and a new active infrared (IR) [25,31]. Because this is a very recent release, there is still no work to evaluate its performance; but if the rumours are true, many of the applications that were previously mentioned may undergo some changes.

7. CONCLUSION After analysing the studies mentioned above, it can be concluded that the Kinect is an incredible piece of technology, which has

revolutionized the use of depth sensors in the last few years. Because of its relatively low cost, the Kinect has served as a great incentive for many projects in the most diverse fields, such as robotics and medicine, and some great results have been achieved. Throughout this paper, it was possible to verify that although the information obtained by the Kinect may not be as accurate as that obtained by some other devices (e.g., laser sensors), it is accurate enough for many real life applications, which makes the Kinect a powerful and useful device in many research fields.

8. ACKNOWLEDGMENTS Firstly, I would like to highlight that this work was supported by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico – Brasil). Secondly, I would like to thank Prof. Paul Lewis and Dr. David Millard for their presentations and guidance throughout this paper. Finally, my thanks to the University of Southampton for providing the resources used as references in this work.

9. REFERENCES [1] Z. Zhang, “Microsoft Kinect sensor and its effect”, Multimedia, IEEE, vol. 19, no. 2, pp. 4-10, 2012. [2] J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced Computer Vision with Microsoft Kinect Sensor: A Review”, IEEE Trans. Cybern., vol. 43, no. 5, pp. 1318-1334, Oct. 2013. [3] J. MacCormick, “How does the Kinect work?” [Online] Available at: http://pages.cs.wisc.edu/~ahmad/kinect.pdf [Accessed 10 November 2013]. [4] R. Durrant, “Microsoft Kinect - Visualising a 3D World”. In 11th annual Interactive Multimedia Conference, (Southampton, UK, 2013). [5] Open-source SDK for 3D sensors - OpenNI. 2013. [Online] Available at: http://www.openni.org [Accessed November 10, 2013]. [6] Kinect for Window Dev Center. 2013. [Online] Available at: http://www.microsoft.com/en-us/kinectforwindowsdev/ [Accessed 10 November 2013]. [7] R.A. El-laithy, J. Huang, M. Yeh, "Study on the use of Microsoft Kinect for robotics applications", Position Location and Navigation Symposium (PLANS), pp. 12801288, 2012. [8] M. Zöllner, S. Huber, H.-C. Jetter, and H. Reiterer, “NAVI – A Proof-of-Concept of a Mobile Navigational Aid for Visually Impaired Based on the Microsoft Kinect”, in Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011, Part IV. LNCS, vol. 6949, pp. 584–587. Springer, Heidelberg (2011).

[11] W. Song, X. Guo, F. Jiang, S. Yang, G. Jiang, and Y. Shi, "Teleoperation Humanoid Robot Control System Based on Kinect Sensor." Proc. of IHMSC, 2012, 264—267. [12] G. Du, P. Zhang, J. Mai, and Z. Li, “Markerless KinectBased Hand Tracking for Robot Teleoperation”, International journal of advanced robotic systems, vol. 9, 2012. [13] R. K. Megalingam, and A. P. K. Anandkumar Mahadevan, “Kinect Based Humanoid for Rescue Operations in Disaster Hit Areas”, International Journal of Applied Engineering Research, vol. 7, no. 11, pp. 2012. [14] BBC News - Microsoft Kinect-powered robot to aid earthquake rescue. 2013. [Online] Available at: http://www.bbc.co.uk/news/technology-12559231 [Accessed 10 November 2013]. [15] Y.-J. Chang, S.-F. Chen, and J.-D. Huang, “A Kinect-based system for physical rehabilitation: A pilot study for young adults with motor disabilities”, Research in developmental disabilities, vol. 32, no. 6, pp. 2566-2570, 2011. [16] B. Lange, C.-Y. Chang, E. Suma, B. Newman, A. S. Rizzo, and M. Bolas, "Development and evaluation of low cost game-based balance rehabilitation tool using the Microsoft Kinect sensor.", in Proc. IEEE Int. Conf. Eng. Med. Biol. Soc., 2011, pp. 1831–1834. [17] L. Gallo, A. P. Placitelli, exploration of medical Kinect.", in Proc. IEEE Computer-Based Medical 2011, pp. 1 –6.

and M. Ciampi, "Controller-free image data: Experiencing the Interna- tional Symposium on Systems (CBMS’ 2011), June

[18] Xbox Kinect in the hospital operating room - Your Health Matters. 2013. [Online] Available at: http://health.sunnybrook.ca/sunnyview/xbox-kinect-hospitaloperating-room/ [Accessed 10 November 2013]. [19] M. N. K. Boulos, B. J. Blanchard, C. Walker, J. Montero, A. Tripathy, and R. Gutierrez-Osuna, “Web GIS in practice X: a Microsoft Kinect natural user interface for Google Earth navigation”, International journal of health geographics, vol. 10, no. 1, pp. 45, 2011. [20] H. Richards-Rissetto, F. Remondino, G. Agugiaro, J. von Schwerin, J. Robertsson, and G. Girardi, "Kinect and 3D GIS in Archaeology", in F. Ferrise (ed.) 18th International Conference on Virtual Systems and Multimedia – Proceedings. pp 331-337. [21] L. Cruz, D. Lucio, and L. Velho, "Kinect and RGBD images: Challenges and applications.", Proc. of the 25th SIBGRAPI Conference, Aug. 2012, pp. 36-49. pp. 36-49.

[9] Intelligent Wheelchair @ FU Berlin - Smart Wheelchair Assistance Technology. 2013. [Online] Available at: http://userpage.fu-berlin.de/latotzky/wheelchair/ [Accessed 10 November 2013].

[22] University of California - UC Newsroom | Researchers turn Kinect game into a 3D scanner. 2013. [Online] Available at: http://www.universityofcalifornia.edu/news/article/26032 [Accessed 10 November 2013].

[10] Autonomous Kinect Electric Wheelchair at IFA 2011. 2013. [Online] Available at: http://www.gottabemobile.com/2011/09/03/kinectautonomous-wheelchair/ [Accessed 10 November 2013].

[23] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, and A. Davison, "KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera.", in Symposium on User Interface Software and Technology (UIST), 2011. 5

[24] I. Tashev, “Recent advances in human-machine interfaces for gaming and entertainment,” Int. J. Inform. Technol. Security, vol. 3, no. 3, pp. 69–76, 2011.

[28] 2 Kinects 1 Box - YouTube. 2013. [Online] Available at: http://www.youtube.com/watch?v=5-w7UXCAUJE [Accessed 10 November 2013].

[25] MSDN Blogs. 2013. “The New Generation Kinect for Windows Sensor is Coming Next Year”. [Online] Available at: http://blogs.msdn.com/b/kinectforwindows/archive/2013/05/ 23/the-new-generation-kinect-for-windows-sensor-iscoming-next-year.aspx [Accessed 10 November 2013].

[29] J. Giles, “Inside the race to hack the Kinect”, The New Scientist, vol. 208, no. 2789, pp. 22-23, 2010.

[26] M.-J. Yoo, J.-W. Beak, and I.-K. Lee, “Creating musical expression using kinect”, Proc. New Interfaces for Musical Expression, Oslo, Norway, 2011. [27] H.-m. J. Hsu, “The potential of Kinect in education,” International Journal of Information and Education Technology, vol. 1, no. 5, pp. 365-370, 2011.

[30] H. Benko, R. Jota, and A. Wilson, “MirageTable: freehand interaction on a projected augmented reality tabletop”. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). ACM, New York, NY, USA, 199-208. [31] Innovation Xbox.com [Online] Available http://www.xbox.com/en-US/xbox-one/innovation. [Accessed 03 December 2013].

at: