Pattern Recognition through the Kinect Interface A Character Drawing Application

Pattern Recognition through the Kinect Interface A Character Drawing Application Student: Bin Gao Faculty Sponsor: Shi-Kuo Chang April 29, 2011 Conte...
Author: Karin Heath
2 downloads 0 Views 517KB Size
Pattern Recognition through the Kinect Interface A Character Drawing Application Student: Bin Gao Faculty Sponsor: Shi-Kuo Chang April 29, 2011

Contents 1 Aim of the Paper

1

2 Goal

2

3 Background

2

4 Scenario

2

5 Initial Requirements

2

6 Conceptual Design

3

7 Finished Program

4

8 Environment and Driver

7

9 Data Processing

7

10 Training Phase and Recording

8

11 Conclusion

8

1

Aim of the Paper

This paper is aimed to provide a comprehensive but concise summary of the capstone project on the Kinect interface. Its focus is on how the project was 1

done rather than how I did the project. It covers the initial requirements, the design phase, the features of the program and the implementation details. However, more detailed information about the implementation should be directed to the relevant source code, which is available as an open source project.

2

Goal

The goal of this project is to recognize gestures from the Kinect interface and use them in changing the stage scenes and in human/stage scenes interactions.

3

Background

The play Love for a Fallen City based on a short story by Shi-Kuo Chang will be staged in Spring 2012 at the National Theatre of Taiwan. Since the new Kinect interface, originally designed for Xbox 360, can be used in changing the stage scenes, and in actors/stage scenes interactions or actors/audience interactions, a project was initiated to utilize the Kinect interface to implement an interactive scenario.

4

Scenario

One scene of Act 2 in the play titled Boring Boxing by a Bored Boxer will be the focus of this project. In the scene, the actor would draw the character meaning “bored”, then point at a “gate” and draw a “heart” to create the same character, and then draw a “gate” and point at the “heart” to create the same character, etc.

5

Initial Requirements

(1) Based upon a switch variable, the component can either deal with live input from Kinect sensors, or read from the stored Kinect file. (2) In the training phase the actor/user will extend his arm so that the component can estimate the threshold to distinguish gestures from writing strokes. (It may be necessary to have additional parameters that can be set/adjusted by the actor/user.)

2

(3) Recognize the gestures for full-character (left), heart (middle) or doorframe (right). (4) Recognize the writing strokes.

6

Conceptual Design

Figure 1: Two Phases Control The initial requirements demand two types of gestures to be recognized. The first type of gestures are command gestures, through which the actor can interactively change the objects placed on the screen. The second type of gestures are writing strokes, through which the actor can draw characters on the screen. Based on these two requirements, the program is designed to receive gestures from two regions defined by their distances to the camera. As shown in Figure 1, the space in front of the camera is divided into 3 regions according to the Command Threshold and the Stroke Threshold. The region between the Stroke Threshold and the camera is called the writing region, through which the actor can put writing strokes. The region between the Command Threshold and the Stroke Threshold is called the 3

command region, through which the actor can perform command gestures. Any gestures placed in the region further than the Command Threshold are not received by the camera. The rendering of the writing strokes on the screen is designed to be as smooth and natural as possible. The visual effects of the character writing performed by the actor should be close to the feelings of watching traditional Chinese calligraphy performance. The idea is to emphasize the materiality of the character and the physical act of its creation.

7

Finished Program

Figure 2: The User Interface As shown in Figure 2, the main user interface is divided into 2 areas, the data visualization area and the control area. The data visualization area is again divided into 3 parts: • The raw data visualization part in the upper left corner, which provides a visual presentation of the raw data captured from the depth camera. 4

• The analyzed data visualization part in the lower left corner, which only shows the relevant gestures contained in the raw data. • The character drawing part to the right, which shows the character drawing process that will be shown on the screen. The control area contains various buttons used to control the program: • The Begin Painting button is used to connect the depth camera to the program and begin receiving data from the camera. • The Reset Paint button is used to clear the character drawing part. The function can also be triggered by two-hands-down command gesture. • The Save Image button is used to save the current image in the character drawing part to a properly formatted file. • The Open Image button is used to open the image saved in a file. • The Recording File textbox is used to change the file name of the current recording file. • The Start Recording button is used to start and stop recording. All the recorded raw depth data will be stored in the current recording file preset in the Recording File textbox. • The Get Depth From Recording button is used to play the current recording file. • The Stroke Threshold textbox is used to change the Stroke Threshold in the program. The function can also be performed in the training phase of the program. • The Command Threshold textbox is used to change the Command Threshold in the program. The function can also be performed in the training phase of the program. • The Save Thresholds button is used to save the current thresholds in the default properties file, so that the values can persist through program executions. • The Status Bar displays the current Stroke Threshold, the current Command Threshold, the gesture being recognized, the Minimal Distance, and other command prompt information. 5

The available command gestures are: • Hand to middle left, displaying the full-character meaning “bored”. • Hand in the middle of the screen, displaying the heart. • Hand to middle right, displaying the door-frame. • Two hands up, beginning the training phase. • Two hands down, clearing the character drawing part of the program. The actor can use two-hands-up command gesture to trigger the training phase. Then he/she will be allowed 5 seconds to extend his arms to set the Stroke Threshold. After setting the Stroke Threshold, the actor will be allowed another 5 seconds to extend his arms to set the Command Threshold. After setting the Command Threshold, the thresholds will be reflected in the program. Any thresholds set out of the valid range will be reset to the default values. The recording function will be best performed by two people, one controlling the program and the other performing gestures. After the depth camera is connected, the Start Recording button will be enabled. By clicking the Start Recording button, the program will begin recording the raw data and storing them in the current recording file. The controller of the program can click the same button again to stop the recording. After 300 frames of raw data being stored, the program will stop the recording automatically regardless whether the recording has been stopped by the controller manually or not. 300 frames of raw data will occupy about 500 megabytes of disk space. Note that after the launch of the program, if the program is initially chosen to play recording files, it can still connect to the camera after playing recording files. However, the operations are not reversible, which means once connected to the camera, the program needs to be restarted in order to play recording files. All the source code can be checkouted from the url at https://kinect.googlecode.com/svn/branches/Depth. The version number of the program described in this paper is revision 62.

6

8

Environment and Driver

The program is written in C# and Visual Studio 2010 environment. The program uses the CL NUI Platform driver, which can be found at http://codelaboratories.com/nui. The CL NUI Platform driver is written in C and will supply raw data captured from the depth camera. In order to use the C driver in the C# environment, the driver needs to be marshalled into the new environment. The detailed C library to C# importing process can be found in the CLNUIDevice.cs of the source code. There are other drivers, like the driver-analyzer bundle OpenNI and PrimeSense, that can provide analyzed data to the program. Despite the readiness of the analyzed data, the quality of the data provided are unverifiable. These drivers are not used in the current version of the program.

9

Data Processing

Figure 3: The Raw Data As shown in Figure 3, the raw data captured from the depth camera are represented as a two-dimensional array of integers. The raw data are processed in the program through multiple steps. Most of the data processing methods reside in the DepthData.cs file. • ImageToDepthField method, copying the raw data stored in the In7

teropBitmap source to an one-dimensional array based on a stride value of 4. • UnprojectDepths method, converting the one-dimensional array to its corresponding two-dimensional array format based on the predefined row numbers constant and column numbers constant. • UnprojectDepthLevels method, the main method used to process the raw data. The first step in the process is to calculate the Minimal Distance. The second step is to filter out the background noises, which are the distance values greater than the Command Threshold, greater than the Minimal Distance plus the Range Value, and 0 values. The third step is to identify writing strokes data, which are distance values less than the Stroke Threshold. The fourth step is to identify command gestures data, which are distance values in between the Command Threshold and the Stroke Threshold. • GetCommand method, converting gestures data into commands. The first step in the process is to identify which command triggering areas are covered by gestures. The second step in the process is to output commands based on different combinations of the covered command triggering areas.

10

Training Phase and Recording

The implementation of the training phase resides in the DepthVM.cs main class file. The Training method begins the training phase and triggers the DispatcherTimer. Other following operations are implemented in the SetThresholds method. The implementation of the recording function resides in the Recorder.cs class file. The recording function is the consecutive writting of frames of raw data into the current recording file.

11

Conclusion

The character drawing application completed all the initial requirements and consolidated them into a self-contained program. The analysis of the raw data will allow new gestures to be added to the program readily. The source code is structurized to facilitate further incorporation of new drivers 8

and new features. This application explores and verifies the possible ways to implement interactive scenarios of a stage play.

9