Gesture Recognition in Smart Home Using Passive RFID Technology

Gesture Recognition in Smart Home Using Passive RFID Technology Kevin Bouchard, Abdenour Bouzouane and Bruno Bouchard LIARA Laboratory Universite du Q...
Author: Elmer McDowell
1 downloads 0 Views 838KB Size
Gesture Recognition in Smart Home Using Passive RFID Technology Kevin Bouchard, Abdenour Bouzouane and Bruno Bouchard LIARA Laboratory Universite du Quebec a Chicoutimi (UQAC) Chicoutimi, G7H 2B1, Canada

{kevin.bouchard, bruno.bouchard, abdenour.bouzouane}@uqac.ca ABSTRACT Gesture recognition is a well-establish topic of research that is widely adopted for a broad range of applications. For instance, it can be exploited for the command of a smart environment without any remote control unit or even for the recognition of human activities from a set of video cameras deployed in strategic position. Many researchers working on assistive smart home, such as our team, believe that the intrusiveness of that technology will prevent the future adoption and commercialization of smart homes. In this paper, we propose a novel gesture recognition algorithm that is solely based on passive RFID technology. This technology enables the localization of small tags that can be embedded in everyday life objects (a cup or a book, for instance) while remaining non intrusive. However, until now, this technology has been largely ignored by researchers on gesture recognition, mostly because it is easily disturbed by noise (metal, human, etc.) and offer limited precision. Despite these issues, the localization algorithms have improved over the years, and our recent efforts resulted in a real-time tracking algorithm with a precision approaching 14cm. With this, we developed a gesture recognition algorithm able to perform segmentation of gestures and prediction on a spatio-temporal data series. Our new model, exploiting works on qualitative spatial reasoning, achieves recognition of 91%. Our goal is to ultimately use that knowledge for both human activity recognition and errors detection.

Categories and Subject Descriptors D.2.11 [Software Architectures]: Data abstraction, Domainspecific architectures, H.1.2 [User/Machine Systems], J.3 [Life and Medical Sciences]: Health.

General Terms Algorithms, Reliability, Experimentation, Human Factors.

Keywords Gesture recognition; passive RFID; smart home; trilateration; spatio-temporal series.

1. INTRODUCTION The ageing of the population in occidental countries as brought many social and economic challenges that require researchers and professionals to design innovative solutions. With the advances of ambient intelligence, opportunistic networks and the miniaturization of technology, it is now possible to create enhanced, pervasive, environments able to provide support services to one or many residents [1]. This idea could lead to prolong the life at home of people with loss of autonomy (e.g. cognitive impairment as caused by Alzheimer's disease) by simply making their home smarter. There are many challenges toward the full implementation of an assistive smart home such as the classical problem of Human Activity Recognition (HAR) [2].

HAR is the concept of associating raw information directly from a large number of sensors of various types (research often focuses on certain types) to the basic actions constituting an Activity of Daily Living (ADL) [3]. The higher granularity we achieve, and the greater the services can be [4]. Many researchers think that the key to better HAR is the better exploitation of information using the particular properties of different natures (spatial, temporal, etc.). For example, Jakkula & Cook [5] exploited the temporal relationships between events created by the trigger of sensors and Augusto & al. designed a spatio-temporal inference engine for smart home [6]. Gesture recognition has always been seen as a method to design intelligent and efficient Human-Computer Interface (HCI) and was exploited for that in smart home research [7]. We think that in a broad sense, gestures give interesting information on the ongoing activity and could be exploited for HAR. Moreover, it could even tell if a particular step of an activity is performed correctly thus help us in establishing the current condition of the human resident. Gesture recognition in pervasive environments is mostly performed with video cameras [8] or accelerometers [9]. However, in the context of smart home assistance of persons with a cognitive deficit, invasiveness must be limited for the technology to be accepted [10], and in the case of some disease, such as Alzheimer, it can even worsen the state of the resident. On the other hand, the problem of gesture recognition based on accelerometers is that the person needs to wear a special item at all time. In this paper, we propose an efficient and accurate gesture recognition method based on passive RFID technology which is based on the objects used by the resident rather than invasive wearable technology. RFID technology possesses many advantages for our precise context. It is cheap (a passive tag cost only few pennies), but also non intrusive. Passive tags can basically be placed on any object of a smart home due to their small size. However, because tags do not possess any power source, the signal can vary a lot and is very noisy. It is why localization from the Received Signal Strength Indication (RSSI) of passive tags, despite being widely covered [11-13], still remain a difficult task in environments that are noisy (metal, humans, etc.) and that cannot be modified for that purpose. In the remainder of this paper, we first discuss the literature on both passive RFID localization and gesture recognition (section 2). In section 3, we describe an elliptic trilateration method that was used to track the smart home objects in real-time. The section 4 presents the model developed to recognize gestures under noisy condition and imprecision. The section 5 details the experiments that were conducted in a real smart home context and discusses the results that were obtained. Finally, the section 6 assesses the new model advantages and its limitation. A foreword on future work is given.

2. RELATED WORK Gesture recognition is a well-established field of research that traditionally focuses on HCI [8]. A gesture is widely described and recognized as an expressive and meaningful body motion (hand, face, arms, etc.) that convey a message or more generally, embed important information of spatio-temporal nature. Gestures are ambiguous and incompletely specified, since a multitude of conceptual information can be mapped to one gesture. The usual steps to perform gesture recognition from spatio-temporal data series are the following: 1. 2. 3. 4.

Segmentation Filtering of the data Limiting directions Matching

In many cases, however, the segmentation step is sometime ignored because the user specifies the start and the end of a gesture with a device or simply because it is assumed that it is known. A difficult challenge of segmentation is the support of gestures of varying length interleaved with small to big inactive time. The filtering is a straightforward step consisting in standardizing the data (time, format, etc.) and compensating for missing data. The step of the limitation of directions is generally very different from one application to another, but research really focuses on the final matching step. To do so, many approaches are based on statistical modeling such as the Hidden Markov Machine (HMMs) [14], the Kalman filtering or other particles filtering [15]. For instance, Samaria & Young [16] exploit HMMs to extract efficiently facial expressions from a single camera. The HMM is a double stochastic process with a finite number of states, and a set of random functions associated to each state. The transition between states has a pair of probabilities (the transition and the output probabilities). The model is said hidden because all that can be seen is a set of observations. The reasoning corresponds to the process of finding the HMM with the highest probability of explaining that set of observations. It is generally required to design and train one HMM per gesture that we desire to recognize [7]. Gesture recognition from particle filters based tracking is also very popular [8, 17]. For instance, Shan & al. [15] combined the technique with Mean shift to perform real time hand tracking. Their algorithm, named Mean Shift Embedded Particle Filter (MSEPF), was tested on a 12fps camera stream with a 240x180 pixels resolution. They showed that their method could robustly track a hand to recognize gestures. Particle filters are very effective in estimating the state of dynamic systems from sensors information. The key idea of these filters is to approximate the probability density distribution by a weighted sample set. Finally, a large number of gesture recognition approaches effectively exploited Finite State Machines (FSMs). For instance, Hong & al. [18] exploited spatial clustering to learn a set of FSMs corresponding to gestures. The idea was to learn the data distributions without the temporal information at first. The clustering extracted the different states to be used for a FSM. A second phase aligned the order of those states by exploiting the temporal information. They tested their approach using four sample gestures performed in front of a video camera. They achieved a hundred percent recognition rate, but admit that with a very noisy data sample, the recognition would fail. As you will see through section 4, we used FSMs for the gesture modeling, but our approach is not dependant on any of these methods (we could easily switch to HMMs).

The main problem with the models of the literature for our precise context with passive RFID in smart home is the many hard assumptions that are made. First of all, it is often assumed that obtaining the basic direction of the movement is straightforward. It is not the case with passive RFID tracking. Secondly, it is assumed that the amount of noise is not a problem (or that there is simply no noise). Thirdly, segmentation is often not an issue with HCI context; therefore, few models address this issue. Finally, they generally suppose that the user is cooperative; an intended recognition context. In our case, the recognition is done unbeknownst to the user (keyhole context) which means fewer definite gestures.

2.1 RFID localization & Gestures While gesture recognition with passive RFID technology is mostly absent from the literature on gesture recognition, a lot of researchers have worked toward the localization of tagged entities in real-time. A substantial part of them arises directly or indirectly from the well-known LANDMARC system [11]. That system introduced the concept of localization from references tags placed at strategic known location. Vorst et al. [12] is one of them. Their model uses passive RFID tags and an onboard reader to localize mobile objects in an environment. A prerequisite learning step is required to define a probabilistic model. This model is exploited with a particle filter (PF) achieves a precision of 20-26 cm. Another model, from Joho et al. [13], use reference tags in combination with different metrics. In particular, it is based on both the RSSI and the antennas' orientation to get an average localization error of 35 cm. The main problem of the literature on localization is that models largely rely on the large deployment of tags of references. While it is a fairly good solution for robot localization, it is not in smart home context where we want to limit the intrusiveness and the modifications of sensors. The approach of Chen et al. [19] is one of the few that exploits trilateration calculus (but with a different radio-frequency technology). They implemented a fuzzy inference engine with one variable that correlated the RSSI of an object transmitter to the distance separating it from a receiver. They achieved a precision of 119 cm.

2.1.1 Gesture recognition using RFID Due to the inherent difficulty of localizing objects with passive RFID technology, we found only one team of researchers that tried to tackle the challenge of gesture recognition with this technology. The team of Asadzadeh & al. [20] investigated the problem with a partitioning localization technique combined to reference tags. With three antennas on a desk, they monitored an 80cm by 80cm area, which was divided into 64 equally sized square cells (10cm by 10 cm). To recognize a gesture drawn by a user, they make few assumptions on the sequence of traversed cells. First, the system is fast enough to never miss any cell of the sequence; that is, the tracked object cannot move farther than one cell away in between two readings. Second, they assume that only forward local moves are possible. Fig. 1 below shows legal (a) and illegal moves (b-c).

Fig. 1. (a) legal move, (b) (c) illegal moves.

From the sequence of crossed cells, their algorithm generates a list of hypotheses by developing the possibilities into a tree structure. Next, a gesture matcher, GESREC, looks up into a dictionary and finds the gesture that best matches the sequence. Their algorithm cannot recognize two consecutive gestures (no segmentation) but works well (93% recognition) on a dictionary of twelve gestures. Their work showed that there is potential for gesture recognition with passive RFID. However, their assumptions made it difficult to apply their system in a smart home which requires more flexibility. Also, they work in an intended case while we perform keyhole recognition. In section 5C, we compare in more details our model to this one.

3. RFID LOCALIZATION The first thing to accomplish in order to recognize gestures with passive RFID technology is the localization of one or many objects in real time. To do so, we exploited a slightly modified version of our algorithm published previously [21] which rely on the received signal strength indication (RSSI). The goal of this paper is not to contribute to the problem of localization, which is actually extensively covered in the literature, but since it is an important difficulty for gesture recognition with passive RFID, we will discuss some of the challenges and the methods we used for this particular work. Our localization algorithm is basically an enhanced trilateration method which implements few filters to correct problems that cause imprecision in the position estimation. Keep in mind that we could have exploited any other localization algorithms with our gesture recognition method but ours is well adapted for smart home context and localization of objects.

3.1 Signal Preprocessing The challenge of localizing tagged objects in real-time can be divided into two parts: the preparation and the localization. A first step that is often neglected is the configuration and installation of the basic tags. It is important to first understand that, even among tags that are technically identical, the sensitivity is sometimes very different. To address that problematic behavior, we take care of testing every tag before installing them on our objects and eliminate those that are too far from the average sensitivity. Another important aspect to take into consideration is the number of tags to put on the tracked object. A major issue of localization is the bad angles of arrival of radio wave on a tag. Therefore, covering more angles should ensure a better quality of information. Since our tags are cheap, we prefer to put four tags in different angles on a tracked object whenever it is possible. Our algorithm is developed to support a variable number of tags and combines the retrieved information to improve its calculation.

3.1.1 False Readings Due to the nature of RFID operation, a common problem is the false-negative readings (FNRs) or even sometime the falsepositive readings (FPRs). This problem is a hindrance for many applications such as gesture recognition so it is a good idea to limit its effect as much as possible. To address false reading, we propose an object state function, denoted where the definition is given in the equation (1). This function is based on the iteration number (an iteration being one reading at a fixed interval of time). (1) The function decides if a tag detection state ( ), a boolean variable, has changed by subtracting the first detection iteration ( ) of a sequence of the opposite state to the current iteration ( )

and comparing it with a . The is the minimum number of iterations the object's state needs to be stable before considering that its detected state has changed. This parameter can be found automatically if one knows the rate of false readings ( ). For example, supposes that you have that means you have roughly one of each four readings should not detect the tag. If you aim to have a your would be 4 since:

3.1.2 RSSI Unstability Another problem with passive RFID technology is the high degree of variation on the RSSI from an iteration to another. This lead to a signal that is very noisy and unpredictable. This problem can partially be corrected with a gaussian weighting average. We use it as follow. The bell-shaped curve of the distribution is centered on the current iteration number (see formula 2). The parameter is the iteration number associated with the RSSI record that we are weighting and the constant is determined proportionally to the iteration length. Thereafter, the mean weighted RSSI of a tag is computed by making use of the formula (3). (2) (3)

3.2 Elliptic Trilateration Trilateration method has been mostly ignored in the literature for passive RFID positioning because of the imprecision of the technology. However, for the reasons mentioned in section 2, most methods are not appropriate in the context of assistive smart homes. One problem of trilateration is that most of the time RFID antennas are directional and the circular equation does not match the sausage-shaped wave propagation. That is why we developed an alternative version using ellipses instead. In order to use it from the RSSI, we have to determine the length of each axis in real time. To do so, we collected a dataset and used the polynomial regression to obtain an equation for the major axis (4) and the minor axis (5). The correlation coefficients were high; respectively and for the major and minor axis. (4) (5)

From this equation, we can create ellipses around antennas that detect an object. In our setting, the tracked object is generally detected by four antennas, but we can pinpoint its position with only two (on the same) or three (on different walls). The idea of the trilateration is simply to compute the intersections of the ellipse to estimate the position. The algorithm has an average precision of 14cm. For more details on the localization algorithm, please refer to this paper [21]. The main change we made to the localization algorithm was to speed up the data collection from every 200ms to every 20ms. Gesture recognition is highly dependent on the amount of data available. The more we have, the easier it becomes to eliminate the noise with various processing.

4. GESTURE RECOGNITION In this section, we present a new gesture recognition method that takes as input the coordinates extracted from any localization algorithm. The new method is especially built to be flexible. In particular, one can use the method in another smart environment with a completely different RFID technology simply by updating

the average localization error. Moreover, the method is especially designed to support the imprecision of the localization and adapts automatically to the context. The following figure depicts the overall method that is presented through this section. Before reading the description of our model, the reader should keep in mind that the recognition of gesture in our context is not aimed to control or interact with the computer system as in the literature. We are trying to achieve recognition of gestures to obtain new information that could be used for human activity recognition, error identification, services delivery, etc. Moreover, we need to stress that we chose to recognize gesture from the movement of tagged objects especially for our context, but our method should be exploitable with wearable RFID bracelet.

opted for a well-established reasoning framework. As stated in the introduction, our long-term goal is to use gesture recognition for HAR and context awareness in order to provide assistive services in a smart environment. A QSR model enables to complete relationships information, and that would help us in making prediction of the next steps/actions of our resident. In the remaining of the section, we describe each part of our algorithm and describe how we integrated the QSR framework of Clementini et al. [22].

4.1 Data Compensation and Standardization Before trying to recognize the ongoing gesture, we have to standardize the data that is obtained from the localization algorithm. It is important to have a fixed rate of incoming position in order to know the movement of the tagged object. In our case, the rate is one position per 20ms or 50 per second. That is what we consider an iteration of our algorithm (which is also an observation point on the infinite spatio-temporal series representing the gesture). It is also sometime necessary to compensate for missing data through time. Indeed, for some unknown reasons, the localization algorithm might not be able to provide the current coordinates of the tracked objet. To do so, we practice a linear interpolation such that if to , the positions from iteration to , are unknown they can be computed with equation 6-7. These equations use the projection of the linear equation on each axis and divide the distance equally by the number of positions that are missing. (6)

Fig. 2. The overall gesture recognition method.

A first important thing to notice about our new method is the distinction between composite and atomic gestures. The literature often prefers to not distinguish between those two since in many cases (e.g. human computer interface) the atomic gestures are known or directly observable. In our imprecise context, recognizing these atomic gestures is as hard as recognizing a composite gesture. Another interesting aspect of our gesture recognition method is the exploitation of a qualitative spatial reasoning (QSR) framework to represent the data from Clementini et al. [22]. In their work, they constructed a formal model to express distance and orientation relationships. The model enables various granularity levels which suits well to our situation. The Fig. 3 shows three possible configurations. The first one, for example, allows to explicit two relations of distance; close (cl) and far (fr), and two relations of direction; left (l) and right (r).

(7)

4.2 Atomic Gestures One can see a gesture as a set of Cartesian positions during a time interval that correspond to a recognizable pattern. That is, a gesture is a spatio-temporal series. The process of recognizing a gesture corresponds to the matching of a definite number of imprecise observed positions at specific times. The imprecision with RFID technology means not only that sometime the observation can be off the real position, but also that during a certain interval, the tracked object might appear as going in a completely different direction. The Fig.4 shows an example of unlucky observations.

Fig. 4. Example of unlucky observations leading to a false conclusion. Fig. 3. Three configuration examples for the QSR framework [22].

There are many advantages of using qualitative data representation over quantitative. For instance, it reduces the possibility and the complexity of the recognition. However, it is mainly for future applications of the recognized gestures that we

As we discussed before, our recognition method is divided into two logical parts (atomic and composite gesture). For us, the atomic gesture is the smallest part of a global gesture which no more can be subdivided as a combination of atomic gesture. We chose this strategy since recognizing the direction of an object with passive RFID localization is already challenging. The

literature on gesture recognition in software and in HCI often ignores this step since they assume (rightly) that it is straightforward to obtain. Therefore, for us, an atomic gesture is the smallest significant basic direction that can be extracted from the noisy positions.

4.2.1 Vector Summation The first step to the recognition of atomic gestures is to transform the data to a more flexible format. Indeed, rather than representing gestures using absolute Cartesian positions it is best to characterize them with format that well represent the movement. For this, the incoming positions are transformed into vectors relative to the previous position. Thus, if is the latest position, the new information will be the vector given by: (8) The main advantage of using relative information instead of absolute position is that the system does not depend upon the context (Cartesian space and granularity) to perform the gesture recognition. The second step consists of finding if there is an ongoing atomic gesture and determining what it is from the incoming data. We tried to perform this step with mathematical regressions to find the equation that approximate the data. This method worked correctly, but difficulties arise if there are many objects to track in real time. In this paper, we suggest using instead the summation of all collected vectors to determine the current atomic gesture. The method is simple and computation friendly since it means only one addition per iteration or constant time . The major difficulty that arises is with the atomic gesture segmentation which is explained in subsection C.

4.2.2 Qualitative Spatial Reasoning The next and the final step of the atomic gesture recognition is to associate the extracted vector to the current atomic gesture. The atomic gestures are a number of qualitative directions that are exploited with the well-known qualitative spatial reasoning framework of Clementini & al. [22]. In our implementation, the frame of reference is the origin of the resulting vector. The number of qualitative orientation is set to eight, but the framework is made to be able to scale this easily. These eight choices are that stand respectively for: East, NorthEast, North, NorthWest, West, SouthWest, South, SouthEast. The distance values are specified with the average error of the localization algorithm ( . Therefore, an atomic gesture is a pair ( . The Fig. 5 shows an example where the gesture would be .

Fig. 5. A sample vector (in red) in the QSR framework

Additionally, in a smart environment, the tracked objects are often not being used to perform a gesture so we have to be able to identify this situation. To do so, it is considered that the object is idle whenever the vector is under ,. The main advantage of this method is that the algorithm can scale automatically to new smart environments and/or new localization algorithm without needing any learning or configuration. The only thing needed is the average positioning error . We chose to exploit QSR framework for three reasons. First, it is easy and intuitive to build composite gesture from qualitative base. Second, such relationships can be straightforwardly enhanced with fuzzy sets to address partially the inherent imprecision. Third, in the future, we aim to use gestures to help recognize the activities of daily living (ADLs).

4.3 Gesture Recognition In our context, the gesture recognition consists to recognize a set of atomic gestures and identify them to a composite gesture drawn from a knowledge base. This part is done in two steps: segmentation and gesture matching. An important point must be kept in mind, since the recognition in made in real time (online), the decision is revised every iteration. In other words, the atomic gesture recognition module keeps in a list the sequence of finished and confirmed atomic gestures but the latest one can change until the segmentation module confirms a new gesture started.

4.3.1 Segmentation The problem of segmentation is to find the point of rupture into the spatio-temporal series that constitutes the individual movement in a direction. To do so, we exploit the average positioning error ( ) combined with the direction information. The algorithm monitors the variation in distance ( of the vector representing the atomic gesture. If the value of is increasing on average, the gesture is either the same, or it has changed for a similar direction (a neighbor direction). To valid that, the segmentation module split the data into two equal sets and recompute the atomic gesture for both parts (as explained before). If the second part does not result in the same atomic gesture, the algorithm knows a new gesture is ongoing. It then sends a message to the atomic recognition module to make its decision final. The module will then try to find approximately from what iteration the gesture has ended (see algorithm 1). ProjectionLength function returns the length of the projection of the vector on the center axis of its associated direction. Input: The set of vectors V[] and the first atomic gesture G1 Output: The iteration of segmentation Si initialize two temporary atomic gesture TG1 and TG2 initialize an empty table of real : Score[length(V[])/2] for i = length(V[])/2 to length(V[]) compute TG1=atomic(V[1 to i]) compute TG2=atomic(V[i to length(V[])]) if TG1 same direction G1 then p1=ProjectionLength(TG1, direction(TG1)) p2=ProjectionLength(TG2, direction(TG2)) compute p1+p2 add to Score[i] else set Score[i] to 0 end end find highest Score[] affect position to Si return Si Algorithm 1 The segmentation process.

The second situation is when the distance no more changes over time (in average). In that case, the segmentation module sends directly a message to the atomic recognition module that the tracked object is now probably idle. The algorithm 1 is used again to determine the end of the gesture. The final situation is when the length of the vector decrease over time. In that case, the new atomic gesture direction is opposed to the latest. The process is the same again. Finally, if the algorithm wrongly segmented the spatio-temporal series, the next atomic gesture will be the same as the previous one. In that situation, the decision is revised and the segmentation is canceled.

4.3.2 Gestures Matching The final part of our method consists to match the list of identified atomic gestures to the composite gestures in the dictionary. For this part, the literature proposes a variety of methods developed through years of research [8, 18]. For this work, we preferred to keep that part simple as it is not the main challenge to gesture recognition from RFID. Once we are able to find basic directions and to perform segmentation, we can rely on standard methods. Our gesture dictionary is a set of finite state machines representing each gesture. The selected ongoing gesture is the state machine that matches the sequence of atomic gestures identified. An alternative method could be to exploit Hidden Markov Machines (HMMs) [17] for the dictionary. The advantage would be to represent the differences of probability for each gesture. With both methods, the gestures models could be learned, but in our case, it did not seem necessary. Each FSM was directly designed by us, since they are fairly small.

We conducted three sets of experiments for this work. The first one used randomly generated gestures with a simulated noise. We describe in the next subsection the construction of the simulator, and the parameters used to test the method. The two other set of experiments were conducted by simulating the gesture with a human subject directly in our smart home. In that case, we decided to focus on the kitchen which comprises four RFID antennas that cover approximately 9 (depending on their parameter 1 to 3 m around each of them). These antennas are APATCH-0025 working on the 860 - 960MHZ band and are circularly polarized for a better indoor GSM coverage. We set the emission speed at 20ms, and each of the four antennas has a divided time slice. The data collection, is however, simultaneous (every 20ms). They are working near their maximum sensitivity and emission power. We use class 3 tags and we put 2 to 4 on each object to improve localization (bad angle of arrival is the worse issue to tracking in real time). The gestures to be recognized by our algorithm are the same twelve of Asadzadeh et al. [20] plus the idle gesture that we added to represent the fact that most of the time the objects do not move. The Fig. 7 shows these composite gestures (8 are composed of two atomic gestures).

5. EXPERIMENTS In order to valid our new model, we needed to design experiments that would reflect our ultimate goal of exploiting gesture recognition in pervasive environment, but also to compare our work with the literature. As a first step toward recognition of gesture with passive RFID technology, we decided to directly reproduce the methodology of Asadzadeh et al. [20] but in an appropriate setting. While they implemented their system on a small table with three antennas and 91 special references tags, we performed our experiments in a full-size apartment comprising a large number of sensors. The said smart environment is depicted by Fig. 6.

Fig. 6. The LIARA’s smart home

Fig. 7. Example gestures used for the experiments. Eight are composed of two atomic gestures, four of only one. The last on the picture is Idle.

5.1 Gesture Recognition For the first set of tests, we developed a gesture generator that simulates a tracked object in an environment. The simulator is based on a set of parameters selected by the user. Amongst them there are the readings cycle in ms, the average positioning error, the duration of the gesture (with a random variable or not) and the speed of the object in meter per second. We used the parameters of the Asadzadeh et al. [20] experiments (gestures average 4.5s at 0.2m/sec). The generator first randomly selects the gesture to generate (or idle). Then, it starts from the latest known real position (if the first gesture, the origin). Depending on the chosen gesture and on the parameters, it then computes the real location of the object. For instance, from the origin at 0.2m/s if the chosen gesture is R, the object would be at after 1 second. The next step is to add noise to the position. To do so, it simply uses the entered average error and generates a random number for each coordinate ranging from to . In our case, the value is between -14cm to 14cm. We let our generator work for around 2000 gestures generated randomly, and we obtained very positive results (94% success). Table 1 details the results that were obtained from this set of tests. The most important thing to notice is that 63% of the time when our algorithm does not take the right decision, it classifies the gesture as Idle. That is the main difficulty is not to distinguish among gesture, but to know if it is simple noise or a gesture. It happens when we are unlucky, as a consequence of the added random noise (see Fig. 4).

Idle F B L R LF LB BR RB FL FR BL RF

True Positive 184 161 136 138 127 162 142 139 126 151 155 139 142 1902

False Negative 7 5 0 8 3 4 13 14 14 13 9 13 14 117

mentioning is that by using the qualitative framework and, more precisely, the average error as a parameter, there is no dependency between recognition rate and localization. Rather, the dependency is moved to granularity versus localization precision. That is, even with half the precision of the localization algorithm, the recognition rate would have been very similar. However, the granularity of the gestures would have halved at the same time. While the method is not dependant on the precision, for the recognized gestures to be useful a minimal granularity is required.

False Positive 74 1 2 1 1 12 7 3 3 1 6 5 1 117

Table 1. The results obtained from the simulation.

5.2 Experimental Protocol Since our first set of tests was positive, we decided to implement the algorithm directly in the smart home and test it with a human subject. The protocol was similar to Asadzadeh et al. [20]. We asked the human subject to perform the gestures following the order they are presented on Fig. 7 and we verified by the same occasion if the Idle time in between was recognized. However, in their work, Asadzadeh et al. [20] performed gestures at an average of 0.2m/s for 4.5s which gives traveling distances of around 90cm. We chose to reduce that average at 80cm to fit our kitchen counter. Moreover, we decided to perform our tests two times. In a first time, we asked the human subject to take approximately 4s (the human had to guess by himself). In the second set of tests, we wanted to see if we could track rapid gestures, so we asked the subject to perform the gestures in less than 2 seconds. We did not discuss it before, but since our algorithm formulates a hypothesis each iteration, it also needs a way to take a final decision for the evaluation process. We could have simply chosen to take the last decision as the final gesture but if the segmentation is not done at the right time, that could lead to bad choice at the end. We decided to use a Gaussian weighing function that would give more weight to the latest decision but would enable to make an average of the decision. Using this decision process and the methodology described, we obtained the following results from performing each gesture 10 times (Table 2).

5.3 Analysis of Experimental Results The results we obtained are really encouraging for the future of this new research topic. Within a real smart environment, we obtained about 91% recognition accuracy for gesture of 4s and 78% for those of 2s. Moreover, again, most of the time when the decision is incorrect, the sequence is classified as Idle. Another interesting statistic we obtained is that after half the execution of the gesture, around 80% of the time the algorithm is correct (for the 4s gesture). The main reason we obtained results slightly lower for the 2s gestures is that we have less data and when unlucky, both the recognition and the segmentation are near impossible to perform correctly. An important information worth

The results fare well in comparison with the work of Asadzadeh et al. [20], which in our knowledge was the first work of gesture recognition with passive RFID. They did obtain 93% of accurate recognition, but there are several factors to take into account. First of all, while they performed their test in an environment specifically optimized for this purpose, we did ours in realistic smart home context, which results in an increased amount of noise. For instance, we performed our gesture in between a stainless stove and a sink. It is well-known that metal is a big disturbance in wave propagation and can lead to significantly higher imprecision or to put it simply: unpredictable behaviors. This is also reflected by the higher precision of their localization algorithm (10cm vs 14cm). Secondly, they do not tackle the difficult challenge of segmentation. In their paper, they chose to simply suppose the beginning and the end of the gesture is known. That approach is realistic with some technology (such as a mouse, or any device with an interaction button), but is not with passive RFID. For the same reason, they make the assumption (and use it in their model) that the tag never remains stationary for a long period. Finally, their localization algorithm was more stable and precise than ours. We could not exploit their localization method because it is based on the large deployment of reference tags in the environment. As we explained in section 2, our context of assistive smart home forces us to try to avoid as much as possible the invasiveness of the installation of sensors and other technologies.

6. LIMITATIONS AND FUTURE WORK While we are very positive about the results of our first algorithm created for the purpose of recognizing gestures in real time with passive RFID technology, many challenges await researchers toward a full-scale use of this technique. First of all, our recognition was indeed impressive, but the composite gestures are very simple at most being made of two atomic gestures. To exploit gesture for human activity recognition and for assistance in smart home, we have to be able to recognize series of atomic gestures and considerably more complicated gestures. We see this as the next step of our investigation on this topic. We also discovered that in real utilization context our localization algorithm has sometime strange behaviors. While the average estimated error of 14cm is correct, we discovered that the error often follows some patterns (and thus is not random). These patterns generally follow a direction and made difficult for our algorithm to detect the opposite atomic gesture. We are also limited by the lack of a third dimension. Localization in 3D would enable many new interesting gestures, and we think it is mandatory for an implementation in the real-life deployment of a

R

L

F

B

FR

BL

FL

BR

RF

LB

LF

RB

Idle

4s

10

8

9

9

9

8

8

9

10

9

10

10

9

91%

2s

9

9

6

6

8

6

6

6

8

9

10

9

10

78%

Table 2. The results from the two set of composite gestures simulated by a human subject.

Total

smart home. There are already some work that tries to address this issue, but we encourage fellow scientists to pursue the research on passive RFID localization. Finally, as a future work, we plan to extend this work to detect large sequence of gestures (atomic or composite) with simultaneous objects exploited in an activity scenario. To this purpose, we are going to recruit around ten human subjects that will perform the activity.

[8]

[9]

7. CONCLUSION In this paper, we presented an efficient gesture recognition method that works with passive RFID technology. Our method is flexible by adapting automatically to new localization schemes requiring only the average error as a parameter. The method is based on a well-known qualitative spatial reasoning framework to represent distance and orientation relationship. We have shown that the method gives promising results through three set of experiments done at the LIARA laboratory. Additionally, the gestures were performed in a realistic smart home environment by a human subject. We believe gesture recognition with passive RFID as much potential and that more research needs to be done on that topic. We believe RFID technology is one of the key technologies for assistive smart homes because it is non intrusive and cheap. With better extraction of knowledge from the technology, such as gestures, we could achieve better HAR and better services delivery in pervasive environments.

[10]

[11]

[12]

[13]

8. ACKNOWLEDGMENTS We would like to thank our main financial sponsors: the Natural Sciences and Engineering Research Council of Canada, the Quebec Research Fund on Nature and Technologies, the Canadian Foundation for Innovation.

9. REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

K. Bouchard, B. Bouchard, and A. Bouzouane, "Guideline to Efficient Smart Home Design for Rapid AI Prototyping: A Case Study," in International Conference on PErvasive Technologies Related to Assistive Environments, Crete Island, Greece, 2012. J. C. Augusto and C. D. Nugent, "Smart homes can be smarter," in Designing Smart Homes: Role of Artificial Intelligence. vol. 4008, ed Berlin: Springer-Verlag Berlin, 2006, pp. 1-15. D. J. Patterson, D. Fox, H. Kautz, and M. Philipose, "FineGrained Activity Recognition by Aggregating Abstract Object Usage," presented at the Proceedings of the Ninth IEEE International Symposium on Wearable Computers, 2005. K. Bouchard, B. Bouchard, and A. Bouzouane, "Spatial recognition of activities for cognitive assistance: realistic scenarios using clinical data from Alzheimer’s patients," Journal of Ambient Intelligence and Humanized Computing, pp. 1-16, 2013/09/13 2013. V. Jakkula and D. J. Cook, "Mining Sensor Data in Smart Environment for Temporal Activity Prediction," in KDD'07, San Jose, California, USA, 2007. J. C. Augusto, J. Liu, P. McCullagh, and H. Wang, "Management of uncertainty and spatio-temporal aspects for monitoring and diagnosis in a Smart Home," International Journal of Computational Intelligence Systems vol. 1, pp. 361 - 378, 2008. T. Westeyn, H. Brashear, A. Atrash, and T. Starner, "Georgia tech gesture toolkit: supporting experiments in

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

gesture recognition," in Proceedings of the 5th international conference on Multimodal interfaces, 2003, pp. 85-92. S. Mitra and T. Acharya, "Gesture recognition: A survey," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 37, pp. 311-324, 2007. J. Liu, L. Zhong, J. Wickramasuriya, and V. Vasudevan, "uWave: Accelerometer-based personalized gesture recognition and its applications," Pervasive and Mobile Computing, vol. 5, pp. 657-675, 2009. K. Mäkelä, S. Belt, D. Greenblatt, and J. Häkkilä, "Mobile interaction with visual and RFID tags: a field study on user perceptions," presented at the Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, San Jose, California, USA, 2007. L. M. Ni, Y. Liu, Y. C. Lau, and A. P. Patil, "LANDMARC: Indoor Location Sensing Using Active RFID," ACM Wireless Networks, vol. 10, pp. 701-710, 2004. P. Vorst, S. Schneegans, Y. Bin, and A. Zell, "SelfLocalization with RFID snapshots in densely tagged environments," in Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, 2008, pp. 1353-1358. D. Joho, C. Plagemann, and W. Burgard, "Modeling RFID signal strength and tag detection for localization and mapping," presented at the Proceedings of the 2009 IEEE international conference on Robotics and Automation, Kobe, Japan, 2009. F.-S. Chen, C.-M. Fu, and C.-L. Huang, "Hand gesture recognition using a real-time tracking method and hidden Markov models," Image and vision computing, vol. 21, pp. 745-758, 2003. C. Shan, Y. Wei, T. Tan, and F. Ojardias, "Real time hand tracking by combining particle filtering and mean shift," in Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on, 2004, pp. 669-674. F. Samaria and S. Young, "HMM-based architecture for face identification," Image and vision computing, vol. 12, pp. 537-543, 1994. L. Bretzner, I. Laptev, and T. Lindeberg, "Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering," in Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on, 2002, pp. 423-428. P. Hong, M. Turk, and T. S. Huang, "Gesture modeling and recognition using finite state machines," in Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, 2000, pp. 410-415. C. Y. Chen, J. P. Yang, G. J. Tseng, Y. H. Wu, and R. C. Hwang, "An Indoor positioning technique based on fuzzy logic," in MultiConference of Engineers and Computer Scientists, Hong Kong, 2010. P. Asadzadeh, L. Kulik, and E. Tanin, "Gesture recognition using RFID technology," Personal and Ubiquitous Computing, vol. 16, pp. 225-234, 2012/03/01 2012. D. Fortin-Simard, K. Bouchard, S. Gaboury, B. Bouchard, and A. Bouzouane, "Accurate Passive RFID Localization System for Smart Homes," presented at the 3th IEEE International Conference on Networked Embedded Systems for Every Application, Liverpool, UK, 2012. E. Clementini, P. D. Felice, and D. Hernández, "Qualitative representation of positional information," Artificial Intelligence, vol. 95, pp. 317-356, 1997.

Suggest Documents