Multi-robot surveillance through a distributed sensor network

Multi-robot surveillance through a distributed sensor network Andrea Pennisi, Fabio Previtali, Cristiano Gennari, Domenico D. Bloisi, Luca Iocchi, Fra...
Author: Norah Davidson
1 downloads 0 Views 2MB Size
Multi-robot surveillance through a distributed sensor network Andrea Pennisi, Fabio Previtali, Cristiano Gennari, Domenico D. Bloisi, Luca Iocchi, Francesco Ficarola, Andrea Vitaletti, and Daniele Nardi Department of Computer, Control, and Management Engineering Sapienza University of Rome via Ariosto 25, 00185 Rome, Italy @dis.uniroma1.it

Abstract. Automatic surveillance of public areas, such as airports, train stations, and shopping malls, requires the capacity of detecting and recognizing possible abnormal situations in populated environments. In this book chapter, an architecture for intelligent surveillance in indoor public spaces, based on an integration of interactive and non-interactive heterogeneous sensors, is described. As a difference with respect to traditional, passive and pure vision-based systems, the proposed approach relies on a distributed sensor network combining RFID tags, multiple mobile robots, and fixed RGBD cameras. The presence and the position of people in the scene is detected by suitably combining data coming from the sensor nodes, including those mounted on board of the mobile robots that are in charge of patrolling the environment. The robots can adapt their behavior according to the current situation, on the basis of a Prey-Predator scheme, and can coordinate their actions to fulfill the required tasks. Experimental results have been carried out both on real and on simulated data to show the effectiveness of the proposed approach. Keywords: Mobile Robots; Wireless Sensor Networks; Multi-Robot Systems; Multi-Robot Surveillance

1

Introduction

A critical infrastructure (CI) is a system which is essential for the maintenance of vital societal functions. Public areas, such as airports, train stations, shopping malls, and offices, are examples of CIs that can be a target for terrorist attacks, criminal activities or malicious behaviors. Usually, CIs are monitored by passive cameras with the aim of detecting, tracking, and recognizing objects of interest to understand and prevent possible threats. However, traditional vision-based systems can result ineffective when dealing with realistic scenarios, since their passive sensors can fail in identifying and tracking an object of interest in a large environment, due to partial and total occlusions, changes in illumination conditions, and difficulties in re-identifying objects in different non-overlapping views. Moreover, a network of fixed passive sensors can be subject to malicious physical attacks [11].

2

Multi-robot surveillance through a distributed sensor network

Fig. 1. The proposed architecture, combining mobile robots, fixed RGBD cameras, and RFID tags and receivers to monitor a populated environment.

Contributions of the Book Chapter. In this chapter, the problem of monitoring a populated indoor environment is faced by combining data coming from multiple heterogeneous fixed and mobile sensors. The term “populated” is used through the book chapter to denote an environment with presence of people. In our description, we do not take into account crowded or densely-populated environments. We describe the development of an architecture designed for the surveillance of a large scenario, where authorized personnel wear Radio Frequency Identification (RFID) tags and the environment is monitored by fixed RGBD cameras with RFID receivers and it is patrolled by multiple mobile robots, equipped with laser range finders and RFID receivers (see Fig. 1). Laser scans, RFID tag data, and RGBD images gathered by the distributed sensors are merged to obtain information about the position and the identity of people in the scene. Moreover, the robots coordinate their actions through a dynamic task assignment to fully cover the operational environment. The architecture is conceived to work in a fully distributed fashion and to automatically raise alarms (possibly communicated to a central operational station) when abnormal conditions are detected. To this end, the developed architecture integrates different technologies. Although all the above technologies have been already developed in previous works, their integration was not previously considered, in particular for surveillance applications. Moreover, an experimental analysis, carried out also in a real environment, shows the effectiveness of the implemented system. The remainder of the chapter is organized as follows. Related work is analyzed in Section 2, while the definition of the problem is given in Section 3. The components of the architecture are described in Section 4 and the process of fusing the information coming from the different sensors is described in Section 5. The multi- robot coordination and the task assignment processes are detailed in Section 6. Results on both a real and a simulated environment are discussed in Section 7. Conclusions and future directions are drawn in Section 8.

Multi-robot surveillance through a distributed sensor network

2

3

Related Work

There exists a large literature about the problem of people detection in indoor environments by using fixed cameras. However, since a variety of factors, including illumination conditions, occlusions, and blind spots, limit the capacity of pure vision-based systems, it is possible to consider a combination of multiple heterogeneous sensors to achieve better results. Approaches integrating multiple sensors can be divided into two main categories: 1) interactive methods, where each person has an active role during the detection process (e.g., by dressing an RFID tag) and 2) non-interactive methods, where the role of the person is passive and the analysis is carried out by the detection system only (e.g., a camera). In the following, some examples of interactive and non-interactive methods are described. Interactive Methods. One of the first experiments about collecting information from a group of people in a physical real context is described by Hui et al. in [8]: 54 individuals attending to a conference, dressed with an Intel iMote device consisting of a micro-controller unit (MCU), a Bluetooth radio and a flash memory, are considered. However, the choice of using Bluetooth does not allow a fine-grained recording of social interactions, mainly because of the missing possibility of analyzing face-to-face interactions. Multiple projects focusing on collecting data from social interactions are developed by the SocioPatterns collaboration. Partners participating in this collaboration have been the first to record fine-grained contacts by using active RFID sensors. This kind of devices allows to record face-to-face interactions within a range of 1.5 meters. For example, Becchetti et al. in [2] describe an experiment in which data coming from wireless active RFID tags worn by 120 volunteers moving and interacting in an indoor area are collected. The tags periodically broadcasts information about contacts with similar tags (i.e., whenever the person wearing the tag came close to another member of the volunteer group). Assuming that the subjects wear the tags on their chest and using very low radio power levels, contacts between tags are detected only when participants actually face one another, since the body effectively acts as a shield for the sensing signals. Thus, it is reasonable to assume that the experiment can detect an ongoing social contact (e.g., a conversation). SocioPatterns has made several installations in different social contexts, including conferences [1], hospitals [9], primary schools [16], and a science gallery [4], making some data sets publicly available on its website1 . Experiments similar to the SocioPatterns’ ones have been conducted deployed by Chin et al. [5], consisting in monitoring people wearing active RFID badges during a conference. The goal is to build a system that can find and connect people to each other. A remarkable result of the experiment is that, for social selection, more proximity interactions lead to an increased probability for a person to add another as a social connection. While the above approaches target the analysis of social human behaviors, in this book chapter we investigate the use of data acquired from interactive tags 1

http://www.sociopatterns.org/datasets

4

Multi-robot surveillance through a distributed sensor network

for surveillance applications. Indeed, we aim at integrating the SocioPatterns sensing platform together with other sensing technologies, including laser range finders and RGBD cameras, to overcome the problems related to traditional automatic surveillance. It is worth noticing that a scenario in which 1) authorized personnel wear RFID tags and 2) other authorized actors (e.g., visitors, travelers, spectators) may have an RFID tag as well (e.g., included in a ticket or a passport or a boarding pass) is a quite plausible one. Airports, embassies, and theaters are examples of scenarios where interactive methods can be used. Non-Interactive Methods. Approaches in this category are based on passive sensors. Since the literature on vision-based systems is huge, we limit our description to existing approaches using technologies other than vision for addressing automatic surveillance. In the field of laser-based systems, Cui et al. in [6] introduce a feature extraction method based on accumulated distribution of successive laser frames. A pattern of rhythmic swing legs is used to extract each leg of a person and a region coherency property is exploited to generate an efficient measurement likelihood model. A Kalman and a Rao-Blackwellized Monte Carlo data association (RBMC-DAF) filters are combined to track people. However, this approach is not effective for people moving quickly or partially occluded. Xavier et al. in [17] describe a feature detection system for real-time identification of lines, circles, and legs from laser data. Lines are detected by using a recursive line fitting method, while leg detection is carried out by taking into account geometrical constrains. This approach cannot handle scan data of a dynamic scene including moving people or not well separated structures. A solution involving human-robot interaction is presented by Shao et al. in [14]. Visual and laser range information are combined: Legs are extracted from laser scans and, at the same time, faces are detected from the images of a camera. A mobile robot uses the detection procedure (that returns the direction and the distance of surrounding people) to approach and to start interacting with humans. However, the swinging frequency is too low for people detection and tracking. In the above cited papers, the main limitation concerns the problem of detecting multiple people. In most cases, the approaches can deal with well separated objects, but cannot be easily extended when multiple people are grouped together. In this book chapter, we propose an approach that can be used in a populated environment and that is suitable for monitoring groups of people. The method combines interactive and non-interactive heterogeneous sensors in order to overcome the problems of traditional vision-based systems. Information coming from range finders, RFID receivers, and RGBD cameras are merged to obtain the position and the identity of people in the scene. Moreover, the actions of the robots are coordinated according to a dynamic task assignment algorithm, in order to a have a dynamic monitoring range.

3

Problem Definition

The problem of monitoring a populated environment can be modeled as a PreyPredator game. Indeed, considering the sensor nodes as predators and the objects

Multi-robot surveillance through a distributed sensor network

5

Fig. 2. (a) RFID tag. (b) Turtlebot robot equipped with a laser range finder and an RFID receiver. (c) Fixed RGBD camera with RFID receiver.

to be monitored as preys, it is possible to formalize the surveillance task as follows: A predator tries to catch preys and a prey runs away from predators. The game consists of preys and predators living in the same environment. It is usually defined as a game where both predators and preys have a score and any individual can gain or lose points over time. A metric distance is assigned to each prey and to each predator as the game score. The goal for each prey is to maximize its distance from the predators, while each predator aims at minimizing its distance from the preys. In our setting, the preys are the people moving in the monitored environment, while the predators are the sensor nodes that are used for detecting the presence and for estimating the position of a person. A sensor node is made of an RFID reader and other additional sensors, like an RGBD camera or a laser range finder. Moreover, some sensor nodes are mounted on mobile robots that navigate in the environment. For such a reason and for the presence of blind areas also, the portion of the environment that is currently observable can vary over time. The monitoring task consists of identifying every person that does not wear an RFID tag by assigning her/him an identity number (ID). The goal of the monitoring task is achieved whenever a sensor node can detect the presence and the position of a person, determining if such a person is wearing or not an RFID tag. Formulating the surveillance task as a Prey-Predator game provides the following advantages: 1) In the case of a person leaving the monitored area and then re-entering later, the re-identification problem is not an issue, since if a person was labeled with an ID i before exiting the scene, when she/he re-enters the scene the system can use another ID j (i 6= j) and continue its process of determining if j is wearing or not an RFID tag; 2) The same performance metric defined for the Prey-Predator game can be used for evaluating our approach, providing quantitative results (see the experiments reported in Section 7).

4

Sensor Nodes

The proposed monitoring approach uses a combination of multiple heterogeneous fixed and mobile sensor nodes. Authorized people wear RFID tags of the type shown in Fig. 2a. Mobile nodes are robots equipped with a laser range finder and

6

Multi-robot surveillance through a distributed sensor network

an RFID receiver (Fig. 2b), while fixed nodes are made of RGBD cameras to grab visual 3D information and RFID receivers that are mounted near the camera (Fig. 2c). The communication between mobile and fixed sensors is achieved by using TCP/IP over a wireless network. This is a feasible solution, since the size of exchanged messages among nodes is quite small (up to 1 KB) and the possibility to either lost a message or receive a delayed message is negligible. 4.1

RFID Tags and Receivers

The two main entities of our sensing platform, designed and developed by the SocioPatterns research collaboration, are the OpenBeacon active RFID tags (Fig. 2a) and the OpenBeacon Ethernet reader (top right in Fig. 2b). The tags are electronic wireless badges equipped with a PIC16 micro-controller (MCU) and an ultra low power radio frequency transceiver. The MCU has a total SRAM of 256 bytes and can work up to 8 MHz of frequency, while the transceiver has very low energy consumptions: 11.3 mAh in transmission at 0 dBm of output power and 12.3 mAh in reception at 2 Mbps of air data rate. They are powered by batteries ensuring a lifetime of more than two weeks and are programmed to periodically broadcast beacons of 32 bytes at four different levels of signal strength: 0, −6, −12, −18 dBm. Every beacon contains the tag identifier, the information about the current signal strength, and other fields useful for debugging. Similarly, the RFID reader has a transceiver as well and an omni-directional covering range of 10 meters. The whole sensing platform is designed to allow the RFID receivers to collect the data sent by each tag via the wireless channel. In our scenario, a receiver is mounted on each robot and it is used to read the signal strength and the ID of a tag, in order to establish if a person detected in the environment is actually wearing a tag. All data collected by RFID readers are forwarded to a central logging server2 , that stores all messages in log-files. Each record contains information about the tag whom sent the packet, including its ID, the signal strength, the sequence number and the IP of the reader that collected the corresponding message. 4.2

Laser Range Finders

The mobile sensor node is composed of a Turtlebot3 equipped with a range finder and an RFID receiver (Fig. 2b). Multiple robots are involved in the task of patrolling the environment. Each robot has a 2D metric map of the environment, that is built off-line using the ROS gmapping tool4 . Furthermore, each robot can be considered always well-localized on the 2D metric map by using the ROS implementation of the AMCL localization method5 . Person detection is carried out by means of a distance map, indicating the probability that a given point in the current laser scan belongs to the metric map. By comparing the distance 2 3 4 5

OpenBeacon Logger. https://github.com/francesco-ficarola/OpenBeaconLogger http://www.turtlebot.com/ wiki.ros.org/gmapping wiki.ros.org/amcl

Multi-robot surveillance through a distributed sensor network

7

Fig. 3. People detection using the laser range finder.

map with the metric map it is possible to extract the foreground objects, i.e., sets of points in the distance map that are far enough from the metric map points. From each foreground object the following features are extracted: the number of its points, their standard deviation, a bounding box, and the radius of the minimum enclosing circle (see Fig. 3). Then, the features are sent as input to an Ada-Boost based person classifier, trained with about 1800 scans. People tracking relies on the particle filter algorithm called PTracker, that is described in Sec. 5. Data association is used to determine the relationship between observations and tracks, and multiple hypotheses are maintained when observations may be associated to more than one track. Finally, each track is combined with the signal detected by the RFID receiver mounted on each robot, in order to verify if a person is wearing the RFID tag. 4.3

RGBD Cameras

The Microsoft Kinect (version 1.0) has been used as RGBD camera. Kinect sensor supplies an RGB image with a resolution of 640 × 480 and a frame rate of 30 frames per second. 3D information are received in the form of a 11-bit depth image. Both color and depth information are used for computing an accurate foreground detection. RGB and depth data are stored for each captured frame. A statistical approach, called Independent Multimodal Background Subtraction (IMBS) [3], is used to create the background model, that is updated every 15 seconds for dealing with illumination changes. The obtained foreground mask is used as starting point for a 3D clustering step. Let B denote the set of 3D points that corresponds to the 2D points belonging to the blobs in the foreground mask and C denote all the 3D points of the point cloud generated from the depth data provided by the Kinect (B ⊂ C). In order to improve the detection results, all the 3D points ∈ {C \ B} having a distance < 0.01 m from the points in B are recursively added to B itself. Then, to filter out possible false positives, all the blobs with a maximum height < 1.2 m are discarded, while the others are considered as valid observations. Finally, the 3D positions of the valid blobs are computed by estimating their Euclidean distance from the cameras. Indeed, since the positions of the cameras monitoring

8

Multi-robot surveillance through a distributed sensor network

Fig. 4. People detection using a RGBD camera.

the environment are known, people can be localized on the 2D metric map by averaging the 3D points belonging to the their blobs and calculating the distance of the average point from the camera. The above described steps are summarized in Fig. 4.

5

Data Fusion

Information coming from fixed and mobile sensor nodes needs to be merged. The data fusion process is made of two phases: 1) Obtaining tracks by fusing visual and laser data and 2) Merging the tracks with RFID receiver information. In the first phase, a multi-object particle filter approach, called PTracker, has been used in order to fuse and track the observations extracted from RGBD and laser data. A particle filter-based tracker maintains a probability distribution over the state of the object being tracked, keeping information about position, scale, color, direction and velocity of the object. Particle filters represent this distribution as a set of weighted samples (particles). Each particle represents a possible instantiation of the state of the object and it is a guess representing one possible position of the object being tracked. The set of particles contains more weight at locations where the object being tracked is more likely to be. This weighted distribution is propagated through time the Bayesian filtering equations, and the trajectory of the tracked object is determined by taking the particle with the highest weight or the weighted mean of the particle set at each time step. A detailed description of the data fusion method is available at http: //www.dis.uniroma1.it/~previtali/downloads/DataFusion.pdf. The out put of this phase is a set St = o1t , . . . , ont containing all the observations oit at time t, 1 ≤ i ≤ n, where n is the total number of the observations. In the second phase, St is merged with coming from the  the information RFID receivers at time t, the set Ut = id1t , . . . , idkt , where idkt is a triple < t, p, r >, with t being the identification number of the tag, p the pose of the receiver that detects the tag t, and r the detection range of the receiver. An observation oit ∈ St is associated to a triple (idtj =< tj , pj , rj >) ∈ Ut if all the

Multi-robot surveillance through a distributed sensor network

9

particles of oti (computed by PTracker) are included in the circular range having radius rj and center pj . After the merging phase, three different outputs can be generated: 1. zht =< oti , idtj > where oti has been merged with idtj ; 2. zht =< oti , ? > where no RFID data have been associated with the track oti ; 3. zht =

Suggest Documents