Mobile Robot Navigation using a Vision Based Approach

Mobile Robot Navigation using a Vision Based Approach Mehmet Serdar Guzel School of Mechanical and Systems Engineering Newcastle University Abstract ...
Author: Audra Cobb
3 downloads 0 Views 498KB Size
Mobile Robot Navigation using a Vision Based Approach

Mehmet Serdar Guzel School of Mechanical and Systems Engineering Newcastle University Abstract Autonomous robots operating in an unknown and uncertain environment must be able to cope with dynamic changes to that environment. For a mobile robot to navigate successfully to a goal whilst avoiding both static and dynamic obstacles is a challenging problem. This paper proposes a methodology based on visual servoing using a single Pan Tilt Zoom (PTZ) network camera mounted on the robot as the primary sensor for indoor mobile robot navigation. The aim for the robot is to carry out its task without any prior knowledge of its environment. Several different approaches to avoiding obstacles have been developed in recent years, some of which are computationally intensive. One approach, using a mapless landmark detection algorithm is receiving a lot of attention, principally because of its efficiency and robustness. This method will be described in depth, as it offers a potential solution. It is proposed to develop a vision based navigation system, using a single high quality PTZ camera to capture and process images in real-time. The system will be integrated into an existing mobile robot platform, which incorporates the Player/Stage architecture. The performance of the visual seroving based approach will be compared with existing obstacle avoidance sensors, which include a scanning laser and sonar array. KEY WORDS: Mobile Robot, Robot Vision, Navigation, Obstacle Avoidance, Object Recognition.

1. INTRODUCTION A number of potential markets are slowly emerging for mobile robotic systems. Entertainment applications and household or office assistants are the primary targets in this area of development. These types of robots are designed to move around within an often highly unstructured and unpredictable environment. Existing and future applications for these types of autonomous systems have one key problem in common: navigation (Althaus, 2003). Vision is one of the most powerful and popular sensing method used for autonomous navigation. When compared with other on-board sensing techniques, vision based approaches to navigation continue to demand a lot of attention from the mobile robot research community, due to its ability to provide detailed information about the environment, which may not be available using

combinations of other types of sensors. The past decade has seen the rapid development of vision based sensing for indoor navigation tasks. . For example, 20 years ago it would have been impossible for an indoor mobile robot to find its way in a hallway as cluttered as the one shown in Figure 1.1, and even now it still remains a challenge. Vision-based indoor navigation for mobile robots is still an open research area. Autonomous robots operating in an unknown and uncertain environment have to cope with dynamic changes in the environment, and for a robot to navigate successfully to its goals while avoiding both static and dynamic obstacles is a major challenge. Most current techniques are based on complex mathematical equations and models of the working environment (DeSouza et al, 2003), however following a predetermined path may not require a complicated solution, and the following proposed methodology should be more robust. The proposal is to use novel obstacle detection and collision avoidance algorithms to address the problem of indoor mobile robot navigation by adopting a centre following method, without the use of prior environmental information. Visual feedback information will be provided using a single pan-tilt-zoom (PTZ) camera with which to acquire both dynamic and static environment information. Sensors, including sonar, position sensing device (PSD), laser rangefinder, radar, and camera can all be used to acquire dynamic information. The sonar sensor is cheap but suffers from wide beam problems and poor angular resolution. The laser rangefinder and radar provide better resolution but are more expensive, and have difficulty detecting small or flat objects on the ground as well as being unable to distinguish between differences types of ground surfaces which can in many instances be easily detected using colour vision (Ahn et al, 2008). Although the accuracy of distance measurement using a camera is lower than that obtained from range-based sensors, various techniques for acquiring the moving images from a PTZ camera are proposed. The mapless navigation technique and developed methodologies resemble human behaviours more than other approaches, and it is proposed to use a reliable vision system to detect landmarks in the target environment and employ a visual memory unit (VMS) in which the learning processes will be achieved using a Neural Network. One of the most important aims of this study is to improve the vision algorithm and control algorithms for dynamic obstacle avoidance. To satisfy above aim, several objectives have been identified: • Carry out a comprehensive review of current vision based methodologies to establish the current state of the art. • Design camera based autonomous mobile robot system. • Design and Implement of the software and control architecture of the desired system. • Implement a powerful image processing and computer visual library for object recognition and feature tracking tasks. • Develop a reliable system for detection of obstacles. • Develop a novel algorithm to recognize dynamic moving obstacles and execute collusion avoidance in real time.

2. LITERATURE REVIEW In recent years there has been an increased interested in visual based navigation and it is accepted as being more robust and reliable than other sensor based navigation systems. Vision feedback is one of the most important sensing methods for navigation based tasks, and according to DeSouza (2003) the strides made in vision systems have been significant. It is essential for a vision based navigation system to incorporate within it some knowledge of its environment (Moravec, 1980 & Nilsson, 1984). There are two different approaches to utilizing this visual information, based on open-loop and closed-loop control methodologies, as described below, and illustrated in Figure 2.1. • Open Loop Control: Extraction of image information and control of a robot are two identifiable separate tasks where the image capture and processing is performed first followed by the generation of a control sequence. • Closed Loop Control: Vision Sensing and control are both done simultaneously. A sensor (camera) monitors the output (position of the image) and feeds the data to a system which continuously adjusts the control input as necessary to keep the control error to a minimum (to maintain the desired speed or position ) ,referred as visual servoing Humans are not capable of positioning themselves in an absolute way, yet are able to reach a goal position with remarkable accuracy by repeating a ‘look at the target and move’ type of strategy. They are apt at actively extracting relevant features of the environment through a somewhat inaccurate vision process and relating these to necessary movement commands, using a mode of operation called ‘visual servoing’ in robotics (Kim et al, 2007). There are three main approaches used in visual servoing which are Position-Based (PBVS) , Image-Based VS (IBVS) a 21/2D which is also called hybrid visual servoing. The main difference between these approaches is motion planning methodology. PBVS naturally solves this problem by localization via 2D or 3D work space transform and planning the robot trajectory directly in the work space. However motion planning in IBVS is made via an image plan. To overcome the limitations of both methods a combined approach has been developed. Malis et al. (1998) presented a 21/2D visual servoing technique which is ‘halfway’ between the classical position-based and image-based approaches, and uses an estimation of the partial camera displacement from the current to the desired camera poses at each iteration of the control law. Vision based navigation falls into three main groups. • MAP BASED NAVIGATION: This consists of providing the robot with a model of the environment. These models may contain different degrees of detail, varying from a complete CAD model of the environment to a simple graph of interconnections or interrelationships between the elements in the environment [DeSouza et al., 2003]. . • MAP BUILDING BASED NAVIGATION: In this approach first a 2D or 3D model of the environment is first constructed by the robot using its on-board sensors after which the model is used for navigation in the environment.



MAPLESS NAVIGATION: This category contains all systems in which navigation is achieved without any prior description of the environment. The required robot motions are determined by observing and extracting relevant information about the elements in the environment, such as walls, desks, doorways, etc. It is not necessary that the absolute position of these objects is known as further navigation to be carried out. Mapless navigation falls into three sub-groups: Navigation using optical flow (Santos-Victor et al., 1993, Arena et al., 2007) , navigation using appearance based matching (Ohno et al., 1996 , Booiji et al., 2007) and navigation using object recognition (Kim, 1995 & Nevatia, 1998), there are also behaviour based approaches to vision based navigation in maples space (Nakamura et al., 1995 & Nakamura et al., 1996).

Several other studies have been proposed. Burschka (2001) worked on a method called “teach-replay”, which was amended and improved upon by Chen et al. (2007). worked on the same methodology and improved a novel method, the technique requires a single off-the-shelf, forward-looking camera without calibration. Feature points are automatically detected and tracked throughout the image sequence, and the feature coordinates in the replay phase are compared with those computed previously in the teaching phase to determine the turning commands for the robot shown in Figure 2.2, Davison (2003) efficiently estimated the location of a single camera and related visual features by means of feature tracking. Many studies on feature-based approaches of vision using simultaneous localization and map building (SLAM) have been done in order to utilize the reliable data association capability of vision sensors (Karlsoon et al., 2005 & Elinas et al., 2006). 3. DESCRIPTION of PROPOSED SYSTEM To overcome navigation issues in an unknown environment whilst avoiding obstacles requires a reliable and robust architecture and also a powerful sensing strategy. The proposed system will be integrated into an existing mobile robot platform (a Pioneer 3-DX robot platform, which incorporates the Player/Stage architecture, and the performance of the visual seroving based approach will be compared with existing obstacle avoidance sensors, which include a scanning laser and sonar array It is proposed to use a high quality PTZ (Pan Tilt Zoom) camera as a primary sensor for the robot. The proposed system is shown in Figure 3.1. This section outlines IP based PTZ camera which will be used in this study, the Pioner 3-DX mobile robot and Player/Stage architecture. 3.1 PTZ Camera In this study an AXIS-213 PTZ Network (IP) camera will be used as vision sensor shown in Figure 3.2. The main difference and advantage of IP(Internet Protocol) cameras is that they provide output in digital form, and can be plugged directly to an Ethernet switch and accessed over an IP network. To achieve this, IP cameras incorporate a small on board computer, which usually run embedded Linux. PTZ cameras are able to move both vertical and horizontal directions; pan/tilt control moves the camera right to left, up and down, making it easy to adjust the angle of view. Furthermore zoom feature is used to gather detailed information from image which is one of the key issues to provide better result from image processing

algorithms. They have an appropriate HTTP interface which helps to implement software development to control the camera. AXIS 213 camera has a HTTP based web interface called vapix which supports CGI based web scripts and, can be used to control the camera via web browser . There are some primitive commands available in vapix are listed in Table 3.1 (Vapix Version-3, 2008). 3.2 Pioneer 3-DX (P3-DX) Mobile Robot PIONEER 3-DX, shown in Figure 3.1, is an agile, versatile intelligent mobile robotic platform updated to carry loads more robustly and to traverse sills more surely with high-performance current management to provide power when it's needed. It is not only accepted as a hoppy robot it will last through years of tough classroom use and come back for more. It is built on the same core client server model as all Mobile Robots and offers an embedded computer option. The mobile robot stores up to 252 watt-hours of hot-swappable batteries. It arrives with a ring of 8 forward sonar and with an optional 8 rear sonar ring. 3-DX's powerful motors and 19cm wheels can reach speeds of 1.6 meters per second and carry a payload of up to 23 kg (Mobile Robot Inc, 2008).

3.3 Player/Stage Architecture Player is a free and very popular software program which is used to control several robotic systems and sensors and provides a network interface to a variety of robot and sensor hardware. Player's client/server model allows robot control programs to be written in any programming language and to run on any computer with a network connection to the robot. Player supports multiple concurrent client connections to devices, creating new possibilities for distributed and collaborative sensing and control, Gerkey et al. (2001) states that: ‘Player, a network server interface to a collection of sensors and actuators, typically constituting a robot.’ (p.11). It is designed as a distributed system and relies on TCP protocol to handle communications between client and server layers. Overall system architecture is shown in Figure 3.3 Stage is a simulation tool which provides to simulate various mobile robots and sensors in a 2D bitmapped Environment, Stage devices present a standard Player interface so few or no changes are required to move between simulation and hardware. A scene from stage tool is demonstrated in Figure 3.4. Stage is designed to support research into multi-agent autonomous systems, so it provides fairly simple, computationally cheap models of lots of devices rather than attempting to emulate any device with great fidelity (The Player Project, 2008). 4. IMAGE PROCESSING STRATEGY Having an appropriate and well organized image processing strategy is one of the key issues to navigate in an unknown or cluttered environment for visual based navigation systems. Most of the visual based navigation strategies rely on landmarks. Landmarks are classified into either artificial or natural categories. Doors, windows and corridors are accepted as natural landmarks. However, symbols placed perpendicular to the

ground or some coloured simple objects such as balls or boxes can be used as artificial landmarks. There are several techniques have been proposed for landmark detection in the areas of image processing and computer vision depending on the type of any landmarks (Gonzales & Woods, 2003). Centre following without prior environmental information based on visual information provided by a single camera is main navigation strategy of this proposed study. When mobile robot moves towards the centre of the corridor, proposed visual system will detect landmarks on the walls and corridors to determine the behaviour of its next movement depending of the landmarks. When an obstacle occurs during the navigation task, obstacle avoidance module will automatically start and determine the obstacle avoidance strategy depending on obstacle features such as speed, size etc.Edge detection is a terminology in image processing and computer vision, particularly in the areas of feature detection and feature extraction, to refer to algorithms which aim at identifying points in a digital image at which the image brightness changes sharply or more formally has discontinuities. The gradient-based edge detection method has been widely used in mobile robot applications (Lee et al, 2003). There are two mainly used edge detection algorithm which are accepted to many as the optimal edge detectors which are canny and sobel edge detectors (Kim & Hwang, 2002). 4.1 Sobel Edge Detectors The Sobel operator performs a 2-D spatial gradient measurement on an image. And typically it is used to find the approximate absolute gradient magnitude at each point in an input greyscale image (Green, 2002). The Sobel edge detector uses a pair of 3x3 convolution masks, one estimating the gradient in the x-direction (columns) and the other estimating the gradient in the y-direction (rows). A convolution mask is usually much smaller than the actual image. As a result, the mask is slid over the image, manipulating a square of pixels at a time (Gonzales & Woods, 2003). An example of an actual sobel masks is shown in Figure 4.1. The magnitude of the gradient is calculated as given formula:

G = Gx 2 + Gy 2 An approximate magnitude can also be calculated using: |G| = |Gx| + |Gy| Example shown in Figure 4.2 presents general characteristics of sobel operators. 4.1 Canny Edge Detectors Canny (1986) considered the mathematical problem of deriving an optimal smoothing filter given the criteria of detection, localization and minimizing multiple responses to a single edge. The canny edge detector firstly smoothes the image to eliminate noise, and then it finds the image gradient to highlight regions with high spatial derivatives. The algorithm then tracks along these regions and suppresses any pixel that is not at

the maximum (non-maximum suppression (Green, 2002). Figure 4.3 presents general characteristics of Canny edge detectors. 5. DEVELOPED SOFTWARE for AXIS CAMERA A versatile program to control an IP based Axis PTZ camera using C++ has been developed. The algorithm composed of three main modules which are HTTP module, String Parser and Main modules respectively. User requests are accepted by the main module which also provides an input screen. The user can select any of camera features such as decreasing or increasing PTZ values of the camera or taking jpeg images and mpeg streams via this screen in real time. Requests are also converted to vapix scripts in this main module. The script format of the request is sent to String Parser module in which host, ip, port and process values of the request are separated and assigned to related variables in the memory. In the final stage, these variables and TCP socket architecture are used by the HTTP module to provide connection with the web server. A response is returned to client to inform it whether a connection has succeeded. Following a successful response, required task is done such as capturing an image from the current environment and store it to desired position in the stored unit as jpeg or bmp format. The proposed algorithm’s primitive flow chart is shown in Figure 5.1 6. CONCLUSION and FUTURE WORKS This paper has reviewed literature of concerning vision based navigation systems; it identifies a number of significant issues. The proposed research question is whether or not a PTZ camera is used as the primary sensor for Navigation and Collision Avoidance for mobile robots? In order to complete this work essential background knowledge of the current state of research on vision based autonomous systems will be gathered as part of the literature review. Visual sensing will be essential for mobile robots to progress in the direction of increased robustness, reduced cost, and reduced power consumption. Moreover, if robots can make use of computationally efficient algorithms and off-the-shelf cameras with minimal setup (e.g., no calibration), then the opportunity exists for robots to be widely deployed (e.g., multiple inexpensive coordinating robots). A novel visual based navigation system will be an important step in this direction.

REFERENCES 1. AHN S., CHOI J., DOH. N. L.,CHUNG K.,(2008) ” A practical approach for EKF-SLAM in an indoor environment: fusing ultrasonic sensors and stereo camera”, Autonomous Robot vol. 24. , pp. 315 -335. 2. ALTHAUS P., (2003) “Indoor Navigation for Mobile Robots: Control and Representations “ PhD Thesis, Royal Institute of Technology Numerical Analysis and Computer Science Computational Vision and Active Perception Laboratory , University of Stockholm . 3. AXIS INC (2008 ) “Vapix”,[Online]. Available from http://www.axis.com/techsup /vapix/ [Accessed 02/02/2009]. 4. GREEN B., (2002 ) “Canny and Sobel Edge Detection Tutorial”,[Online]. Available from http://www.pages.drexel.edu/~weg22/can_tut.html [Accessed 02/02/2009]. 5. CHEN, Z., STANLEY. T., (2006) “Qualitative Vision-Based Mobile Robot Navigation”, IEEE International Conference on Robotics and Automation (ICRA), pp. 2686-2692. 6. DESOUZA, G. N., KAK, A. C., (2002) "Vision for mobile robot navigation: a survey” IEEE trans.on Pattern Analysis and Machine Intelligence, Vol.24, No.2, pp.237-267. 7. ELINAS, P., SIM, R., LITTLE, J. J., (2006) “SLAM: Stereo vision SLAM using the Rao- Blackwellised particle filter and a novel mixture proposal distribution”. In Proc. of IEEE international conference on robotics and automation, pp. 1564–1570. 8. GONZALES., C., R.., WOODS., R., (2003) “Digital Image Processing”, ‘Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 78-94 9. MORAVEC, H., “Obstacle Avoidance and Navigation in the Real World by a Seeing robot Rover,” PhD thesis, Stanford Univ., Sept. 1980. (published as Robot Rover Visual Navigation. Ann Arbor, MI: UMI Research Press, 1981.) 10. MALIS, E., CHAMUETTE, F., BOUDET, S., (2000) “Multi-cameras visual servoing” , in ‘Proceedings of the IEEE International Conference on Robotics and Automation ICRA ‘00’ , Vol. 4, pp. 3183-3188. 11. NILSSON, N.J, (1984) “Shakey the Robot,” Technical Report 323, SRI Int’l,. 12. KIM, D. R., NEVETIA, J. C., THOMPSON, P. D., DICK, J. P. R., COWAN, J. M. A BOOIJ, O., TERWIJN, B., ZIVKOVIC, Z.,KROSE. B., (1995) “Symbolic Navigation with a Generic map”, IEEE Workshop Vision for on Robotics , pp. 136– 145.

13. KARLSOON, N., BERNARDO, E. D., OSTROWSKI, J., GONCALVES, L., PIRJANIAN, P., MUNICH, M. E., (2005) “The vSLAM algorithm for robust localization and mapping ” In Proc. of IEEE international conference on robotics and automation , pp. 24–29. 14. KIM S., OH, S.,(2007) “Hybrid Position Based Visual Servoing for mobile robots” , Journal of Intelligent & Fuzzy Systems 18, pp. 73–82. 15. PIAZZI., H., PRATTICHIZZO, H., (2003) “An auto-epipolar strategy for mobile robot visual servoing”, ‘Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1802-1807. 16. PLAYER PROJECT, (2008 ) [Online]. Available from http://playerstage.sourceforge.net/ [Accessed 02/02/2009]. 17. SANTOS-VICTOR, G., CUROTTO F., (1993) “Divergent Stereo for Robot Navigation: Learning from Bees,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition.

Figure 1.1 Indoor Robots can now localize themselves using computer vision (DeSouza, 2003)

Figure 2.1 Open-Loop Control and Closed-Loop Control (Kragic,2001)

Figure 3.1. Pioneer 3-DX mobile robot

Figure 2.2. TOP: Two milestone images from an indoor experiment, with all the feature points overlaid. BOTTOM: Two current images within each segment, as the robot moves toward the corresponding milestone location, with feature points overlaid. Current image tell the robot to turn right (Chen et al., 2007)

Figure 3.2. AXIS 213 PTZ cameras

Figure 3.3. Overall System architecture of Player, Gerkey et al. (2001)

Figure 3.4 A screen shot from stage simulator tool

Figure 4.1. Sobel masks (Bill Green, 2002)

Figure 4.2. An indoor example for Sobel edge detection method

Figure 4.3. An indoor example for Canny edge detectos Table 3.1. Some basic vapix scripts

1: http://myserver/axis-cgi/jpg/image.cgi for Request Default Image 2: http://myserver/axis-cgi/mjpg/video.cgi?resolution=320x240 &camera=1&compression=25 for Request a Multipart JPEG image stream from camera 1 with a resolution of 320x240 and compression of 25. 3: http://myserver/axis-cgi/com/ptz.cgi?info=1&camera=3 for Request information about which PTZ commands are available for camera 3.

Interface

Request (PTZ values or Take image stream)

Main Module

Send Request at vapix format

HTTP Module

String Parser Module Send tokens (Ip,host,port) Do Request

Connection

Successful connection

Server

Figure 5.1. Flow chart of developed software

Suggest Documents