Object Recognition In Augmented Reality

Object Recognition In Augmented Reality Speaker: Chester Pereira MS Candidate, Dept. Of Computer Science Project Advisor: Dr. George Markowsky Dept. o...
Author: George Stanley
0 downloads 0 Views 5MB Size
Object Recognition In Augmented Reality Speaker: Chester Pereira MS Candidate, Dept. Of Computer Science Project Advisor: Dr. George Markowsky Dept. of Computer Science Date:


Outline Ø Ø Ø Ø Ø Ø Ø Ø

Introduction to Augmented Reality. Motivation. Goals of the Project. System Components. Calculating the user’s view : The Viewing Pipeline Implementation of the System. Experimental Results. Conclusion and Future Work.

Introduction to Augmented Reality Ø

Augmented Reality (AR) is a recent research area that sprung out of Virtual Reality (VR).




Ideally, user is unable to distinguish between real and synthetic elements.


Real World + Computer Generated Synthetic Elements

Introduction to Augmented Reality

Introduction to Augmented Reality Virtual Reality: “ A computer generated 3-D world in which a user is immersed” (Aukstakalnis and Blater) Augmented Reality: “ Enhancing the user’s perception in the real world by merging synthetic sensory information” Differences between AR and VR: Ø Immersiveness – Real vs. Virtual Environment Ø Errors are less tolerated in AR

Introduction to Augmented Reality

Mixed Reality

Real Environment





Virtual Environment

Reality - Virtuality Continuum ( Milgram et al )

Motivation Ø


AR research finds applications in different areas such as medicine, entertainment, engineering design, military training, etc. Computing devices are getting to be smaller in size and faster. E.g.: vASUS MyPal A600 Pocket PC – A PDA that runs at 400MHz (75mm x 125mm x 12.8mm ) vMicroSoft Xbox game console – NVidia 300MHz 3-D Graphics processor (12” x 4” x 8” approximately)


Currently developed AR systems may cost up to a few tens of thousands of dollars. (Can we build something that is not so expensive?)

Goals of the Project Ø Ø



Develop a relatively inexpensive system. System recognizes static objects in a room ( E.g.: Doors and Windows). Objects are static; user may move around freely. System has information about the position and description of the objects. Augmented View: Draw wire-frames around the objects that lie in the user’s view, and provide textual description of those objects.

Goals of the Project Intended working of the system

Window Door to hallway

False Door #1

False Door #2

Artist : Sanjeev Manandhar

System Components Components of a typical Augmented Reality System: Ø

Tracking system – Estimate the position of the user w.r.t. the surroundings.


Computing Device - Necessary computation to generate the augmented view.


Display system – Relay the Augmented view to the user.

System Components Hardware Components used: Ø QuickCam Home Camera (Logitech) – A pc camera, to capture the user’s view. Ø pciBIRD (Ascension Technology Corporation) –A magnetic tracker to track the position of the user. Ø “Clip-On” Model CO-1(The MicroOptical Corporation) – A monocular display system. Ø DELL Dimension 4100 Series Desktop PC – Intel Pentium III processor @ 1000 MHz.

System Components 1.QuickCam Home Camera: ØUSB color camera. ØField of view: 450. ØFocusing range: 3ft. to infinity.

Approximate cost: less than $100.

System Components 2.pciBIRD: Ø 6 Degree of freedom magnetic tracker on a pci card. Ø Translation range of 76.2cm along X, Y, and Z axes. Ø Upto 105 measurements per second. Ø Position/ Orientation of the sensor is obtained by measuring the magnitude of the magnetic field produced by the transmitter. Approximate cost: around $5,000.

System Components 3.”Clip-On” Model CO-1: Ø Monocular LCD display. Ø NTSC/VGA formats supported. Ø 320 x 240 pixels. Ø 16-bit color. Ø 60 Hz refresh rate.

Approximate cost: around $1,500.

System Components Line Segment through the two sensors forms the Z-axis of the eye system.

Figure: Hardware Components mounted on a hard hat.

System Components Software Used: ØAll code written in Microsoft Visual C++ on Windows2000. ØMicroSoft Video for Windows (VFW) – for video capture and video manipulation. Ø pciBIRD C++ API – (provided by Ascension Technology Corporation) for interfacing the tracker with the code.

Calculating the user’s view : The Viewing Pipeline Ø Objects to be recognized are geometrically simple to describe Doors and Windows are rectangular in nature. ØThe augmented view : wire-frames around the objects that lie in the user’s view.

Geometric transformation of each edge from world co-ordinates to screen co-ordinates.

Calculating the user’s view : The Viewing Pipeline World Co-ordinates World to eye transformation Eye Co-ordinates Line Clipping Clipped Edge in Eye Co-ordinates Perspective transformation Virtual Screen Co-ordinates View-port transformation Actual Screen Co-ordinates

Calculating the user’s view : The Viewing Pipeline Y


World System: eye


View reference point


ØRight Handed System. ØAxes - X1,Y1,Z1. ØOrigin – Center of magnetic transmitter.

Y1 Eye System: ØLeft Handed System. ØAxes - X,Y,Z. ØOrigin – eye.


ØEye to vrp – Direction of view.

Calculating the user’s view : The Viewing Pipeline Homogeneous Matrix Representation: ØA point P(x,y,z) is represented as P [x y z 1]. ØGeometric Transformations (translation, rotation, scaling) are represented as 4x4 matrices. ØTo apply transformations T1,T2 ... Tn (in that order) on P, P1= P x T1 x T2 x … x Tn. or P1= P x GEO,


GEO= P x T1 x T2 x … x Tn.

Calculating the user’s view : The Viewing Pipeline Y


(xf, yf, zf)


Transforming world system to eye system: ØTranslate origin to eye.


(xa, ya, za)

ØReverse direction of X1. ØRotate by 900 about X1.


Y1 (0,0,0)

ØRotate by ‘θ’ about Y1 until Z1 is above Z. ØRotate by ‘ψ’ about X1 until Z1 is above Z.


Calculating the user’s view : The Viewing Pipeline P1=P * EYE

Transforming world system to eye system: Transformation Matrix : EYE cosθ

















d1=sqrt((xf-xa)2+(yf-ya)2+(zf-za) 2)

Calculating the user’s view : The Viewing Pipeline Virtual Screen Co-ordinates:


P(x,y,z) P1(x1,y1)




Virtual Screen

Calculating the user’s view : The Viewing Pipeline The Viewing Pyramid:





Calculating the user’s view : The Viewing Pipeline θ – Field of view


aperture = b/d=tan(θ θ/2) In our case, aperture=tan(45/2)=0.4142

d eye

θ screen

b Z

Calculating the user’s view : The Viewing Pipeline Virtual Screen Co-ordinates:





y1=y/z*apert (-1lets the user to set the function to be called whenever a frame is captured. •capSetCallbackOnError =>lets the user to set the function to be called in case an error was encountered. •capCaptureSetSetup =>Brings up a common dialog to let the user select capture settings. •capPreviewRate =>Sets the preview rate ( normally set to 15 frames per sec)

Implementation of the System 4. The pciBIRD API: InitializeBIRDSystem –Reset all PCIBIRD boards in the system, obtain and build a database of information containing number of sensors, transmitters etc. SetSensorParameter –Set parameters of the sensor, such as data format type, frequency of measurement,etc. ( In our case, data format = “position only”) GetAsynchronousRecord – Returns the last data record from the last computation cycle. GetBIRDError – Returns the oldest error message in the error queue.

Implementation of the System 5. Putting it all together: ØRead scene information into theObjects[]. ØIntialize pciBIRD system. Ø Create a capture window and connect to a capture driver. Ø Set preview rate to 15 frames/sec. Ø Set the frame- callback function to FrameProc().

Implementation of the System 5. Putting it all together:


eye-position=Reading from sensor; vrp=Reading from sensor; set_virtualcenter(bitmapwidth/2, bitmapheight/2); set_vs (bitmapwidth,,bitmapheight); ProcessObjects();


Implementation of the System 5. Putting it all together: The ProcessObjects() function: For each object For each edge of the object Transform end-points from world to eye co-ordinates Clip edge against the viewing pyramid If edge is visible perspective transformation ( => 3-D to 2-D, normalized) View-port transformation

( => actual screen co-ordinates)

If at least one edge is visible Show textual description of the object.

Experimental Results Ø Only one sensor was used : Camera position was fixed. Ø vrp set by measuring co-ordinates of the point that translated to the center of the screen. Ø Operating range of tracker was not wide enough : smaller objects such as posters were chosen. 1 1 9.3



15.7 19.6


17.1 19.7 16.7 9.5 19.7


Experimental Results

Experimental Results Wire frames do not match exactly with edges of the object. Possible reasons: Ø Effect of electrical/magnetic devices in the vicinity. ØTracking device does not accurately convey the proper “sign” of the coordinates. ØOur assumption: user does not tilt his head from side to side – hard to keep camera from tilting. ØCamera’s field of view =450?

Conclusion and Future Work Observations: ØTracking plays a massive role in Augmented Reality. ØMagnetic tracking is not very practical for our problem: ØWe cannot expect a real-life scenario to be devoid of electro-magnetic devices. ØTracking area is very low. ØDisplay device used is not really a “see through” device. Actual see-through devices are expensive.

Conclusion and Future Work Possible Alternative to magnetic tracking: UNC’s Hi-Ball Tracker (Hi-Ball 3100) ØWorks on Infra-red. ØUnaffected by electromagnetic waves and sound waves. ØBetter resolution (0.2mm) ØGreater operating range (144 sq.ft. – more than 1600 sq.ft.)

Conclusion and Future Work Can we expect better tracking and display device in the near future? Answer – YES. Reason – Devices available now are better than those that were available since this project was started. Examples – Ø UNC’s Hi-Ball tracker was not commercially available. Ø MicroVision’s military HMD uses Retinal Scanning Display Technology - “draws” images directly on the retina.

Conclusion and Future Work Summary: Ø AR – enhancing perception by introducing synthetic elements. Ø Motivation – Computing devices are getting faster and more compact. Ø Goal – Develop a relatively inexpensive device to recognize geometrically simple objects. Ø Observations – Magnetic trackingà not a very good choice. Ø We can expect tracking and display devices to be more easily available and less expensive in the future.

Acknowledgements Dr. George Markowsky Dr. Larry Latour Dr. Tom Wheeler Dr. Ed Ferguson Carol Roberts Vijay Venkataraman Sanjeev Manandhar

References http://www.se.rit.edu/~jrv/research/ar/ - Presents an introduction to the field of Augmented Reality and has a number of links to Augmented Reality work available on the web. http://www.howstuffworks.com/augmented-reality.htm - Presents a simple introduction to Augmented Reality. “Tracking Requirements for Augmented Reality”, Ronald T. Azuma, Communications of the ACM, 36, 7 (July 1993), 50-51. “A Survey of Augmented Reality”, Ronald T. Azuma, Presence: Teleoperators and Virtual Environments 6, 4 (August 1997), 355 - 385. “Augmented Reality: A class of displays on the Reality-Virtuality Continuum”, Paul Milgram, Haruo Takemura, Akira Utsumi, Fumio Kishino. http://www.augmented-reality.org/ - Has links to technology, research groups, projects, products, and resources for more information, related to Augmented Reality. http://www.cs.unc.edu/~tracker/ - The UNC tracker project. http://www.3rdtech.com/HiBall.htm - Contains information of the commercially available version of UNC’s HiBall Tracker. "Stereoscopic Vision and Augmented Reality", David Drascic, Scientific Computing and Automation, 9(7), 31-34, June 1993. http://www.stereo3d.com/hmd.htm - Contains a comparison of commercially available Head Mounted Displays. http://www.mvis.com/prod_mil_hmd.htm - MicroVision’s Military HMD that uses the Retinal Scanning Display Technology. http://www.ascension-tech.com – Website of Ascension Technology Corporation, manufacturers of various tracking devices.