Overcoming Mobile Usability Testing Hurdles: A Mobile Payments Case Study

Usability Professionals’ Association promoting usability concepts and techniques worldwide Overcoming Mobile Usability Testing Hurdles: A Mobile Pay...
Author: Carmel Randall
1 downloads 0 Views 252KB Size
Usability Professionals’ Association

promoting usability concepts and techniques worldwide

Overcoming Mobile Usability Testing Hurdles: A Mobile Payments Case Study

Kent Griffin User Interface Designer, Mobile PayPal [email protected] Jungeun Kim User Interface Designer PayPal [email protected] Paresh Vakhariya Sr. User Researcher PayPal [email protected]

A peer-reviewed paper from: UPA 2007 Conference Patterns: Blueprints for Usability June 11-15, 2007 Austin, Texas, USA http://www.usabilityprofessionals.org

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Copyright 2007, UPA and the authors.

Overcoming Mobile Usability Testing Hurdles

Abstract PayPal’s mobile payment product (“Text to Buy”) posed several hurdles specific to usability testing mobile applications in a lab environment. This paper talks about how we overcame these hurdles by using multiple research methods, simulating multi-modal interactions, customizing the prototype, and extending standard web techniques like Wizard of Oz.

Keywords PayPal, m-commerce, Text to buy, mobile, payment, cell phone

Citation This paper may be cited as: Griffin, K., Kim, J., & Vakhariya, P. (2007, June) “Overcoming Mobile Usability Testing Hurdles: A Mobile Payments Case Study”. Proceedings of the Usability Professionals’ Association, UPA 2007. Austin, TX, USA.

UPA 2007: Patterns in Usability

Page 2 of 8

Overcoming Mobile Usability Testing Hurdles

Introduction For many, the idea of using a mobile phone for day-to-day payments seems like a far-off fantasy. However, mobile payments are certainly nothing new. By 1999, PayPal, iMode, and Nordea had all launched various mobile payment solutions targeted at their specific locales.[1] Nonetheless, today there are still a lot of hurdles in the way of realizing the dream of never having to carry a wallet. Among other things, an mcommerce system needs to be safe, easy to use, and compelling.[2] However, the success of a mobile product lies not just in being able to build the solution, but also in being able to effectively test it. When trying to usability test a mobile application, several unique hurdles need to be overcome. This paper will examine how the testing methodology for the “Text to Buy” program dealt with seven of the more difficult hurdles.

Introduction to “Text to Buy” Launched in April 2006, Text to Buy allows consumers to securely send money and buy items using their mobile phones. After activating a phone online, you can look for advertisements with the Text to Buy icon in magazines, billboards, or even commercials. Along with every icon, there is an item keyword (which becomes the body of the text message) and a short code (where the text message is sent). After texting in your order, an Interactive Voice Response (IVR) system calls you back to confirm the purchase.

Figure 1. How Text to Buy works As you can see, this application involves the interaction of several elements: print media, text messages, voice calls, and a website. Moreover, the goal of the usability test was to get feedback on all these elements, as well as the overall system

Methods We decided to use a combination of two research techniques to gather feedback on the product and to address the hurdles that we anticipated. The two research activities were a usability test (both in the United States and the United Kingdom), and two cognitive walkthrough sessions with a group of users. All the activities were conducted during the design and iteration phase so that we could address any issues encountered by participants. The primary goal of the 1:1 usability test was to determine if users were both willing and able to buy products via text messaging. By repeating the usability test in the UK, we intended to gather feedback on any international differences in how this product would be used. The cognitive walkthroughs were intended to gather similar feedback from users, but in a more natural setting where background noise and other interruptions are common. By using a combination of research methods in two different locales, we hoped to address the seven mobile usability testing hurdles discussed in the next section.

UPA 2007: Patterns in Usability

Page 3 of 8

Overcoming Mobile Usability Testing Hurdles

Mobile Usability Testing Hurdles Here is a list of seven hurdles that we knew we would to face while usability testing this mobile application. While there may be more hurdles (especially for different applications), we wanted to make sure our methodology at least addressed these. As we describe the hurdles, we’ll also discuss how each research method we used played a different role in helping to overcome the hurdle. 1. Mobile Target Segments Are Different (I will never use this product) Defining an audience is always critical to any design or research project. However, the target audience of a mobile application is often different than an online application. We need to consider more than the traditional categories, such as early adopters, everyday consumers, and tech-savvy users. Instead, mobile applications may also be well suited for commuters, teenagers, or even the physically-challenged.[3] To define the target segment that our mobile research should focus on, we evaluated various quantitative surveys and other market data. This research showed that the top criteria that might affect performance were age and text messaging experience. Based on this, we decided to recruit users under 35 years of age with a variety of text message experience. To control for a product bias, we also recruited a mix of existing and new users. However, to see if the frequency of text messaging made an impact on product usage, we made three groups: light usage (less than 1 text message a month), medium usage (1 to 5 a month), and high usage (more than 5 a month). These numbers were based on market research data indicating frequency of mobile text messaging in the US.[4] These numbers were higher for European countries, and the recruiting criteria for the UK study are discussed in a separate section of this paper. No significant differences were noticed between the new and existing users (a discovery that we used when recruiting for the UK study). Overall, participants with various text messaging experience were equally successful in accomplishing their tasks. However, the less experienced group were (on average) more skeptical that the system would work until they completed their first purchase. This revealed a mental barrier that the product would strive to overcome through education. The heavy text message users were also slightly faster (though not significantly) in terms of task time. The existence of these small differences shows that, while the product can be targeted toward all three groups, it did make sense to group users this way for the study. 2. Unfamiliarity With a Test Device (This isn’t what I’m used to) In a typical web-based usability session, moderators usually assume that the participant can use a keyboard and mouse. However, it is less likely that a participant will know how to use a particular phone during a mobile study. This is even more important when testing a text messaging application as there are many different configurations (triple-tap, predictive text, symbol mode), and each phone uses them differently. For consistency, we forced all participants to use a typical US phone (the Motorola v180). While they would have been more comfortable with their own phones, it would have been difficult to tell how the usability of their device affected the data. However, since we forced them to use this particular phone, none of them were familiar with it when they started the session. This meant that their lack of familiarity with the phone could affect how easy they perceived using the product to be. To address this issue, the participants were given two preliminary tasks to become familiar with the device. Since our product includes text messaging and IVR interactions, we asked them to send and receive a text message and a phone call. To make sure they knew phone well, the text message they composed had a combination of numbers, letters, and symbols (for example, “Let’s see the 7:00 movie”). We then asked the participant to place and receive a phone call. People in the control room helped simulate a realistic environment by both replying to the text messages and answering the phone calls. When these initial tasks were completed, participants knew how to use the phone well enough to proceed with the rest of the study. These initial tasks, therefore, prevented device discrepancy from impacting the test results significantly.

UPA 2007: Patterns in Usability

Page 4 of 8

Overcoming Mobile Usability Testing Hurdles

3. Small Form Factor Impedes Observation (I don’t hold it this way) Observing the participant’s actual interaction is one of the most difficult problems in a mobile usability study. By design, mobile devices are very personal. However, in order to accommodate observers who want to see the user-device interaction, cameras are usually set up near (or on) the device.[3] When setting up these cameras (or in choosing not to use them), there needs to be a balance between helping observers watch the study and impeding the user’s interaction with the device.

Figure 2. Mobile Usability Testing lab setup Given budget constraints, we decided to use the two overhead cameras that were already in the test room, rather than purchase cameras designed specifically for mobile testing. One camera captured the participant’s face so that the people in the observation room could see the reactions to the tasks. The second overhead camera was zoomed in on the device screen. To ensure the screen could be captured, the cell phone was taped to the table. Feeds from both of these cameras were viewable in the control room and the observation room. While good for observation, this setup meant the participants could not hold the phone in their own hands. Although this is clearly not how users usually interact with their phones, we wanted the usability test to focus on observation and data logging. To receive well-rounded feedback, we also conducted a cognitive walkthrough where users could interact with their cell phones in a more natural manner - by holding it in their hands without worrying about any cameras. During the usability tests, there was no indication that the results were skewed because the participants could not use their phones in the way they wanted. They felt comfortable with the product and were able to complete the tasks successfully. During the cognitive walkthrough, users completed the tasks in a similar amount of time, and no device-related issues were uncovered. The similarity of the results shows that formfactor did not significantly influence the results of the first usability test. That is, not being able to hold the phone did not affect people’s ability to use the product. Taping the phone to the desk was not only good for observation, but it didn’t impact the reliability of the results. It is important to note that form-factor could play a more important role for other products that require more extensive use of the keypad (such as games).

UPA 2007: Patterns in Usability

Page 5 of 8

Overcoming Mobile Usability Testing Hurdles

4. Mobile Prototyping Is Difficult (Is this thing real?) Building a working prototype for a multi-modal mobile application isn’t as easy as making a few mockups or HTML pages. Moreover, showing the interaction between these elements is even more difficult.[5] When testing a system that isn’t fully developed, a certain level of improvisation is always required. To build a working prototype, we used a different prototyping technique for each element. We made paper mockups of the print advertisements, HTML pages for the website, scripted calls for the IVR interaction, and used a carrier’s website to send any necessary text messages. To make the interaction believable, we also adapted the Wizard of Oz (WOZ) technique to the mobile environment. This required two team members to stay in the control room and send text messages or IVR calls. When a participant needed to send a text message, we had them send it to a short code that would not generate a response. Every time the product should have responded with a text message, we instead had a team member send a text message using the carrier’s website. Likewise, when a confirmation call should have been placed, a team member placed the call using a phone in the control room.

A) Print ads

B) Company website

C) Text notification

D) Example IVR confirmation call “Hello John D, we’re calling to confirm your purchase in the amount of four dollars. Using your phone’s keypad please enter your PIN.” Figure 3. Various prototyped components of Text to Buy The adaptation of the WOZ technique was critical to making the interactions happen in real time. All the users were very satisfied with the system and none commented on it feeling like a prototype. By adopting these techniques, we also received more feedback on how the various components interacted with each other rather than on the components themselves. For example, people commented on whether the confirmation call they received accurately reflected the print advertisement they had seen a few minutes back. 5. Real World Is Noisy and Less Private (The lab is too quiet) It’s usually difficult to decide whether to do a mobile study in a lab or in more natural setting. When using a mobile device outdoors, for example, many factors come into play that wouldn’t exist in a controlled lab environment. When not in a lab, people’s interactions become affected by background noise, as well as their concerns over what other people may hear or see. In these cases, people are also more likely to become interrupted or distracted by surrounding events. These interactions, and the problems they can cause, are difficult to replicate by doing a one-on-one lab study.

UPA 2007: Patterns in Usability

Page 6 of 8

Overcoming Mobile Usability Testing Hurdles

During the usability test, it was not possible to have realistic, noisy environments while still being able to observe and log the session. However, since the product allows the people to give voice commands like ‘ approve’ or ‘yes’, it was imperative to understand users’ comfort level in less controlled environments. This hurdle was addressed by holding a cognitive walkthrough session after the usability tests were complete. These were conducted as group sessions to create some background noise. By having several people near each other, we were also able to generate what was perceived as a less secure environment. The addition of the walkthrough did reveal new results that the lab study could not. For example, it provided data on how sensitive the IVR system should be to voice commands when it detects various levels of background noise. We were also able to collect data on the participants’ perception of security when using the product in a more public environment. It would have been impossible to gather this data with just a labbased usability test. 6. Purchase Not Engaging Enough (I don’t want to buy this item) To discover the concerns people would have with mobile commerce, we needed to make sure that participants were as invested in the process as possible. If a participant felt like they were buying an item they actually wanted, they would be more concerned with overall experience instead of just the purchase. To accomplish this, we asked some additional questions when recruiting the participants. We collected input from each participant about the category of items they might buy in the near future (such as consumer electronics, DVDs, or cosmetics) and asked for some specific examples. We then used these answers to create ads specific to what the participants were interested in purchasing. These customized ads definitely made the experience more realistic for the participants. Several became engaged enough to suggest questions or concerns they had with the overall process. For example, they mentioned questions about refunds, how tax was handled, and whether they could choose a gift address. Once these questions had been raised, we were also able to get feedback on these questions from the remaining participants. This contrasts with the results of the cognitive walkthrough where we re-used the same advertisements. The participants now seemed less engaged in the experience and tended to focus on the actual purchase experience (for example, whether the voice was clear or how quickly the system reacted). Thus, by tailoring the ads to the individual participants, we received more detailed feedback than would have been possible otherwise. 7. International Differences May Be Significant (We do it differently here) The mobile culture differs significantly from one country to another. Given the radically different mobile penetration rates across countries and the different devices that are used, it is likely that users in one country will react to a mobile product differently than those in another. Since Text to Buy was initially being launched in the US, UK, and Canada, we looked at market data for each country.[4] These data showed vast differences in the UK market (for example, a higher penetration rate and a wider variety of mobile services). Thus, we conducted a separate qualitative usability test in the UK. For this study, we only recruited existing users (as US testing had indicated no significant differences). We maintained the age group (below 35 years old), and we only included heavy text messaging users. Text messaging frequency was adjusted to the higher UK standards, and a Nokia 6600 was used as the typical UK phone. To get accurate data, we customized all the prototypes, including the web pages, text messages, IVR calls, and product ads. For example, we localized the product ads to use appropriate pricing and currency, and we used a British voice for the IVR system. Although a vendor in the UK conducted this usability test, we made sure that the methodology was consistent with the US usability test. The overall product performed very well in the UK as well. However, users in the UK did have different expectations about the product’s functionality. For example, UK users had a clearer understanding of how short codes worked and what to expect when a text message was sent. By contrast, many participants in the US questioned the idea of sending a text message to a five digit number. UK users also came up with several new use cases that the US users did not necessarily think of (for example, buying subway tickets or parking tickets).

UPA 2007: Patterns in Usability

Page 7 of 8

Overcoming Mobile Usability Testing Hurdles

These findings suggest that people in the UK do indeed use mobile products differently than the users in the US. Hence, it makes sense to conduct separate usability tests when possible.

Conclusions Usability testing multi-modal mobile applications can be challenging due to the hurdles we discussed. Using multiple research methods (1:1 usability tests and group research) focused on different locales and environments is recommended. Simulating the actual product and user interaction in a realistic manner (online as well as offline) will provide well-rounded results. Users also find the research more engaging if the usability prototype is customized according to their preferences. Creative ways of extending standard web techniques (like the “Wizard of Oz” technique) to mobile applications will increase the user’s interaction with the prototype and encourage them to provide more detailed feedback. We are currently exploring other research methods, such as ethnographic field research, to complement the lab research. We are also looking at other viable camera options like a “device mounted camera” or a “lightweight sled” (which were not included due to budget constraints and were deemed as not significantly affecting user performance). As new trends and needs emerge in the mobile market, more creative ways of researching this challenging, but interesting, area will be key.

References [1] Sadeh, N. (2002). M-Commerce Technologies, Services, and Business Models. Boston: Benchmark Productions Inc. [2] Lucas, P. (2005, March). When will M-Commerce Arrive? Digital Transactions, 2(2), 30-42. [3] Weiss, S. (2002). Handheld Usability. Chichester, England: John Wiley & Sons Ltd. [4] Netsize. (2005). The Netsize Guide – 2005 Edition .Paris: Netsize S.A. [5] Longoria R. (Ed.). (2004). Designing Software for the Mobile Context: A Practitioner’s Guide. London: Springer-Verlag.

Authors Kent Griffin has worked at PayPal since 2004 and has been designing and researching their mobile payment solutions since 2005. He is a graduate from Stanford University with a B.S and M.S. in Symbolic Systems focused in HCI Design. He is also a member of BayChi (the San Francisco chapter of ACM SIGCHI) and is a regular attendee of Mobile Mondays in Silicon Valley. Jungeun Kim is currently working for PayPal as a User Interface Designer. She has a strong interest in the digital media’s impact on culture. As a foreigner, living in the US, she is also naturally interested in international design. Paresh Vakhariya has been working at eBay/PayPal for the last three years and is currently leading several User Research initiatives including Mobile. He has been conducting user research for almost eight years. He has a MS in Industrial Engineering (focused on HCI) from Clemson University and a BS from University of Bombay (Mumbai). Previously, he worked for Intel for four years. He has keen interest in Mobile usability, Ethnographic research and Participatory design methodologies. Paresh recently presented “User Experience & Design Team: Efficient Handshake between Different Roles” at APCHI’06. He has also presented two posters at UPA on competitive analysis and field visit data gathering techniques.

UPA 2007: Patterns in Usability

Page 8 of 8