There s No Such Thing as Gaining a Pound: Reconsidering the Bathroom Scale User Interface

Session: Domestic Computing UbiComp’13, September 8–12, 2013, Zurich, Switzerland There’s No Such Thing as Gaining a Pound: Reconsidering the Bathro...
Author: Cori Parks
2 downloads 0 Views 1MB Size
Session: Domestic Computing

UbiComp’13, September 8–12, 2013, Zurich, Switzerland

There’s No Such Thing as Gaining a Pound: Reconsidering the Bathroom Scale User Interface Matthew Kay,*† Dan Morris,* mc schraefel,*‡ Julie A. Kientz† *Microsoft Research †University of Washington ‡University of Southampton Redmond, WA, USA Seattle, WA, USA Southampton, UK [email protected], [email protected], [email protected], [email protected] ABSTRACT

The weight scale is perhaps the most ubiquitous health sensor of all and is important to many health and lifestyle decisions, but its fundamental interface—a single numerical estimate of a person’s current weight—has remained largely unchanged for 100 years. An opportunity exists to impact public health by re-considering this pervasive interface. Toward that end, we investigated the correspondence between consumers’ perceptions of weight data and the realities of weight fluctuation. Through an analysis of online product reviews, a journaling study on weight fluctuations, expert interviews, and a large-scale survey of scale users, we found that consumers’ perception of weight scale behavior is often disconnected from scales’ capabilities and from clinical relevance, and that accurate understanding of weight fluctuation is associated with greater trust in the scale itself. We propose significant changes to how weight data should be presented and discuss broader implications for the design of other ubiquitous health sensing devices.

fully partially not work home other yes no

Figure 1. Screenshot of a mobile web app used to collect multiple weigh-ins each day. Participants entered their weight and answered three multiple-choice questions at each weigh-in. The result was added to a running graph of weight over time.

have replaced the analog needle, coarse measurements of body fat have been added, and some scales log data for offline review; however, the singular data point is still the main display and is often the only information presented at the time of weigh-in. Most scales answer just one question—“what do I weigh right now?”—which may not be the best framing for weight data.

Author Keywords

Weight, scales, health data perception ACM Classification Keywords

J.3. Life and medical sciences: Health.

We believe there are several issues with current scales that work against an effective understanding of weight management, which we explore in this paper. For example, digital scale readouts convey an unrealistic level of precision, negatively affecting user perception. We also show that many scale users develop a deep, trusting relationship with their scales despite significant misconceptions about accuracy, trends, and fluctuation; in an online survey of over 800 scale users, we found that respondents with less understanding of how weight fluctuates during the day were less likely to trust their scales. This is exacerbated by the fact that the scale interface makes no attempt to inform users about how weight fluctuates. Our work suggests an opportunity to re-imagine the 100-year-old user interface that is still state-of-the-art in weight management, grounded in best practice in weight management research and consumers’ understanding of weight fluctuation. Further, as scales are part of a larger class of increasingly ubiquitous health feedback devices that provide single-point, instantaneous measurements—such as body fat estimators, thermometers, pedometers, and blood pressure cuffs—our work provides a foundation for future design in this broader space.

INTRODUCTION

The bathroom scale is the most ubiquitous tool for diagnosing and managing weight issues—arguably, the most ubiquitous health sensor of all—and several studies have shown that frequent weigh-ins help maintain weight loss [25,28]. However, people who are watching their weight often have a marked aversion to stepping on the scale [7]. We hypothesize that some of this resistance comes from the design of the scale’s interface. Despite its centrality to global health and wellness, the familiar bathroom scale interface has barely changed since it was first introduced about 100 years ago: it still produces a single value representing one’s weight at the moment of measurement. Digital displays Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. UbiComp’13, September 8–12, 2013, Zurich, Switzerland. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-1770-2/13/09…$15.00. http://dx.doi.org/10.1145/2493432.2493456

401

Session: Domestic Computing

UbiComp’13, September 8–12, 2013, Zurich, Switzerland

The rest of this paper is organized as follows: First, we describe related work in weight management (focusing on scales) and intelligibility of ubiquitous interfaces. Second, we analyze a repository of online reviews of scales, examining consumers’ understanding of quantitative health measurements in terms of attributes like accuracy, precision, and trends. Third, we outline themes from semistructured interviews with experts in nutrition on the role of the scale in clinical practice and their clients’ relationships with scales. Fourth, we present results from a study quantifying daily weight fluctuation, which has previously been only anecdotally studied even in the clinical literature. Fifth, we describe a large-scale survey of over 800 participants assessing their understanding of how scales operate, how much their weight typically fluctuates, and their own relationships with scales. Finally, we synthesize design recommendations for weight scales and discuss broader implications for the design of health feedback displays.

instantaneous weight. One exception is a Weight Watchers scale that displays the difference between current weight and a goal weight (or the previous measured weight); however, this still treats single data point measurements as meaningful reflections of current weight and does not inform users of broader patterns of weight fluctuation. Intelligibility of Feedback in Ubiquitous Computing

One of the core challenges in scale interfaces we will discuss throughout this paper is users’ understanding of the underlying data—how weight typically fluctuates and the uncertainty associated with measuring it. Lim and Dey have studied the effects of the intelligibility of context-aware systems on user perceptions—essentially, how transparent the reasoning or certainty of these systems are to users [14,15]. They found that exposing the certainty of a system—for example, a confidence region in location-aware systems—improves users’ perceptions of the accuracy and appropriateness of a system, so long as the certainty is good enough [15]. In general, the effect of displaying uncertainty on task performance seems to vary by application, sometimes having positive [1] or negative [20] effects.

BACKGROUND AND RELATED WORK Weight Management and Scales

As links among obesity, mortality, and other health conditions have become clear [2,9], weight management has become a key part of health practice. Obesity is clinically defined in terms of weight and Body Mass Index (BMI) [4,22]; BMI is itself a function of weight and height. Therefore, the scale plays a central role in diagnosing obesity. The scale is also used as part of the treatment regime for obesity: more frequent use of the scale, such as daily weigh-ins, correlates with better weight maintenance after weight loss [25,28]. Studies have shown people who maintain weight best after weight loss interventions eat healthily, have physical activity in their lives, and regularly monitor their weight [12,27]. Actual approaches to reducing weight are most commonly associated with calorie restriction and increased physical activity [13,16]—i.e., having people eat less food than required to maintain their current weight. Finally, the weight scale also allows a patient or clinician to monitor weight fluctuation, which has itself has been directly associated with increased mortality [6,21]. Fluctuation is particularly common in individuals dealing with obesity: numerous studies show that successful weight loss is often followed by a recurrence of obesity, with patients sometimes gaining more than they have lost [6,21].

Other work has looked at using natural-language generation to describe inferences in health data [17,23] as a way to improve human inference. We believe this approach may be promising for weight data, and a systematic understanding of people’s grasp of statistical vocabulary is essential to it. Researchers have tried to quantify words of estimative probability by having people assign numerical probabilities to words like ‘likely’, ‘uncertain’, ‘impossible’, and so on [11]. Similarly, confusion around measurement descriptions such as ‘precision’ and ‘accuracy’ has been explored in science education [24] and in specific scientific domains [26], but we are not aware of similar investigations of lay understanding of such words, despite their frequent use in product descriptions and consumer reviews. ONLINE REVIEWS STUDY

We began our investigation into users’ perceptions of weight scale data with a qualitative analysis of online product reviews from a popular shopping site (amazon.com) for several consumer scales. This study aimed to answer three questions: 1) What are consumers’ expectations for accuracy in scales? 2) How do these expectations relate to consumers’ satisfaction with devices? and 3) What terminology do consumers use to express these expectations?

Because caloric restriction seems to have only short-term benefits and often leads to weight regain, and because weight fluctuation is associated with increased mortality, recent work has asked whether weight management should be based more on healthy behaviors than on instantaneous weight [5,18]. In the consumer space, scales such as the Withings and the Fitbit Aria have adopted a self-tracking approach: these scales automatically upload weight and body composition to a website where users can view graphs of their weight over time. However, despite innovations in offline feedback, the fundamental user interface of the scale at weigh-in remains essentially unchanged, reflecting only

We analyzed product reviews for four popular scales: the Withings scale, the Fitbit Aria, a Tanita scale, and a Weight Watchers scale. Amazon.com reviews include two pieces of metadata: a 5-point product rating and a yes-or-no helpfulness rating (derived from the question “was this review helpful to you?”). The helpfulness rating overestimates the helpfulness of reviews with a small number of positive reviews, so we convert it to a helpfulness score by taking the lower bound of its 95% binomial confidence interval.

402

Session: Domestic Computing

UbiComp’13, September 8–12, 2013, Zurich, Switzerland

From a corpus of 1084 reviews, we selected those with at least one helpfulness rating (855 reviews). Of these, we considered only 1-, 2-, 4-, and 5-star reviews (817 reviews) and then coded 100 reviews (the top 50 with 1 or 2 stars and the top 50 with 4 or 5 stars, ordered by helpfulness score). We used affinity diagramming to identify recurrent themes within this subset around users’ understanding of precision, accuracy, and uncertainty. We derived a coding scheme from these themes with 44 codes across 5 categories: motivations for using the device, how reviewers test accuracy/reliability, consistency expectations, factors discussed with respect to data quality, and interpretations of noisy data. The reviews were coded, and we used frequency profiling [19] to identify codes that were more frequently found in 4- or 5-star reviews (positive reviews) than 1- or 2star reviews (negative reviews), and vice versa.

the number of reviews containing various words and their derivatives (we list words here only by one form, e.g. consistency for consistent/consistency and derivatives). By far the most-used term was ‘accuracy’ (in 48/68 reviews), followed by ‘consistency’ (22/68), ‘fluctuation’ (10/68), ‘variance’ (8/68), ‘precision’ (6/68), ‘reliable’ (5/68), and ‘repeatable’ (3/68). We note that even in this small sample, words were not used consistently by reviewers: for example, ‘precision’ was used to refer both to the concept of accuracy and of precision by different reviewers. We also observed a strong preference for the use of the term ‘accuracy’ to refer broadly to issues of measurement uncertainty. We therefore believe that a more systematic investigation of vocabulary for expressing uncertainty is warranted. EXPERT INTERVIEWS

We interviewed four experts on weight change to validate the findings from our online review study, to better understand how scales are used in weight management, and to learn how experts see the effects of scale use on their users:

Results Trend Focus vs. Data Point Focus

Positive reviews were more likely to exhibit a trend focus (28% of positive reviews, 4% of negative reviews). Rather than discussing problems with individual readings, reviewers discussed the overall value of the scale in surfacing fitness trends. For example, from a positive review:

 E1, a professional strength and nutrition coach, works with clients trying to lose weight and clients trying to add muscle mass for specific athletic activities.  E2, a dietician whose practice includes both athletes and non-athletes dealing with body weight issues. She is also an author of two cookbooks on healthy eating.

However, body weight fluctuates throughout the day and week. With this scale, I've found myself weighing myself several times per day and looking at my data over a week or month, clear trend lines can be seen despite the daily fluctuations. Ultimately, this is the reason that I bought the scale and makes me very happy.

 E3, an osteopathic physician who works in a family medical practice and focuses on weight loss issues. He works in a low-income area with high rates of obesity.  E4, the author of popular books and a blog on nutrition practices and a practicing fitness and nutrition coach. He primarily works with clients looking to lose weight.

This reviewer accepts fluctuations in the data, reasoning that the overall trend is more important. In contrast, negative reviews were more likely to quantify the perceived precision of a device and then express a desire for more consistent readings (2% of positive reviews, 26% of negative reviews), either within the device or as compared to other devices; for example (from a negative review):

We conducted a semi-structured interview with each expert, focusing on their background, perceptions of scales, how scales fit into their practice, and their clients’ perceptions of weight and scales. We used affinity diagramming of transcripts to identify high-level themes, discussed below.

The weight ranges +/- 1.5 lbs each time you use it. So let's say you weight [sic] 150 on the scale at your doctor's office. you can expect your reading to be anywhere between 148.5 to 151.5 when using this scale. […] I can't rationalize keeping a $150+ scale that just isn't accurate.

Results Scales Can Reinforce Inappropriate Goals.

E1 and E2 both stressed that while weight is important, it is not always a complete picture of clients’ progress toward fitness goals. E1 noted that many people do not make the connection that body composition is often more important than weight and that “there’s people that completely change their body composition and stay the same weight.” E2 also noted that people use weight as an “inappropriate goal”. One of her clients was “hung up” because she couldn’t get to 125 lbs, even though in photos she clearly had a lean body composition. E2 stated that a specific weight—as a number—is often “such an identity for people”, and that people are “not so obsessed with your shoe size”. E4 called these “assumed” numbers: “a lot of people decide on a number at the beginning that they think they will look good

Consumers’ expectations for the accuracy and reliability of scales seem to vary depending on their model of use. Those with an understanding of or a focus on trends seem more willing to tolerate noisy data, so long as they can establish a baseline from which to observe change. By contrast, those who gave negative reviews were more likely to focus on perceived noise in the data, even if the magnitude of that noise was similar to that reported in positive reviews. Vocabulary and Terminology

In total, 68 of the 100 reviews we coded discussed issues around accuracy, precision, or uncertainty. To get a sense of the vocabulary used to express these concepts, we counted

403

Session: Domestic Computing

UbiComp’13, September 8–12, 2013, Zurich, Switzerland

at”. These issues were reflected in how E1, E2, and E4 use weight with clients: as one measure amongst several, including body fat calipers (E1 and E2) and circumference measures (E1, E2, and E4), e.g. waist or shoulder circumference. E4 noted, “weight is an excellent tool when used in combination with other metrics”.

day: “They really start to connect to how certain behaviors and food choices affect data”, but noted that while some people get excited by connecting data to behaviors or conducting self-experiments, there is a personality split: this sort of tracking works more for people who have “a bias towards data”, a split also noted by our other experts. Finally, E4 stated, “the place where I like it [the scale] is, after getting to a good point, understanding what a healthy weight range is.” He described scales as particularly valuable for supporting weight maintenance among people who have lost weight: once people get to a steady weight and establish a healthy weight range, they can see when weight gets to “an amount outside of a comfortable zone” then adjust their behavior. In general, our experts cast the best use of weight as an indicator of a trend rather than as an absolute value; as E4 said: “We only really want to know: would that line be ‘kind of going down’ or ‘kind of going up’”.

Emotional Connection

E2’s observation that weight can act as an “identity” for people reflects a broader theme of emotional connections to scales and weight that pervaded our discussions with experts. E1, E2, and E3 discussed how they must tailor their recommendations to clients, depending on how comfortable they estimate each client will be with regular weighing. E1 noted that weighing daily would drive most people “batty”; “they have an emotional experience… they see numbers and it’s not what they expect”; and that weight can move “wildly” for some clients; e.g., simply by changing the proportion of carbohydrates in one’s diet, a person might see a change of 5–8 lbs. E1 described one client:

Education and Rationale are Essential

E2 and E3 both emphasized the importance of educating clients to help them understand weight changes. E3 noted that “a third to a half of a visit” typically consists of providing background information—for example, if a client gains a couple of pounds, E3 has to explain that it is probably water. E2 echoed this sentiment when talking about client compliance: “Mandates don’t work. When you explain why, you get better compliance”. All experts discussed the need to explain potential sources of weight fluctuation to clients as a way to allay their concerns about small changes in weight. These practices suggest that perhaps approaches to conveying intelligibility—particularly rationales or explanations of why data looks as it does [14]—may have strong impact in the weight space.

There was a fellow that was ignoring the other measures [he only looked at weight]… He was trying to lose weight, and he gained a pound. He was blaming external forces, he was venting: “This isn’t working!”… I pointed out, “Well, you lost a few inches off your waistline.” It was a very emotional reaction from a level-headed guy. Overreaction to Fluctuations

E2 noted that people react “out of proportion” to small changes in weight of 1–2 lbs and they “extrapolate forward in their minds”. She described clients as getting “the horrors” when they feel like their weight moves in an undesirable direction. E4 noted people can get “kind of crazy” and tend to think of small weight changes as absolute instead of transient. He has to tell them: “let’s wait a day or two and see what happened”. He also noted a tendency for some people to weigh themselves at home and the gym and worry about differences of a pound or two without considering differences in the scales used. E1 and E3 both tailored their recommendations to their estimation of a patient’s ability to handle regular weighing; as E3 noted: “some people get bent out of shape if they weigh themselves every day”.

WEIGHT TRACKING STUDY

The results of the online reviews study and our expert interviews support our hypothesis that a significant number of consumers have misperceptions about scale accuracy and weight fluctuation. However, we cannot accurately assess people’s understanding of daily weight fluctuation without some standard against which to judge their perceptions. We were unable to find studies of within-day weight fluctuation in the literature (weight change is typically studied between days). Furthermore, consultations with physicians and dieticians suggested such data could help them allay clients’ concerns, but they were not aware of any studies that had collected it. To begin to fill this gap in the literature, we devised a study to gather data on within-day weight fluctuation. We specifically sought to answer two questions: 1) How much does a person’s weight typically vary during a single day? and 2) How much do weighing conditions like clothes or the scale used affect weight measurements? Both of these questions inform our hypotheses that single-point, context-free measurements overlook important aspects of weight management and that consumers place undue emphasis on numerical precision in weight measurements.

Regular Weighing Still Has Significant Value

Despite the potential issues with weighing our experts outlined, all of them considered it an important practice and recommended most clients weigh themselves about once a week. Recognizing the tendency for weight to fluctuate during the day from their own experience, they suggest clients weigh in at a consistent time of day and under similar conditions (e.g., just before breakfast) and typically once a week (E1 estimated daily fluctuations at 3–5 lbs, and E4 at 3–4 lbs, though neither were aware of studies measuring this fluctuation). At the same time, E1 noted the potential value of weighing more often: “if they can mentally take it, I tell them to go every day: you can see amazing trends.” He even described some clients who weigh multiple times a

404

Session: Domestic Computing

Component

UbiComp’13, September 8–12, 2013, Zurich, Switzerland

Effect (lbs)

SD F2,641 = 31.32

partially

0.85

0.30

t641 =  2.81

fully

2.17

0.28

t641 =  7.71

p 

Suggest Documents