Estimating User Interruptability using Contextual Parameters and Machine-Learning Algorithms

Estimating User Interruptability using Contextual Parameters and Machine-Learning Algorithms JOHAN HESSELBERG Master’s Degree Project Stockholm, Swe...
Author: Damon Gilbert
1 downloads 0 Views 603KB Size
Estimating User Interruptability using Contextual Parameters and Machine-Learning Algorithms

JOHAN HESSELBERG

Master’s Degree Project Stockholm, Sweden 2005

TRITA-NA-E05080

Numerisk analys och datalogi KTH 100 44 Stockholm

Department of Numerical Analysis and Computer Science Royal Institute of Technology SE-100 44 Stockholm, Sweden

Estimating User Interruptability using Contextual Parameters and Machine-Learning Algorithms

JOHAN HESSELBERG

TRITA-NA-E05080

Master’s Thesis in Computer Science (20 credits) at the School of Computer Science and Engineering, Royal Institute of Technology year 2005 Supervisor at Nada was Anders Lansner Examiner was Anders Lansner

Abstract As devices, computer applications, and services have become increasingly ubiquitous in diverse and ever-changing environments, a higher demand has been put on the user to manage and use them efficiently. To facilitate these tasks, we can introduce awareness of some areas of user context into these devices and let them do more of the work for us. However, at this time, no good understanding of what aspects of our context are useful to infer information about us exists. Such understanding is essential in order for the awareness to lead to appropriate action. This Master’s thesis takes one step towards a better understanding of one context - user information relationship; the relationship between context and interruptability. To explore this relationship, a prototype information system was implemented to continuously sample context and activity data from different sources, including a GPS receiver, a microphone, a sensor for measuring body-metrics, and SMTP and IMAP message traffic. The prototype system was then used in a one-person user study, where context data together with the interruptability of the user was collected over a two-week period. Immediate results originating from the study included the identification of issues related to the usability of the information system. These results showed that the system, with its current hardware, was not well suited for continuous use over extended periods of time. More interesting results were uncovered in the analysis of the context data that had been collected during the study. Four machine-learning algorithms (1R, C4.5, KNN and the Bayesian network) were chosen as analyzation tools and their prediction performance on the data were measured. The prediction performance of the algorithms showed that some of the contextual parameters provide a very good foundation for estimating user interruptability. In particular, the time (the hour and the day of the week) and user motion proved valuable in the achievement of accurate predictions. The report concludes with some proposed methods for the optimization of the information system usability, followed by a discussion of the promising results of the analysis with an emphasis on what may have led to them.

Uppskattandet av användartillgänglighet med hjälp av kontextuella parametrar och maskininlärningsalgoritmer

En studie av relationen mellan kontext och användartillstånd genom implementeringen och evalueringen av ett informationssystem

Sammanfattning Med ökningen och spridningen av apparater, datorapplikationer och tjänster i olikartade och alltid föränderliga miljöer, har ett större krav satts på användaren vad gäller dennes förmåga att handha och använda dem effektivt. Genom att introducera medvetenhet av en del av användarens kontext i dessa tjänster, applikationer och apparater, kan de underlätta handhavande och användning genom att utföra en del av arbetet själva. Det finns dock, för närvarande, ingen god förståelse av vilka aspekter av vår kontext som är användbar till att förutsäga information om oss, en förståelse som är grundläggande om medvetenhet ska leda till handling. Denna exjobbsrapport tar ett steg mot en bättre förståelse av ett samband mellan kontext och användarinformation, den mellan kontext och tillgänglighet. För att undersöka detta samband, byggdes ett informationssystem som regelbundet samplade kontext- och aktivitetsdata från olika källor, inklusive en GPS mottagare, en mikrofon, en sensor för mätning av biometrik, och SMTP och IMAP meddelandetrafik. Prototypsystemet användes sedan i en enpersonsstudie där kontextdata tillsammans med användarens tillgänglighet samlades in under två veckor. Omedelbara resultat från denna studie var identifieringen av problem relaterade till systemets användarvänlighet, vilka visade att det, med sin nuvarande hårdvara, inte var väl lämpat för ett regelbundet användande över längre tidsperioder. Resultat av större intresse kom från analysen av kontextdatan som hade samlats in. Fyra maskininlärningsalgoritmer (1R, C4.5, KNN och det Bayesiska nätverket) valdes som analysverktyg och deras förusägelseprestanda av tillgängligheten på kontexdatan mättes. Algoritmernas förutsägelseförmåga visade att vissa kontextuella parametrar ger en väldigt god grund att basera estimeringen av tillgänglighet på. Speciellt visade det sig att tiden (timme och veckodag) samt rörelse hos användaren var värdefulla för att erhålla en bra förutsägelse. Rapporten avslutas med några förslag på hur användarvänligheten av informationssystemet kan ökas, följt av en diskussion om de goda analysresultaten med en betoning på vad som kan ha orsakat dem.

Foreword This Master’s thesis details my Master’s project in Computer Science that I performed at the Department of Numerical Analysis and Computer Science (NADA) at the Royal Institute of Technology in Stockholm, Sweden, during the period August 2004 - January 2005. The research on which this report is based was undertaken on behalf of, and initiated by the Swedish Institute of Computer Science (SICS) in Uppsala, as part of ongoing research in context-awareness. The project commissioner and supervisor at SICS was Ph. Lic. Markus Bylund.

Contents 1 Introduction 1.1 Purpose . . . . . . . 1.2 Problem description 1.3 Method . . . . . . . 1.4 Report disposition .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 2 2 2 3

2 Background and related work 2.1 The context construct . . . . . . . . . . . . . . . 2.2 The problem with ambiguity . . . . . . . . . . . . 2.3 Privacy concerns . . . . . . . . . . . . . . . . . . 2.4 Methods for studying context . . . . . . . . . . . 2.4.1 Prototyping . . . . . . . . . . . . . . . . . 2.4.2 Ethnomethodological studies . . . . . . . 2.5 Architectures and design for context management 2.5.1 The widget model . . . . . . . . . . . . . 2.5.2 The infrastructure approach . . . . . . . . 2.5.3 The blackboard architecture . . . . . . . . 2.5.4 Further design issues . . . . . . . . . . . . 2.6 Concerns with digitally representing context . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

4 4 6 7 8 8 8 9 9 10 11 12 13

. . . . . . . . . . . . .

14 15 15 15 16 16 16 18 18 18 18 19 20 20

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

3 The information system 3.1 Requirements . . . . . . . . . . . . 3.2 Non-functional concerns . . . . . . 3.2.1 Usability and intelligibility . 3.2.2 Accountability . . . . . . . 3.2.3 Privacy . . . . . . . . . . . 3.3 A conceptual model . . . . . . . . 3.4 Hardware components . . . . . . . 3.4.1 Mobile PC . . . . . . . . . . 3.4.2 Body-metric sensors . . . . 3.4.3 GPS receiver . . . . . . . . 3.4.4 Mobile phone . . . . . . . . 3.4.5 Microphone . . . . . . . . . 3.5 Implementation . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

3.6 4 The 4.1 4.2 4.3

3.5.1 Discoverers . 3.5.2 Interpreters . 3.5.3 Widgets . . . 3.5.4 Servers . . . . 3.5.5 Translators . Technical difficulties

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

21 21 21 23 24 24

study Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27 27 28 31

5 Analysis and results 5.1 Semantical issues . . . . . . . . . . . . 5.1.1 Storage format . . . . . . . . . 5.1.2 Parameter selection . . . . . . . 5.1.3 Constructed parameters . . . . 5.1.4 Ambiguities in position . . . . 5.1.5 Momentous events . . . . . . . 5.1.6 Many-to-one relationships . . . 5.2 Preconditions . . . . . . . . . . . . . . 5.3 Machine-learning algorithms . . . . . . 5.3.1 Classification rule (1R) . . . . . 5.3.2 Decision tree (C4.5) . . . . . . 5.3.3 K-nearest neighbors (KNN) . . 5.3.4 Bayesian network . . . . . . . . 5.4 The sampled data . . . . . . . . . . . . 5.5 Results . . . . . . . . . . . . . . . . . . 5.5.1 Prediction accuracy . . . . . . 5.5.2 Essential contextual parameters

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . . . . . . .

33 33 33 34 34 35 35 35 35 36 36 37 37 38 38 38 39 39

6 Conclusions and recommendations 6.1 Proposals for the information system . . . . . . . . . . . . . . . . . . 6.2 Conclusions from the data analysis . . . . . . . . . . . . . . . . . . . 6.3 Report summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41 41 42 44

References

45

A Detailed prediction results A.1 Accuracy measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Without body-metrics . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Using body-metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49 49 50 51

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

Chapter 1

Introduction The typical desktop computer user is no more. Instead, computers have become more and more ubiquitous in our environment, some even following us around wherever we go. The computing devices of today live in all kinds of appliances and utilities, such as refrigerators, elevators and cars, as well as in personal technology we carry around, such as PDAs, cell phones and watches. These devices are diverse in shape and intended use, as well as in the types of interfaces they offer. They are also used in a number of different environments and ways. All this can be seen as a step towards Mark Weiser’s vision of “the third wave of computing” where every person would utilize many different computers that would be ubiquitously embedded in the environment and used without requiring our direct attention[23]. However, even though computing devices are becoming more and more prevalent, and used in new situations not suitable for traditional desktop PCs, they still lack abilities for operating successfully in highly dynamic environments. Most notably, they lack awareness of the situation and environment in which they perform their tasks. By being aware of the context and activity of the user, applications and systems can positively affect the user experience by, to some extent, perform actions on the behalf of the user. Research in the area of context-awareness is aimed at achieving systems and services that can adapt to factors not part of the internal state of the device. Unfortunately, we do not currently know what kind of contextual information is good for describing and predicting activity and needs of a user. Nor do we know of a good way of finding or deducing this information. One way of getting to terms with this, is to explore the possible methods of collecting contextual data and estimate their value by determining which methods yield a solid basis for establishing relationships between context and information about the user. This report explores one possible method.

1

1.1

Purpose

The purpose of the project was to evaluate a method for finding useful contextual parameters in a limited domain that can predict user actions and needs within that domain. The method in question was to implement and use an information system with the ability to acquire, aggregate and store context and activity data from several different sources. The suitability of the method would then be determined based on the usability of the system and the quality of the data it collected.

1.2

Problem description

The problem was to address the lack of knowledge about which contextual parameters (also referred to as attributes) and what representations of these are useful in inferring information about the user or the context of the user. This information could potentially describe the needs, activity or the state of the user1 . Together with the purpose of the project (previous section), this led to the following statement: Problem statement: Can an information system be a useful tool for finding good contextual parameters, such as they can be used to infer information about the user or the context of the user? If so, which contextual parameters (of those evaluated) are important for this deduction? In order to learn more about this, an information system had to be implemented. The system had to have the ability to collect context and activity information from a number of sources (devices, sensors, software etc.) and be able to store this information in a format that facilitated analysis. The system should then be used and evaluated for its efficacy in determining information about the user through 1. analyses of the data collected by the system to determine what dimensions of the data are most important and 2. observations on how seamlessly (with the other activities of the user) the actual data collection was carried out. In particular the analyses should then be used as a foundation for iteratively improving the system further.

1.3

Method

The design, implementation and evaluation of the information system described later in this report have been conducted using an iterative development and analyzation process. Furthermore, it has been stressed that the system should (when possible) 1

Exactly what kind of information that was of interest was not determined until later.

2

be evaluated on the system level, taking the performance of the system as a whole into account. The process has included four steps that all could affect how to proceed, i.e., move to the next step or to any of the previous ones. These four steps were to: 1. Perform conceptual studies based on current and related literature. 2. Identify (or revise) the requirements of the information system and create a conceptual model adhering to the requirements. 3. Implement a prototype and perform function testing. 4. Evaluate the prototype (through usage) with regard to some aspect. Results from each of the above steps are described in this report, including problems that were encountered during the process and the results from the evaluation reached after the final iteration.

1.4

Report disposition

Chapter 2 provides a background on context-awareness and in particular of relevant and related work to this project. Chapter 3 details the information system that was a constitutive part of the project for reaching the conclusions presented in this report. The chapter describes the implementation of the system as well as the hardware it uses for collecting contextual data. Chapter 4 outlines the study that was performed using the system, how the system was used, and some usage issues that were identified. Chapter 5 describes several issues related to data representation and interpretation that was discovered during the development iterations. Also presented, are the results from the analysis of the context data collected during the study, and the machine-learning algorithms that were used to estimate the user interruptability. Chapter 6 concludes the report by proposing some improvements for the information system and by discussing the prediction results from the previous chapter. Finally, a short report summary ends this Master’s thesis.

3

Chapter 2

Background and related work The field of context-awareness, when talking about information systems, has developed together with what is called ubiquitous computing. Ubiquitous computing is according to Mark Weiser the third wave of computing[23]. The first wave was the mainframe era, when many users shared each computer system. As the second wave took over, there was a relative balance of users to computers, i.e., each computer had one user. The current wave, and what Weiser foresaw, is an era in which computer systems are becoming ubiquitous and where we all interact with hundreds of different computer devices on a daily basis. While ubiquitous computing aims at always providing computational power, applications and services to a user, the field of context-awareness attempts to assist the user in seeking advice from these services and applications. This includes finding the ones the user needs for a particular situation. However, the drive to embed awareness in our environment has encountered a number of difficulties, including concerns about user privacy, resulting in a sober view of how context-aware applications and services should interact with their users. This chapter covers some of these difficulties and concerns. It also describes previous work in the area of context-awareness related to this project as well as introduces some terminology that are used later in this report, but first, let us define context.

2.1

The context construct

In computing the term context carries a very general meaning, thus much has been written about context in the attempt to define, divide, categorize and enumerate its significance. The reason is that without a common definition of context, we will have a hard time comparing research and work across different communities, each with their own understanding about context.[25] Also to be able to say that our computing devices and environment have achieved context-awareness, we need something to evaluate them against, thus requiring us to define a common terminology for what context is and what it is not. One attempt, 4

which has been a widely referenced operational definition, is Dey’s, Salber’s and Abowd’s, which states: Context: any information that can be used to characterize the situation of entities (i.e. whether a person, place or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves. Context is typically the location, identity and state of people, groups and computational and physical objects.[5] Even though this definition includes most of what a layperson normally would call context, other views of context have also been proposed. Lucas argues that there is not one kind of context, but that one should categorize it into three different types to ease analysis[15]: • The physical context, which is in reference to physical objects, for example: position, location, and proximity. • The device context, which denotes the state of devices and their interrelationship with other devices. • The information context, which is the domain where computation takes place, meaning that (digital) information is independent of the computers, networks and storage media that house them. In other words, Lucas’ view differs with Dey et al.’s definition in that he makes a distinction between the states of objects and information related or contained within them. Others oppose the above views on the grounds that they interpret context as something static that can be specified and enumerated beforehand. Greenberg argues that context, instead, is a dynamic, ever-changing, construct, that avoids being categorized a priori in any but the simplest cases[9]. The reasons are threefold: Firstly, human actions are a mix of prepared plans and momentous actions resulting from the current situation. Secondly, activity comprises a subject (the performer), an object (the motivation) and operations on these entities; this activity defines the context. The activity changes in response to changing conditions, thereby making the context ever-changing. Thirdly, people belong to one or more locales, or social worlds, at any given time. A locale denotes a context, not necessarily physical, that is relevant to the user at the current time. A user can belong to many locales at the same time and selectively attend to them, making the context change with the changes of locale. A somewhat different view is described by Philip Agre, who states that context is the union of two aspects, an architectural and an institutional[2]. The institutional aspect denotes the persistent types of human relationships, or the set of social roles and rules practiced within a relationship, e.g., languages, the exchange of currency or a system of law. It can also be said to include practices, or the set of routines 5

that people in a community have evolved for doing things, such as greeting an acquaintance or going to a social function. The architectural aspect, on the other hand, denotes the physical environment that has been built by man and that people occupy, i.e., structures such as buildings, doors, cars or keyboards etc. Evidently, what we mean by context and what aspects of it that can be captured are up for debate. However, to resolve the ambiguity with context for this report, Dey et al.’s definition, as quoted above, will be used for the rest of this paper.

2.2

The problem with ambiguity

It is not only difficult to define context, but even when we have a definition there are still inherent ambiguities in what we mean by certain contextual attributes. Take for instance the ambiguities inherent in any context data categorized according to Lucas’ three types of context[15] (defined in the previous section)1 . First, we have ambiguities in the physical context. Position can be determined by a number of different coordinate systems each optimized for some specific use, so we need to know which one is used. Moreover, how we specify location is open to interpretations. For instance what do we mean when we use “Stockholm” as a location designate? The capital of Sweden, the county it lies in, the tiny village near Olofström in the southern Sweden, or maybe one of the at least six places in the US called Stockholm? If we were to use position and location concurrently, the problem remains: What exact point would we say specifies Stockholm? Secondly, there are ambiguities in the device context, particularly in relation to device discovery—how do we find nearby devices to communicate with? Proximity does not necessarily denote the physical one. Two devices sitting 1 meter apart that do not share a common communication channel are just as distant from each other as two devices on the opposite side of the globe. However, if they shared a gigabit Ethernet-link using a mutually known protocol they would consider “themselves” to be in closer to each other. Furthermore, how do we specify the device we want to communicate with among potentially hundreds of other devices? Using, for instance, a common name cannot really be considered a unique enough identifier to denote unambiguously the specific device in an ad-hoc network. What would enforce the uniqueness of the name of all devices when they do not consistently share the same device context all the time? On the other hand, the unique device serial number is too specific and does not really convey the right information, e.g., my mobile phone. What if I have more than one? Finally, information replication in the current information technology era has only increased the difficulty of specifying which information we are referring to. Pages on the Internet are updated and replicated continually on, for instance, mirroring servers, in client caches and as printed documents. How do we specify the 1

As described in this section, all three categories hold uncertainty; however, this is not due to the categorization, but is a property of the information itself. Therefore, we could as well have used Dey et al.’s definition, but Lucas’ categorization makes some of the elements of context more explicit.

6

exact version? In addition, identical information might have different purposes; a copy of a text document might have been created for a number of reasons: off-line edit, for a backup or to serve as an inspiration for a new text. This inevitably introduces more ambiguity. All in all, this requires some thought about the “design” of the contextual attributes of the information system of this project in order to decrease the overall ambiguity introduced into the system. Nevertheless, due to the scope of ambiguity we cannot expect to eliminate it entirely.

2.3

Privacy concerns

Capturing context inevitably introduces privacy concerns, in particular for those whose contexts are captured. Paparazzi photographers, surveillance cameras, web browser caches and credit card receipts are just a few examples of entities that contain or provide captured context data. This can result in desituated action, by which the current action of a user is recorded and, possibly, made globally accessible. This may have far-reaching consequences for anyone by coming back and haunt us tomorrow[10]. Giving our environment a more extensive context-awareness than it has today, a step which is necessary if we are to achieve the goal of increasing user experience by having our devices perform tasks for us, only increases these concerns. Questions related to privacy and relevant to any individual are: What elements of my current context are captured and is it occurring at this instant? How do I know what the captured context data will be used for and who has access to it? Ackerman observed that these questions are in turn bound up with control, i.e., “who controls what information as well as the applications and systems that construct and disseminate that information”[1]. As a step to ensure privacy, he proposes that context-aware applications should adhere to the US Federal Trade Commission’s privacy guidelines, which are: • Notice: The individual should know what type of information is collected and used, and what third parties other than the collectors have access to it. • Choice: The individual should be able to choose not to have data collected. • Access: The individual should have access to data collected about him, and to be able to correct information and delete the information he wants. • Security: Measures should be taken to secure collected data from unauthorized access. Still, these recommendations do not solve the underlying issue: one person’s context-awareness is another person’s lack of privacy. For this work, the discussion of privacy results in questions about how the system will be used and what emphasis its use will put on privacy. A motivation for design decisions affecting privacy will therefore be provided (see section 3.2.3). 7

2.4

Methods for studying context

Current research methods aimed at studying the relationship between context and activity can currently be categorized into two different types: prototyping and ethnomethodological studies.

2.4.1

Prototyping

Prototyping is fundamentally the design and assembly of a working model of a system or application in order to discover design and implementation flaws as well as the suitability of the system for its intended purpose. A number of works in or related to context-awareness using this method have been published. One of the first was by Mark Weiser, who in the early 90s together with researchers at Xerox, built three location-aware devices of different sizes to aid in note-taking[24]. The prototyping uncovered a number of issues that had to be resolved and that new (not yet available at the time) types of components were necessary for the devices themselves to work properly. The explicit results from this study are not of great interest to this report, but Weiser showed the feasibility of using prototyping to uncover difficulties with ubiquitous and location-aware devices for personal use. More recent works include: • SenSay, a context-aware mobile phone that through data from sensors (bodymetric as well as environmental) and the electronic calendar of the user tries to determine the state of the user and act appropriately (e.g. turning on the signal or converting the ring tone to vibrate).[21] • Coordinate, a service for predicting user availability. The service infers availability by creating a user model from past desktop activity together with user annotations on interruptability for past events in the calendar of the user. The model then uses data about currently scheduled meetings and events to offer a prediction of the current availability.[14] • Awarenex, an activity aware multi-platform instant messenger application. The application uses the current and recorded data about computer, e-mail, IM and phone activity of an individual to present the current activity of the user (e.g. on the phone or at the computer) and also idle time (time of no activity) to other users.[22]

2.4.2

Ethnomethodological studies

The other type of research employed for studying context uses ethnomethodological studies. Ethnomethodology is, according to Dourish, concerned with social action in general, and its approach to examining how orderliness of action arises from within has been applied to a wide range of forms of social action, from the conduct of science to crossing the street.[7] 8

These studies are either performed through passive or participating observations.

2.5

Architectures and design for context management

Even though context-aware systems are in their infancy, work on prototype implementations and identification of design issues for context-aware systems has been conducted. At least three different types of conceptual models for collecting, using and reacting to variations in context have been proposed: The widget model, the infrastructure approach and the blackboard architecture. Common to all of them is that each model inherently supports distribution, because context-aware applications often require being aware of a number of places and of what occurs at each place.

2.5.1

The widget model

A widget based model is described by Dey, Salber, and Abowd in their paper A conceptual framework and a toolkit for supporting the rapid prototyping of contextaware applications[5]. The model uses an analogy to graphical user interfaces and the widgets that compose them, where each widget has a specific task, e.g., for a GUI-button, notify the application that it has been pressed. Widgets have a similar task in the widget model: To notify applications or other components of changes in context or provide the latest context data when queried. This gives a very noticeable degree of separation of concerns—the application developer does not need to be concerned with how the context data was gathered. In addition to the widgets, the model also includes components for context interpretation, aggregation, and service execution. The complete model includes the following components: • Context widgets, which capture raw context data through interaction with sensors and services in the environment. • Interpreters, which provide functions for transforming context captured by one or more widgets into another more useful abstraction. • Aggregators, which collect related information from the widgets and makes it available to applications. • Services, which execute actions on behalf of applications. • Discoverers, which store information about the other components, making it possible for them to find one another by querying a discoverer. Dey et al. have also implemented the model in the programming language Java to aid developers of context-aware applications. This work produced The Context Toolkit, a freely available open source Java API. The Java implementation follows the conceptual model except on two points: aggregators are called servers and the 9

context widgets include the functionality of services, i.e., there are no stand-alone services in the implementation.

2.5.2

The infrastructure approach

Another way of integrating components for achieving context-awareness has been proposed by Hong and Landay. Hong and Landay argues that a service infrastructure comprised of devices and sensors placed throughout the environment, which applications can connect to and access data, has a number of advantages over the widget model[12]. The advantages would be threefold: 1. Independence from operating systems, programming languages and hardware. 2. Improved capabilities for maintenance and evolution. You can remove and replace sensors without affecting the services and applications that use them. (Dey et al.’s The Context Toolkit has some of this, but may need one more layer between sensors and widgets to eliminate the one-to-one mapping between sensors and widgets.) 3. Sharing of sensors, processing power, data and services. Devices can be made simpler because they do not need to contain top-performance processors or large storage media etc. Using an infrastructure evidently has its pros as compared to running services on stand-alone devices. To build a working infrastructure, however, a number of problems needs to be tackled: • Define standard data formats and protocols. Should be rich enough represent most sensor data, but simple enough so any device can use them. • Design services. Here Hong and Landay propose two important backbone services applications that could be useful: Automatic path creation and proximity based discovery. Automatic path creation refers to the ability of the infrastructure to be able to pipeline services automatically to serve a request, which could be accessed from an application via a query language. Proximity based discovery refers to a method by which applications can recognize what sensors are nearby. Problems are logistical, representational (how to describe proximity) and storage. • Delegate responsibilities. What is the best balance of tasks between the infrastructure and the devices. • Determine the access and scope of sensor data. Who should have access to data and how can one ensure that only the right people get access. Furthermore, there should be a way for individuals to retrieve information regarding the data stored about them. 10

• Scaling. The number of sensors, services and devices to be used may pose some great engineering challenges to the building of the infrastructure. Due to the immense effort required for the building of the infrastructure, a full-scale implementation of this approach will not be feasible in the near future. Nevertheless, it is an interesting approach that possibly could be implemented in a very limited environment, e.g., labs.

2.5.3

The blackboard architecture

The blackboard architecture, proposed by Winograd for use in context-aware services, uses a data-centric instead of process-centric point of view (which is the case in the widget and infrastructure models)[25]. This architecture has the processes posting messages to a common shared “board” (residing on a central server) rather than directly sending requests to components that then communicates back via callbacks. Other processes wanting this data provide message patterns to the board that describe what kind of information they are interested in. The board then returns matching messages, i.e., when messages matching the pattern arrive to the blackboard, it forwards them to those components that have a matching pattern. Obviously, this model has both advantages and disadvantages compared to one or both of the other conceptual models. Advantages according to Winograd include: 1. Robustness resulting from the loose coupling between components (an exception is the blackboard itself, see below) lets the system continue executing even if one component has gone down. This is also true for the infrastructure model, but not so for the widget model. 2. Configurability, or ease of configuration, also follows from the loose coupling between components and allows the addition or removal of subscriptions and components without affecting the system in such a way that rebooting is necessary. In contrast, the widget model uses a tighter coupling between its components, which introduces a more complex configuration. 3. Simplicity originating from the uniformness of the communication paths in the model. This makes the blackboard approach in essence simpler than the other two whose communications include multiple types of messages, e.g., request and callback communication pairs. A major disadvantage with the model is its dependence on the blackboard service. The blackboard resides on a central server and all messages pass thorough it, why the robustness of the system is highly reliant on the robustness of the blackboard itself. Another caveat is the message passing efficiency, which, due to the communication with the blackboard and pattern matching at the board, is not as fast as a peerto-peer communication would be. However, Winograd argues that “given today’s networking and processor speeds, efficiency is not the bottleneck in many cases”[25]. 11

2.5.4

Further design issues

Besides the above architecture proposals for building context-aware systems, general design practices on context-aware application have been brought up. These practices are partially based on the fact that it is very unlikely that one can represent human and social aspects of context in such a way that machines can take (correct) autonomous action on the behalf of a user. Also, when a machine does take action, someone has to be responsible for whatever events are caused by it. Hence, it is essential that the application, device, or machine give the user insight into its state and its assumptions about the current activity and context. According to Belotti and Edwards the sought for properties are[3]: • Intelligibility – systems should be able to inform the user what they know, how they know it and what they are doing about it. • Accountability – systems must enforce user accountability when they try to mediate user actions that affect others. To help designers integrate these two properties into their system, Belotti and Edwards gives four design principles that a system should follow: 1. The system should inform the user of current system capabilities and understandings. For instance, what kind of situation does the system think the user is in. 2. The system should provide feedback, such as feed-forward, in-process feedback and confirmation. The user should also receive information about whether the system capture context info about him, what will happen to captured information, who has access to it and for what purposes the information will be used. 3. The system should enforce identity and action disclosure, so users will be accountable for their own actions as well as be able to perceive actions and intentions of others. 4. The system should provide control to the user and easily override the decisions of the system if they are wrong. Taking the above principles into regard with the intended use of the system of this project, we can at least say that the property of intelligibility will be something to strive for in order to allow the user to apply appropriate actions on the system when needed. The property of accountability might be less significant due to the single user scope of the system; this will be discussed in section 3.2.2. 12

2.6

Concerns with digitally representing context

Whatever design methodologies and practices we use when we implement a prototype system for context management, there will always be some questions regarding the completeness, storage and collection of the data itself. Grudin identified a number of those issues that arise when we try to capture and use context digitally; something which it might not be suited for[10]: • By capturing context, we remove it from its context; the context that is not captured. (There is no way of capturing it all.) • Aggregation and interpretation performed by software differs with when it is performed by biological, social and psychological processes. • Digital information reaching a network may appear anywhere on the planet anytime in the future (this was called desituated action in section 2.3). That is, anything that is captured instantly achieve potential immortality while the contextual information that is not captured will be cut of from the part being captured and probably never again be recalled. • The ease of capturing some context compared to other context attributes can profoundly change how important we think some context information is. Context information that is captured cheaply/easily will dominate the digital representation and may cause us to fail to realize that other important aspects of context are missing. • The wish to increase efficiency by capturing more context that services and applications can use, may increase the need for capturing even more context. (There will always be occasions when the captured context is insufficient to disambiguate situations from another.) In other words, the use of systems to perform the task of collecting and interpreting context data for us comes at a price that is hard to assess the explicit consequences of.

13

Chapter 3

The information system The information system is the core of this Master’s project—it was what the completion of the study of contextual parameters as well as the suitability of an information system for studying the same was dependent on. In other words with a poorly implemented system the results from this work would probably prove to be of marginal value at best. Unfortunately, implementing a good context-aware system is a nontrivial task. To cite Agre on the topic: "Context" is such an all-embracing term that it’s easy to underestimate the problem of designing a computational device that could be "aware" of it. Some aspects of context are simple ambient parameters of physics — such as temperature or noise levels — and in these cases the matter is not so difficult. Most aspects of context; however, are defined to some extent by the institutions that structure both the ongoing activity and the social relations within which the activity is embedded. [. . . ] A device that cannot participate in this work of social construction will be incapable of registering the most basic aspects of "context".[2] Much can therefore be said about how well the design and implementation of the system presented on the following pages fulfill its goals; since it only registers physical and ambient parameters, it misses a big and important part of the context according to Agre. Further, any of the assumptions that lay as a foundation for this work can and should be criticized in the light of good software design and design of aware systems in particular. However, this is not a conclusive and final investigation of the design and implementation of context-aware systems, but is instead meant to determine whether the quality of this implementation is good enough for coming to any, hopefully, interesting results about its suitability for telling us things about context. This chapter gives a detailed overview of the system and its original requirements from a conceptual model to actual implementation. Discussed are also hardware or device dependencies. Finally, difficulties that were identified during this work are presented with their found solutions, if any. 14

3.1

Requirements

In order to acquire the data needed for performing good data analysis on context information a number of functional requirements was set up. These included the ability to 1. continuously collect context information from a beforehand not specified number of sensors and devices (i.e. it should be possible to add or remove sensors to the system without changing the implementation of the system); 2. continuously collect information about the activity of a user; 3. add context and activity information from logs; 4. extract, save and present information in a way that supports further analysis. From the requirements we can see that the system itself only needed to be aware of (some) of its context. In other words, in contrast to the systems described in section 2.4.1 this system was not required to be able to act on its environment; doing so is the role of other applications that use the estimations this system infers.

3.2

Non-functional concerns

In addition to the above functional requirements, some non-functional concerns needed to be discussed. These originated partially from the material described in chapter 2.

3.2.1

Usability and intelligibility

Usability was not deemed a major concern for this system, since would be a prototype only used by a limited number of people. Therefore, it was acceptable if a majority of the components that needed user intervention for, e.g., start-up, were started by supplying an argument list to the runtime environment. Furthermore, it seemed important that most components should not require user intervention to operate, but should instead be services running independently to the greatest extent possible. Nevertheless, components at higher layers within the architecture could require, due to more functionality, direct user control in order for the user to change some aspects of their behavior easily (so as not to require reprogramming for each task they should handle). This control mechanism should be readily accessible and require a minimum of effort on the user side. In addition, to provide a way of control, essential is also the property of intelligibility. Trivially the user will not be able to make good control decisions using the control mechanism without being informed about the status of the system. Likely was the need for some kind of simple GUI to fulfill these necessities; although no attempt to create a user interface suitable for the broader public would be attempted. The reason for this decision is the difficulty in determining good user centered design of a UI without a thorough understanding of the users[10]. 15

3.2.2

Accountability

Since the system was not required to be able to act on its environment, as mentioned above, the issue with accountability would be a minor one. In other words, since it would not be able to take actions that affected anything else besides its own behavior, which only included monitoring abilities, it would not affect other peers. Of course, it could be argued that the mere presence of the system could have an affect on other people that would require its user to be accountable for it. However, as we will see in the next sections, the system would run on a separate computer and be carried around, therefore making it very evident who was accountable for its presence.

3.2.3

Privacy

Since this system to the highest degree were to depend on contextual information collected about a user, methods of ensuring privacy or giving a motivation for not doing so, was relevant. After some thought, it seemed acceptable to not make any explicit attempts to ensure the privacy of the user, due to the fact that the system per se was intended for research of contextual parameters. Therefore, it was assumed that only properly informed individuals, who know and approve of the gathering of their contextual and activity data, would use the system. Nevertheless, one should not dismiss future requirements of privacy, and increasing privacy control should be a possible extension to this system.

3.3

A conceptual model

With the requirements as a starting point, a conceptual model was built and identified as adhering to the requirements. Figure 3.1 shows this model. The system got the name The Context Explorer. The figure depicts the system as an aggregation layer between data collectors and logs, and the inference procedure of a (user specific) behavior or activity model. The data collectors can be sensors, devices or infrastructural systems of any kind, including pulse meters, thermometers, accelerometers, positioning devices, etc. that collects contextual information about the user. It is also possible that sensors directly can infer the activity of a user (for instance a pressure sensor in the seat of your car sending data saying that something has a weight similar to you, together with information about the engine running, could provide data supporting that you are driving) in which case the system should be able to read activity data from these too. Another way of providing activity data is through an annotation device or similar, with which the user or a separate person, annotates the data with activity information. The infrastructural system on the other hand can often provide a user-activity pair together with a timestamp of the occurring action. For instance, a banking system tracking withdrawal and deposits will certainly be able to provide information 16

about the activity of a user, like “making a withdrawal from ATM Y at location Z at time XX:XX”. The data delivery to The Context Explorer does not need to take place in realtime, but could also be an I/O operation with a database that contains data for the user for some period. The data itself could be contextual, activity, or both. On the other end of the system, we have components that want to use the collected data in some way. This could be a graph tool used for displaying some dimensions for analysis, or more likely, a data-mining tool for automatically finding patterns in the actions of the user, and with some machine-learning algorithm create a behavioral model of the user.

Figure 3.1. System overview

17

3.4

Hardware components

As stated in section 3.1, the system needed to interact with various hardware components to retrieve its data. Determining exactly what data and therefore which sensors the system needed to later infer information about the user was the goal of the analysis (5) as described in section 1.2. Therefore, the sensors were chosen based on availability, cost and data overlap basis1 , i.e., using both a step counter and the body-metric sensor armband (described in section 3.4.2) was deemed redundant since the armband included a step counter.

3.4.1

Mobile PC

Since the system itself would be a software application, it needed hardware to run on. A Boser HS-1600 SBC (figure 3.2)—a highly portable computer that easily could be carried around by the user—was available at SICS and seemed appropriate. The mobile PC would run without interruption, allowing the system to fetch data from the sensor equipment and other components that provided activity or context data.

3.4.2

Body-metric sensors

To retrieve information about the physical state of the user, sensors for acquiring body-metrics are essential. For The Context Explorer a Body Media SenseWear® PRO2 armband was used, see figure 3.3. The armband, which is placed around the upper arm with the sensor itself on the triceps brachialis musculature, continuously collects body metrics such as heat flux, galvanic skin response, skin temperature, physical activity, number of steps taken and sleep/wake states[4]. It also provides the means for explicitly time stamping the beginning or the end of events by pressing a button on the sensor shell. The events can later be annotated with a label after the data has been transferred over to a computer.

3.4.3

GPS receiver

The natural choice for automatically determining the position of the user is a GPS receiver. A GPS receiver provides information such as the current latitude, longitude, altitude and speed. The major drawback with a GPS is its inability to provide location data indoors or in places where a large part of the sky is obscured. The impact of this was reduced 1

It can be argued that just choosing the readily available and cost effective data acquisition devices is just what other people (stating that the devices should provide sufficient data to help finding patterns in user behavior) have done previously. However, no such claim is made here, instead the devices will be used to see what data acquired by the devices is important for constructing useful models of the behavior of the user. It could be so that some of the data the sensors (and the other components) provide are superfluous or that even if they all are used together still do not give enough information to say anything that could be used to infer information about the user.

18

Figure 3.2. The Boser HS-1600 SBC (and a pen for size comparison)

Figure 3.3. The BodyMedia SenseWear® PRO2 armband[4]

by connecting the GPS to the mobile PC over a Bluetooth link, making it possible for the GPS to provide data indoors, by residing close to a window, even if the user was moving in places where no GPS coverage was available. (Position estimates with only a few meters resolution was not deemed necessary.) Figure 3.4 depicts the Bluetooth enabled GPS receiver that was used.

3.4.4

Mobile phone

Another alternative for providing position data is a mobile phone. The mobile network can determine the location of the mobile phone and then make this information available to a client over, for instance, HTTP. This is how location-based services based on the Ericsson’s Mobile Positioning System work [8]. 19

Figure 3.4. The GPS receiver

The major advantage of retrieving position data from the mobile network rather than from a GPS receiver is that it works indoors. (Since you have mobile network reception indoors, you can determine your position.) Unfortunately, the monetary cost of location-based services becomes high when used frequently for an extended time, since each location query usually incurs a fixed cost. Nevertheless, it was deemed probable that the benefits could outweigh the costs, which is why the phone was kept as a position determiner alternative to the GPS.

3.4.5

Microphone

A microphone can, trivially, be used to collect data about the current audio level. As such, it can provide clues about the current context that position and body metrics cannot. Furthermore, spoken words can also be used to input data into the system via a speech recognition engine running on the mobile PC, thereby being able serving as an annotation device. Hence, a PC-headset was used for audio acquisition. Another possible use for the microphone and the speech recognition engine is to extract key words during user conversations with other peers and provide more finely grained sound event data, compared to the audio level alone. This was not attempted in this work, but is a possible extension for future improvements.

3.5

Implementation

For the implementation of the information system any of the three different approaches described in section 2.5 could have been used. However, it came to be based on The Context Toolkit (see section 2.5.1). The reason was that Dey et al. had made their work open source that was readily available for download[6], making it (in theory) relatively simple to extend to include the requirements that had been established for The Context Explorer. The Context Explorer therefore came to feature the same type of components that are parts of The Context Toolkit. In other words: Interpreters, discoverers, widgets and servers for collecting, aggregating and interpreting contextual data. In addition, a fifth component type was included called the translator, which provided 20

a modular way of translating the attribute representation in The Context Toolkit to other formats. A component diagram of the final implementation of the various component types and their communication paths is shown in figure 3.5.

3.5.1

Discoverers

The discoverer has exactly the same function as in the Context Toolkit—keep record of the other components in the network, so widgets and servers can query it for the network locations (IP-address, port etc.) of the components, as well as other information.

3.5.2

Interpreters

The system has The Context Toolkit’s ability to use any number of interpreters that take pre-specified input data and output interpretations of them. However, only objects of type ContextAbstractionServer (see section 3.5.4) have built-in support for automatically making procedure calls to user-specified interpreters when widget data is acquired. The system implementation features three interpreters: • ITimestampToDateTime – takes a timestamp and returns a multi-attribute containing the corresponding minutes, hours, date, day of the week, month and year. • ITimestampToDateTimeString – takes a timestamp and returns a parseable string with the hour, minute, date, month and year. Currently, there is no support for using this string in any useful way, except ease the effort of viewing collected data and its time of acquisition. • IPositionToPlace – takes a position designate: latitude, longitude and an error estimate in meters, and uses this data to calculate the most probable place the position designate belong with. The name of the place is returned.

3.5.3

Widgets

Six widgets are also implemented. These collect the actual user information through interaction with various devices and services. All widgets store data they gather in their own database table, which allows later retrieval of historic context data. Most widgets also support callbacks that other widgets, or more often servers, can subscribe to in order to receive the latest context data. The widgets are: • WNmeaGPS – reads data from a GPS receiver and makes information such as latitude, longitude, altitude, error estimate, heading and speed available to subscribing components. 21

• WGsmPosition – polls a GSM positioning server for the position of a mobile phone with some specific phone number. It returns latitude, longitude and an error estimate to subscribing components. • WSenseWearLog – reads a comma separated file (.csv) that has been created from data collected by a BodyMedia SenseWear sensor and stores it in its database table. Other components can later fetch this data via remote procedure calls. • WMicSound – reads data from a SAPI compatible speech recognition engine, which in turn reads real-time audio data from a microphone, and establishes the current audio level and also a running average for a pre-specified interval. The interval also specifies how often the widget should notify its subscribers with new data.

Figure 3.5. The components of the system and their communication paths.

22

• WSpeechAnnotator – asks the user for vocal input, which then, with the help of a speech recognition engine, is used to create annotations. Vocal input requested if a pre-specified minimum query interval has passed and the widget, through subscriptions with other widgets, learns that some context elements have changed. • WImapCommunication – acts as a proxy between e-mail clients and the real IMAP-server. IMAP-commands sent via the widget are parsed to establish things like when a user reads or fetches his mail, from whom the mail originated, and all recipients. Subscribing components are notified right away when a new mail event occurs via callbacks. • WSecureSmtpCommunication – acts as a proxy between e-mail clients and the real SMTP-server (that requires encryption over SSL). SMTP-commands are parsed and when a user sends e-mail, subscribers to the widget are notified with who sent the mail and to whom together with the subject of the message. Besides the widget specific attributes all widgets tag all data they collect with a timestamp.

3.5.4

Servers

The system only uses one type of server; a so called ContextAbstractionServer. The server collects information from widgets via callbacks as well through explicit remote procedure calls. Collected information is stored in the database table of the server. Before saving data to the local data storage, however, the server will interpret the data by calling any of its known interpreters that the user explicitly has told the server to access for interpretation. Furthermore, an instance of a ContextAbstractionServer has the ability to use any of the implemented translators of the system to write the saved attributes to file for analysis with other tools. The server could also provide means of creating a model straight from the stored data without first saving the attributes to a file. This; however, must be provided by the explicit subclass of the server that implements the actual model creation algorithm. The following subclasses exist: • SGenericAggregator – a generic server with only the capabilities of the ContextAbstractionServer. • SContextClassifier – a server that in addition to the capabilities of the ContextAbstractionServer can also build a classifier for context data or load one that is already built, and can also classify new context data using the model. That is, it incorporates some of the functionality not originally specified to be a part of the Context Explorer, but in the layer on top of it (see figure 3.1). All interaction with the server is performed via a GUI (ContextAbstractionServerUI ), but the server itself does not require the GUI to run in order to operate. 23

The GUI consists of a number of controls for performing the operations described above including the model creation. Figure 3.6 depicts screen-shots of the GUI.

Figure 3.6. The GUI used for controlling a ContextAbstractionServer.

3.5.5

Translators

Translators or instances of the class AttributeTranslator, are simple objects that translate from The Context Toolkit’s internal representation of context data into another format that more readily can be used for analysis or presentation. A translator differs from an interpreter in two important ways: First, it only communicates via procedure calls and is not networked-based in any way, and secondly, it does not perform any interpretations of data, just translations. Another difference is that a translator also supports writing of translated data to a file. The only implemented translator is the CtkToWekaAttribueTranslator, which translates attributes to the format used in the data mining Java API called WEKA [27].

3.6

Technical difficulties

The development of the prototype system encountered a number of problems, both technical and semantical. By semantical we mean issues related to data representation and interpretation. These are further discussed in chapter 5 section 5.1. The technical problems included those that affect the operation of the system in general such as hardware set-up, software bugs and similar problems. Clearly, the least interesting problems are the technical ones. Unfortunately, they were at the same time the ones that required the most time to solve. The following list is an enumeration of the most noticeable technical problems: 24

• Software bugs in The Context Toolkit API. The release of the toolkit that was used (ctk-12292003, which at the time was the latest one) featured numerous bugs that needed to be corrected before the system could be completed. If The Context Toolkit had not been open source, there would never have been a chance of completing the system using Dey et al.’s work. • Memory leaks in the speech recognition engine in Microsoft’s SAPI 4.0. This was the version made available at “Microsoft Agent download page for endusers” [16], but it did not work very well and caused the WMicSound widget to leak memory until it died. Downloading and installing the newer SAPI 5.1 fixed this. • The SAPI dependent widgets did not work very well when started from another computer via Microsoft’s Remote Desktop. The widgets either stopped retrieving new context data or stopped working when the remote desktop connection was dropped (upgrading to SAPI 5.1 did not solve the problem). None of the other widgets was affected. The only identified solution was not to use Remote Desktop. • The line-out and mic-in of the mobile PC was deeply affected by static which the PC itself generated, the most probable reason: Badly shielded cables running to the sound card, which caused the following: 1. The WMicSound to think there was a constant hum with the same sound level as that of a normal conversation. This in turn caused the widget only to be able to identify very loud sounds. 2. The WSpeechAnnotator to interpret the static as some of the words used for annotations, thereby misclassifying contextual data without the notice of the user. 3. The constant static also made Windows XP (running on the mobile PC) to occasionally and automatically turn down the microphone volume to a minimum, making it almost impossible to annotate the activity vocally using the microphone and the WSpeechAnnotator widget. The latter two of the above issues got me to write a simple GUI annotation widget (not shown in figure 3.5), which was used in place of the WSpeechAnnotator when possible. I also reclassified misclassified data that had been collected during runs with the WSpeechAnnotator. • The mobile PC had a knack for turning itself off during use. This was often caused by lack of battery-power but sometimes of no specific reason at all. The cause of the latter was never identified, and the only solution was to restart the system. • The Bluetooth USB-plug used for communicating with the GPS receiver could not directly be plugged into one of the mobile PCs USB-ports. Doing this 25

would cause the Bluetooth communication to go down within a couple of minutes. A reasonably easy solution was identified to be to connect an USBhub to an USB-port on the mobile PC and thereafter plug in the Bluetooth plug into the hub. • The GPS receiver sometimes locked up and stopped being reachable over the Bluetooth link. A quick turn-off/turn-on maneuver was identified as solving the lock up for a while.

26

Chapter 4

The study The purpose of the study was twofold: First, to use and evaluate the information system (The Context Explorer) as a tool for gathering context data (this chapter), and second, to collect data that later could be analyzed (see chapter 5). Of interest to the former part of the purpose was how easily usage of the system would blend in with the rest of the tasks of the user; the system could possible affect the user to such a degree that it would severely change his behavior. This chapter describes the study in more detail, its delimitation, what actions the system expected of the user to operate as intended, and observations resulting from the study.

4.1

Delimitations

Several delimitations was set up in order to ease later analysis of the data and because the allowed time span of this Master’s project was limited. The delimitations had been identified during earlier development iterations of the project, and included disallowing arbitrarily many possible annotation and named places, because it was presumed that this together with the limited time would not allow the finding of useful activity or user state patterns. The delimitations, it was hoped, would allow the system to get a chance to collect enough data for each environment and activity the user was engaged in. The activity or state to be studied was chosen to be the user interruptability, i.e., how willingly the user would allow (external) communication attempts, like phone calls or IM-chat requests. The reason for choosing interruptability was that it is used in The Aware Messenger developed at SICS and described by Segall and Bylund[20]. (If the data analysis following the study would show that it is possible to distinguish between states of interruptability, the system could potentially be used to set the interruptability of the user automatically in the Aware Messenger service.) More specifically, the delimitations were that: 1. The contextual parameters were limited to those The Context Explorer could acquire with its implemented widgets and interpreters. These are listed in 27

table 4.1 and 4.2. 2. The time period of the study was chosen to extend over one to two weeks during normal working weeks of the user (i.e. no once in a year trips abroad during the study etc.). The exact period would depend on how problems like usability would affect extended use. 3. Places the user knew he often and regularly visits was beforehand named and their occupied area and position specified. This come to include about ten different places and included the local shopping mall, homes of friends, the gym, and train stations. The size, or area, of the places ranged between 960 to 127000 square meters but most of them were in the range between 1000 and 6000 square meters. Thus, each area was exaggerated somewhat, partially because of the errors in determining the exact spread of the place, and partially to make sure that real location of the place was included in the area. Worth mentioning is that the distance between any two places was from 200 meters and up, except one distance which measured rougly 20 meters. 4. The different user states was enumerated into the same three the Aware Messenger uses: available, occupied, and discreet. These states was later defined by the user according to table 4.3 to allow consistent use. 5. The state was assumed to only change when the user stated so, i.e., the last known state would be assumed by the system until the user stated otherwise. 6. Two instances of the ContextAbstractionServer would be used for the study. One server would poll the widgets with a sample rate of one context acquisition every minute (which was deemed sufficient for discovering changes in the context), while the other would subscribe to changes the widgets notified about. The reason for running two servers was that it would give a better foundation for data analysis later on; the data from the server using subscriptions would have more, but harder to analyze, data.

4.2

Execution

The study itself was performed with me (the author) as the subject for a total time of two weeks in December 2004. The execution was non-continuous due to crashes and unplanned turn-offs, e.g., out-of batteries, resulting in an effective time of study less than two weeks. My regular occupation for the time was to program some miscellaneous components for the system itself and write this report. While I did this, I also administered the system itself, which involved to frequently performing a number of tasks to give The Context Explorer the best opportunity to collect data about my context and activity. 28

Parameter timestamp date-time sound level sound level average sound level num. peaks username latitude longitude place name altitude speed heading e-mail usage e-mail to e-mail from connected

Description A timestamp (number of ms since midnight, January 1, 1970 UTC) The date (year, month, date, day of the week) and time (hour, minute, seconds) The current sound level (as measured by the microphone) A running average of the sound level for the last time period The number of audio level peaks above a threshold for the last time period The username of the user the context data is applicable to The current latitude of the user The current longitude of the user The name of the place the user is currently at The current altitude of the user The speed the user is moving with The direction in which the user is heading Whether the user is sending, fetching or reading mail (according to the mail server) The e-mail address a particular e-mail is sent to The e-mail address a particular e-mail is sent from Describes whether the user is connected to an e-mail server for either the sending or retrieval of mails

Table 4.1. The contextual parameters (excluding body-metrics) that The Context Explorer can collect with its current widgets and interpreters.

Firstly, I had to carry the mobile PC (and its attached hardware) with me wherever I went. The reason was twofold: to allow the microphone to collect audio data from the setting I was currently in, and allow my position to become updated when I moved from one place to another. Secondly, I had to annotate my status when the speech annotation widget asked me and I had changed state, this in turn involved carrying the headset so I would hear when a status update was sought so I would not forget to state the change. Thirdly, the body metric sensor needed to be attached to my arm during the time the system itself was running. Finally, to learn something from the study itself I had to write down thoughts and observations on how well the study progressed. These observations are presented in the next section. 29

Parameter lying down sleep steps per minute physical activity energy exp. per minute gsr avg longitudinal acc. avg.

longitudinal acc. num peaks

longitudinal acc. MAD transverse acc. avg.

transverse acc. num peaks

transverse acc. MAD skin temp. avg. near body temp. avg.

heat flux avg.

Description Whether the user is lying down or not Whether the user is sleeping or not The number of steps the user has taken for the last minute Whether the user is physically active or not The number of calories the user has exerted during the last minute The average galvanic skin response for the last time period The average user motion along the longitudinal axis (parallel to the arm of the user) for the last time period Number of times major changes in the direction of acceleration along the longitudinal axis has occurred The stillness or movement of the user along the longitudinal axis (small value - little movement) The average user motion along the transverse axis (perpendicular to the arm of the user) for the last time period Number of times major changes in the direction of acceleration along the transverse axis has occurred for the last time period The stillness or movement of the user along the transverse axis The average user skin temperature during the last time period for the last time period The average air temperature immediately around the arm of the user for the last time period The amount of heat exchanged from the arm of the user to the outside environment for the last time period

Table 4.2. The body-metric parameters that The Context Explorer can collect using the SenseWear® Pro2 .

30

State available

discreet

occupied

Description The user is available and will happily answer incoming IM-chat requests or other types of communication attempts. Ex. activity: leisure surfing on the computer The user does not want to get disturbed if it is not anything important. Ex. activity: working in front of the computer, watching TV with the computer nearby. The user is performing some activity that makes him negligent of most communication attempts. Ex. activity: sleeping, cooking.

Table 4.3. The possible interruptability states in the study.

4.3

Findings

The execution of the study uncovered a number of issues with the information system. The issues were either related the usability of the system, such as ease of use and how well it blended in with the activities of the user; or its performance, e.g., how well the system performed its task without user intervention: • Using the system on a day to day basis involved some cumbersome management. To always have the state of the system in mind at all time (e.g. has the mobile PC turned off itself yet?) and remember to plug in the computer to the outlet when possible, so the batteries would not run out, negatively affected the user experience significantly. • Managing the system was also time consuming, especially in the beginning when no routine had been worked in. Unplugging the computer from the home network and the power grid and then pack it together with its appliances into a bag when it was time for lunch or to walk to the gym took time. Sometimes the battery was not charged enough, resulting in the computer turning itself off, where after the user had to reboot the system and connect a charged battery (very annoying if the user was just about to leave). • During the beginning of the study when the system was not very stable, it was not uncommon for, in particular, widgets to crash. Since no screen was attached to the mobile PC during normal use, it usually took time before crashes were discovered. As a result, important context information (that would have been collected during the time after the crash) could never be sampled. • The initial try of the study showed the importance of the subject having a good definition of the different states used for annotations. In the first try, 31

there was no consistent use of the three states and the subject could annotate similar activities with different interruptabilities. • Changing availability in noisy environments through interaction with the WSpeechAnnotator widget proved difficult. The noise interfered with the annotation, and the widget could hardly make sense of the annotation; sometimes it even misinterpreted the true annotation as something else. However, due to the definition of the three states of interruptability, annotations almost never had to take place under these circumstances and had therefore little practical significance. • To carry around the mobile PC and having the headset applied correctly on the head caused stigma and made the user feel self-aware in public environments. This in turn affected the behavior of the user somewhat. • The SenseWear sensor did not affect user activities noticeably nor cause any experienced stigma. This was due to its size and the fact it was almost never visible for other people since clothes obscured it. • At times, managing the system (e.g. restarting it when the computer had turned itself off and the user was just about to leave, or when Windows automatically turned down the microphone volume preventing annotations from taking place) caused frustration and affected the amount of time the system was used. When the user thought the system behaved to devilish, usage could be interrupted for a couple of hours and once even for the rest of the day.

32

Chapter 5

Analysis and results The study described in the previous chapter had resulted in a number of observations on the usability and performance of the developed context-aware system. However, the collected data had still not been investigated for patterns that could give a more thorough and scientifically valid estimation of the performance of the system. Therefore, concluding the study and with that the two-week run of the system, analysis of the data started. The analysis was aimed at two goals: 1. Evaluate the sampled context data from the study in the light of how well it can be used for predicting user interruptability. 2. Find the contextual parameters of those used in the study that are beneficial for the predictions. This chapter describes the data analysis and the method used for performing it. It also gives the results from the analysis in terms of how well the interruptability could be predicted and what contextual parameters were essential for doing so. First; however, the semantical problems identified during the iterative development process of The Context Explorer are presented together with how they were overcome (if applicable).

5.1

Semantical issues

The identification and solving of the semantical problems was unlike the technical ones (section 3.6) of interest to this Master’s project. To reiterate what is meant by semantical issues: Problems related to the data representation and interpretation used by the information system that can affect the ease of later analysis and data management. The identified issues are presented in the following sections.

5.1.1

Storage format

The Context Toolkit’s default way of storing context data in databases does not ease data mining without extensive pre-processing of the data. The reason: The 33

Context Toolkit does not make any assumptions of the persistence of context during periods when context data is unavailable, resulting in a great many unknowns for data collected by servers. This later resulted in the set-up of two different servers for the study, as described in section 4.1, and the assumption that the user activity was constant until changed by the user.

5.1.2

Parameter selection

The choice of parameters, and their representation, is crucial for finding patterns in the context data. Obviously, if we do not sample some aspect of the context we will not be able to find a relationship between it and other aspects. Less trivial is the fact that the representation also is important; take for instance the representation of time. If it is represented as a timestamp, a strict increasing representation (in contrast, hour of the day can be seen as a “cyclic” representation), it is very hard to find relationships between events and the time. On the other hand, if you look at the hour and the day of the week, it seems more probable that you can find some, e.g., the user is occupied on Tuesdays from 1 AM to 9 AM.

5.1.3

Constructed parameters

What the specific parameters, and in particular their composition, of the widgets should be to facilitate model creation is an open issue and cannot be specified definitively. Take for instance the attribute lying down from the SenseWearLog widget (although computed in the armband). This attribute is obviously “constructed” by other attributes since its value is inferred from the longitudinal and transverse accelerometers in the SenseWear armband. When a constructed attribute like lying down are used, we will, supposedly, not increase prediction performance, because the data required for reaching the performance already is available in the attributes that compose it. However, the model created for making the predictions are likely to become simpler when we use the constructed attribute, since the attribute itself house complexity that otherwise would be transferred to the model. Therefore, to reduce the complexity of the model, we can introduce constructed attributes. Unfortunately, there is an infinite space of possible constructed attributes, and the benefits with a less complex model did not outweigh the cost of identifying good constructed attributes (due to lack of time). As a consequence, no attempt to create constructed attributes was performed for this Master’s project1 . Although, where these were already available, like in the SenseWear armband, they were used together with the other attributes. 1

Exceptions included the attributes date-time, because of the reason described in section 5.1.2; and sound level num. peaks, to try to find some interesting audio data even though static was prominent.

34

5.1.4

Ambiguities in position

There are inherent ambiguities in how to translate a position estimate to a place (here in the IPositionToPlace interpreter). Trivially, a position designate can refer to multiple places, but also, due to the errors in GPS and GSM positioning, multiple and nearby places always have a chance for becoming a candidate for the current place of the user given the estimate of the positioning device. This results in a potential need for resolving the ambiguity for every position update. However, trial runs with the system showed that in the domain of study the number of occasions when the place was misclassified was so low that the due number of correctly classified places “over-shadowed” the incorrectly classified ones. That is, instances having the wrong place value had little if any affect on activity classification. The reason for this fortunate outcome was that the error estimate seldom was so large that another place than the true one had to be considered. For other places to be considered, the estimate had to be above 200 meters or more (see section 4.1), but the GPS was often more accurate than that.

5.1.5

Momentous events

The lack of liveness of momentous events like “reading mail” for the email usage attribute does not give a good foundation for finding patterns in the state of the user. This is because we (or the widget) cannot know for how long an activity like “reading mail” is valid—the e-mail client only reports “mail read” once after some pre-specified amount of time, and not for how long the mail was read. Therefore, the number of context updates with this event are neglectable compared to the inverse of the event (“not reading mail”), making the event itself insignificant for finding patterns.

5.1.6

Many-to-one relationships

It is difficult to solve many-to-one relationships in context, primarily due to The Context Toolkits’s storage format. The Context Toolkit assumes that each contextual parameter only can have one value for each update. Awkwardly, there are occasions when this is not true, for example, the attribute email to (see table 4.1)— the number of recipients can be more than one. The issue with many-to-one relationships was never solved, because of a just lack of effort, which in turn was caused by the fact that sending and reading mails (at least for the widget) are momentous events. As explained previously, momentous events are difficult to use for discovering patterns.

5.2

Preconditions

The context data from the two-week study had been stored in an SQL-database. The database contained one table for each server that had aggregated data from the 35

widgets. The tables in turn had one column for each contextual parameter where the elements of the columns contained the collected data itself. Section 4.1 stated the delimitations of the study and that two instances of ContextAbstractionServer had been run for the execution of the study. This resulted in two database tables with aggregated data, where one table contained data from the server that had used polling and the other contained data from the server that had used subscriptions. The latter contained a lot of null values2 , because of the asynchronous context updates that occur with subscriptions. As a result, it was possible that this would require the stored data to go through some kind of preprocessing in order for analysis to find any patterns between the context data and the interruptability itself. Another issue that had to be taken into account when performing the data analysis was that some of the data, for this study the body-metrics retrieved through the SenseWear® Pro2 armband (table 4.2), cannot be aggregated in real-time. This was of high significance on the behavior model that should be built, under the assumption that we later want to use it for predicting user state in real-time. However, it was interesting to see whether the addition of the data would increase the performance of the model. If so, we could say that if we had this data available in real-time, the model would perform better. Consequently, there were four data sets for a potential analysis: Data collected through polling with and without body-metrics, and data collected through subscriptions with and without body-metrics. The latter data (from subscriptions) was later ignored for two reasons. Firstly, it was deemed too time-consuming to implement and perform the pre-processing. Secondly, the other data proved to give a sufficiently good foundation for analysis.

5.3

Machine-learning algorithms

For the analysis, a number of machine-learning algorithms were chosen. The main reason was that a machine learning algorithm can, in addition to find patterns in the data, also be used for model creation (not true for instance-based algorithms) and later used for classifying new data. The latter property is important if we want an application to infer our status automatically from our context. Four algorithms were chosen for evaluation: one instance-based (KNN) and three model-based (1R, C4.5 and the Bayesian network). The subsections to this section gives short descriptions about each algorithm, but are not exhaustive in any way.

5.3.1

Classification rule (1R)

Among the simplest machine learning algorithms is the “1-rule” or 1R algorithm. 1R attempts to find the one attribute that has the highest accuracy for predicting the class in the data set. This is done by trying all the attributes, i.e., branch on each 2 Null values in the table does not give any information about the context, it just tells us that the explicit context information was not stored at that time.

36

of them, and choose the majority class in each branch. The error rate is the number of instances in the data set that do not have the majority class. The attribute with the lowest error rate is then used for predictions.[26] Robert Holte showed that 1-rules (that classify instances based on one attribute) often had a very good performance (in relation to their simplicity) for common data sets[11]. 1R was used as one of the ML-algorithms for this analysis to test how well a very simple model would perform on the data and as a reference against which the other algorithms would be evaluated. Should they perform worse than 1R there is no good reason to use them, because they would have both higher complexity and lower accuracy than 1R.

5.3.2

Decision tree (C4.5)

Decisions trees are the logical extension of 1-rules and use multiple attributes for predicting the class. Decision trees provide easily understood rules and are the common workhorses in the data-mining community; therefore, it seems appropriate to evaluate a tree for the classification task. The most commonly used decisiontree algorithm is the C4.5 and C5.0. (The former was chosen since it is the free predecessor of the commercial algorithm.)[26] Details on how the C4.5 decision tree algorithm works are beyond the scope of this work, but basically, what it does is to select an attribute as the root and branch on each value. This results in the data being split into subsets, one for every value of the attribute. The splitting is performed recursively on each branch. The recursive procedure ends at a branch when all instances in the branch have the same class. To get more details on how the algorithm chooses the attributes, what it does with missing values and how it discovers when less complex (smaller) trees are better then more complex ones, see for instance Witten’s and Frank’s book Data mining: Practical machine learning tools with Java implementations [26].

5.3.3

K-nearest neighbors (KNN)

K-nearest neighbors is an instance-based learning algorithm, which means that it saves all the instances from the data and uses them explicitly to predict the class of new ones[26]. Saving all the instances, makes KNN inefficient in terms of required storage, but it may still be an interesting contrast to the other algorithms. The instances KNN stores, can be thought as residing in an n-dimensional space (where n is the number of attributes). New instances, that we want to classify, are compared to other instances along these dimensions and the closest K instances are used for a majority vote on the class of the new instance. The distance metric most often used is the Euclidean distance for numeric attributes, and some fixed value for nominal attributes (e.g. 0 if same value and 1 otherwise). Sometimes the distance is also used for weighing how much influence the K neighbors have on the classification, but this was not used for this analysis. 37

5.3.4

Bayesian network

A Bayesian network is a directed acyclic graph (DAG) that shows the different variables or knowledge components as nodes, and dependencies between these nodes as arcs. Each node (for discrete variables) has an associated probability table that describes how probable it is that the variable takes a certain value conditioned on its ancestors. For continuous variables, this table is replaced with a conditional probability distribution. The distributions and the tables are computed using the chain rule of probability.[17] What is then attempted, is to infer the values of those nodes that do not have any. For this project that would be the value of the class we want to predict (i.e. the interruptability). This value is estimated using Bayes’ rule: P (X|y) =

P (y|X)P (X) P (y)

where X is the node value and y is the observed evidence. We choose the X with the highest probability given y, i.e., maxX=available,occupied,discreet{P (X|y)}. That only leaves the method for constructing the Bayesian network in the first place. This includes the problem with choosing a scoring function which is then used together with search algorithms for finding the highest scoring (and most suitable) graph for representing the data. This is described in detail in, for instance, Kevin Murphy’s An introduction to graphical models[17]. The main reason for using a Bayesian network for learning a model was that Horvitz and Apacible used one when they were trying to predict the interruptability using the attentional focus and workload of the user[13]. Their study did not result in any spectacular results (prediction accuracies between 61-83%), but that does not mean that Bayesian networks will not give high predictability for the data of this study.

5.4

The sampled data

The data that would underlie the analysis for this study, was after closer inspection found to have been collected over 13 days (although not for 24 hours for each of the days), totaling 10443 samples. These samples consisted of 877 for the class available, 6590 for occupied and 2976 for discreet.

5.5

Results

The sought for results for the analysis was, as stated in the preamble of this chapter, to find out how well the context data could determine the activity of the user and also what aspects of context contribute to correct predictions. Below are the results summarized for these two parts, while the conclusions are saved for chapter 6. More detailed results are presented in appendix A. 38

5.5.1

Prediction accuracy

The accuracies for the algorithms were all determined through tenfold cross-validation on the sampled data. To show any differences the body-metrics could have on the prediction performance (if it was available in real-time), all algorithms were first run on the data excluding body-metrics (see table 5.1) and then on the data including it (see table 5.2). For KNN, different K-values in the range from 1-5 were tried out in order to find the best one. A choice of K = 3 proved to give KNN the best performance on both data sets. Algorithm 1R C4.5 KNN (K=3) Bayes net

Accuracy (%) 86.63 91.57 89.04 88.78

Table 5.1. Prediction results using only context data (no body-metrics)

Algorithm 1R C4.5 KNN (K=3) Bayes net

Accuracy (%) 86.63 96.52 95.34 88.80

Table 5.2. Prediction results using context data and body-metrics

As seen, the C4.5 decision tree performed best, reaching beyond the 90% mark in both runs (91.57% and 96.52%). All four algorithms performed better when the body-metric data was available; however, the Bayesian network performed only marginally better (88.78% compared to 88.80%). The second best algorithm on the data was KNN. It only performed marginally worse on the data including body-metrics than the decision tree (95.34% and 96.52%), why it, if it was not for the higher model complexity (store all training instances), could as well be used instead of the tree. The worst algorithm, relatively speaking, was the classification rule (1R). It had exactly the same accuracy in both runs, trivially, because it used the same attribute. Even so, it proved sufficiently good to reach an accuracy of 86.63%, outperforming Horvitz and Apacible!

5.5.2

Essential contextual parameters

Using the accuracies presented in the previous section we see that the C4.5 algorithm performed best on the data, with and without body-metrics. Fortunately, this algorithm was one of those for which we easily can establish what attributes amounted to the prediction performance by looking at the model itself. 39

The attributes used by the decision tree was (of course) different for the data set excluding body-metrics and the one using body-metrics. For the data excluding body-metrics, the attributes (from chapter 4 table 4.1) was hour, day of the week, place name, altitude, sound level average, and sound level num. peaks. However, it was found that removal of all attributes except the first two had a very low impact on the prediction performance of the model, resulting in a decrease of only 0.2%. Therefore, the actual essential parameters for the data without body-metrics was hour and day of the week. When the data also included body-metrics, the attributes (see table 4.1 and table 4.2) that made up the model was: lying down, longitudinal acc. avg, longitudinal acc. num. peaks, transverse acc. avg, skin temperature avg, hour, day of the week, heat flux avg., physical activity, and near body temp. avg.. At least four (five including lying down) of these are directly related to user motion. Worth mentioning is that the attribute lying down is the first attribute the tree branches on and predicts the state to be occupied whenever it is true. Removal of any of the attributes of the model resulted in a significant reduction (>1%) of the prediction performance, why all the above attributes can be considered relevant. As contrast, the classification rule (1R) used only one attribute to branch on: hour (of the day).

40

Chapter 6

Conclusions and recommendations The previous chapters have described the results from the different phases of this Master’s project. From the initial conceptual study, we have seen different approaches for constructing context-aware systems as well as issues previously discovered related to this type of endeavor. The description about the information system has elaborated on the requirements, functions and implementation of the prototype as well as technical problems encountered during its development. The next chapter detailed the study, how it was conducted and a number observations arising there from. Finally, data representation and interpretation issues were presented followed by the results from the analysis of the collected contextual data. This final chapter concludes this report by discussing the results from the previous chapters, in particular chapter 4 and 5 which was of the most interest (see the problem description in section 1.2).

6.1

Proposals for the information system

The study described in chapter 4 exposed a number of issues related to the usage of the information system. Most of them were caused by shortcomings in the hardware, although some third-party software also introduced annoyments (e.g. the bad speech recognition in noisy environments). Nevertheless, in common for all the issues was that they were related to usability, or the lack there of. If it should be possible to employ the prototype for more general use, i.e., under less controlled conditions, then these issues need to be resolved. This is also true if we want to use the prototype for collecting context data for a behavior model so it can determine the interruptability. Currently, it is easier to set the interruptability yourself than to rely on the system using a behavior model to do it for you. In other words, if The Context Explorer should be used more extensively than in this project, the usability needs to be addressed1 . To come to terms with this at least the following improvements are needed: 1 Note: The usability necessities raised in section 3.2.1 were sufficient for the usability of the software, but not for the hardware it used or ran on.

41

1. Decrease the obtrusiveness of the system by decreasing the number of required management tasks. These include maneuvers to plug in and unplug the computer from the network and the power grid, and pack the computer plus its miscellaneous components (e.g. USB-hub and GPS receiver) for easy carry around. 2. Increase the time of disconnected use by improving battery life. Also make it evident when the battery is about to run out so the use will have an opportunity to charge them. This problem is almost specific to the mobile PC used in this project, newer portable and mobile PCs has the ability to notify the user about the battery status, why a change of computer would partially solve this problem (the battery life issue might still exist though). 3. Improve intelligibility. Adding a small and portable screen would make it easier for the user to find out what the system knows the status of the system, without needing to plug it into a monitor. Furthermore, a GUI could be used as an annotation device during collection of training data as a backup for vocal annotation, which had proved difficult in noisy environments. 4. Make the system more portable. This could include fastening the mobile PC to the body of the user, maybe to the belt, instead of carrying in a separate bag. The benefit would be that the user will not have to remember moving the computer with him. 5. Use a less protrusive microphone (i.e. smaller, lighter and less noticeable) to avoid stigma. It is easy to imagine hardware that would make the above possible: a minimal portable PC with WLAN (like the OQO model 01, figure 6.1), and a Bluetooth headset alternatively a tie clip microphone, would be a good start.

6.2

Conclusions from the data analysis

The prediction results of the machine-learning algorithms presented in the previous chapter, section 5.5.1, showed a surprisingly good accuracy. The accuracy ranged from 86.63-96.52% depending on which algorithm and data set was used. The fact that increased prediction performance was achieved by introducing body-metric data, proved the importance of physical activity (and probable associated temperature changes) for estimating interruptability. This was also explicitly apparent in the type of parameters of the decision tree model, which mostly contained body-metric attributes (section 5.5.2). Still, the minimal difference between accuracy on the two data sets shows that body-metrics aren’t essential to achieve good predictability. The non-essential nature of body-metrics was also evident from the performance of 1R, which although it belongs to the lower end of the accuracy range still managed, 42

Figure 6.1. OQO model 01 ultra personal computer[18]

with a 86.63% prediction performance, to outperform Horvitz’s and Apacible’s work by only using the hour of the day as predictor (as mentioned in section 5.5.2). The surprising result warrants an explanation, and there is certainly not one cause but several. First, Horvitz and Apacible had another measure of interruptability, using a numerical cost of interruption instead. Second, their amount of context data spanned a much short time period (only a couple of hours). Third, they had a much higher sample rate in order to try to find transitions in the attentional state of the user. Finally, they used different contextual parameters, having an emphasis on visual and acoustical analysis, and application and system events[14]2 . Nevertheless, 1Rs performance still showed that for this data the time of the day is a very good estimator for the interruptability of the user. The most evident and probable explanation for the exceptional prediction performance presented in this report must be ascribed to the definitions of the three states of interruptability. Take for instance occupied which is a all-encompassing state including activities such as sleeping, cooking and exercising. As a result, there is no difference, from the perspective of interruptability prediction, between these activities. Also, the activities are of such type that you can expect them to occur at a similar time each day. In contrast available is a very narrowly defined state; it basically includes occasions when the user was in front of the computer but not working, something you also can think would occur at a similar time each day if you regularly are working the same hours. Adding to the above discussion, the fact that my days during the study indeed were very similar (comparing Saturdays with Saturdays, and weekdays with one 2

The comparison of the results presented in this report and those of Horvitz and Apacible would be more valid if the four differences presented here were not present. Unfortunately, no published similar work has been found to compare with.

43

another etc.) increased the ease of predicting my state using the time of the day. Had the study instead been performed for an extended time ranging over several weeks, it seems probable that the accuracies would go down noticeably.

6.3

Report summary

This report has shown that it is practicable to use an information system for collecting context, body-metric, and activity data and use it for predicting the interruptability of a user with very good accuracy. In particular it has been revealed that the time and user motion can be very good predictors when the user is involved in similar doings (working, sleeping) consistently. However, the performance will certainly vary greatly depending on what classification of interruptability is used and what definitions of each type are considered. Even though the above results are encouraging, other issues need to be addressed even more; most obviously the issue with usability. The information system presented in this report did not provide enough user-friendliness for an enjoyable user experience, something we want, if we are going to use it on a consistent basis to provide (some) context-awareness. The cause was mainly related to intelligibility (it was hard to get information about the status of the system) and management (the set of tasks required by the system). Proposals for hardware that can help to increase the usability, included a smaller, more portable, mobile PC with a battery-life indicator and a screen, and a small non-obtrusive microphone.

44

References [1] Ackerman, A., Darrel, T. & Weitzner, D.J (2001) Privacy in context, HumanComputer Interaction, 16 , pp. 167-179 [2] Agre, P. E. (2001) Changing places: Contexts of awareness in computing. Human- Computer Interaction, 16 [3] Bellotti, V. & Edwards, K. (2001) Intelligibility and Accountability: Human considerations in context aware systems. Human-Computer Interaction,16. [4] BodyMedia. SenseWear®PRO2 Armband, http://www.bodymedia.com/ research/sensewear.jsp, accessed December 4, 2004 (Permission to use picture granted by Maria Fattore-Gill at BodyMedia.) [5] Dey, A. K., Abowd, G. D. & Salber, D. (2001) A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human- Computer Interaction, 16 [6] Dey, A. K. & Newberger, A., The Context Toolkit — A toolkit for context-aware applications, http://contexttoolkit.sourceforge.net/, accessed December 5, 2004 [7] Dourish, P. (2004) What we talk about when we talk about context. Pers Ubiquit Comput (2004) 8: 19-30 [8] Ericsson. Developing positioning applications, http://www.ericsson.com/ mobilityworld/sub/open/technologies/mobile_positioning/about/mps_ gettingstarted_develope_application, accessed December 4, 2004 [9] Greenberg, S. (2001) Context as a dynamic construct. Human-Computer Interaction, 16 [10] Grudin, J. (2001) Desituating action: digital representation of context. HumanComputer Interaction,16 pp.(2-3). [11] Holte, Robert C. (1993) Very simple classification rules perform well on most commonly used datasets. Machine Learning, Vol. 11, pp. 63-91. [12] Hong, J. I., & Landay, J. A. (2001) An infrastructure approach to context-aware computing. Human-Computer Interaction, 16 45

[13] Horvitz, E. and Apacible, J. (2003) Learning and Reasoning about Interruption, Proc. Int’l Conference on Multimodal Interfaces (IMCI 2003) [14] Horvitz, E., Koch, P., Kadie, C.M. & Jacobs, A. (2002), Coordinate: Probabilistic Forecasting of Presence and Availability, Proceedings of the National Conference on Uncertainty and Artificial Intelligence (UAI 2002). [15] Lucas, P. (2001) Mobile devices and mobile data issues of identity and reference. Human-Computer Interaction, 16 [16] Microsoft. Microsoft Agent download page for end-users, http://www. microsoft.com/MSAGENT/downloads/user.asp, accessed November 14, 2004 [17] Murphy, K. (2001) An introduction to graphical models, http://www.cs.ubc. ca/~murphyk/Papers/intro_gm.pdf, accessed January 12, 2005 [18] Cheng, C. (2004) OQO model 01 review, http://www.pcmag.com/article2/ 0,1759,1676362,00.asp, accessed January 20, 2005 [19] Rennie, J. (2004) Derivation of the F-Measure http://people.csail.mit. edu/u/j/jrennie/public_html/writing/fmeasure.pdf, accessed Januray 15, 2005 [20] Segall Z., Bylund M., Frank N., and Frank C. (2004) The Aware Messenger - A Case Study In Human Aware Computing. Mobility Conference 2004, 2-4 August, Singapore [21] Siewiorek D, Smailagic A, Furukawa J, Krause A, Moraveji N, Reiger K, Shaffer J, and Lung Wong F. (2003) SenSay: A Context-Aware Mobile Phone, Proceedings of the Seventh IEEE International Symposium on Wearable Computers (ISWC’03) [22] Tang, J., Yankelovich, N., Begole, J., Van Kleek, M., Li, F. & Bhalodia, J. (2001) ConNexus to Awarenex: Extending awareness to mobile user, Proceedings of CHI 2001, ACM Press, pp. 221-228. [23] Weiser, M. (1991) The Computer for the 21st Century, Scientific American, 265(3), pp. 66-75 [24] Weiser, M. (1993) Some computer science issues in ubiquitous computing, Communications of the ACM 36, 7 (July 1993), pp. 65-84. [25] Winograd, T. (2001) Architectures for context, Human-Computer Interaction, 16 [26] Witten, Ian H. & Frank, Eibe. (2000) Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann Publishers, San Fransisco, pp. 72-75, 78-82, 159-170. 46

[27] The University of Waikato, Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/~ml/weka/, accessed November 14, 2004

47

48

Appendix A

Detailed prediction results The following appendix details the results presented in chapter 5 with some per-class measurements including the FP rate, precision, recall and the F-measure.

A.1

Accuracy measures

There are a number of measures widely used in the machine learning community in addition to accuracy. These measures show other aspects of the prediction performance of the machine-learning algorithms and are only relevant when considering the classes separately. In common for all the measures are that they all are based on two or more of the true positives (TP), false positives (FP), false negatives (FN), and the true negatives (TN). The true positives for a class (e.g. available) are the instances in the data that were correctly classified as the class. The false positives on the other hand, are the instances that were incorrectly classified as belong with the class. In the same way, the false negatives are the instances that were incorrectly classified as not belong with the class, while the true negatives are the instances that were correctly classified as not belong with the class.

FP rate The FP rate is the rate of incorrectly classified negatives compared to all the negatives, i.e., FP and TN. That is F P rate =

FP FP + TN

Recall / TP rate The recall measure (also called TP rate) shows the rate of correctly classified positives compared to all the positives, i.e., TP and FN. The recall measure is a help 49

when one wants to consider the cost associated with falsely classifying instances as not belonging to the class. Recall =

TP TP + FN

Precision The precision measure combines the TP and FP to describe how good the classification was to disambiguate between instances that belong with the class and instances that does not belong. The precision measure is a help when one wants to consider the cost associated with falsely classifying instances as belonging to the class. P recision =

TP TP + FP

F -measure The F1 -measure is an attempt to characterize performance by combining the precision and recall measures. The F1 -measure is special case of the more general Fα -measure that weighs the recall, r, and precision, p, differently[19]. Fα −measure =

(α + 1)pr αp + r

However, for the analysis for this report the F1 -measure has been used. F1 −measure =

A.2

2T P 2pr = p+r 2T P + F P + F N

Without body-metrics Class available occupied discreet

FP rate 0.008 0.234 0.067

Precision 0.882 0.892 0.789

Recall 0.799 0.914 0.757

F-measure 0.839 0.903 0.772

Table A.1. Accuracies for 1R without body-metrics.

50

Class available occupied discreet

FP rate 0.004 0.078 0.073

Precision 0.942 0.961 0.806

Recall 0.926 0.915 0.915

F-measure 0.934 0.937 0.857

Table A.2. Accuracies for C4.5 without body-metrics.

Class available occupied discreet

FP rate 0.029 0.119 0.059

Precision 0.712 0.941 0.825

Recall 0.922 0.908 0.833

F-measure 0.804 0.924 0.829

Table A.3. Accuracies for KNN (K=3) without body-metrics.

Class available occupied discreet

FP rate 0.030 0.087 0.075

Precision 0.709 0.955 0.797

Recall 0.939 0.884 0.883

F-measure 0.808 0.918 0.838

Table A.4. Accuracies for the Bayesian network without body-metrics.

A.3

Using body-metrics Class available occupied discreet

FP rate 0.008 0.234 0.067

Precision 0.882 0.892 0.789

Recall 0.799 0.914 0.757

F-measure 0.839 0.903 0.772

Table A.5. Accuracies for 1R with body-metrics.

Class available occupied discreet

FP rate 0.004 0.048 0.021

Precision 0.951 0.977 0.937

Recall 0.959 0.974 0.943

F-measure 0.955 0.976 0.940

Table A.6. Accuracies for C4.5 with body-metrics.

51

Class available occupied discreet

FP rate 0.005 0.052 0.033

Precision 0.934 0.975 0.903

Recall 0.980 0.959 0.930

F-measure 0.956 0.967 0.916

Table A.7. Accuracies for KNN (K=3) with body-metrics.

Class available occupied discreet

FP rate 0.023 0.016 0.113

Precision 0.765 0.991 0.742

Recall 0.975 0.845 0.981

F-measure 0.858 0.912 0.845

Table A.8. Accuracies for the Bayesian network with body-metrics.

52

Suggest Documents