Interactive Multimodal User Interfaces for Mobile Devices

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004 Interactive Multimodal User Interfaces for Mobile Devices Wolfgang ...
Author: Lucas Wiggins
3 downloads 1 Views 172KB Size
Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

Interactive Multimodal User Interfaces for Mobile Devices Wolfgang Mueller, Robbie Schaefer, Steffen Bleul Paderborn University/C-LAB Fuerstenallee 11, Paderborn, Germany [email protected], [email protected], [email protected]

Abstract

which combines the areas multimodal user interface and ubiquitous/pervasive computing [20].

Portable devices come with different limitations in user interaction like limited display size, small keyboard, and different sorts of input and output capabilities. With the advance of speech recognition and speech synthesis technologies, their complementary use becomes attractive for mobile devices in order to implement real multimodal user interaction. However, current systems and formats do not sufficiently integrate advanced multimodal interactions. We introduce an advanced generic Multimodal Interaction and Rendering System (MIRS) dedicated for mobile devices. MIRS incorporates efficient processing of XML specification languages for limited, mobile devices and comes with the XML-based Dialog and Interface Specification Language (DISL). DISL can be considered as an UIML subset, which is enhanced by the means of state-oriented dialog specifications. The dialog specification is based on ODSN (Object Oriented Dialog Specification Notation), which has been introduced to define User Interface control by means of interaction states with transition rules.

For generic multimodal user interface description languages, there are currently only very few activities. In the area of graphical user interface description languages, the User Interface Markup Language (UIML) [7] has been established and is currently available as UIML 3.0. UIML is mainly for the description of static user interfaces (structures) and their properties (styles), which is not completely independent from the target platform. The behavioral part of UIML is not well developed and does not give sufficient means to specify real interactive, state-oriented user interfaces. VoiceXML [8] is widely recognized as a standard for the specification of speech based dialogs. In addition to both, InkXML [17] has been defined to support interaction with hand writing interfaces. However, UIML, VoiceXML, and InkXML only cover their individual domains and do not integrate with other modalities. Beyond those, there are other XML-based multimedia languages for general interactive multimedia presentation, such as MHEG, HyTime, ZyX, and SMIL [4]. They enable simple authoring of rich multimedia presentations including layout, timing of streaming audio, video, images, text etc. as well as some very basic interactions in order to select a specific path in an interactive presentation. Considering all XML-based languages, only UIML and VoiceXML provide partial and SMIL limited support for user interaction description. Nevertheless, they are still too limited for the specification of more complex state–based dialogs as they frequently appear in the interaction with mobile devices and remote control via those devices.

1. Introduction With the wide availability of considerably powerful mobile computing devices, the design of portable interactive User Interfaces (UIs) is posed to new challenges, as each device may have different capabilities and modalities for UI rendering. The growing variety of different mobile devices to access information on the Internet has induced the introduction of special purpose content presentation languages, like WML [19] and Compact HTML [12]. However, their application on limited devices is cumbersome and most often requires advanced skills. Therefore, we expect that advanced speech recognition and synthesis will soon complement current technologies for user-, hardware-, and situation-dependant multimodal interaction in the context of embedded and mobile devices. First applications are developed in the area of Ambient Intelligence (AmI) [1],

Though there are currently no activities for a combined XML-based multimodal dialog and interface specification language, the W3C has established activities for an architecture for general multimodal interaction [10]. The Multimodal Interaction (MMI) Framework defines an architecture for combined audio, speech, handwriting, and keyboard interaction as a set of properties (e.g., presentation parameters or input constraints); a set of methods (e.g., begin playback or recognition); and a set of events raised by the

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

1

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

component (e.g., mouse clicks, speech events). The MMI framework covers • multiple input modes such as audio, speech, handwriting, and keyboarding; • multiple output modes such as speech, text, graphics, audio files, and animation. MMI considers human user interaction with a so-called interaction manager. The human user enters input into the system, observes, and hears information presented by the system. The interaction manager is the logical component that coordinates data and manages execution flow from various input and output modalities. It maintains the interaction state and context of the application by responding to inputs from component interface objects and changes in the system and environment. This paper introduces an instance of an MMI framework. We present the architecture of our Multimodal Interaction and Rendering System (MIRS). In the context of MIRS, we introduce the XML-based Dialog and Interface Specification Language (DISL). DISL is based on an UIML subset, which is extended by rule-based descriptions of stateoriented dialogs for the specification of advanced multimodal interaction and the corresponding interfaces. DISL defines the state of the UI as a graph, where operations on UI elements perform state transitions. DISL’s dialog part is based on DSN (Dialog Specification Notation), which was introduced to describe User Interface control models. Additionally, DISL gives means for a generic description of interactive user dialogs so that each dialog can be easily tailored to individual input/output device properties, e.g., graphical display or voice. In combination with DISL, we additionally introduce S-DISL (Sequential DISL). S-DISL is a sequentialized representation of DISL dedicated to the limited processing capabilities of mobile devices. The remainder of this paper is structured as follows. The next two sections introduce approaches and means for dialog and user interface description languages. MIRS and DISL are introduced in Section 4. Section 5 gives the MIRS/DISL example of the remote control of an Winamp MP3 player via a Siemens S55 mobile phone before the paper closes in Section 6 with a conclusion and outlook.

2. User Dialog Specification There exists several classical approaches, which focus on the description of user dialogs interactions for Graphical User Interfaces like Dialogue-Nets [11], Petri-Nets [2], UAN (User Action Notation) [6], and ODSN (Objectoriented Dialog Specification Notation) [16]. They all refer to the same basic concepts of parallel Finite State Machines

and mainly differ in their description means and hierarchical decomposition into components. Therein, user dialogs are defined by means of states and state transitions, which are triggered by events from user interface elements. In our approach, we apply ODSN concepts. OSDN has been developed to model complex state space for advanced human-computer interaction and builds on DSN [5], which is rooted in propositional production systems [14]. ODSN models the user interaction as different objects, which communicate by exchanging events. Each object is described by hierarchical states, user events, and transition rules. Each rule has a condition and body where the condition may range over sets of states and sets of user events. The body is executed when the specified events occur and the object is in the specified state. When the condition becomes true, the execution of the body may perform a state transition. As an ODSN example, consider the following rule. USER INPUT EVENTS switches (iAdapt, iReset) SYSTEM STATES brightness (#dark #normal #bright) color (#black #white) RULES #dark #black iAdapt -->

#white

The example defines two events and two hierarchical states. The rule fires when the event iAdapt occurs, brightness equals #dark, and color is #black. After firing, the rule sets color to #white. Though they were originally developed for graphical user interfaces, all of the above approaches apply for the specification of voice–based dialogs without further modification. However, current activities in the domain of voice browsers do not focus on the explicit specification of the dialog rather than integrate dialog specification implicitly. The most prominent approaches are VoiceXML [8] and SALT [9]. VoiceXML is defined by the W3C as an XMLbased language. It is a language for writing interactive applications for ‘voice browsers’. While listening to spoken prompts and jingles, controls are given by means of spoken input and keypad strokes. VoiceXML defines applications as a set of named dialog states and covers interactions based on spoken prompts (synthetic speech), outputs of audio files and streams, recognition of spoken words and phrases, recognition of touch tone key presses, recording of spoken input, control of dialog flow, and telephony control (call transfer and hangup). VoiceXML definitions are composed into forms, which represent states of interaction where goto defines a state transition. The following gives an example of a short form definition.

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

2

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

Which do you like better, black or white? I’m sorry, I didn’t understand. ... In the first part, a question is raised. Due to the answer, two different state transitions are performed. In the case of an error, the state is kept and the prompt is repeated.

3. User Interface Description Portable user interface description mostly refers to the application of HTML and related languages. For advanced user interaction, JavaScript and Java Applets are often applied. However, such definitions sometimes lack real portability and are not suited for mobile devices. Even complete HTML 4.0 (without JavaScript) is not supported by any of those devices. PDAs and mobile phones are, for instance, limited in their screen resolution while most HTML descriptions are designed for large displays and contain elements like framesets and tables, which require a PC monitor for display. Therefore, alternative markup languages were developed to enable the use of Internet services on mobile limited devices: C-HTML (Compact HTML), WML, and MML (Mobile Markup Language). C-HTML is a subset of HTML 4.0, which is dedicated to the limitations of mobile devices. Elements that do not meet the mobile device requirements like small display, low memory, and slow CPU

are excluded. Thus, C-HTML excludes image maps, tables, and framesets. Since C-HTML is a subset of HTML, each C-HTML page can be viewed with an HTML browser, but a C-HTML browser cannot render all HTML elements. Therefore, the art of transcoding HTML to C-HTML considers not just replacing or deleting unsupported elements rather than to restructure the document so that no information gets lost. In addition, the WAP Forum (Wireless Application Protocol) has introduced WML for micro browsers on mobile devices. WML is based on a combination of a HTML 4.0 subset and HDML (Handheld Device Markup Language) [13] plus some extensions. In addition to HTML structures, WML is composed of a set of cards, which correspond to states in the user interaction dialog. Specific hyperlinks provide means to navigate between cards of one WML specification. Compared with C-HTML, WML has several differences in details like in the support of bold character sets and subheadings (h1, h2, ...). In addition to C-HTML and WML, the Japanese provider J-Phone has defined the MML (Mobile Markup Language) family of languages for their J-Sky service: S-MML (Small MML), M-MML (Medium MML), F-MML (Full MML). S-MML is compliant to a HTML 4.0 subset defined for small displays with 4 lines × 12 characters. M-MML is defined for mobile phones with display sizes 160x100 pixels. F-MML stands for full compatibility to HTML 4.0. Since HTML and its variants provide very poor capabilities for the definition of graphical user interfaces, Harmonica has defined UIML (User Interface Markup Language). An UIML description consists of an interface part, describing the structure and behavior of the interface, and a peers’ section, defining how the used elements are to be translated to target format specific elements. The peers’ section also describes the API of the backend application, so that API methods can be invoked from within the code. An interface consists of parts, which may be composed into subparts. Parts have an associated class and properties. Behavior can be added by means of rules, which are invoked by events. Actions in a rule include the setting of property values, throwing of events, or invocation of methods. Through that, the interface reflects a simplified model of a general GUI developed. An example of the description of an UIML widget is shown by the following UIML code.

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

3

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

composer Though rules support the description of basic interactions, advanced action specification and the notion of stateoriented description is missing to efficiently specify advanced user dialogs in a comprehensive way. UIML allows switches to select parts and properties that are required for platform-specific rendering. Thus, the user may define, e.g., an HTML section and a VoiceXML section for generating HTML and VoiceXML output. For each section, a dedicated vocabulary can be used. The peers’ section includes a description of API calls of the backend in the logic section and directions for the conversion of elements to several target formats. API calls in the logic section contain the method name and the parameters. Optionally, a script can be added to implement local API calls.

4. Multimodal Dialogs for Mobile Devices To introduce an XML-based framework for Multimodal Interaction, we have defined a generic Multimodal Interaction and Rendering System (MIRS), that is based on the specification, exchange, and rendering of user interfaces and dialogs, which are specified by the means of our Dialog and Interface Specification Language (DISL). We first present the MIRS architecture before introducing DISL and its role in the context of MIRS.

4.1. Multimodal Interaction and Rendering System (MIRS) MIRS is based on a client-server architecture as given in Fig. 1. An application server provides a user interface and dialog description given by a DISL description. In order to support efficient computation of the DISL description, DISL is first transformed to S-DISL, which is a sequential intermediate format of DISL. The transformation is performed by a XSLT description executed by an XSLT processor. After transformation, the S-DISL file is transmitted to the mobile client and interpreted. The interpreter separates the description of the dialog and the abstract widgets, where latter are abstract representations of user interface objects, which may be represented as text, graphics, or voice. Due to the individual user and hardware profile, a renderer is spawned for each abstract widget or set of abstract widgets, respectively.

Transform

DISL

S-DISL

Interpreter

HW & User Profile

Renderer

XSLT

Server

Mobile Device

Figure 1. MIRS System Architecture After spawning the renderer, it continuously interacts with the interpreter sending user interface events and receiving user interface descriptions for rendering, i.e., generating user interface information. The communication between both is controlled by a finite state machine, which defines the user dialog as part of the DISL description. So, on the receipt of an event, a state transition is performed, an action is executed, and user interface information is sent to the renderer. User interface information and the specification of the dialog by means of rule-based descriptions of finite state machines are specified by DISL and S-DISL, respectively, which are both introduced hereafter.

4.2

DISL

With the development of DISL, we provide a language, which describes the user interface representation as well as the user interaction with a combined dialog and control model. Though, the description of the dialog model is oriented towards UIML [7], DISL introduces a generic approach for general, platform-independent multimodal UIs. Since UIML definitions are too much platform-related, we tried to identify the smallest set of UIML’s UI elements for general

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

4

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

applications as the smallest denominator over several platforms. The smallest denominator is needed to ensure that a UI can be rendered on all limited devices and be used with different modalities. However, that does not imply that it results in a limitation as the renderer can obtain additional information from the system’s environment or draw conclusions from existing settings (see also Subsection 4.4 for more information). For example, if a UI description mandates a graphical element like a button, it automatically excludes voice applications. If a generic trigger replaces ”button”, a voice UI renderer could also interpret that description. Therefore, reducing the set of widgets is not a real constraint rather than enables a broader field of applications. DISL divides the selected UI elements into two groups. The first one conveys information to the user, whereas the second one supports interaction with the user or with the system. For the first one, we have identified four informative elements: • variable field, which shows the status of a variable • text field for showing textual information, note that they can also be spoken • generic field, which allows arbitrary extensions • widget list, which is used as a container for several widgets All informative elements can be set invisible when they are temporarily not required or are only of interest for internal computations. The group for user interactions provides (i) automatically issued user confirmations/notifications and (ii) widgets for entering information: • built-in commands • generic commands as an extension mechanism

The DISL control model inherits basic concepts from ODSN [16], which means that it not only describes a finite state machine, where state transitions are fired by user events or internal events, but by using the cross product of states, a reduction of the number of transitions can be achieved. An individual DISL dialog description requires a three-step process. 1. First, variables have to be defined to capture the states of the different UI parts. For example, a Boolean variable ”power” is used to determine, whether an application is powered on or off. 2. In a second step, rules describe the user dialog. A rule consists of a condition, which returns either true or false and an action part. Details will be provided later in this section. 3. Based on the rules, transitions may fire. If the transition fires, actions may be performed, which can execute communication with the backend application like providing new values or send commands and define new states by assigning new values to variables. DISL is an XML application, which consists of an optional head element with meta-data followed by a set of templates or interfaces. As an example, the global general structure may look like the following definition: ... ...

• user confirmations • variable boxes, to assign values to variables • text boxes, to enter textual input • generic boxes, for arbitrary input • choice groups, for the selection of a set element Even if the aforementioned parts are oriented towards UIML and the syntax of the dialog model resembles UIML syntax, DISL is basically a different language and cannot be processed by an UIML processor. In DISL, we use the UIML syntax just for the definition of widgets and the vocabulary of the informative and interactive elements. However, the corresponding behavioral part requires the definition of a real UIML syntax extension.

An interface is identified with a name, and a specific state, which is by default set to ”other”. By means of the ”state” attribute, one can define whether the interface is executed at the start of the application, the end, or if it is the default parent of a subinterface. The interface itself is composed of the three parts ”structure”, ”style”, and ”behavior”, where ”structure” and ”style” define the dialog model, and ”behavior” the control model. The ”structure” defines the widgets, which are available for the interface, composed as a list of widgets, whereby each widget can contain other widgets of the following widget types: variablefield, textfield, genericfield, command, confirmation, genericcommand, choicegroup, widgetlist, variablebox, textbox. Those types are essential for basic user interfaces, however, through genericfield and

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

5

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

genericcommand DISL is open to any extension. The attribute ”where” may specify the relative position of each widget (before, after, first, last). Per default, widgets are listed in sequence. The ”style” part defines the widget representation. A ”style” element consists of several parts, each referencing one widget through the required ”generic-widget” attribute. A ”part” element describes the widget properties, which can be freely set. In order to be upward compatible with UIML, DISL ”structure” and ”style” elements are subsets of corresponding UIML elements. Additionally, DISL inherits the concept of templates to reuse parts of the UI. The DISL ”behavior” extends original UIML. As UIML has no real control model implemented, communication with the backend application has to be implemented with the peers’ section and to be mapped to scripts or other UI languages. Those mappings – though quite straight forward – are rather inflexible and not really applicable for the description and transformation of generic UIs as examined in [15]. Therefore, we introduce an event mechanism, which can trigger the application and change the state of the UI dialog by introducing states, events, and state transition rules. States are given by variables. Variables need to have a unique name and contain parsed character data, which determine the value of the variable. The allowable values depend on the type of the variables, which can be set in the ”type” attribute. Supported basic types are integer, string, boolean, and pointers to widgets. Additional attributes may specify its association to an interface or a widget, and whether the variable is just used as a constant. Variables are automatically evaluated when they are referenced by their name. DISL rules evaluate a condition and return a Boolean value. The rules are connected through Boolean operations, which allow DSN-oriented cross product processing, that enables a powerful modeling of user interface behavior. There are different types of conditions, which can be combined and evaluated through boolean operations and comparison operators. The first one checks for equality of two values, which can be properties, variables or structs. The ”call” condition checks if an external call was applied successful and or evaluating events, the ”event” condition can be used. Executed actions are defined in the ”transition” part within ”if-true” statements, where all rules, which are combined by Boolean operators, are evaluated. When the ”if-true” statement evaluates to ”true”, the corresponding ”action” part is executed. An action part may consist of several statements performing actions and changing the state of the UI. The following example illustrates a DISL rule, specifying the volume control of a music device:

128 20 ... yes ... yes ...

First, variables for the current volume and a value for increasing the volume are assigned. The rule ”IncreaseVol” implements the condition that evaluates to true, if the widget ”IncreaseVol” is selected. If true, a set of statements is processed in the action part. There, the ”increaseVolValue” is added to the previous set volume, and statements update the UI, e.g., setting a ”yes” and ”cancel” control. The DISL event mechanism introduces another new concept, which is derived from the concept of timed transitions in ODSN. Events support advanced reactive UIs on remote clients, since they provide the basis for, e.g., timers. DISL events have an action part for the definition of transitions. However, the action is not triggered by rules rather it depends on a timer, which is defined by an attribute. An event

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

6

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

may fire only once or periodically each time the timer expires. For advanced programming, events can be activated and deactivated. The following example shows how the event mechanism is applied to periodically check the current song of a remote music player. It additionally outlines how an external call, e.g., to a servlet, is applied. getplaypos ...

A call has a source. This is typically an http request, but other protocols through URLs can be applied as well. The call establishes the communication with the (remote) application. The call id is used as a pointer to the return the value of the application, which can also be an exception in case of an error. The timeout parameter catches unexpected errors, e.g., when an application is not responding due to a network failure. The timer based event mechanism supports client-based synchronization with the backend application since querying external resources can modify internal UI states.

4.3. S–DISL

like ”nextoperation”, ”nextrule” etc. have to be introduced. Through that, the 42 elements of the SDML DTD can be reduced to 10 basic elements. For example, all action elements are reduced to one with a mode attribute defining the type. The next transformation step sorts the ten element types into ten lists. Ids are replaced by references and empty attributes are deleted in order to get a lean serialized document. The final output is a stream of serialized elements. Although the stream is bigger than the original tree structure, the saved processing time outweights the disadvantage. The size of the stream however can be additionally reduced by using binaryXML [18].

4.4. Tweaking UIs Generated UIs naturally do not have the same appeal as handcrafted UIs. Therefore, we introduce a concept, which combines automated UI generation with annotations to improve the look and feel of UIs. This is denoted as ”tweaking” and was introduced in [3]. Tweaking basically implements the inheritance of properties between widget classes. That means, that the properties of a widget are automatically expanded by inherited properties. The properties of new widget classes are used to refine the look and feel of widgets and to add decorations. The following figure shows such a tree from a generic widget down to specialized widgets for mobile devices. Generic Widget Audio UI VoiceXML

Graphic UI Mobile Application Java MIDP

Desktop Application Java Swing

Visual C++

Figure 2. Tree structure of multimodal UI

Since DISL is designed for mobile devices with limited resources, we developed a serialized form of DISL that allows faster processing and a smaller memory footprint, namely S-DISL. The idea behind S-DISL is that an S-DISL interpreter just has to process a list of elements rather than complex tree structures. On the one hand this saves processing time, on the other hand gives a smaller footprint for the interpreter, which both saves resources required for UI rendering. To achieve a serialized form, a preprocessor implements a multi-pass XSLT transformation of the DISL file to S-DISL. The first two passes are used to flatten the tree structure. To avoid information loss, new attributes providing links,

Applying the original UIML concepts to DISL, we would have to introduce different vocabularies for different modalities. To overcome this, we introduce different path oriented properties, which reflect object-oriented class relationships, e.g., Graphic-UI.mobile-application.Java-MIDP and Graphic-UI.desktop-application.Java-Swing. Having several vocabularies, which can be easily accessed through path elements, allows straightforward design of multimodal UIs, since there is no problem to switch between different vocabularies within transitions or events. In those cases, the renderer just has to check from which vocabulary the properties are derived and to drive/sense the

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

7

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

appropriate input and output channels. This will be partly outlined by the following example in more details.

5. Example To demonstrate our approach, we provide a working example from home automation, which is already completely implemented. The main idea of our application scenario is to control home entertainment equipment through mobile devices. In the implemented example, we remotely control an MP3 player, which runs on a PC by a J2ME-MIDP enabled mobile phone1 . On a PC, a user is able to use a full fledged graphical user interface as it comes, e.g., with Winamp (see Fig. 3). However, that UI cannot be rendered on a mobile phone with a tiny display. Therefore, we have applied the aforementioned MIRS infrastructure and DISL concepts in developing a generic user interface to enable remote control of the MP3 player. The generic UI can be implemented as a service, which can be downloaded and used by the mobile phone.

Figure 3. GUI of Windows based MP3 Player The generic UI - in DISL notation - mainly describes the control model together with rendering properties. The DISL description is transformed to the intermediate S-DISL format through several XSLT transformation steps and finally transmitted to the mobile phone, which runs the interpreter and renderer given as a Java MIDlet. The UI is supposed to support the following basic remote player functions: main on/off, play, stop, mute, pause, go to next title, return to previous title, and volume control. The collection of the corresponding controls is provided as a list of widget elements in the DISL description, which defines state transitions as well as their binding to the commands of the backend application, i.e., the Winamp player. As an example, the following S-DISL code fragment gives the widget list for volume control: 1 To become attractive, we presume that cost-free Bluetooth-based communication will become available for mobile phones soon. Then, it can be used as a universal remote control within the home environment. However, the current J2ME-enabled phones only support expensive bundled GSM transmission (GPRS).



The structural part of the interface description is followed by a style description for each supported widget. The style elements provide information for the renderer. For example, it defines whether the widget is visible or not. The next code fragment shows the style component for one widget: Increase Volume Increases Volume by 10 Every time this command is activated the volume will be increased by 10% no yes yes

Note here, that the DISL structure and style specifications compare to UIML so far. The behavioral part, which is defined below consists of rules and transitions as introduced in Subsection 4.2. As an example, we only give one transition illustrating the action of the ”increase volume” command. The transition fires after the ”IncreaseVol” rule becomes true. Then, the value of the variable ”IncreaseVolValue” is added to the variable ”Volume”. The given

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

8

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

actions then switch the ”Apply” and ”Cancel” widgets to visible2 . yes yes ...

In our example, commands to the backend application are provided as http requests, which are handled by the Interaction Manager, which forwards the commands to the application. The UI Interaction Manager can employ the functionality of a web server, since all MIDP1.0-enabled phones and PDA’s support HTTP. In our implementation, the communication part of the system is written as a set of servlets running on an Apache web server. The client software is currently running on a Siemens S55 mobile phone that comes with J2ME MIDP1.0. Fig. 4 shows an interaction sequence on the mobile phone3 . When the music player application is selected, the UI is requested from the web server and all internal structures are initialized before the UI can be rendered. This procedure has to be performed only once at the initial startup and may take some seconds. Afterwards, all operations with server communication are approximately at the same speed as with the WML browser.

6. Conclusion and Future Work This paper introduced the architecture of our Multimodal Interaction and Rendering System (MIRS) together with the 2 ”visible”

is interpreted as ”audible” for voice rendering where taken from an PC emulator as the photos from the real device were not in sufficient quality

Figure 4. UI rendered on Mobile Phone XML-based Dialog and Interface Specification Language DISL. DISL is based on an extended UIML subset. The extensions are based on ODSN (Object Oriented Dialog Specification Notation). Our current implementation has demonstrated the feasibility for mobile phones. Major parts of MIRS run on an Apache webserver in combination with a J2ME MIDP1.0 enabled Siemens S55 mobile phone. The implementation currently covers the complete definition of DISL, its transformation to S-DISL by a XSLT transformer, the complete S-DISL interpreter, as well as a graphical renderer. In order to complete and test the current implementation we still have to extend it by a voice-based renderer and voice recognition. However, currently available mobile phones as well as PDAs do not provide sufficient processing power; neither for software-based real-time voice synthesis nor for speech recognition. Therefore, we have established a PC-based testbed, which also is also used for the evaluation of user and hardware profile dependent rendering of multimedia information. Our results in employing first versions of MIRS and DISL are quite encouraging as they enable easy modeling and deployment of multimodal and scalable user interfaces required for a great variety of application domains, especially for mobile and ubiquitous computing. Therefore, we are trying – under the umbrella of upcoming international research projects – to combine it with efforts undertaken by other project partners in order to provide a sustainable impact and strive for standardization. However, there is still a long way to go for our final goal to establish a complete system for automatic generation and delivery of user- and device-dependent multimodal user interface.

3 They

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

9

Proceedings of the 37th Hawaii International Conference on System Sciences - 2004

Acknowledgements We appreciate the cooperation and the fruitful discussions with Johan Plomp (VTT) as well as with our ITEA VHE Middleware partners and colleagues at C-LAB. We especially acknowledge the support of Prof. Gerd Szwillus (Paderborn University).

References [1] E. Aarts. Ambient intelligence in homelab, 2002. Royal Philips Electronics. [2] P. Roche B. d’Ausbourg, G. Durrieu. Deriving a formal model from uil description in order to verify and test its behavior. In Proc. of the 3rd Eurographics Workshop on Design, Specification, and Verification of Interactive Systems, 1999.

[15] J. Plomp, R. Schaefer, W. Mueller, and H. Yli-Nikkola. Comparing Transcoding Tools for Use with a generic User Interface Format, 2002. Extreme Markup Languages 2002, Montreal, Canada, August 4-9. [16] G. Szwillus. Object oriented dialogue specification with odsn. In Proceedings of Software-Ergonomie ’93, Teubner, Stuttgart, 1997. [17] Z. Trabelsi, S.-H. Cha, D. Desai, and Ch. Tappert. A voice and ink xml multimodal architecture for mobile e-commerce system. In Proceedings of the second international workshop on Mobile commerce, 2002 , Atlanta, Georgia, USA, 2002. [18] WAP Forum. WAP Binary XML Content Format, April 1998. [19] WAP Forum. Wireless Markup Language Specification Version 1.1, Juni 1999. [20] M. Weiser. The computer for the 21st century, 1991. Scientific American 265(3): 94-104.

[3] L.D. Bergman, G. Banavar, D. Soroker, and J. Sussman. Combining handcrafting and automatic generation of userinterfaces for pervasive devices. In CADUI’2002, 4th International Conference on Computer-Aided Design of User Interfaces, Valenciennes, France, 2002. [4] S. Boll, W. Klas, and U. Wertermann. A comparison of multimedia document models concerning advanced requirements. Technical report, Computer Science Department, University of Ulm, Germany, 1999. [5] M. B. Curry and A. F. Monk. Dialogue modelling of graphical user interfaces with a production system. In Behaviour and Information Technology, Vol. 14, No. 1, pp 41-55, 1995. [6] H.R. Hartson et al. UAN: A user-oriented representation for direct manipulation interface designs. In ACM Transactions on Information Systems, vol.8, no. 3, July 1990. [7] M. Abrams et al. UIML: an appliance-independent xml user interface language. In Computer Networks 31, Elsevier Science, 1999. [8] S. McGlashan et al. Voice extensible markup language (voicexml) version 2.0, 2003. [9] SALT Forum. Speech application language tags (salt) 1.0 specification, 2002. [10] D. Raggett (eds.) J. A. Larson, T.V. Raman. W3c multimodal interaction framework, May 2003. W3C NOTE 06 May 2003. [11] C. Janssen. Dialogue nets for the description of dialogue flows in graphical interactive systems. In Proceedings of Software-Ergonomie ’93, Teubner, Stuttgart, 1993. [12] T. Kamada. Compact HTML for Small Information Appliances, W3CNote, Februar 1998. [13] P. King and T. Hyland. Handheld Device Markup Language Spcification, W3C Note, May 1997. [14] D. Olsen. Propositional production systems for dialog description. In Proceedings of CHI’90, pp 57-63, 1990.

0-7695-2056-1/04 $17.00 (C) 2004 IEEE

10

Suggest Documents