Self-Organization for Multi-Component Multi-Media Environments

1 Self-Organization for Multi-Component Multi-Media Environments Michael Hellenschmidt, Thomas Kirste Fraunhofer Institute for Computer Graphics, Dar...
1 downloads 4 Views 1003KB Size
1

Self-Organization for Multi-Component Multi-Media Environments Michael Hellenschmidt, Thomas Kirste Fraunhofer Institute for Computer Graphics, Darmstadt, Germany {mhellens,tkirste}@igd.fhg.de

Abstract— The vision of Ambient Intelligence is based on the ubiquity of information technology, the presence of computation, communication, and sensorial capabilities in an unlimited abundance of everyday appliances and environments. While today’s experimental smart environments are carefully designed by hand, future ambient intelligent infrastructures must be able to configure themselves from the available components in order to be effective in the real world. But enabling an ensemble of devices to spontaneously act and cooperate coherently requires software technologies that support self-organization. In this paper, we outline the S ODA P OP middleware, which aims at addressing the challenges raised by such environments. In contrast to other approaches, S ODA P OP uses a two-stage model for structuring multi-agent systems and provides unique facilities for coordinating the activities of competing agents through domain specific conflict resolution strategies. The use of S ODA P OP is illustrated by realizing a smart conference room whose capabilities may be extended ad hoc through dynamically added devices. Index Terms— Ambient Intelligence, Multimedia Appliances, Middleware, Self-Organization

Fig. 1. Typical environments we would like to be smart: High-tech conference rooms

I. I NTRODUCTION

without resorting to expensive cameras and difficult image analysis to detect who is currently at the speaker’s desk. Setting up the application software for this scenario, which drives the environment’s devices in response to sensor signals, too presents no major hurdle. So it seems as if Ambient Intelligence is rather well understood, as far as information technology itself is concerned. Details like image and speech recognition as well as natural dialogues of course need further research, but building smart environments from distributed components seems technologically straightforward, once we understand what kind of proactivity users will expect and accept. But only as long as the device ensembles that make up the environment are anticipated by the developers. Today’s smart environments in the various research labs are usually built from devices and components whose functionality is known to the developer. So, all possible interactions between devices can be considered in advance and suitable adaptation strategies for coping with changing ensembles can be defined. When looking at the underlying software infrastructure, we see that the interaction between the different devices, the “intelligence”, has been carefully handcrafted by the software engineers which have built this scenario. This means: significant (i. e., unforeseen) changes of the ensemble require a manual modification of the smart environment’s control application.

The vision of Ambient Intelligence (AmI) [1], [2], [3] is based on the ubiquity of information technology, the presence of computation, communication, and sensorial capabilities in an unlimited abundance of everyday appliances and environments. A rather popular scenario illustrating this vision is the “smart conference room” (or “smart living room”, for consumer-oriented projects) that automatically adapts to the activities of its current occupants (cf. e. g., [4], [5], [6]). Such a room might, for instance, automatically switch the projector to the current lecturer’s presentation as she approaches the speaker’s desk1 and turn down the room lights—turning them up again for the discussion. Of course, we expect the environment to automatically fetch the presentation from the lecturer’s notebook. And the lecturer should be able to use her own wireless presentation controller to move through her slides—alternatively, she might use a controller device at the speaker’s desk. Such a scenario does not sound too difficult: It can readily be constructed from common hardware available today. Using pressure sensors and RFID tagging, it may be built even This work has been partially supported by the German Federal Ministry of Education and Research 1 For the smart living room, such as [7], this reads: “switch the TV set to the user’s favorite show, as he takes seat on the sofa.”

2

This is obviously out of the question for real world applications, where people continuously buy new devices for embellishing their home. And it is a severe cost factor for institutional operators of professional media infrastructures such as conference rooms and smart offices. As example for such changes, consider the smart conference room above: if one participant’s notebook has a built in camera, the room could additionally support video conferencing and gesture interaction. Or, even more challenging, imagine a typical ad hoc meeting, where some people come together in a perfectly average room. All attendants bring along notebook computers, at least one contributes a projector, and the room itself provides some light controls. Of course, all devices will be accessible by wireless networks. So it would be possible for this chance ensemble to provide the same assistance as the deliberate smart conference room above. Enabling this kind of ambient intelligence, the ability of devices to configure themselves into a coherently acting ensemble, requires more than setting up a control application in advance. Here, we need software infrastructures that allow a true self-organization of ad-hoc appliance ensembles, with the ability to afford nontrivial changes to the ensemble. (See also [8] for a similar viewpoint on this topic.) In this paper, we discuss the salient properties of such a software infrastructure and propose a solution to these challenges, the “S ODA P OP” system. S ODA P OP uses a twostage approach to structuring multi-agent systems and provides unique facilities for coordinating the activities of competing agents. The further structure of this paper is as follows: In Section II, we introduce our solution proposal for a software infrastructure that supports such ensembles. Section III then outlines how a typical ensemble is managed with the help of this infrastructure. Based on this, a comparison of our approach with other activities is then given in Section IV. Finally, in Section V, we outline the next steps.

II. A S ELF -O RGANIZING M IDDLEWARE A. Preliminary Considerations When looking at the challenges of self-organization as indicated in the previous section, we can distinguish two different aspects: Architectonic Integration refers to the integration of the device into the communication patterns of the ensemble. For instance, the attachment of an input device to the ensemble’s interaction event bus. Operational Integration describes the aspect of making new functionality provided by the device (or emerging from the extended ensemble) available to the user. For instance, if you connect a CD player to an ensemble containing a CD recorder, the capability of “copying” will now emerge in this ensemble. Although a thorough coverage of “self-organization” requires the handling of both aspects, we will concentrate on

User Interface

User Interface

User Interface

User Interface

Control Application

Control Application

Control Application

Actuators (Display)

Actuators (Tape Drive)

Actuators (Display)

Actuators (Tape Drive)

Device A (TV set)

Device B (VCR)

Device A (TV set)

Device B (VCR)

Event Channel Control Application

?

Action Channel

Devices on their own

Fig. 2.

Devices ensembled

Devices and Data Flows

the aspect of architectonic integration in this paper2 . A central requirement for a software infrastructure supporting architectural integration is that it should support ensembles that are built from individual devices in an ad hoc fashion by the end user. This situation is for instance common in the area of home entertainment infrastructures, where users liberally mix devices from different vendors. From this follows that it is not viable to rely on a central controller—any device must be able to operate stand-alone. Furthermore, some infrastructures may change over time—due to hardware components entering or leaving the infrastructure or due to changes in the qualityof-service available for some infrastructure services, such as bandwidth in the case of wireless channels. Therefore, such an architecture should meet the following objectives: • ensure independence of components, • allow dynamic extensibility by new components, • avoid central components (single points of failure, bottlenecks), • support a distributed implementation, • allow flexible re-use of components, • enable exchangeability of components, • provide transparent service arbitration. Following, we will introduce a software infrastructure for managing self-organizing ensembles that we have developed based on these considerations. B. Devices and Data Flows When developing at a middleware concept, it is important to look at the communication patterns of the objects that are to be supported by this middleware. For smart environments, we need to look at physical devices which have at least one connection to the physical environment they are placed in: they observe user input, or they are able to change the environment (e. g., by increasing the light level, by rendering a medium, etc.), or both. When looking at the event processing in such devices, we may observe a specific event processing pipeline, as outlined in Fig. 2: Devices have a User Interface component that translates physical user interactions to events, the Control Application is responsible for determining the appropriate 2 Operational integration can be realized based on an explicit modeling of the semantics of device operations as “precondition / effect” rules, which are defined over a suitable environment ontology. These rules then can be used by a planning system for deriving strategies for reaching user goals, which consider the capabilities of all currently available devices. See [9] for details.

3

action to be performed in response to this event, and finally the Actuators are physically executing these actions. It seems reasonable to assume that all devices employ a similar event processing pipeline (even if certain stages are implemented trivially, being just a wire connecting the switch to the light bulb). It would then be interesting to extend the interfaces between the individual processing stages across multiple devices, as outlined in the right right side of Fig. 2. This would allow a dialogue component of one device to see the input events of other devices, or it would enable a particularly clever control application to drive the actuators provided by other devices. By turning the private interfaces between the processing stages in a device into public channels, it might be possible to achieve an architectonic integration. So, the underlying approach of our proposal for a middleware is to develop a system model that provides the essential communication patterns of such data-flow based multicomponent architectures. The model we have developed so far is called S ODA P OP (for: Self-Organizing Data-flow Architectures suPporting Ontology-based problem decomPosition.). Following, we give a brief overview over the salient features of this model. Note that the “channels” outlined in Fig. 2 are not the complete story. Much more elaborate data processing pipelines can easily be developed (such as discussed in [10]). The point of S ODA P OP is not to fix a specific data flow topology, but rather to allow arbitrary such topologies to be created ad hoc from the components provided by the devices in an ensemble. C. Basic Elements of S ODA P OP The S ODA P OP model [11] introduces two fundamental organization levels: • •

Coarse-grained self-organization based on a data-flow partitioning. Fine-grained self-organization of functionally similar components based on a kind of “Pattern Matching” approach.

Consequently, a S ODA P OP system consists of two types of elements: Channels, which read a single message at time point and map them to multiple messages which are delivered to components (conceptually, without delay). Channels have no externally accessible memory, may be distributed, and they have to accept every message. Channels provide for spatial distribution of a single event to multiple transducers. Transducers, which read one or more messages during a time interval and map them to one (or more) output messages. Transducers are not distributed, they may have a memory and they do not have to accept every message. Transducers provide for temporal aggregation of multiple events into a single output. In general, a transducer may have multiple input and output channels (m : n, rather than just 1 : 1). The “User Interface” or “Control Application” boxes in Fig. 2 are transducers.

An system in S ODA P OP is defined by a set of channels and a set of transducers connecting these channels. So, a system is a graph where channels represent points (nodes) and transducers represent edges. S ODA P OP supports two typical communication patterns: Events that travel in a data-flow fashion through the different transducers and RPCs that resemble normal remote procedure calls. Events and RPCs describe different routing semantics with respect to result processing: When a RPC is called by a transducer, it expects a result. D. Subscriptions Events and RPCs are (in general) posted without specific addressing information: in a dynamic system, a sender never can be sure which receivers are currently able to process a message. It is up to the channel on which the message is posted to identify a suitable message decomposition and receiver set (service arbitration). A channel basically consists of a pipe into which event generators push messages (events or RPCs) which are then transmitted to the consumers (transducers) subscribing this channel. When subscribing to a channel, an event consumer declares: • the set of messages it is able to process, • how well it is suited for processing a certain message, • to what extent it is able to run in parallel to other message consumers on the same message, • whether it is able to cooperate with other consumers in processing the message. These aspects are described by the subscribing consumer’s utility. A utility is a function that maps a message to a utility value, which encodes the subscriber’s handling capabilities for the specific message. A transducer’s utility may depend on the transducer’s state. Examples for such utility functions will be discussed in Section III. When a channel processes a message, it evaluates the subscribing consumers’ handling capabilities and then decides which consumers will effectively receive the message. The channel may also decide to decompose the message into multiple (presumably simpler) messages, which can be handled better by the subscribing consumers. Following, we describe how the S ODA P OP infrastructure can be used in a concrete application setting. III. A N E XAMPLE E NVIRONMENT Our example environment is the “smart conference room” already outlined in Section I. Such a high-tech conference room should react autonomously, proactively, and reasonably to device changes and occupants’ actions. For instance, in case a person stands up and walks to the speaker stand, the speaker microphone should be turned on and the room lights as well as the speaker desk lighting should be set to appropriate levels for the lecturer and the audience. If the lecturer has brought along a personal notebook computer holding her presentation, this presentation should be delivered on the room’s main projector as soon as the lecturer reaches the speaker stand. And if we further assume that the lecturer has also provided his personal audio equipment (e. g., high-fidelity loudspeakers for movies

4

Sensors

Pressure Sensors Floor / Seat

Projector Switch

Laptop

Room Light

Speaker Stand Light

Microphone

sensor

switch

sensor

switch

switch

switch

parser

parser

parser

Control & Strategy Planning

agent

agent

agent

agent

Actors

actor

actor

Goals

parser

Atomic Events

Parsing / Dialogue Management

actor

Function Calls

Fig. 3.

actor

A conference room’s appliance ensemble

included in the presentation), we would like the conference room to autonomously choose the best sound presentation device. A. Devices and Channels To realize such a room based on the S ODA P OP model, we need to establish the relevant channels and their specific strategies. The first step is to look at the devices that might be present in our conference room. We are considering: • microphone at the speaker stand with a simple loudspeaker, • a projector (“beamer”) to display presentations, • two light systems: room lights and speaker stand illumination, • a floor pressure sensor in front of the speaker stand, • pressure sensors installed in the chairs (of the first row of the audience area), • places for personal devices, such as notebook computers. Considering this set of devices, we postulate three fundamental channels that group the components potentially contributed by the available devices into the following four processing levels: • the level of sensory components that emit atomic events. An atomic event could be triggered by environment changes (e. g., from pressure sensors) as well as by explicit user interaction (e. g., by using an on/off switch), • the level of parsing components that translate sequences of atomic events into goals representing the environment changes intended by the user, • the level of assistant components, responsible for mapping above goals into sequences of device actions that will achieve the desired effect, • the level of actors, causing the physical effects of device actions. The fundamental channels then are:

the event channel that sends events to the different parser components, • the goals channel that passes the goals constructed by the parsing level to the most appropriate assistant component, • the operations channel that sends concrete function calls to the the responsible actors. Finally, in order to support the dynamic cooperation of components in the different processing levels, suitable strategies have to be assigned to the individual channels: The “event” channel uses a distributed parallel event parsing algorithm. In order to allocate tasks to the most appropriate control agent resp. actor component, “goals” and “operations” channels use a “best offer wins” strategy. So, our generic component architecture for dynamic ambient intelligent device ensembles looks as outlined in Fig. 3. Here, the columns represent individual devices with their local event pipelines contributing their components to the overall ensemble. With respect to Fig. 2, we have added just a single channel, the “goals” channel, which splits the Control Application into two separate components: Parsing / Dialogue Management and Control / Strategy Planning. A sufficiently detailed account of the underlying concept of goal-based interaction is out of the scope of this paper, we refer the reader to [9] (cf. also footnote 2). •

B. Ensemble Dynamics For the set up of our test scenario, we use a conference room with controllable lighting, a beamer (including a permanently attached PC), several pressure sensors, and a microphone (together with a simple loudspeaker component). To provide for notebooks to be aware of the presence of their user, we use a simple awareness software that emits an event containing the ip number of the notebook if the user presses a special shortcut.

5

pressure sensor seat / speakers stand

sensor 1

beamer

personal laptop awareness

room lights

Speakers stand light

microphone

switch

sensor

switch

switch

switch

parser

parser

parser

assistant

assistant

assistant

1

1

parser

parser 2

assistant

assistant 3

5

actor

4

actor

4

actor

4

actor

4

actor

Intelligent scenario Fig. 4.

Communication flow in the extended ensemble

The channel set identified in Section III-A has been implemented using the Java version of S ODA P OP (see Section V). Devices equipped with these definitions can readily be plugged together, then exhibiting the expected spontaneous “intelligent” behavior. A typical ensemble based on this channel set is given in Fig. 3. Referring to the initial situation shown here, each device’s interaction component sends single events that are immediately claimed by its own parser component. A device’s parser translates these events into straightforward goals (e. g., “room light on”) published on the “goals” channel, which in turn are captured by the device’s assistant to be translated into actions delivered on the “operations” channel, where they are immediately (and solely) claimed by the device’s own actor component. So, each device is directly (and exclusively) controlled by its own interaction component. Not wrong—but also not very intelligent (or interesting). However, let us now dynamically extend the ensemble by a new device that provides another parser and another assistant component, as outlined in Fig. 4. Assume that the new parser is configured to claim a sequence of events, composed of a chair occupation change event (if one is standing up), the awareness event of a personal notebook, and a floor occupation event (provided by the floor sensor at the speaker stand), which are to be generated within a specific time interval (see the events marked with (1) in Fig. 4). The parser translates this sequence of events into the goal “prepare room for the presentation from PC with ip number x”, where x is the ip number from the notebook that emitted the absence event, and publishes it on the goal channel (see (2) in Fig. 4). Given this goal, the newly introduced assistant component will win the competition on the goal channel, and it will then create a strategy for achieving this goal by constructing the following action sequence: • switch on beamer, • get presentation from notebook and start it, • switch speaker stand lights on, • dim the room lights, • switch on microphone, • switch on loudspeakers. Each of these actions are then published on the “operations”

channel, which distributes them to the different actors, going through another competition cycle for each action (see points (4) in Fig. 4). Turnin back to our scenario, we observe that our lecturer has brought along a high-fidelity loudspeaker device she needs for her presentation. Added to the existing device infrastructure, the new loudspeaker extends the topology as shown in Fig. 4 at point 5. Now there are two possible loudspeakers available. Both receive the request for the task “speakers on” and both compete on the operation channel with the aspects they consider as important for accomplishing the task. In our example ensemble, the new loudspeakers raises more aspects and claims more fidelity for them (e. g., stereo effect, powerful integrated amplifier) than the original one – i. e., makes a better offer. So the “operations” channel selects the new speaker as the device with the highest performance to accomplish the task “speakers on”. Both changes in ensemble behavior as well as the original ensemble synthesis from the individual devices have been managed spontaneously, without any human intervention, based on the generic channel topology and the channels’ competition strategies. IV. R ELATED W ORK AND A SSESSMENT There are other approaches that address the problem of dynamic, self organizing systems, such as HAVi [12], Jini [13], the Galaxy Communicator Architecture (see [14] and [15]), or SRI’s Open Agent Architecture (OAA) (see [16] and [17]). Especially Galaxy and OAA provide architectures for multi-agent systems. Likewise, the pattern-matching (event subscription) approach in S ODA P OP ist not new. Comparable concepts can be found in Galaxy, in the OAA, as well as in earlier works on Prolog or in the Pattern-Matching Lambda Calculus. Here the S ODA P OP approach provides a certain refinement at the conceptual level by replacing language-specific syntactic pattern-matching functionality (cf. the Prolog-based pattern matching of OAA) by a language-independent facility based on utility value computation functions that are provided by transducers. However, the important differences of S ODA P OP to the above approaches are • S ODA P OP uses a two-stage approach to system decomposition and self organization. Coarse-grained structuring is provided by defining channels, fine grained structure is supported by “pattern matching”. • S ODA P OP explicitly supports data-flow architectures by providing event channels besides conventional RPC channels. The combination of these two approaches is an important extension over the above systems. To our experience it is dangerous to provide only a single granularity for decomposing a complex system structure. OAA, Galaxy, Jini, and similar approaches are based on a single communication bus and message decomposition paradigm, which is responsible for all communication and arbitration in the agent ensemble. This single granularity necessarily has to be fine in order to provide the required flexibility. When

6

trying to fix the overall structure of the system, such a fine granularity provides too much detail and quickly leads to a proliferation of interfaces that are shared by only a few components. This danger specifically exits, when the interface discussion is carried out by several independent component developers in parallel. However, the proliferation of interfaces is a Bad Thing, because it obstructs the interoperability of system components—a prime goal of S ODA P OP. The S ODA P OP approach, on the other hand, provides abstractions that allow both a top-down structuring of the system (channels) as well as a bottom-up structuring (within-channel strategies). The top down-structure (i. e., an application domain’s generic channel topology) provides a strong guidance for further interface discussions. S ODA P OP also explicitly includes a data-flow based mechanism for constructing systems out of components, based on S ODA P OP Event Channels. Finally, the decentralized per-channel arbitration mechanisms supported by S ODA P OP , provides a much finer control over the message routing semantics than conventional object bus approaches (such as OAA and Jini), while at the same time avoiding a centralized, manually managed routing rule base, as Galaxy requires. Only this unique combination makes the automatic creation of ad-hoc ensembles from autonomous, distributed devices possible. V. C URRENT S TATE AND N EXT S TEPS A public version of S ODA P OP is currently available in Java. The download offers an API for establishing channels and transducers as well as for creating and attaching arbitrary channel strategies. A set of basic channel strategies is included (i. e., parallel event parsing, “best-offer wins” agent selection, and a strategy for distributing multi-modal system output to different render components by problem decomposition), as well as a set of demonstration scenarios (e. g., home entertainment device infrastructure), which illustrates the use of S ODA P OP. It should be clear that S ODA P OP aims at providing a core facility for self-organization of appliance ensembles, not a comprehensive software infrastructure covering all aspects conceivably being required for AmI systems. So, security, privacy, authentication, context management, strategy planning, dialogue management, etc., are currently not part of S ODA P OP, as we envision these functionalities to be provided by layers above (resp. below) S ODA P OP. One important goal of making S ODA P OP available publicly is to elicit critical feedback and suggestions for improvement of this proposal for structuring AmI systems. S ODA P OP is currently being developed within the project DynAMITE ([18]) and is available from the project web site. In the next stage, following issues will be addressed: • provide a strategy for transducers to collaboratively perform a task (e. g., a transducer controlling a beamer device collaborates with a transducer for light control in maximizing the contrast for the audience), • provide specialized channel strategies for additional ontologies (e. g., a multi-modal input strategy complementing the multi-modal output strategy mentioned above),

• •

provide an interface to lower-level standard protocols (e. g., UPnP). develop simulation tools providing graphical interfaces to allow fast and efficient experiments without the need to set up physical hardware in real rooms. R EFERENCES

[1] K. Ducatel, M. Bogdanowicz, F. Scapolo, J. Leijten, J.-C. Burgelman, Scenarios for ambient intelligence 2010, ISTAG report, European Commission, Institute for Prospective Technological Studies, Seville (Nov 2001). URL ftp://ftp.cordis.lu/pub/ist/docs/istagscenarios2010.pd [2] N. Shadbolt, Ambient Intelligence, IEEE Intelligent Systems (2003) 2–3. [3] E. Aarts, Ambient intelligence: A multimedia perspective, IEEE Multimedia (2004) 12–19. [4] D. Franklin, K. Hammond, The intelligent classroom: providing competent assistance, in: Proceedings of the fifth international conference on Autonomous agents, ACM Press, 2001, pp. 161–168. URL http://doi.acm.org/10.1145/375735.376037 [5] Stanford Interactive Workspaces iWork, Project Overview, http://iwork.stanford.edu/ (Oct. 2003). [6] Oxygen, MIT Project Oxygen, Pervasive, Human-centered Computing, http://oxygen.lcs.mit.edu/ (2002). [7] B. Brumitt, B. Meyers, J. Krumm, A. Kern, S. A. Shafer, Easyliving: Technologies for intelligent environments, in: HUC, 2000, pp. 12–29. URL citeseer.nj.nec.com/brumitt00easyliving.html [8] D. Servat, A. Drogoul, Combining amorphous computing and reactive agent-based systems: a paradigm for pervasive intelligence?, in: Proceedings of the first international joint conference on Autonomous agents and multiagent systems, ACM Press, 2002, pp. 441–448. URL http://doi.acm.org/10.1145/544741.544842 [9] T. Heider, T. Kirste, Supporting goal-based interaction with dynamic intelligent environments, in: Proc. 15th European Conference on Artificial Intelligence (ECAI’2002), Lyon, France, 2002. [10] T. Herfet, T. Kirste, M. Schnaider, EMBASSI: multimodal assistance for infotainment and service infrastructures, Computers & Graphics 25 (4) (2001) 581–592. [11] T. Heider, T. Kirste, Architecture considerations for interoperable multimodal assistant systems, in: Proc. 9th International Workshop on Desgin, Specification, and Verification of Interactive Systems (DSV-IS’2002), Rostock, Germany, 2002. [12] HAVi, Inc., The HAVi Specification – Specification of the Home Audio/Video Interoperability (HAVi) Architecture – Version 1.1, www.havi.org (May 2001). [13] Sun Microsystems, Inc., Jini Technology Core Platform Specification – Version 1.1, www.jini.org (Oct. 2000). [14] S. Seneff, E. Hurley, R. Lau, C. Pao, P. Schmid, V. Zue, Galaxy-II: A Reference Architecture for Conversational System Development, in: ICSLP 98, Sydney, Australia, 1998. [15] S. Seneff, R. Lau, J. Polifroni, Organization, Communication, and Control in the GALAXY-II Conversational System, in: Proc. Eurospeech 99, Budapest, Hungary, 1999. [16] SRI International AI Center, The Open Agent Architecture, http://www.ai.sri.com/ oaa/ (2000). [17] D. L. Martin, A. J. Cheyer, D. B. Moran, The Open Agent Architecture: a framework for building distributed software systems, Applied Artificial Intelligence 13 (1/2) (1999) 91–128. [18] DynAMITE, DynAMITE Project Overview, www.dynamiteproject.org (2004).