Designing robust distributed systems with weakly interacting feedback structures

Designing robust distributed systems with weakly interacting feedback structures EPITA April 24, 2013 Peter Van Roy ICTEAM Institute Université cathol...

Author: Carmel Fletcher

2 downloads 0 Views 4MB Size

Report

Download PDF

Recommend Documents

Designing Distributed Antenna Systems (DAS)

Designing Distributed Data Warehouses and OLAP Systems

A logical framework for designing robust distributed NLP applications

Designing Robust Service Encounters

Distributed- System Structures

NEUTRAL NETWORKS OF INTERACTING RNA SECONDARY STRUCTURES

High productivity with open motor feedback systems

Interacting with Intelligent Agents

Interacting with augmented holograms

Designing With Decorative Architectural Handrail Systems

Designing agent-based systems with UML

Interacting with Multi-Robot Systems Using Battle Management Language (BML)

Distributed Systems. Operating Systems

Interacting with Suppliers

A Distributed Local-Leg Feedback Algorithm for Robust Walking on Uneven Terrain

Programming Distributed Memory Systems with MPI

Optimization of Systems with Dynamic Structures

Replicated Directory Service for Weakly Consistent Distributed Caches

DESIGNING WITH TILE DESIGNING WITH TILE

Designing Microwave Distributed Filters with LINC2- Single-Ended Highpass Distributed Filter Design Example

Patients interacting with pharmacy staff:

31 Feedback Control Systems

Fault-Tolerant Distributed Feedback Global Chassis Control

Interacting with the User 18

Designing robust distributed systems with weakly interacting feedback structures EPITA April 24, 2013 Peter Van Roy ICTEAM Institute Université catholique de Louvain Louvain-la-Neuve, Belgium 1 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Overview l 

As Internet services become larger, they become more complex and their environment becomes more hostile l  l 

l 

How can we design Internet services to provide predictable behavior in such conditions? l  l 

l 

Motivating examples from biology (and some well-designed computing systems) Proposed architecture for scalable services as a set of weakly interacting feedback structures with dependencies

Preliminary evaluation based on Scalaris key/value store (SELFMAN project) l  l 

l 

Partial failures, software errors, communication problems, churn, attacks Problems due to global behavior (oscillations, traffic jams, multicast storms, thundering herds, chaotic fluctuations, thrashing, cascading failures)

Scalaris provides high-performance transactions on a structured overlay network Scalaris contains five feedback structures and their dependencies: connectivity, routing, load balancing, replication, and transactions

We are currently formalizing this approach and tying it to existing quantitative approaches

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

2

Motivating examples from biology and computing

3 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Motivating examples l 

Many systems exist that survive in hostile environments l  l  l 

l 

l 

Biological organisms Some computing systems Human organizations

It is a good idea to study these systems to derive general design principles We give five examples to introduce the main ideas l  l  l  l  l 

Apr. 2013

Hotel lobby example → Debugging of feedback structures Human respiratory system → Design rules, state diagram TCP protocol operation → Systems with many parts Human endocrine system → Concurrent component model Human organizations → Design patterns for feedback structures

P. Van Roy, UCL, Louvain-la-Neuve

4

Simple example: hotel lobby (from [Wiener 1948]) l 

Two loops interacting through a common subsystem (stigmergy) l 

Thermostat

This is unstable! l 

Tribesman Fire Hotel lobby

Calculate corrective action Tribesman (stoke fire if too cold)

l 

Thermostat (run aircond. if too warm) Stoke fire

Measure temperature near fire

Run airconditioning

Measure temperature in lobby

Hotel lobby Actuating agents

Monitoring agents Subsystem

Apr. 2013

l 

Wiener leaves the fix as homework for the reader (!) One possible solution: outer loop (tribesman) controls the other by simply adjusting the thermostat l 

P. Van Roy, UCL, Louvain-la-Neuve

The tribesman stokes the fire but gets colder and colder because the airconditioning works harder and harder

One loop controls the other 5

Hotel lobby solution Tribesman (adjust thermostat)

Thermostat (run aircond. if too warm) Measure temperature at thermostat

Run airconditioning

Measure temperature at tribesman

Hotel lobby

l  l  l 

Instead of stoking a fire, the tribesman simply adjusts the thermostat. The resulting system is stable. This uses management (one loop controls another) instead of stigmergy (two loops interact through the environment) Design pattern: use the system, don’t try to bypass it 6

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Human respiratory system The operation of the human respiratory system is given as one feedback structure, inferred from a precise medical description of its behavior

Trigger unconsciousness when O2 falls to threshold Render unconscious (and reduce CO2 threshold to base level)

Conscious control of body and breathing

Other inputs

Increase or decrease breathing rate and change CO2 threshold (maximum is breath!hold breakpoint)

Trigger breathing reflex when CO2 increases to threshold Trigger laryngospasm temporarily when sufficient obstruction in airways

Breathing reflex

Some design rules: l  l  l  l  l 

Detect obstruction in airways

Laryngospasm (seal air tube)

Measure CO2 in blood

Monitor breathing

Measure O2 in blood

Breathing apparatus in human body Monitoring agents

Actuating agents

Default behavior: rhythmic breathing reflex Complex component: conscious control can override and plan lifesaving actions Abstraction: conscious control does not need to know details of breathing reflex Fail-safe: conscious control can itself be overridden (falling unconscious) Time scales: laryngospasm is a quick action that interrupts slower breathing reflex Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

7

Discussion of respiratory system l 

Four feedback loops: two inner loops (breathing reflex and laryngospasm), a loop controlling the breathing reflex (conscious control), and an outer loop controlling the conscious control (falling unconscious) l 

l 

Holding your breath can have two effects l  l 

l 

This design is derived from a precise textual medical description [Wikipedia 2006: Entry “Drowning”] Breath-hold threshold is reached first and breathing reflex happens O2 threshold is reached first and you fall unconscious, which reestablishes the normal breathing reflex

Some plausible design rules inferred from this system l 

l 

Apr. 2013

Conscious control is sandwiched in between two simpler loops: the breathing reflex provides abstraction (consciousness does not have to understand details of breathing) and falling unconscious provides protection against instability Conscious control is a powerful problem solver but it needs to be held in check

P. Van Roy, UCL, Louvain-la-Neuve

8

Respiratory system state diagram

l  l 

The behavior of the human respiratory system modeled as a state diagram Dominant subset = active subset of feedback loops = state l  l 

Apr. 2013

At any time, one subset is active, depending on operating conditions Each subset corresponds to a state in the state diagram P. Van Roy, UCL, Louvain-la-Neuve

9

TCP as feedback structures Send stream

Send acknowledgement

l 

Outer loop (congestion control) Calculate policy modification (modify throughput)

This example shows a reliable byte stream protocol with congestion control (a variant of TCP) l 

Inner loop (reliable transfer)

l 

Calculate bytes to send (sliding window protocol)

The congestion control loop manages the reliable transfer loop l 

Actuator

Monitor

Monitor

(send packet)

(receive ack)

throughput

Subsystem (network that sends packet to destination and receives ack)

l 

This diagram is for the sending side

By changing the sliding window’s buffer size

With n connections there are n feedback structures interacting through a shared network (stigmergy) l  l 

This is an example of a system with many WIFS Each FS has its own state 10

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Human endocrine system l 

l 

l 

The endocrine system regulates many quantities in the human body It uses chemical messengers called hormones which are secreted by specialized glands and which exercise their action at a distance, using the blood stream as a diffusion channel By studying the endocrine system, we can obtain insights in how to build large-scale self-regulating distributed systems

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

11

Feedback loops in the endocrine system l 

There are many feedback loops and systems of interacting feedback loops in the endocrine system l 

l 

Much regulation is done by simple negative feedback loops l 

l 

l  l 

Provides homeostasis (stability) and reaction to stresses Glucose level in blood is regulated by hormones glucagon & insulin. In the pancreas, A cells secrete glucagon and B cells secrete insulin. Increase in glucose in blood causes decrease in glucagon and increase in insulin. These hormones act on the liver, which releases glucose in the bloodstream. Calcium level in blood is regulated by parathyroid hormone (parathormone) and calcitonine (also in opposite directions), which act on the bone

More complex regulatory mechanisms exist, e.g., hypothalamuspituitary-target organ axis There is interaction between nervous transmission and hormonal transmission

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

12

Hypothalamus-pituitary-target organ axis (endocrine system)

l  l 

Two superimposed groups of negative feedback loops, a third short negative loop, a fourth loop from the central nervous system [Encyclopaedia Britannica 2005] This diagram shows only the main components and their interactions; there are many more parts giving a much more complex full system

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

13

Discussion of endocrine system l 

This system is quite complex l 

Many interacting feedback loops, many “short circuits”, many special cases, much interaction with other systems (nervous, immune) l  l 

Negative feedback for most, also saturation (logistic curve) Evolution is not always a parsimonious designer! § 

l 

Only criterion: it has to work

Several feedback loops are channeled through a single point, the hypothalamus-pituitary complex in the brain l  l 

So that the central nervous system can manage these loops Different time scales are used: the loops are slow; the central nervous system is fast 14

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Computational architecture of human endocrine system l 

Local and global components l  l 

l 

Local: gland, organ, or clumps of cells Global (diffuse): large part of the body

Point-to-point and broadcast channels l  l 

Fast point-to-point: nerve fiber, e.g., from spinal chord to muscle Slower broadcast: hormone diffused by blood circulation l 

l 

With buffering (reducing variations): carrier proteins

Regulatory mechanisms can be modeled by interactions between components and channels l  l 

There are often intermediate links Abstraction (encapsulation) is almost always approximate 15

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Design patterns for feedback structures Archetype Family Tree (from [Senge 1994])

I am most concerned about ...

fixing problems

growth

Reinforcing Loop vicious and virtuous spiral

Balancing Loop

While waiting for my But my fix to take hold, to relieve the tension, fix is your I become satisfied nightmare with less ...

But my growth But nothing seems to lead to your decline... grows forevever...

Success to the Successful

I have more than one limit and can't address all of them equally...

The Attractiveness Principle

I form a partnership for growth, but end up feeling betrayed ... Limits to Growth

... so, if we're all up against the same limit

Tragedy of the Commons

Drifting Goals

My capacity is my limit. Therefore, my capacity isn't large enough...

Growth & Underinvestment

Escalation

The drifting goals undermine my long-term growth ...

But I don't know what I'm going to do

Indecision

But sometimes, the reaction is not immediate

But my fix comes back to haunt me Balancing with Delay

...by making my partner into an adversary ...

... because I'm getting at the real underlying cause ...

Accidental Adversaries

Shifting the Burden

... but there's a temptation to let my standards slip instead ...

but once I become addicted to the symptomatic solution ...

Growth & Underinvestment (drifting standards)

l 

Fix that Backfire

Addiction

We can arrange feedback structures in a tree according to their relationships and the problems they solve

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

16

Designing scalable systems

17 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Designing scalable systems l 

Essential ingredients l 

l 

Design principles inferred from existing working systems and validated subsequently The CAP theorem is an essential tool l 

l 

First step l  l  l 

l 

It holds at all scales and all levels of abstraction

The default is a set of independent parts We add coordination between these parts It is important to add as little coordination as possible

Next step: weakly interacting feedback structures 18

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

The CAP theorem l 

The CAP theorem is an essential tool for any scalable system design l 

l 

For an asynchronous network, it is impossible to implement an object that guarantees the following properties in all fair executions: l  l  l 

l 

The CAP theorem was conjectured by Eric Brewer at PODC in 2000 and proved by Seth Gilbert and Nancy Lynch in 2002

Consistency: all operations are atomic (totally ordered) Availability: every request eventually returns a result Partition tolerance: any messages may be lost

The CAP Theorem applies for all systems, at all levels of abstraction, and at all sizes l  l 

It can be applied in many places in the same system The whole system is a rainbow of interacting instances of CAP 19

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Designing with CAP l 

C is hard to achieve → (P+A, no C) is the default l 

l 

Consistency requires global coordination

Avoid needing C if possible l 

We can achieve robustness (P) and performance (A) l  l  l 

l 

But if we really need C l  l 

Give up A → Waiting sometimes needed Give up P → Fragile system l 

l 

l 

DropBox and Web cache give P and A, but not C Wuala and BitTorrent are read-only, achieve C easily Mercurial is consistent if connected (C+A), but is still usable if disconnected (P+A)

Distributed database guarantees C but will block if there is a partition

Accept weaker C → Eventual consistency

We can have our cake and eat it too, if we pay the price l  l 

Highly reliable communication channels and fault tolerance We get C and A, and we “seem” to get P as well (actually, we just have less partitions) l  l 

Scalaris, Beernet: peer-to-peer with majority consensus (Paxos) gives robustness Cassandra: run on cloud, not peer-to-peer (does not support loose coupling) 20

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

The default is a set of independent parts l group

split/merge

l 

Every scalable design starts as a decentralized system (P+A, no C) l 

l 

Nodes occasionally interact (add some C) → collaboration, emergence l  l 

l  l 

A system of independent parts Split protocol: what happens when a node leaves a group (may be abrupt) Merge protocol: what happens when a node joins a group

Merge is based on data coherence and may need input from highest level Many examples: biology, peer-to-peer, map-reduce, gas/liquid/solid, … 21

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Mostly independent parts l 

Large systems consist of independent parts with weak interactions l 

l 

l 

l 

Gas in a box: molecules mostly independent, occasional interaction when two molecules collide. Peer-to-peer network: peers mostly independent, occasional interaction between neighbors only. Can provide efficient and robust communication and storage infrastructure (see later). Gossip algorithm: nodes mostly independent, occasional interaction between random pairs. Can efficiently solve many global problems such as diffusion, search, aggregation, monitoring, and topology management.

This seems to be a general principle l 

Systems with many parts that interact strongly are avoided by nature

22 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Types of systems l 

computing

l 

l 

l 

computing

This diagram is from [Weinberg 1977] An Introduction to General Systems Thinking The discipline of computing is pushing the boundaries of the two shaded areas inwards Software development and computational science are the vanguards of system theory However, there seems to be something inherently unpleasant about the white area in the middle l 

l 

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

It is extremely difficult to analyze systems with many strongly interacting parts; science has barely touched it Even biological organisms avoid it (they are mostly decomposable) 23

Adding consistency/ coordination l 

We start with a decentralized system (P+A, no C) l  l 

l  l 

l 

How much C do we need and how do we add it? General principle: as little as possible (weak interaction)

The rest of the talk explores how to add C Main design principle: weakly interacting feedback structures We validate the approach on a real system l 

Scalaris, a transactional key/value store

24 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

A scalable architecture in four steps l 

Concurrent component l 

l 

l 

Feedback loop l 

l 

Monitor, corrector, and actuator components connected to a subsystem and continuously maintaining one local goal

Feedback structure l 

l 

An active entity communicating with its neighbors through asynchronous messages Intelligence is concentrated in complex components

A set of feedback loops that work together to maintain one global system property

Weakly interacting feedback structures l 

l 

Apr. 2013

The complete system is a conjunction of global properties, each maintained by one feedback structure The feedback structures have dependencies based on the operating conditions P. Van Roy, UCL, Louvain-la-Neuve

25

Scalaris with feedback structures

26 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

A peer-to-peer key/ value Store: Scalaris Sscalaris= Skey-value ∧ Sconnect ∧ Sroute ∧

Sload ∧ Sreplica ∧ Strans #!$%&'!()!)!*%&#"#!+&*,

5)!) 9!%$+ =

l 

)!%1"*"!,2!*%&#"#!+&*,2 ! " "! "! "#%-)!"%&2!(3$)0"-"!,

6+7-"*)!"%&!/),+$

"- 0"-"! )8)"-)0"-"!,

4++$ !% 4++$!/),+$ 4++$"!%"4++$ /),+$ 5)!) 9!%$+
>A

5)!) 9!%$+ >>@

5)!) 9!%$+ >>?

5)!) 9!%$+ >>>

#*)-)0"-"!,

Sconnect → Sroute → Sreplica → Strans Sload

Scalaris is a high-performance self-managing key/value store that provides transactions and is built on top of a structured overlay network l  l 

l 

.$)&#)*!"%&!/),+$

The Scalaris specification is a conjunction of six properties. Each non-functional property is implemented by one feedback structure.

A major result of the European SELFMAN project (www.ist-selfman.org) 4000 read-modify-write transactions per second on two dual-core Intel Xeon at 2.66 GHz

Scalaris has five WIFS: connectivity management (Sconnect), routing (Sroute), load balancing (Sload), replica management (Sreplica), and transaction management (Strans)

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

27

Scalaris scalability

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

28

Scalaris is based on a structured overlay network Ring l 

Structured overlay networks are often based on a ring l 

l 

Self organization is done at two levels: l 

l 

Fingers Apr. 2013

By far the most popular structure, it has many variants and has been extensively studied

The ring ensures connectivity: it must always exist despite node joins, leaves, and failures The fingers provide efficient routing: they can be temporarily in an inconsistent state

P. Van Roy, UCL, Louvain-la-Neuve

29

Structured overlay networks: inspired by peer-to-peer l 

Hybrid (client/server) l 

l 

R = 1 (others) H=1

Unstructured overlay l 

l 

l 

Napster

R = N-1 (hub)

Gnutella, Kazaa, Morpheus, Freenet, …

Uses flooding

R = ? (variable) H = 1…7 (but no guarantee)

Structured overlay l  l 

Exponential network DHT (Distributed Hash Table), e.g., Chord, DKS, Scalaris, Beernet, etc.

R = log N H = log N (with guarantee)

30 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

A “relaxed” structured overlay network Bushes

l 

The relaxed ring is completely asynchronous l 

l  l 

Perfect ring

l  l 

There is a perfect ring (in red) as a subset of the relaxed ring The relaxed ring is always converging to a perfect ring l 

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Join and leave are completely asynchronous (as opposed to Scalaris, where they are synchronous) The bushes appear only if there are failure suspicions Beernet implements the relaxed ring

The bushiness depends on churn (rate of change of the ring, leaves/joins) and failure suspicion rate (communication delays) 31

More on the relaxed ring l 

False failure suspicions are common on the Internet l 

l 

The relaxed ring solves this by doing ring maintenance in asynchronous fashion [Mejias 2008] l  l 

l 

We do not want to eject the node from the ring when this happens

Nodes communicate through message passing For a join, instead of one step involving 3 peers (as in Scalaris, Chord, or DKS), we have two steps each with 2 peers → we do not need locking or a periodic stabilization algorithm

Invariant: Every peer is in the same ring as its successor

32 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Phases in the relaxed ring

l 

The relaxed ring has (at least) three phases l  l 

l 

Uses ring merge algorithm developed in SELFMAN [Shafaat 2009] We are studying how the ring reacts to external stress (phase transitions)

Key questions: l  l 

Apr. 2013

How do the phases show up at the application layer? (“qualitative changes”) How do we know when we are near a phase transition? (“early bubbling”) P. Van Roy, UCL, Louvain-la-Neuve

33

Phases in large systems Water phase diagram (Copyright © Martin Chaplin)

l 

l 

A phase is a concise characterization of an aggregate behavior in a system consisting of many interacting components Phases appear in many large systems l 

l 

Not just physical systems (water) but also computing systems (like peer-to-peer)

Different parts of the system can be in different phases l  l  l 

Depending on the local operating conditions (environment) Boundaries between phases can be sharp or diffuse Phase transitions and critical points can occur if operating conditions change 34

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Complex components (supplement)

35 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Some complex components l 

Human intelligence l 

l 

Main strength: adaptability (dynamic creation of new feedback loops)

Program intelligence l 

Can easily go beyond human intelligence in many areas! l 

l  l  l 

Turing test is irrelevant: complex components are already replacing humans in more and more areas

Minesweeper digital assistant: uses constraints (easy to program!) Chess: uses alpha-beta search with heuristics Compiler: translates humanreadable program into executable form 36

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Properties of complex components l 

Complex components are essential parts of many large systems l 

l 

Complex components completely solve a problem inside a specific (small) part of the space of system operating conditions (from the viewpoint of the rest of the system) l 

l 

l 

For example, conscious control in the human respiratory system

Conscious control, a chess program, and a compiler are extremely smart within their operating space Outside of this space, they can be very stupid and should be inactive (on their own accord or forced)

Complex components are completely unpredictable when viewed from the outside l  l 

If it were not so, they would not be needed! They can be highly nonlinear and unstable; the rest of the system has to trust them (typically, up to some hardwired fail-safe) 37

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Power is built in, not added on

3.6-Liter Biturbo Motor with 353 kW (480 HP) Porsche Carrera GT l 

The power of a system depends on the strength of its complex components l  l  l  l  l 

Apr. 2013

The human respiratory system uses conscious control (e.g., to avoid drowning!) Erlang OTP uses supervisor trees and a database to implement robustness Scalaris uses Paxos consensus and replication to implement fast transactions Google Search uses eigenvector calculation of the Web link matrix What does your system use? P. Van Roy, UCL, Louvain-la-Neuve

38

Why is conscious control so smart? l 

Cognitive science and neuroscience try to understand why l 

l 

Conscious control is a bricklayer: it continuously builds and organizes new components on top of existing components l 

l 

l 

This process is continuous from birth with compound interest effect, which is why humans are so smart in common-sense tasks

It continuously brings the most useful concepts to the top (cache organization combined with “grandfather cell”) l 

l 

The brain uses brute force, but in a very smart way

Manipulating common concepts is made easy

“Mirror neurons”: it can use its own components to simulate other humans, which is why humans can empathize so well with others It can efficiently execute up to two complex programs at once (“walking and chewing gum”), because of the two-lobed structure of the brain 39

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Clouds and elastic applications (supplement)

40 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Elastic computing l 

Two main infrastructures for scalable computing l  l 

l 

Cloud is elastic; peer-to-peer is not l 

l 

l 

Peer-to-peer: use of client machines Cloud-based: use of datacenters Elasticity: the ability to scale resource usage up and down rapidly according to instantaneous demand Elasticity is a new property that did not exist before clouds

Elasticity makes possible a new kind of application l 

l 

Applications that use enormous computational and storage resources for short times, but at constant (low) cost Applications that use data-intensive algorithms and machine learning 41

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Clouds are the first key: much more than meets the eye! l 

Cloud computing is a form of client-server where the “server” is a dynamically scalable network of loosely coupled heterogeneous nodes that are owned by a single institution

l 

It allows enterprises to offload their computing infrastructure

l 

It gives mobile devices an easy way to manage data

l 

Is that all that cloud computing offers? l  l 

No! This is just the tip of the iceberg! Cloud computing is the beginning of a much more profound change 42

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Clouds are elastic! r (resources)

l 

Elasticity is the ability to ramp up resources quickly to meet demand l 

r · t ≤ c0

l 

l 

Elastic

resources

With elastic clouds the enormous dark blue area becomes available Applications that need enormous resources for short times can get them for low cost! l 

r0

l 

Local resources

t0

Like electric power distribution

Like electric power distribution, pay only for the volume (cost is product of time and number of machines) This is exactly what intelligent applications need!

t (time interval)

43

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Machine learning is the second key l 

l 

Machine learning is the discipline that studies how to program computers to evolve behaviors based on example data or past experience Machine learning can solve complex problems that we cannot solve in any other way l 

l 

l 

It has many successes in practical applications both big and small, e.g., speech recognition, computer vision (face and handwriting, etc.), social prediction (epidemics, economics, retail, etc.), robot control (drones, cars, etc.), data mining, aiding natural sciences (biology, astronomy, neurology, etc.) It is a major force on the Internet in big companies (Google, Amazon, Netflix, Facebook, etc.) as well as in startups (e.g., RecordedFuture)

Machine learning will (eventually) transform programming! l 

l 

Apr. 2013

Programmers will not work on raw data any more; instead they will build machine learning systems “Programming, like all engineering, is a lot of work; machine learning is more like farming, where we let nature do most of the work” – Pedro Domingos P. Van Roy, UCL, Louvain-la-Neuve

44

An elastic application (1): real-time voice translation l 

l 

The pieces of this application already exist; for example the IRCAM research institute has implemented many of them It requires combining domain knowledge (in sound and language) with an enormous sound fragment database, hosted on a cloud English/Chinese sound fragment database

English voice

l 

l 

Normalization to canonical voice

Decomposition into phoneme sequences

Lookup in sound database

(purely hypothetical design!)

Concatenative synthesis

Denormalization to original voice

Chinese voice

Franz Och, head of translation services at Google, announced that they are working on something similar (Feb. 10, 2010) Rick Rashid, head of research at Microsoft, has recently demonstrated a prototype of this application (Nov. 2012)

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

45

An elastic application (2): ubiquitous augmentation l 

Your sensory input will be “augmented” in real-time l  l  l  l  l 

l 

Faces, objects, and names you see will be recognized Selected relevant information will be given spontaneously Foreign languages (text, audio, visual) will be translated When doing an activity, you will be guided to do it expertly When confronted with a problem, solutions will be suggested

The augmentation will be good enough that it can be always enabled (it doesn’t get in your way) l  l 

It will learn to mesh with your thinking processes productively On the rare occasions that it is disabled, you will feel helpless l  l 

As if half of your brain just stopped working Like today’s Internet addictions, but much worse! 46

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Space of intelligent applications l XL

Tomorrow’s applications

Real-‐&me audio language transla&on

Learning/setup phase

elastic resource requirements l (learning time constraints) l L

Weather forecas3ng

Optimization is a form of learning!

Advanced applications

Recorded Future

Standard applications l M

l S

Google Search Google Translate l Recommenda3on sys. Speech recogni3on l Skype connec3on l Social networks l Media transla3on

BitTorrent WIMP GUI l MicrosoP Oﬃce

l One-‐shot

l Interactivity

l Intelligent

Wolfram Alpha Image recogni3on

Championship chess (Deep Fritz) IBM Jeopardy(Watson)

Computer algebra Peer-‐to-‐peer CDN Google Earth l JIT Compiler

MMORPG l Role-‐playing games l Chess programs

l M

l L

l S

augmented reality

l XL

Query/use phase

l One-‐way stream l Conversa3on

Real-‐&me expert guidance Crea&ve problem solving (controlled search)

(learning + query)

Apr. 2013 P. Van Roy, UCL, Louvain-la-Neuve

elastic resource requirements l (response time constraints) 47

Conclusions

48 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

Conclusions l 

Design of large distributed systems is difficult l 

l 

How can we design them? l  l  l 

l 

l  l 

Weakly interacting feedback structures with dominant subsets Complex components to solve the problem in limited conditions Phases define behavior over all possible operating conditions

Validation l 

l 

Learn from existing systems that work Inspiration from the SELFMAN project on self-managing systems Inspiration from biological systems

Proposed architecture l 

l 

Not just because of their own complexity, but because the environment becomes more hostile

First validation with the Scalaris and Beernet transactional key/value stores

Ongoing research l  l 

l 

Apr. 2013

Formalization and semantics Tie the approach to existing quantitative techniques (control theory, model checking, system dynamics) Collaboration with system and application builders 49 P. Van Roy, UCL, Louvain-la-Neuve

References

50 Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

References (1) l 

l 

l 

l  l 

l 

l 

l 

l 

l 

l 

Joe Armstrong. Making Reliable Distributed Systems in the Presence of Software Errors, Ph. D. dissertation, Royal Institute of Technology (KTH), Kista, Sweden, Nov. 2003. Ken Birman, Gregory Chockler, and Robbert van Renesse. “Toward a Cloud Computing Research Agenda”, 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware, ACM SIGACT News, 40(2): 68-80 (June 2009). Alexandre Bultot. A Survey of Systems with Multiple Interacting Feedback Loops and Their Application to Programming, Master’s report, Dept. of Comp. Sci. and Eng., UCL, Aug. 2009. Rick Cattell. “High Performance Scalable Data Stores”, Feb. 22, 2010. Raphaël Collet. The Limits of Network Transparency in a Distributed Programming Language, Ph. D. dissertation, Dept. of Comp. Sci. and Eng., UCL, Dec. 2007. Michael Fischer, Nancy Lynch, and Michael Paterson. “Impossibility of Distributed Consensus with One Faulty Process”, Journal of the ACM, 32(2): 374-382 (April 1985). Seth Gilbert and Nancy Lynch. “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services”, ACM SIGACT News, 33(2): 51-59 (2002). Rachid Guerraoui and Luís Rodrigues. Introduction to Reliable Distributed Programming, SpringerVerlag, 2006. Márk Jelasity and Özalp Babaoglu. “T-Man: Gossip-based Overlay Topology Management”, Proc. 3rd Int. Workshop on Engineering Self-Organising Systems (ESOA 2005), Springer-Verlag LNCS volume 3910, 2006, pp. 1-15. Boris Mejías. A Relaxed Ring for Self-Managing Decentralized Systems with Transactional Replicated Storage, Ph. D. dissertation, Dept. of Comp. Sci. and Eng., UCL, Oct. 2010. Gerhard Michal and Dietmar Schomburg. Biochemical Pathways: An Atlas of Biochemistry and Molecular Biology, Wiley-Blackwell, 1999 (first edition), 2012 (second edition). 51

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve

References (2) l 

l 

l 

l 

l 

l  l 

l 

l 

l 

l 

Florian Schintke, Alexander Reinefeld, Seif Haridi, and Thorsten Schütt. “Enhanced Paxos Commit for Transactions on DHTs”, 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010), May 17-20, 2010, Melbourne, Australia. SELFMAN: Self Management for Large-Scale Distributed Systems Based on Structured Overlay Networks and Components. European 6th Framework Programme, www.ist-selfman.org (2009). Peter M. Senge et al. The Fifth Discipline Fieldbook: Strategies and Tools for Building a Learning Organization, Nicholas Brealey Publishing, 1994. Tallat M. Shafaat, Ali Ghodsi, and Seif Haridi. “Dealing with Network Partitions in Structured Overlay Networks”, Journal of Peer-to-Peer Networking and Applications, 2(4): 334-347 (2009). Steven Strogatz. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering (Studies in Nonlinearity), Perseus Books, 1994. Nassim Taleb. The Black Swan: The Impact of the Highly Improbable, Penguin Books, 2008. Peter Van Roy, Seif Haridi, and Alexander Reinefeld. “Software Design with Weakly Interacting Feedback Structures and Its Application to Distributed Systems”, Research Report, Dept. of Comp. Sci. and Eng., UCL, 2011. Peter Van Roy. “Programming Paradigms for Dummies: What Every Programmer Should Know”, chapter in New Computational Paradigms for Computer Music, G. Assayag and A. Gerzso (eds.), IRCAM/Delatour France, June 2009. Gerald M. Weinberg. An Introduction to General Systems Thinking, Dorset House Publishing, 1975 (Silver Anniversary Edition 2001). Norbert Wiener. Cybernetics, or Control and Communication in the Animal and the Machine, MIT Press, Cambridge, MA, 1948. Ulf Wiger. “Four-fold Increase in Productivity and Quality – Industrial Strength Functional Programming in Telecom-Class Products”, Ericsson Telecom AB, 2001. 52

Apr. 2013

P. Van Roy, UCL, Louvain-la-Neuve