Models of Distributed Computing

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015) Models of Distributed Computing Noah Mendelsohn Tufts University Email: [email protected]...

Author: Mae Rice

5 downloads 3 Views 1MB Size

Report

Download PDF

Recommend Documents

Distributed Computing

Introduction to Distributed Computing

Topology in Distributed Computing

Distributed and Cloud Computing

Internet Distributed Computing

Keywords Cloud Computing, Grid Computing, Cluster Computing, Utility Computing, Service Computing, Distributed Computing

Distributed Lag Models

Super Computing and Distributed Computing Camp Manizales, Colombia Ago

3 Autoregressive distributed lag models

Computing stable models in parallel

2007 Distributed Computing, M. L. Liu 1

COMP5426 Parallel and Distributed Computing. MapReduce

ACM SIGACT News Distributed Computing Column 25

Survey on Topological Methods in Distributed Computing

Subscribe and RPC for Enterprise Distributed Computing

DISTRIBUTED COMPUTING WITH PYTHON BY FRANCESCO PIERFEDERICI

Load balancing in distributed object computing systems

A Distributed Computing Environment for Material Sciences

781 Distributed Computing Systems. Review of Lecture 1. Lecture 2

Distributed Dynamic Scheduling of Composite Tasks on Grid. Computing System

THE computing power of any distributed system can be

Extending Distributed Lag Models to Higher Degrees

Business Models for Distributed Energy Resources:

Global Transaction Models on Mobile Computing

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)

Models of Distributed Computing

Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah

Architecting a universal Web

Identification: URIs Interaction: HTTP Data formats: HTML, JPEG, GIF, etc.

© 2010 Noah Mendelsohn

Goals  Introduce basics of distributed system design  Explore some traditional models of distributed computing  Prepare for discussion of REST: the Web’s model

3

© 2010 Noah Mendelsohn

Communicating systems

© 2010 Noah Mendelsohn

Communicating systems

CPU Memory Storage

CPU Memory Storage

We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical) © 2010 Noah Mendelsohn

Communicating Sequential Processes We’ve got pretty clean higher level abstractions for use on a single machine

CPU Memory Storage

CPU Memory Storage

We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical) © 2010 Noah Mendelsohn

Communicating systems How can we get a clean model of two communicating machines?

CPU Memory Storage

CPU Memory Storage

We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical) © 2010 Noah Mendelsohn

Large scale systems

How can we get a clean model of a worldwide network of communicating machines?

Internet What are the clean abstractions on this scale? © 2010 Noah Mendelsohn

WARNING!!  This is a very big topic…  …many important approaches have been studied and used…  …there is lots of operational experience, and also formalisms…

This presentation does not attempt to be either comprehensive or balanced…the goal is to introduce some key concepts

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Message Passing

© 2010 Noah Mendelsohn

Message passing

CPU Memory Storage

CPU Memory Storage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Half duplex: one way at a time

CPU Memory Storage

CPU Memory Storage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Full duplex: both ways at the same time

CPU Memory Storage

CPU Memory Storage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Message passing  Data abstraction: – Low level: bytes (octets) – Sometimes: agreed metaformat (XML, C struct, etc.)

 Synchronization – Wait for message – Timeout

© 2010 Noah Mendelsohn

Interaction Patterns

© 2010 Noah Mendelsohn

Between pairs of machines

CPU Memory Storage

CPU Memory Storage

Request Response

 Message passing: no constraints  Common pattern: request/response

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Client Server

© 2010 Noah Mendelsohn

Client / server

CPU Memory Storage

CPU Memory Storage

Request service Response

 Request / response is a traffic pattern  Client / server describes the roles of the nodes  Server provides service for client

© 2010 Noah Mendelsohn

Client / server  Probably the most common dist. sys. architecture  Simple – well understood  Doesn’t explain: – How to exploit more than 2 machines – How to make programming easier – How to prove correctness: though the simple model helps

 Most client/server systems are request/response

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing N-Tier

© 2010 Noah Mendelsohn

N-tier – also called Multilevel Client/Server

CPU Memory Storage

CPU Memory Storage

Request

CPU Memory Storage Request

Response

Response

 Layered  Each tier provides services for next higher level  Reasons: – Information hiding – Management – Scalability

© 2010 Noah Mendelsohn

Typical N-tier system: airline reservation

Reservation Records

iPhone or Android Reservation Application Flight Reservation Logic

Browser or Phone App

Application - logic

Application - logic

Many commercial applications work this way © 2010 Noah Mendelsohn

The Web itself is a 2 or 3 Tier system

Web Server

Browser Proxy Cache (optional!)

E.g. Firefox

E.g. Squid

E.g. Apache

Many commercial applications work this way © 2010 Noah Mendelsohn

Web Reservation System Reservation Records Web-Base Reservation Application Flight Reservation Logic

Proxy Cache (optional!)

HTTP

Browser or Phone App

HTTP

E.g. Squid

RPC? ODBC? Proprietary?

Application - logic

Application - logic

Many commercial applications work this way © 2010 Noah Mendelsohn

Web Publishing System Content Management System Web-Base Reservation Application Content Distribution Network

Browser or Phone App

E.g. Akamia

Content Web Site

E.g. cnn.com

Database or CMS

Many commercial applications work this way © 2010 Noah Mendelsohn

Advantages of n-tier system  Separation of concerns – each layer has own role  Parallism and performance? – If done right: multiple mid-tier servers work in parallel – Back end systems centralize mainly data requiring sharing & synchronization – Mid tier can provide shared, scalable caching

 Information hiding – Mid-tier apps shielded from data layout

 Security – Credit card numbers etc. not stored at mid-tier

© 2010 Noah Mendelsohn

Other patterns  Spanning tree  Broadcast (send to many nodes at once)  Flood  Various P2P  Etc.

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Remote Procedure Call

© 2010 Noah Mendelsohn

Remote Procedure Call  The term RPC was coined by the late Bruce Nelson in his 1981 CMU PhD thesis  Key idea: an ordinary function call executes remotely  The trick: the language runtime or helper code must automatically generate code to send parameters and results  For languages like C: proxies and stubs are generated – Not needed in dynamic languages like Ruby, JavaScript, etc.

 RPC is often (erroneously IMO) used to describe any request / response system

© 2010 Noah Mendelsohn

RPC: Call remote functions automatically

x = sqrt(4) float sqrt(float n) { send n; read s; return s; } proxy

CPU Memory Storage Request

invoke sqrt(4)

float sqrt(float n) { …compute sqrt… return result; }

CPU Memory Storage

result=2 (no exception thrown) Response

void doMsg(Msg m) { s = sqrt(m.s); send s; } stub

 Interface definition: float sqrt(float n);  Proxies and stubs generated automatically  RPC provides transparent remote invocation

© 2010 Noah Mendelsohn

RPC: Pros and Cons  Pros: – Transparency is very appealing – Simple programming model – Useful as organizing principle even when not fully automated

 Cons – Getting language details right is tricky (e.g. exceptions) – No client/server overlap: doesn’t work well for long-running operations – May not optimize large transfers well – Not all APIs make sense to remote: e.g. answer = search(tree) – Versioning can be a problem: client and server need to agree exactly on interface (or have rules for dealing with differences)

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Distributed Object Systems

© 2010 Noah Mendelsohn

How do you build an RPC for this? Class int int int }

Point { x,y getx() {return x;} gety() {return y;}

Class Rectangle { …members and constructs not shown… Point getUpperLeft() {…}; Point getLowerRight {…}; }

Call method on remoted object

int area (Rectangle r) { width=r.getLowerRight().getx() – r.getUpperLeft.getx(); width=r.getLowerRight().gety() – r.getUpperLeft.gety(); }

myRect = new Rectangle; …assume position set here.. int a = area(myRect); // REMOTE THIS CALL!

Pass object to remote method

Distributed Object systems make this work! © 2010 Noah Mendelsohn

Distributed object systems  In the 1990s, seemed like a great idea  Advantages of OO encapsulation & inheritance + RPC  Examples – CORBA (Industry standard) – DCOM (Microsoft)

 Still quite widely used within enterprises  Complicated – – – – –

Marshalling object references Distributed object lifetime management Brokering: which object provides the service today Remote “new”: creating objects on remote systems All the pros & cons of RPC, plus the above

 Generally not appropriate at Internet scale

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Some Other Options

© 2010 Noah Mendelsohn

Special Purpose Models  Remote File System – Network provides transparent access to remote files – Examples: NFS, CIFS

 Remote Database – Examples: ODBJ, JDBC

 Remote Device – Remote printing, disk drive etc.

 Virtual terminal – One computer simulates an interactive terminal to another

© 2010 Noah Mendelsohn

Some other interesting models  Broadcast / multicast – Send messages to everyone (broadcast) / named group (multicast)

 Publish / subscribe (pub/sub) – Subscribe to named events or based on query filter – Call me whenever Pepsi’s stock price changes – Implements a distributed associative memory

 Reliable queuing – – – –

Examples: IBM MQSeries, Java Message Service (JMS) Model: queued messages, preserved across hardware crashes Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions; Depends on disk-based transaction systems at each node to keep queues

 Tuple spaces – Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM) – Network-scale shared variable space, with synchronization – Good for queues of work to do: some cloud architectures use a related model to distribute work to servers

© 2010 Noah Mendelsohn

Stateful and Stateless Protocols

© 2010 Noah Mendelsohn

Stateful and Stateless Protocols  Stateful: server knows which step (state) has been reached  Stateless: – Client remembers the state, sends to server each time – Server processes each request independently

 Can vary with level – Many systems like Web run stateless protocols (e.g. HTTP) over streams…at the packet level, TCP streams are stateful – HTTP itself is mostly stateless, but many HTTP requests (typically POSTs) update persistent state at the server

© 2010 Noah Mendelsohn

Advantages of stateless protocols  Protocol usually simpler  Server processes each request independently  Load balancing and restart easier  Typically easier to scale and make fault-tolerant  Visibility: individual requests more self-describing

© 2010 Noah Mendelsohn

Advantages of stateful protocols  Individual messages carry less data  Server does not have to re-establish context each time  There’s usually some changing state at the server at some level, except for completely static publishing systems

© 2010 Noah Mendelsohn

Text vs. Binary Protocols

© 2010 Noah Mendelsohn

Protocols can be text or binary on the wire  Text: messages are encoded characters  Binary: any bit patterns  Pros and cons quite similar to those for text vs. binary file formats  When sending between compatible machines, binary can be much faster because no conversion needed  Most Internet-scale application protocols (HTTP, SMTP) use text for protocol elements and for all content except photo/audio/video  HTTP 2.0 moving to binary (for msg size and parsing speed)

© 2010 Noah Mendelsohn

Summary

© 2010 Noah Mendelsohn

Summary  The machine-level model is complex: multiple CPUs, memories  A number of abstractions are widely used for limited-scale distribution  RPC is among the most interesting and successful  Statefulness / statelessness is a key design tradeoff  We’ll see next time why a new model was needed for the Web

© 2010 Noah Mendelsohn