Models of Distributed Computing

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015) Models of Distributed Computing Noah Mendelsohn Tufts University Email: [email protected]...
Author: Mae Rice
5 downloads 3 Views 1MB Size
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)

Models of Distributed Computing

Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah

Architecting a universal Web

Identification: URIs Interaction: HTTP Data formats: HTML, JPEG, GIF, etc.

© 2010 Noah Mendelsohn

Goals  Introduce basics of distributed system design  Explore some traditional models of distributed computing  Prepare for discussion of REST: the Web’s model

3

© 2010 Noah Mendelsohn

Communicating systems

© 2010 Noah Mendelsohn

Communicating systems

CPU Memory Storage

CPU Memory Storage

We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical) © 2010 Noah Mendelsohn

Communicating Sequential Processes We’ve got pretty clean higher level abstractions for use on a single machine

CPU Memory Storage

CPU Memory Storage

We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical) © 2010 Noah Mendelsohn

Communicating systems How can we get a clean model of two communicating machines?

CPU Memory Storage

CPU Memory Storage

We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical) © 2010 Noah Mendelsohn

Large scale systems

How can we get a clean model of a worldwide network of communicating machines?

Internet What are the clean abstractions on this scale? © 2010 Noah Mendelsohn

WARNING!!  This is a very big topic…  …many important approaches have been studied and used…  …there is lots of operational experience, and also formalisms…

This presentation does not attempt to be either comprehensive or balanced…the goal is to introduce some key concepts

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Message Passing

© 2010 Noah Mendelsohn

Message passing

CPU Memory Storage

CPU Memory Storage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Half duplex: one way at a time

CPU Memory Storage

CPU Memory Storage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Full duplex: both ways at the same time

CPU Memory Storage

CPU Memory Storage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Message passing  Data abstraction: – Low level: bytes (octets) – Sometimes: agreed metaformat (XML, C struct, etc.)

 Synchronization – Wait for message – Timeout

© 2010 Noah Mendelsohn

Interaction Patterns

© 2010 Noah Mendelsohn

Between pairs of machines

CPU Memory Storage

CPU Memory Storage

Request Response

 Message passing: no constraints  Common pattern: request/response

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Client Server

© 2010 Noah Mendelsohn

Client / server

CPU Memory Storage

CPU Memory Storage

Request service Response

 Request / response is a traffic pattern  Client / server describes the roles of the nodes  Server provides service for client

© 2010 Noah Mendelsohn

Client / server  Probably the most common dist. sys. architecture  Simple – well understood  Doesn’t explain: – How to exploit more than 2 machines – How to make programming easier – How to prove correctness: though the simple model helps

 Most client/server systems are request/response

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing N-Tier

© 2010 Noah Mendelsohn

N-tier – also called Multilevel Client/Server

CPU Memory Storage

CPU Memory Storage

Request

CPU Memory Storage Request

Response

Response

 Layered  Each tier provides services for next higher level  Reasons: – Information hiding – Management – Scalability

© 2010 Noah Mendelsohn

Typical N-tier system: airline reservation

Reservation Records

iPhone or Android Reservation Application Flight Reservation Logic

Browser or Phone App

Application - logic

Application - logic

Many commercial applications work this way © 2010 Noah Mendelsohn

The Web itself is a 2 or 3 Tier system

Web Server

Browser Proxy Cache (optional!)

E.g. Firefox

E.g. Squid

E.g. Apache

Many commercial applications work this way © 2010 Noah Mendelsohn

Web Reservation System Reservation Records Web-Base Reservation Application Flight Reservation Logic

Proxy Cache (optional!)

HTTP

Browser or Phone App

HTTP

E.g. Squid

RPC? ODBC? Proprietary?

Application - logic

Application - logic

Many commercial applications work this way © 2010 Noah Mendelsohn

Web Publishing System Content Management System Web-Base Reservation Application Content Distribution Network

Browser or Phone App

E.g. Akamia

Content Web Site

E.g. cnn.com

Database or CMS

Many commercial applications work this way © 2010 Noah Mendelsohn

Advantages of n-tier system  Separation of concerns – each layer has own role  Parallism and performance? – If done right: multiple mid-tier servers work in parallel – Back end systems centralize mainly data requiring sharing & synchronization – Mid tier can provide shared, scalable caching

 Information hiding – Mid-tier apps shielded from data layout

 Security – Credit card numbers etc. not stored at mid-tier

© 2010 Noah Mendelsohn

Other patterns  Spanning tree  Broadcast (send to many nodes at once)  Flood  Various P2P  Etc.

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Remote Procedure Call

© 2010 Noah Mendelsohn

Remote Procedure Call  The term RPC was coined by the late Bruce Nelson in his 1981 CMU PhD thesis  Key idea: an ordinary function call executes remotely  The trick: the language runtime or helper code must automatically generate code to send parameters and results  For languages like C: proxies and stubs are generated – Not needed in dynamic languages like Ruby, JavaScript, etc.

 RPC is often (erroneously IMO) used to describe any request / response system

© 2010 Noah Mendelsohn

RPC: Call remote functions automatically

x = sqrt(4) float sqrt(float n) { send n; read s; return s; } proxy

CPU Memory Storage Request

invoke sqrt(4)

float sqrt(float n) { …compute sqrt… return result; }

CPU Memory Storage

result=2 (no exception thrown) Response

void doMsg(Msg m) { s = sqrt(m.s); send s; } stub

 Interface definition: float sqrt(float n);  Proxies and stubs generated automatically  RPC provides transparent remote invocation

© 2010 Noah Mendelsohn

RPC: Pros and Cons  Pros: – Transparency is very appealing – Simple programming model – Useful as organizing principle even when not fully automated

 Cons – Getting language details right is tricky (e.g. exceptions) – No client/server overlap: doesn’t work well for long-running operations – May not optimize large transfers well – Not all APIs make sense to remote: e.g. answer = search(tree) – Versioning can be a problem: client and server need to agree exactly on interface (or have rules for dealing with differences)

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Distributed Object Systems

© 2010 Noah Mendelsohn

How do you build an RPC for this? Class int int int }

Point { x,y getx() {return x;} gety() {return y;}

Class Rectangle { …members and constructs not shown… Point getUpperLeft() {…}; Point getLowerRight {…}; }

Call method on remoted object

int area (Rectangle r) { width=r.getLowerRight().getx() – r.getUpperLeft.getx(); width=r.getLowerRight().gety() – r.getUpperLeft.gety(); }

myRect = new Rectangle; …assume position set here.. int a = area(myRect); // REMOTE THIS CALL!

Pass object to remote method

Distributed Object systems make this work! © 2010 Noah Mendelsohn

Distributed object systems  In the 1990s, seemed like a great idea  Advantages of OO encapsulation & inheritance + RPC  Examples – CORBA (Industry standard) – DCOM (Microsoft)

 Still quite widely used within enterprises  Complicated – – – – –

Marshalling object references Distributed object lifetime management Brokering: which object provides the service today Remote “new”: creating objects on remote systems All the pros & cons of RPC, plus the above

 Generally not appropriate at Internet scale

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing Some Other Options

© 2010 Noah Mendelsohn

Special Purpose Models  Remote File System – Network provides transparent access to remote files – Examples: NFS, CIFS

 Remote Database – Examples: ODBJ, JDBC

 Remote Device – Remote printing, disk drive etc.

 Virtual terminal – One computer simulates an interactive terminal to another

© 2010 Noah Mendelsohn

Some other interesting models  Broadcast / multicast – Send messages to everyone (broadcast) / named group (multicast)

 Publish / subscribe (pub/sub) – Subscribe to named events or based on query filter – Call me whenever Pepsi’s stock price changes – Implements a distributed associative memory

 Reliable queuing – – – –

Examples: IBM MQSeries, Java Message Service (JMS) Model: queued messages, preserved across hardware crashes Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions; Depends on disk-based transaction systems at each node to keep queues

 Tuple spaces – Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM) – Network-scale shared variable space, with synchronization – Good for queues of work to do: some cloud architectures use a related model to distribute work to servers

© 2010 Noah Mendelsohn

Stateful and Stateless Protocols

© 2010 Noah Mendelsohn

Stateful and Stateless Protocols  Stateful: server knows which step (state) has been reached  Stateless: – Client remembers the state, sends to server each time – Server processes each request independently

 Can vary with level – Many systems like Web run stateless protocols (e.g. HTTP) over streams…at the packet level, TCP streams are stateful – HTTP itself is mostly stateless, but many HTTP requests (typically POSTs) update persistent state at the server

© 2010 Noah Mendelsohn

Advantages of stateless protocols  Protocol usually simpler  Server processes each request independently  Load balancing and restart easier  Typically easier to scale and make fault-tolerant  Visibility: individual requests more self-describing

© 2010 Noah Mendelsohn

Advantages of stateful protocols  Individual messages carry less data  Server does not have to re-establish context each time  There’s usually some changing state at the server at some level, except for completely static publishing systems

© 2010 Noah Mendelsohn

Text vs. Binary Protocols

© 2010 Noah Mendelsohn

Protocols can be text or binary on the wire  Text: messages are encoded characters  Binary: any bit patterns  Pros and cons quite similar to those for text vs. binary file formats  When sending between compatible machines, binary can be much faster because no conversion needed  Most Internet-scale application protocols (HTTP, SMTP) use text for protocol elements and for all content except photo/audio/video  HTTP 2.0 moving to binary (for msg size and parsing speed)

© 2010 Noah Mendelsohn

Summary

© 2010 Noah Mendelsohn

Summary  The machine-level model is complex: multiple CPUs, memories  A number of abstractions are widely used for limited-scale distribution  RPC is among the most interesting and successful  Statefulness / statelessness is a key design tradeoff  We’ll see next time why a new model was needed for the Web

© 2010 Noah Mendelsohn