781 Distributed Computing Systems. Review of Lecture 1. Lecture 2

2 Outline EEC-681/781 Distributed Computing Systems „ Overview of distributed systems ¾ ¾ Lecture 2 ¾ Design Goals (part 2) Hardware Concepts S...
Author: Mae Newton
2 downloads 2 Views 77KB Size
2

Outline

EEC-681/781 Distributed Computing Systems

„

Overview of distributed systems ¾ ¾

Lecture 2

¾

Design Goals (part 2) Hardware Concepts Software Concepts

Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University [email protected] 22 January 2006

EEC686/785

Wenbing Zhao

3

4

Definition of a Distributed System

Review of Lecture 1 „

Definition of distributed systems

„

Design goals (part 1)

„

Original: A collection of independent computers that appear to the users as a single coherent system

„

Modified: A piece of software that ensures a collection of autonomous computers to appear as a single coherent system ¾ ¾

22 January 2006

EEC686/785

Wenbing Zhao

Autonomous computers connected by a network Software specifically designed to provide an integrated computing facility

22 January 2006

EEC686/785

Wenbing Zhao

1

5

6

Design Goals „

Connecting users and resources

„

Transparency: ¾

„

„

Services provided are based on standards

Flexibility: ¾

„

Access transparency

„

Location transparency

„

Migration transparency

„

Relocation transparency

„

Replication transparency

„

Concurrency transparency

„

Persistency transparency

Users feel like they are using a single-user system

Openness: ¾

„

Distribution Transparency

Separation of policy and mechanisms

Scalability, availability, security

22 January 2006

EEC686/785

22 January 2006

Wenbing Zhao

EEC686/785

Wenbing Zhao

7

8

Distribution Transparency Mini-quiz „

The transparency that hides where a resource is located is called: a) b) c) d)

„

b) c) d)

a) b) c) d)

Access transparency Location transparency Relocation transparency Migration transparency

22 January 2006

The transparency that hides the fact that a resource may move to another location is called

„

Access transparency Location transparency Relocation transparency Migration transparency

The transparency that hides differences in data representation and how a resource is accessed a)

Distribution Transparency Mini-quiz

The transparency that hides the fact that a resource may be moved to another location while in use is called

„

a) b) c) d)

EEC686/785

Wenbing Zhao

Access transparency Location transparency Relocation transparency Migration transparency

Access transparency Location transparency Relocation transparency Migration transparency

22 January 2006

EEC686/785

Wenbing Zhao

2

9

10

Replication Transparency „

Concurrency Transparency

Replication transparency - Hide that a resource is replicated ¾ ¾

„

More than one copy is available All replica should have the same visible name

Concurrency transparency - Hide that a resource may be shared by several competitive users ¾

¾

¾

22 January 2006

EEC686/785

Wenbing Zhao

This feature is really nothing new. Operating systems have been offering concurrency transparency for a number of decades Easy to guarantee if accesses to the same resource are all read-only Care must be taken to maintain consistence if some accesses are updates

22 January 2006

EEC686/785

Wenbing Zhao

11

12

Failure Transparency „

Persistency Transparency

Failure Transparency - Hide the failure and recovery of a resource ¾ ¾

„

Persistency Transparency - Hide whether a (software) resource is in memory or on disk

Can be achieved through replication But, very challenging and costly in general

22 January 2006

EEC686/785

Wenbing Zhao

22 January 2006

EEC686/785

Wenbing Zhao

3

13

14

Degree of Transparency „

Degree of Transparency

Observation: Aiming at full distribution transparency may be

„

too much „

Sometime distribution is apparent and not something you want to hide, e.g., users may be located in different continents

„

Completely hiding failures of networks and nodes is (theoretically and practically) impossible ¾ ¾

Full transparency will cost performance, exposing distribution of the system ¾ Keeping Web caches exactly up-to-date with the master copy ¾ Immediately flushing write operations to disk for fault tolerance

You cannot distinguish a slow computer from a failing one You can never be sure that a server actually performed an operation before a crash

22 January 2006

EEC686/785

Wenbing Zhao

22 January 2006

EEC686/785

Wenbing Zhao

15

16

Openness of Distributed Systems „

Open distributed system: Be able to interact with services from other open systems, irrespective of the underlying environment ¾

Systems should conform to well-defined interfaces Systems should support portability of applications

¾

Systems should easily interoperate

¾

22 January 2006

EEC686/785

Openness of Distributed Systems „

Achieving openness: At least make the distributed system independent from heterogeneity of the underlying environment ¾ ¾ ¾

Wenbing Zhao

Hardware Platforms Languages

22 January 2006

EEC686/785

Wenbing Zhao

4

17

18

Implementation Openness

Implementation Openness

„

Openness requires flexibility

„

Implementing openness: Requires support for different policies specified by applications and users ¾

¾

¾

¾

„

¾

What level of consistency do we require for client cached data? Which operations do we allow downloaded code to perform? Which QoS requirements do we adjust in the face of varying bandwidth? What level of secrecy do we require for communication?

22 January 2006

EEC686/785

Implementing openness: Ideally, a distributed system provides only mechanisms:

¾ ¾ ¾

Wenbing Zhao

Allow (dynamic) setting of caching policies, preferably per cacheable item Support different levels of trust for mobile code Provide adjustable QoS parameters per data stream Offer different encryption algorithms

22 January 2006

EEC686/785

Wenbing Zhao

19

20

Mechanisms and Policies „

„

Example: Managing a Queue

Mechanisms determine how to do something while policies decide what should be done The separation of policy from mechanism allows maximum flexibility in choosing policies and if policy decisions are to be changed later

22 January 2006

EEC686/785

Wenbing Zhao

„

Let’s use an abstract priority queue as example

„

We need to support mechanisms for: ¾ ¾ ¾

Insert/Delete items at start Insert/Delete items at end Know length of queue

„

The queue can be implemented in different ways

„

Policies can be for example FIFO, LIFO – should be decided by queue user

22 January 2006

EEC686/785

Wenbing Zhao

5

21

22

Size Scalability

Scale in Distributed Systems „

Scalability can be measured at three dimensions: ¾

¾

¾

„

„

Size scalability – We can easily add more users and resources to the system Geographical scalability – users and resources may lie far apart geographically Administrative scalability – The system can still be easy to manage even if it spans many independent administrative organizations

¾

„

EEC686/785

¾ ¾

Wenbing Zhao

“I think there is a world market for maybe five computers”

Internet: ¾

Scalability problems in distributed systems appear as performance problems caused by limited capacity of servers and network

22 January 2006

Thomas J. Watson, Chairman of IBM, 1943:

July 1993: 1,776,000 computers July 1999: 56,218,000 computers January 2002: 168,000,000 computers and > 23,000,000 DNS domains

22 January 2006

EEC686/785

Wenbing Zhao

23

24

Size Scalability Problems

Size Scalability Problems „

Concept

Example

Centralized services

A single server for all users

Centralized data

A single on-line telephone book

Centralized algorithms

Doing routing based on complete information

22 January 2006

EEC686/785

Wenbing Zhao

Problem running centralized algorithms in distributed systems ¾

„

Would result in enormous number of messages have to be routed over many lines

Any algorithm that operates by collecting information from all sites, sends it to a single machine for processing, and then distributes the results must be avoided

22 January 2006

EEC686/785

Wenbing Zhao

6

25

26

Geographical Scalability Problems

Decentralized Algorithm Characteristics „

No machine has complete information about the system state

„

Interprocess communication in WANs has much longer latency than that in LANs

„

Machines make decisions based only on local information

„

Communication in WANs is inherently unreliable, and virtually always point-to-point

„

Failure of one machine does not ruin the algorithm

„

„

There is no implicit assumption that a global clock exists

Centralized components would reduce geographical scalability, just as does to size scalability

22 January 2006

EEC686/785

Wenbing Zhao

22 January 2006

EEC686/785

27

28

Techniques for Scaling

Administration Scalability Problems „

Different administrative domain usually impose different policies, e.g., with respect to resource usage, management, and security

22 January 2006

EEC686/785

Wenbing Zhao

Wenbing Zhao

„

Hiding communication latencies

„

Distribution

„

Replication

22 January 2006

EEC686/785

Wenbing Zhao

7

29

Hiding Communication Latencies „

Hiding communication latencies is applicable to in the case of geographical scalability

„

Technique #1: Try to avid waiting for responses to remote service requests as much as possible

„

Technique #2: Reduce the overall communication by moving part of the computation that is normally done at the server to the client process requesting the service

22 January 2006

EEC686/785

Wenbing Zhao

Hiding Communication Latencies – Move Computation to Clients

22 January 2006

EEC686/785

31

Technique for Scaling - Distribution „

„

30

Wenbing Zhao

32

Decentralized Naming Service

Distribution: Partition data and computations across multiple machines Examples: Domain name services (DNS) ¾

DNS name space is hierarchically organized into a tree of domains, which are divided into nonoverlapping zones

22 January 2006

EEC686/785

Wenbing Zhao

22 January 2006

EEC686/785

Wenbing Zhao

8

33

34

Techniques for Scaling – Replication

Techniques for Scaling – Replication

„

Replication: Make copies of data available at different machines across the distributed system

„

Examples: ¾ ¾ ¾ ¾

Replicated file servers (mainly for fault tolerance) Replicated databases Mirrored Web sites Large-scale distributed shared memory systems

22 January 2006

EEC686/785

Wenbing Zhao

„

Replication not only increases availability, but also helps to balance the load between components leading to better performance

„

Replication also help increase the geographical scalability by placing a copy nearby different users

22 January 2006

EEC686/785

Wenbing Zhao

35

36

Problem with Scaling by Replication „

Applying scaling techniques through replication sounds straightforward, but be aware that having multiple copies might leads to inconsistencies: ¾ ¾

¾

modifying one copy makes that copy different from the rest Always keeping copies consistent and in a general way requires global synchronization on each modification Global synchronization precludes large-scale solutions

22 January 2006

EEC686/785

Wenbing Zhao

Problem with Scaling by Replication „

If we can tolerate inconsistencies, we may reduce the need for global synchronization

„

Tolerating inconsistencies is application dependent

22 January 2006

EEC686/785

Wenbing Zhao

9

37

38

Techniques for Scaling – Caching Caching: A special form of replication. It allows client processes to access local copies

„

¾ ¾

Web caches (browser/Web proxy) File caching (at server and client)

Distributed Systems: Hardware Concepts „

Multiprocessors

„

Multicomputers

„

Networks of Computers

Similarity to replication: making a copy of a resource, generally in the proximity of the client accessing that resource Difference from replication: caching is a decision made by the client of a resource, not by the owner of the resource

„

„

22 January 2006

EEC686/785

Wenbing Zhao

22 January 2006

EEC686/785

39

40

Networks of Computers

Multiprocessors and Multicomputers „

„

Distinguishing features: ¾

¾

High degree of node heterogeneity: ¾

Private versus shared memory Bus versus switched interconnection

¾ ¾ ¾ ¾

„

¾ ¾

EEC686/785

Wenbing Zhao

High-performance parallel systems (multiprocessors as well as multicomputers) High-end PCs and workstations (servers) Simple network computers (offer users only network access) Mobile computers (palmtops, laptops) Multimedia workstations

High degree of network heterogeneity: ¾

22 January 2006

Wenbing Zhao

Local-area gigabit networks Wireless connections Wide-area switched megabit connections

22 January 2006

EEC686/785

Wenbing Zhao

10

41

42

Distributed Operating Systems

Distributed Systems: Software Concepts „

„

An overview between ¾ ¾ ¾

¾ ¾

System

Description

Main Goal

DOS

Tightly-coupled operating system for multiprocessors and homogeneous multicomputers

Hide and manage hardware resources

NOS

Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN)

Offer local services to remote clients

Middleware

Additional layer atop of NOS implementing general-purpose services

Provide distribution transparency

22 January 2006

EEC686/785

Some characteristics ¾

DOS (Distributed Operating Systems) NOS (Network Operating Systems) Middleware

Wenbing Zhao

OS on each computer knows about the other computers OS on different computers generally the same Services are generally (transparently) distributed across computers

22 January 2006

EEC686/785

Wenbing Zhao

43

44

Network Operating System

Multicomputer Operating Systems „

Harder than traditional (multiprocessor) OS, because memory is not shared, emphasis shifts to processor communication by message passing: ¾ ¾ ¾

¾

Often no simple global communication No simple system-wide synchronization mechanisms Virtual (distributed) shared memory requires OS to maintain global memory map in software Inherent distributed resource management: no central point where allocation decisions can be made

22 January 2006

EEC686/785

Wenbing Zhao

„

Some characteristics: ¾ ¾

¾ ¾

Each computer has its own operating system with networking facilities Computers work independently (i.e., they may even have different operating systems) Services are tied to individual nodes (ftp, ssh) Highly file oriented (basically, processors share only files)

22 January 2006

EEC686/785

Wenbing Zhao

11

45

46

Network Operating System „

Network Operating System „

Two clients and a server in a network operating system

22 January 2006

EEC686/785

Wenbing Zhao

Different clients may mount the servers in different places.

22 January 2006

EEC686/785

Wenbing Zhao

47

Distributed System (Middleware-based) „

Characteristics: ¾ ¾ ¾

OS on each computer need not know about the other computers OS on different computers need not generally be the same Services are generally (transparently) distributed across computers

22 January 2006

EEC686/785

Wenbing Zhao

12