2
Outline
EEC-681/781 Distributed Computing Systems
Overview of distributed systems ¾ ¾
Lecture 2
¾
Design Goals (part 2) Hardware Concepts Software Concepts
Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
[email protected] 22 January 2006
EEC686/785
Wenbing Zhao
3
4
Definition of a Distributed System
Review of Lecture 1
Definition of distributed systems
Design goals (part 1)
Original: A collection of independent computers that appear to the users as a single coherent system
Modified: A piece of software that ensures a collection of autonomous computers to appear as a single coherent system ¾ ¾
22 January 2006
EEC686/785
Wenbing Zhao
Autonomous computers connected by a network Software specifically designed to provide an integrated computing facility
22 January 2006
EEC686/785
Wenbing Zhao
1
5
6
Design Goals
Connecting users and resources
Transparency: ¾
Services provided are based on standards
Flexibility: ¾
Access transparency
Location transparency
Migration transparency
Relocation transparency
Replication transparency
Concurrency transparency
Persistency transparency
Users feel like they are using a single-user system
Openness: ¾
Distribution Transparency
Separation of policy and mechanisms
Scalability, availability, security
22 January 2006
EEC686/785
22 January 2006
Wenbing Zhao
EEC686/785
Wenbing Zhao
7
8
Distribution Transparency Mini-quiz
The transparency that hides where a resource is located is called: a) b) c) d)
b) c) d)
a) b) c) d)
Access transparency Location transparency Relocation transparency Migration transparency
22 January 2006
The transparency that hides the fact that a resource may move to another location is called
Access transparency Location transparency Relocation transparency Migration transparency
The transparency that hides differences in data representation and how a resource is accessed a)
Distribution Transparency Mini-quiz
The transparency that hides the fact that a resource may be moved to another location while in use is called
a) b) c) d)
EEC686/785
Wenbing Zhao
Access transparency Location transparency Relocation transparency Migration transparency
Access transparency Location transparency Relocation transparency Migration transparency
22 January 2006
EEC686/785
Wenbing Zhao
2
9
10
Replication Transparency
Concurrency Transparency
Replication transparency - Hide that a resource is replicated ¾ ¾
More than one copy is available All replica should have the same visible name
Concurrency transparency - Hide that a resource may be shared by several competitive users ¾
¾
¾
22 January 2006
EEC686/785
Wenbing Zhao
This feature is really nothing new. Operating systems have been offering concurrency transparency for a number of decades Easy to guarantee if accesses to the same resource are all read-only Care must be taken to maintain consistence if some accesses are updates
22 January 2006
EEC686/785
Wenbing Zhao
11
12
Failure Transparency
Persistency Transparency
Failure Transparency - Hide the failure and recovery of a resource ¾ ¾
Persistency Transparency - Hide whether a (software) resource is in memory or on disk
Can be achieved through replication But, very challenging and costly in general
22 January 2006
EEC686/785
Wenbing Zhao
22 January 2006
EEC686/785
Wenbing Zhao
3
13
14
Degree of Transparency
Degree of Transparency
Observation: Aiming at full distribution transparency may be
too much
Sometime distribution is apparent and not something you want to hide, e.g., users may be located in different continents
Completely hiding failures of networks and nodes is (theoretically and practically) impossible ¾ ¾
Full transparency will cost performance, exposing distribution of the system ¾ Keeping Web caches exactly up-to-date with the master copy ¾ Immediately flushing write operations to disk for fault tolerance
You cannot distinguish a slow computer from a failing one You can never be sure that a server actually performed an operation before a crash
22 January 2006
EEC686/785
Wenbing Zhao
22 January 2006
EEC686/785
Wenbing Zhao
15
16
Openness of Distributed Systems
Open distributed system: Be able to interact with services from other open systems, irrespective of the underlying environment ¾
Systems should conform to well-defined interfaces Systems should support portability of applications
¾
Systems should easily interoperate
¾
22 January 2006
EEC686/785
Openness of Distributed Systems
Achieving openness: At least make the distributed system independent from heterogeneity of the underlying environment ¾ ¾ ¾
Wenbing Zhao
Hardware Platforms Languages
22 January 2006
EEC686/785
Wenbing Zhao
4
17
18
Implementation Openness
Implementation Openness
Openness requires flexibility
Implementing openness: Requires support for different policies specified by applications and users ¾
¾
¾
¾
¾
What level of consistency do we require for client cached data? Which operations do we allow downloaded code to perform? Which QoS requirements do we adjust in the face of varying bandwidth? What level of secrecy do we require for communication?
22 January 2006
EEC686/785
Implementing openness: Ideally, a distributed system provides only mechanisms:
¾ ¾ ¾
Wenbing Zhao
Allow (dynamic) setting of caching policies, preferably per cacheable item Support different levels of trust for mobile code Provide adjustable QoS parameters per data stream Offer different encryption algorithms
22 January 2006
EEC686/785
Wenbing Zhao
19
20
Mechanisms and Policies
Example: Managing a Queue
Mechanisms determine how to do something while policies decide what should be done The separation of policy from mechanism allows maximum flexibility in choosing policies and if policy decisions are to be changed later
22 January 2006
EEC686/785
Wenbing Zhao
Let’s use an abstract priority queue as example
We need to support mechanisms for: ¾ ¾ ¾
Insert/Delete items at start Insert/Delete items at end Know length of queue
The queue can be implemented in different ways
Policies can be for example FIFO, LIFO – should be decided by queue user
22 January 2006
EEC686/785
Wenbing Zhao
5
21
22
Size Scalability
Scale in Distributed Systems
Scalability can be measured at three dimensions: ¾
¾
¾
Size scalability – We can easily add more users and resources to the system Geographical scalability – users and resources may lie far apart geographically Administrative scalability – The system can still be easy to manage even if it spans many independent administrative organizations
¾
EEC686/785
¾ ¾
Wenbing Zhao
“I think there is a world market for maybe five computers”
Internet: ¾
Scalability problems in distributed systems appear as performance problems caused by limited capacity of servers and network
22 January 2006
Thomas J. Watson, Chairman of IBM, 1943:
July 1993: 1,776,000 computers July 1999: 56,218,000 computers January 2002: 168,000,000 computers and > 23,000,000 DNS domains
22 January 2006
EEC686/785
Wenbing Zhao
23
24
Size Scalability Problems
Size Scalability Problems
Concept
Example
Centralized services
A single server for all users
Centralized data
A single on-line telephone book
Centralized algorithms
Doing routing based on complete information
22 January 2006
EEC686/785
Wenbing Zhao
Problem running centralized algorithms in distributed systems ¾
Would result in enormous number of messages have to be routed over many lines
Any algorithm that operates by collecting information from all sites, sends it to a single machine for processing, and then distributes the results must be avoided
22 January 2006
EEC686/785
Wenbing Zhao
6
25
26
Geographical Scalability Problems
Decentralized Algorithm Characteristics
No machine has complete information about the system state
Interprocess communication in WANs has much longer latency than that in LANs
Machines make decisions based only on local information
Communication in WANs is inherently unreliable, and virtually always point-to-point
Failure of one machine does not ruin the algorithm
There is no implicit assumption that a global clock exists
Centralized components would reduce geographical scalability, just as does to size scalability
22 January 2006
EEC686/785
Wenbing Zhao
22 January 2006
EEC686/785
27
28
Techniques for Scaling
Administration Scalability Problems
Different administrative domain usually impose different policies, e.g., with respect to resource usage, management, and security
22 January 2006
EEC686/785
Wenbing Zhao
Wenbing Zhao
Hiding communication latencies
Distribution
Replication
22 January 2006
EEC686/785
Wenbing Zhao
7
29
Hiding Communication Latencies
Hiding communication latencies is applicable to in the case of geographical scalability
Technique #1: Try to avid waiting for responses to remote service requests as much as possible
Technique #2: Reduce the overall communication by moving part of the computation that is normally done at the server to the client process requesting the service
22 January 2006
EEC686/785
Wenbing Zhao
Hiding Communication Latencies – Move Computation to Clients
22 January 2006
EEC686/785
31
Technique for Scaling - Distribution
30
Wenbing Zhao
32
Decentralized Naming Service
Distribution: Partition data and computations across multiple machines Examples: Domain name services (DNS) ¾
DNS name space is hierarchically organized into a tree of domains, which are divided into nonoverlapping zones
22 January 2006
EEC686/785
Wenbing Zhao
22 January 2006
EEC686/785
Wenbing Zhao
8
33
34
Techniques for Scaling – Replication
Techniques for Scaling – Replication
Replication: Make copies of data available at different machines across the distributed system
Examples: ¾ ¾ ¾ ¾
Replicated file servers (mainly for fault tolerance) Replicated databases Mirrored Web sites Large-scale distributed shared memory systems
22 January 2006
EEC686/785
Wenbing Zhao
Replication not only increases availability, but also helps to balance the load between components leading to better performance
Replication also help increase the geographical scalability by placing a copy nearby different users
22 January 2006
EEC686/785
Wenbing Zhao
35
36
Problem with Scaling by Replication
Applying scaling techniques through replication sounds straightforward, but be aware that having multiple copies might leads to inconsistencies: ¾ ¾
¾
modifying one copy makes that copy different from the rest Always keeping copies consistent and in a general way requires global synchronization on each modification Global synchronization precludes large-scale solutions
22 January 2006
EEC686/785
Wenbing Zhao
Problem with Scaling by Replication
If we can tolerate inconsistencies, we may reduce the need for global synchronization
Tolerating inconsistencies is application dependent
22 January 2006
EEC686/785
Wenbing Zhao
9
37
38
Techniques for Scaling – Caching Caching: A special form of replication. It allows client processes to access local copies
¾ ¾
Web caches (browser/Web proxy) File caching (at server and client)
Distributed Systems: Hardware Concepts
Multiprocessors
Multicomputers
Networks of Computers
Similarity to replication: making a copy of a resource, generally in the proximity of the client accessing that resource Difference from replication: caching is a decision made by the client of a resource, not by the owner of the resource
22 January 2006
EEC686/785
Wenbing Zhao
22 January 2006
EEC686/785
39
40
Networks of Computers
Multiprocessors and Multicomputers
Distinguishing features: ¾
¾
High degree of node heterogeneity: ¾
Private versus shared memory Bus versus switched interconnection
¾ ¾ ¾ ¾
¾ ¾
EEC686/785
Wenbing Zhao
High-performance parallel systems (multiprocessors as well as multicomputers) High-end PCs and workstations (servers) Simple network computers (offer users only network access) Mobile computers (palmtops, laptops) Multimedia workstations
High degree of network heterogeneity: ¾
22 January 2006
Wenbing Zhao
Local-area gigabit networks Wireless connections Wide-area switched megabit connections
22 January 2006
EEC686/785
Wenbing Zhao
10
41
42
Distributed Operating Systems
Distributed Systems: Software Concepts
An overview between ¾ ¾ ¾
¾ ¾
System
Description
Main Goal
DOS
Tightly-coupled operating system for multiprocessors and homogeneous multicomputers
Hide and manage hardware resources
NOS
Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN)
Offer local services to remote clients
Middleware
Additional layer atop of NOS implementing general-purpose services
Provide distribution transparency
22 January 2006
EEC686/785
Some characteristics ¾
DOS (Distributed Operating Systems) NOS (Network Operating Systems) Middleware
Wenbing Zhao
OS on each computer knows about the other computers OS on different computers generally the same Services are generally (transparently) distributed across computers
22 January 2006
EEC686/785
Wenbing Zhao
43
44
Network Operating System
Multicomputer Operating Systems
Harder than traditional (multiprocessor) OS, because memory is not shared, emphasis shifts to processor communication by message passing: ¾ ¾ ¾
¾
Often no simple global communication No simple system-wide synchronization mechanisms Virtual (distributed) shared memory requires OS to maintain global memory map in software Inherent distributed resource management: no central point where allocation decisions can be made
22 January 2006
EEC686/785
Wenbing Zhao
Some characteristics: ¾ ¾
¾ ¾
Each computer has its own operating system with networking facilities Computers work independently (i.e., they may even have different operating systems) Services are tied to individual nodes (ftp, ssh) Highly file oriented (basically, processors share only files)
22 January 2006
EEC686/785
Wenbing Zhao
11
45
46
Network Operating System
Network Operating System
Two clients and a server in a network operating system
22 January 2006
EEC686/785
Wenbing Zhao
Different clients may mount the servers in different places.
22 January 2006
EEC686/785
Wenbing Zhao
47
Distributed System (Middleware-based)
Characteristics: ¾ ¾ ¾
OS on each computer need not know about the other computers OS on different computers need not generally be the same Services are generally (transparently) distributed across computers
22 January 2006
EEC686/785
Wenbing Zhao
12