IN THE USUAL client server model, clients access object

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998 1 Using Name-Based Mappings to Increase Hit Rates David G. Thaler, Student Member,...

Author: George Holmes

1 downloads 0 Views 209KB Size

Report

Download PDF

Recommend Documents

Figure 1. The Web V1.0 model of client-server interaction. Figure 2. The Ajax model of client-server interaction

CPS221 Lecture: The Application Layer and the Client-Server Model

Client-Server

The Model Object in EViews

Client-Server Programming in Java

Client-Server Applications in LabVIEW

CONTENTS. Contents. System Overview. System Components. Thick Clients. Web Client. Server. Storage SQL. Client Applications. Server and Storage

Server-side object recognition and client-side object tracking for mobile augmented reality

DHCP Client and Server

client-server architecture

Client Server Interac.ons

DHCP Client and Server

Multi-client Server Programming

Client Server Computing

ImageNow Server and Client

The new Object Model in ADO.NET

Client-Server mit Sockets in Java

Access Point and Access Server

Network Access Clients - Overview Release Notes for Barracuda VPN Clients & Network Access Clients

10. Anwendungsprogrammierung. Client-Server-Architektur. Funktionsgruppen in DB-Anwendungen. Client-Server-Modell

FTP Access Guide for Clients

The ADO 2.0 Object Model

Internet Protocols & Client Server Programming

IEC Client and Server Simulator

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

1

Using Name-Based Mappings to Increase Hit Rates David G. Thaler, Student Member, IEEE, and Chinya V. Ravishankar, Member, IEEE Abstract—Clusters of identical intermediate servers are often created to improve availability and robustness in many domains. The use of proxy servers for the World Wide Web (WWW) and of rendezvous points in multicast routing are two such situations. However, this approach can be inefficient if identical requests are received and processed by multiple servers. We present an analysis of this problem, and develop a method called the highest random weight (HRW) mapping that eliminates these difficulties. Given an object name and a set of servers, HRW maps a request to a server using the object name, rather than any a priori knowledge of server states. Since HRW always maps a given object name to the same server within a given cluster, it may be used locally at client sites to achieve consensus on object–server mappings. We present an analysis of HRW and validate it with simulation results showing that it gives faster service times than traditional request allocation schemes such as round-robin or least-loaded, and adapts well to changes in the set of servers. HRW is particularly applicable to domains in which there are a large number of requestable objects, there is a significant probability that a requested object will be requested again, and the CPU load due to any single object can be handled by a single server. HRW has now been adopted by the multicast routing protocols PIMv2 and CBTv2 as its mechanism for routers to identify rendezvous points/cores. Index Terms— Caching, client–server systems, computer networks, distributed agreement, multicast routing, proxies, World Wide Web.

I. INTRODUCTION

I

N THE USUAL client–server model, clients access object data or services that are made available by servers. A single-server system is not robust, however, and often provides insufficient resources to handle a large number of requests. Thus, clusters of equivalent servers can be used to increase service availability and to lower the workload on individual servers. In this paper, we investigate how a client may map a request for a particular object to a server, so as to minimize response time. In particular, we will limit our scope to domains with the following characteristics. • All requests for the same class of objects are handled by a cluster of servers with equivalent functionality and capacity. • The set of servers in the cluster is known to clients prior to issuing requests, and all clients see the same Manuscript received November 26, 1996; revised June 20, 1997; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor D. Estrin. This work was supported in part by the National Science Foundation under Grant NCR9 417 032. The authors are with the Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI 48109-2122 USA (email: [email protected]; [email protected]). Publisher Item Identifier S 1063-6692(98)01075-9.

set of servers. For example, this information may be statically configured, looked up at client startup time, or periodically distributed to clients. • There is some notion of a “hit rate” at a server, so that a server responds more quickly to a duplicate request than to a first-time request. This is clearly true when the server functions as a cache. It is also true if clients need to locate the server which was assigned to do a particular task. • The benefit of object replication across servers is negligible. That is, the CPU load due to any single object can be handled by a single server. The above characteristics cover a wide variety of domains. Some examples include: 1) Real-Time Producer–Consumer Systems: For example, in multicast routing protocols such as the core-based trees (CBT) [1] and protocol independent multicast (PIM) [2] protocols, receivers’ routers request data for a specific session by sending a join request toward the root of a distribution tree for that session, and sources send data to a session via the root of its tree. The root thus takes on the role of a server, with receivers and sources becoming clients. Sources and receivers must rendezvous at the root for effective data transfer. 2) Client-Side WWW Proxy Caches: In the World Wide Web (WWW), pages can be cached at proxy servers [3], [4]. All outbound client requests can then go through a local proxy server. If the proxy server has the page cached, the page is returned to the client without accessing the remote provider. Otherwise, the page is retrieved and cached for future use. Note that server-side proxy caches often do not fall within the relevant class of domains. When proxy caches are placed near servers to handle inbound requests, the number of requests for a single object may represent enough central processing unit (CPU) load to make object replication more effective. 3) Task Delegation: In task brokering systems, tasks may be delegated to various servers. Any clients which desire to interact with such a task must then contact the server running that task. For these domains, we will present an efficient algorithm which maps requests to servers such that requests for the same object are sent to the same server, while requests for different objects are split among multiple servers. We will refer to this concept as a “name-based” mapping. It is possible to view the case where all clients send requests for the same object to the same server as defining affinities between objects and servers in a cluster (Fig. 1). A number of studies (e.g., [5]–[7]) have examined the related notion of “cache-affinity” scheduling in the context of shared-memory multiprocessors, in which tasks are sent to processors which already have data cached. This achieves higher cache hit rates at the possible expense of load balancing.

1063–6692/98$10.00  1998 IEEE

2

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

Fig. 1. Object–server affinities.

Several fundamental differences exist, however, between multiprocessor systems and distributed systems which limit the applicability of cache-affinity scheduling. First, multiprocessor systems typically assume centralized schedulers, whereas clients in decentralized distributed systems independently choose servers and direct requests to them. A centralized scheduler in a distributed system also represents an undesirable performance bottleneck and a single point of failure. Second, a centralized scheduler can maintain up-to-date processor load information, but such information is expensive to obtain in distributed systems, since scheduling may be done locally at clients distant from the servers (in terms of latency). Finally, in some multiprocessor systems, a request in progress may be migrated to another processor as part of the scheduling algorithm. Such migration is usually undesirable or even impossible in distributed systems, and migrated requests must typically be restarted. Although cache-affinity scheduling algorithms are not directly applicable in our domains, our goals are similar: to increase the cache hit rates and thus reduce latency by using appropriate scheduling. We highlight the importance of using a sensible policy for reducing replication and improving hit rates by directing requests for a given object to the same server. This paper makes two major contributions. First, it gives a general model for mapping requests to servers: Section III describes the model and typical goals of mapping functions, covers previous work, and shows how common mapping functions fit within our model. Second, we present our “namebased” mapping function: Section IV develops this notion, Section V provides an analysis of our mapping function, and Section VI describes efficient implementations. Finally, applications in two popular domains are examined as case studies in Section VII. II. GOALS

FOR

MAPPINGS

Algorithms for mapping requests to individual servers within clusters have typically concentrated on two goals: load balancing and low mapping overhead. We argue in this paper that a number of other goals are important as well. We discuss these goals in this paper, and proceed to develop the notion of name-based mappings. We also propose a new mapping method called highest random weight (HRW) mapping that meets all the criteria discussed in this section. A. Traditional Goals We begin by considering the two goals on which conventional mapping algorithms have focused.

Goal 1 (Low Overhead): The latency introduced in picking a server within a cluster must be as small as possible. Schemes which require purely local decisions have low overhead, while schemes requiring an additional exchange of network messages have high overhead. Also, for an algorithm to be applicable to many domains, the mapping function must be portable and fast enough to use in high-speed applications. Goal 2 (Load Balancing): To guarantee uniform latency, requests should be distributed among servers so that each server sees an equal share of the resulting load (over both the short and long term) regardless of the object size and popularity distributions. To ensure that this goal is realized, one must first define what load balancing means, and consider how loads are generated. It is well known (e.g., [8]) that patterns of requests can be very bursty when the request stream includes machine-initiated requests, so that the arrival process of requests is not, in general, Poisson. The packet train [9] model is a widely used alternative to the traditional Poisson arrival model, and appears to model such request patterns well. We therefore adopt this model, and assume that requests arrive in batches, or “trains.” For load balancing over short time intervals, we focus on the load resulting from the arrival of a single train of requests. In this case, we desire that the load resulting from each train (in terms of amount of resources consumed) be split nearly equally among the available servers. After the individual requests in a packet train are assigned to servers, there are many possibilities for the load on any single server. We can model the load on a given server as the value of a random variable . The random variable may be viewed as representing a load distribution defined over an infinite population of identical servers, so that the load on any one server is a single value sampled from this distribution. We can similarly model the loads within a cluster of size as a sample of size from this distribution. For the algorithms we analyze in this paper, we may consider this sample to independent arbitrarily distributed random be the values of with identical means and variances. variables Let the mean and variance of the distributions of the be and , respectively. If are the load values sampled from the , we are interested in defining a measure to ensure that these sampled values are as close to each other as possible. The sample standard deviation provides a good measure of dispersion. However, the value of depends on the magnitudes of the sample values whose expectation is the population mean. We therefore normalize by dividing by the population mean, and deal with the ratio Our goal will be to minimize this ratio. requests Consider an incoming packet train that contains Since these requests represent loads, we simply treat them as loads. Let these requests be drawn from indiLet there vidual distributions with mean and variance servers and let each server be assigned be requests. The resulting load on server is . We thus a sum of random variables begin with the following lemma. Lemma 1: Let be a random variable whose values represent the load on server The

THALER AND RAVISHANKAR: USING NAME-BASED MAPPINGS

distribution of is approximately normal with mean and variance as Proof: The result follows immediately from the centrallimit theorem. See any standard book dealing with sampling theory ([10], for example). Hence, for sufficiently large packet train sizes, loads on all servers are normally distributed. We will now show that loads in a server cluster become balanced if the coefficient of variation (ratio of mean to standard deviation) of the load distribution on servers tends to zero. Lemma 2: Let be a sample of size from a normal distribution with mean and standard deviation If is the sample standard as deviation, Proof: Let be the sample mean, and let By the usual definition of sample variance, we have

The sum on the right side follows the distribution [10] with degrees of freedom.1 Thus, we may write Therefore, Since the distribution is bounded, as Hence, vanishes as well. We can now combine these lemmas to conclude that for sufficiently large packet train sizes: 1) the load distribution on servers is approximately normal and (2) the loads in a server cluster become balanced if the coefficient of variation of this normal distribution tends to zero. We will find this result useful in studying the properties of request mapping algorithms in subsequent sections. B. Additional Goals for Mappings This section provides the motivation for our name-based method for assigning objects to servers. We motivate our approach by discussing the issues of replication and disruption. 1) Replication and Rendezvous Issues: A request mapping algorithm can reduce retrieval latency not just by balancing loads and maintaining low overhead but also through a third mechanism: minimizing replication of work. Poorly designed mapping schemes can cause several different servers to do the same work, lowering efficiency. Replication of work arises when client requests for the same object are sent to multiple servers, causing them each to retrieve and cache the same object separately, for example. Such replication is particularly unacceptable in real-time producer–consumer domains, where a producer sends real-time object data to a server. Since consumers retrieve object data from servers, producers and consumers must rendezvous at the server for successful information transfer. That is, producers and consumers must select the same server independently and simultaneously. Real-time data requires low end-to-end latency, so a 100% hit rate for server selection is required for a successful rendezvous between producer and consumer. Long

latency would otherwise result, violating real-time deadlines, since the server selected by the consumer must first obtain the data from the producer. Section VII-A describes one example of this effect. A 100% hit rate for server selection can only be achieved when the producer sends its information to all servers in a cluster (wasting resources), when consumers send requests to all servers (again wasting resources), or when all producers and consumers send data and requests for the same object to the same server. The last of these options may be viewed as defining affinities between objects and servers. Replication can also reduce hit rates in caching schemes by decreasing the effective cache size of the servers in the cluster. For example, a study conducted in 1992 [11] reported that a 4GB cache was necessary for intermediaries to achieve a cache hit rate of 45% for FTP transfers. Thus, if we used four servers and a mapping scheme which allows replication, each server would require a 4-GB cache to achieve a 45% hit rate, rather than only a 1-GB cache each. On the other hand, in a balancing scheme that avoids replication, each server would see requests for one fourth of the objects. Intuitively, its response time and cache hit rates would then be the same as if there were only the requestable objects, and the probability of finding a requested object in the cache will thus be greater. As we will see in Section VII-B, this scheme allows each server to get an equivalent hit rate with only a 1-GB cache. If the growth of replication is slow, however, it is unlikely to be a matter for concern. However, as Theorem 6 of Section V-E demonstrates, replication can get quickly out of hand, and an object can be expected to become replicated in all servers within about requests. An important factor to consider is that the latency for the server to retrieve the object from the remote provider is far longer than the latency to retrieve the object from the server’s cache. We thus formulate the following additional goal for a mapping algorithm. Goal 3 (High Hit Rate): The mapping scheme should attempt to increase the probability of a hit. 2) Minimizing Disruption: Whenever a server comes up or goes down, the current object–server affinities may change. This leads to another goal. Goal 4 (Minimal Disruption): Whenever a server comes up or goes down, the number of objects that are remapped to another server must be as small as possible. In multicast routing, for instance, this goal minimizes the number of sessions liable to experience data loss as a result of changes to distribution trees. For distributed caching systems, this maximizes the likelihood that a request for a previously cached object will still result in a cache hit. A parameter of particular significance for schemes to map objects to servers is the disruption coefficient , which we define as the fraction of the total number of objects that must be remapped when a server comes up or goes down. Theorem 1 (Disruption Bounds): For every mapping which evenly divides objects among servers, the disruption coefficient satisfies

01

1 There are only m independent terms in the sum on the right side. We values arbitarily, but the last value is now fixed because can choose m l1 l ml must hold.

01 +111+ m =

3

where

is the number of active servers.

4

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

Proof: Let be the number of objects cached within the cluster. We first observe that no more than objects can be disrupted, giving one as the upper bound on disruption. Next, if objects are evenly divided among servers, then there are objects assigned to each server at any time. When one or more servers go down, all objects assigned to each server that goes down must be reassigned to another server, regardless of any other changes. Thus, disruption When one or more servers come back up, all objects which were previously mapped to each server must be reassigned to it when it comes back up. This is because the set of active servers is then identical to the set of active servers before the server(s) originally went down, at which time it had objects. Since clients making a purely local decision cannot distinguish between the two cases, the previous mapping must be restored. Thus, objects must be reassigned to each new server regardless of any other changes. Hence, again, disruption

the time it is acquired and used by a client. In the worst case, all clients issue requests to the same previously idle server, resulting in a very high load. Minimum-load mapping is, however, the approach taken in many existing systems. For example, Cisco’s LocalDirector [12], which redirects WWW requests to one of a set of local servers, periodically queries the servers for status information, thus potentially using out-of-date information. In the Contract Net protocol [13], servers are queried for load information when a request is ready, introducing additional latency. C. Fastest Response Mapping In the fastest response scheme, a client pings the servers and picks the one that responds first. Thus, (response time When all servers are equally distant, this mapping is for similar to the least loaded scheme (with the same advantages and disadvantages), since the server with the least load typically responds first. The Harvest [4] web cache implementation and the Andrew file system (AFS) [14] both use this method.

III. A MODEL FOR MAPPING REQUESTS TO SERVERS Any scheme which maps a request for an object to a specific server in a cluster can be logically viewed as picking the server which minimizes (or maximizes) some value function Let be a cluster of servers. A typical mapping function thus selects a server at time such that (1) where and are indices of servers. A number of mapping functions have been used in practice that attempt to realize one or more of the goals discussed in Section II. The goal of balancing loads is particularly significant from our point of view since it directly influences response times. We now present some commonly used value functions and discuss how well each of them achieves the above goals. A. Static Priority Mapping In a static priority scheme, the server list is statically ordered, e.g., in the model described above. Clients simply try contacting each server in order until one responds. While this does provide fault tolerance, the entire load will typically fall on the highest priority server, potentially causing long response times during times of heavy use. In addition, the cache space available is underutilized since the space available at the other servers is not used. B. Minimum-Load Mapping Sending a request to the least loaded server divides requests between servers so as to keep the load low on each server, providing faster service times. Here, is some measure of the current load, i.e., (load on at time , thus requiring an additional mechanism (either periodic or on-demand) to determine which server currently has the lowest load. Making this determination is nontrivial, since clients are constantly issuing requests, and load information may be out-of-date by

D. Round-Robin Mapping A simpler scheme is round-robin, where successive requests are sent to consecutive servers. For example, when a name in the domain name service (DNS) resolves to multiple IP addresses, DNS returns this list of IP addresses, rotated circularly after each request. If clients use the first address on this list, requests will be sent to the various IP addresses in round-robin fashion,2 thereby balancing the number of requests sent to each [15]. The NCSA scalable web server configuration [16] is one example of such use of DNS. When all servers are up, a true round-robin scheme maps the th request sent to the cluster to the th server (modulo m). Thus, To get an ordered list for robustness, this is logically equivalent to assigning weights in our model as

where denotes the number of previous requests sent and is the number of servers. We now demonstrate formally that round-robin achieves load balancing when the request rate is high. As discussed earlier, we will use the packet-train model [9] for our analysis. Let be the number of requests in the batch or train. Let be a random variable describing the service time for one request. Let be a random variable describing the total service time for all requests in the batch which are mapped to a given server (Note that is independent of the queue discipline.) Theorem 2 (Round-Robin Load Balancing): Let be a random variable describing the total processing required for all requests mapped to a given server If requests are assigned to servers in a round-robin manner, then the square of the coefficient of variation of is given by (2) 2 Note that DNS may not give true round-robin ordering when clients generate duplicate requests, due to DNS caching at clients.

THALER AND RAVISHANKAR: USING NAME-BASED MAPPINGS

and, hence, when

5

has finite variance

since request service times are i.i.d. We next observe that the number of requests mapped to a given server is binomially distributed with success probability Thus, we obtain

Proof: Since requests are assigned round-robin, each server will get exactly (when is a multiple of requests. Since the value of for a given server is the sum of the service times of the requests mapped to it, we get

This equation can be split into the terms for each moment of

(3) Since service times of individual requests are independent and identically distributed, the variance is also additive, giving

Letting , we can now solve for the coefficients and separately as follows:

(4) Equation (2) directly follows from (3) and (4), since Lemma 2 now guarantees that the load is balanced when is large, and the load is therefore significant. This observation applies both to long-term load balancing (where as the time interval of interest grows), as well as short-term load balancing as the request rate increases (i.e., when large batches of requests arrive within a short period of time).

applying the binomial theorem in the last step. Similarly,

E. Random Mapping Another way to balance the expected number of requests assigned to each server is to send requests to a server chosen at random, e.g., [ ], as suggested in [14]. This is referred to in queueing theory as a random split. As before, let be the number of requests in the batch or train. Let be a random variable describing the service time for one request. Let be a random variable describing the total service time for all requests in the batch which are mapped to a given server (Note that is independent of the queue discipline.) Theorem 3 (Random-Split Load Balancing): Let be a random variable describing the total processing required for all requests mapped to a given server. If requests are randomly assigned to servers, such that the probability that a request will be mapped to a given server is , then the square of the coefficient of variation of , is given by

Putting these results back into the original equation, we obtain (7)

(8) Thus, for the square of the coefficient of variation, we get

(5) and, hence, when

Simplifying,

and using the identity , we obtain (5). Table I summarizes how well each function discussed above meets the desired properties.

has finite variance

Proof: Since the value of for a given server is the sum of the service times of the requests mapped to it, we get (6) To find the second moment of the service time, let a server receive requests Then the square of the total expected service time is given by

IV. MAPPINGS BASED

ON

OBJECT NAMES

Not every scheme which avoids replication has a low disruption coefficient. For example, the static priority mapping algorithm (Section III-A) certainly avoids replication, since all requests are sent to the same server. However, its disruption coefficient is unity since every object gets remapped when the primary server fails. Disruption can be minimized by balancing the number of objects mapped to each server.

6

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

TABLE I MAPPING FUNCTIONS

One way of accomplishing this goal is to use the name of the object to derive the identity of the server. Since this mapping is a purely local decision, its overhead remains low. Unlike conventional mapping schemes based on name servers, such a mapping is “stateless” since it depends only on the identities of the object and the cluster servers, and not on the state at the cluster servers or that held in a name server. Such a stateless mapping can be viewed as a hash function, where the key is the name of the object, and the “buckets” are servers. In our case, the number of buckets can vary over time as servers are added or removed. Second, it is possible for one or more of the servers to be down, so that an object must hash to another server when one goes down. Therefore, the output of such a hash function must be an ordered list of servers rather than a single server name. In some domains, such as PIMv2 [17], the list of servers is dynamically updated to exclude unreachable servers. In this case, it suffices for the hash function to map a name to a single server. We are interested in the more general case, however, and therefore define a stateless mapping as a function which, given a list of servers, maps an object name to a specific ordering of the server list. A conventional hash function maps a key to a number representing one of “buckets” by computing as a function of , i.e., The function is typically defined as , where is some function of (e.g., when is an integer). In our case, a key corresponds to an object name, and a bucket corresponds to a server in a cluster. A serious problem with using a modulofunction for mapping objects to servers, however, arises when the number of active servers in the cluster changes. If a simple modulo- hash function were used to map objects to servers, then when the number of active servers changes from to (or vice versa), all objects would be remapped except those for which When is uniformly distributed, the disruption coefficient will thus be ; i.e., almost all objects will need to be reassigned. Clearly, a better scheme is needed. A. HRW Hashing We now introduce a new mapping algorithm, which we call HRW. HRW operates as follows. An object name and server address together are used to assign a random “weight” to each server. The servers are then ordered by weight, and a request is sent to the active server with the highest weight.

Thus, HRW may be characterized by rewriting (1) as follows:

(9) where is the object name, is the IP address of server and is a pseudorandom function of and If servers have unequal capacity, the weights assigned to each server may be scaled by a constant distributed to clients along with the server’s address. This would allow servers with more capacity to receive a proportionately higher portion of the load. In the remainder of our discussion, however, we will assume that servers have equal capacity.

V. PROPERTIES

OF

HRW

We now analyze the HRW algorithm described above and examine how well it satisfies the requirements outlined in Sections III and IV. A. Low Overhead It is easy to see that HRW requires no information beyond server and object names. This allows clients to make an immediate decision based on purely local knowledge. In real-time producer–consumer domains, it must be possible to change the server in use without requiring the data transfer to start over. In such domains, some optimizations are desirable when the server changes, if a client is receiving a large number of objects simultaneously. First, when a server goes down, clients must reassign all objects previously mapped to that server. For each of those objects, if the list of weights has been preserved, this list can be used directly to reassign the object to its new maximum-weight server. Alternatively, the implementation could trade speed for memory by recalculating the weights for each server and not storing Second, when a server comes up (and the lists of weights have not been preserved), recalculation of all weights for all objects can be avoided simply by storing the previously winning weight. Then, when the server comes up, the implementation need only compute for each in use, and compare it to the previously winning object weight for Only those objects for which yields a higher weight need be reassigned.

THALER AND RAVISHANKAR: USING NAME-BASED MAPPINGS

7

B. Load Balancing of servers. Let be the universe of requestable objects. Some objects are likely to be requested more frequently than others, so let have popularity , defined as the probability that an incoming request is for object Let be a mapping of objects to servers that partitions the objects equally among the servers. The actual run-time load experienced by any server clearly depends on the popularities of the objects mapped to it. The more popular the objects mapped to , the greater the load it sees. By analogy with object popularities, we may define the popularity of a server as a random variable whose value equals the sum of the popularities of the objects mapped to it. Thus, represents the probability that a request will be sent to If are server popularity values, we must have since Let the mapping assign objects to server Since assigns all objects to servers, we can view as a selection of size from the set , such that any object is selected with the probability We can similarly model server popularities by viewing as a sample of size from , taken without replacement. In this case, the population mean , since the sum to 1. Let the population variance be We now prove two theorems that characterize the loadbalancing properties of HRW. Theorem 4 states that the coefficient of variation of vanishes as the number of objects becomes large. Theorem 5 states that the amount of processing done by each server is balanced when both and are large. Theorem 4 (Hash-Allocation Request Balancing): Let objects be partitioned among servers using HRW, with each server receiving exactly objects. If and are as defined above, then the square of the coefficient of variation of is given by Let there be a set

(10) and, hence, when

has finite variance

Proof: Let objects be mapped to As before, we can treat the object popularities as a sample of size , and write If is the sample mean, then , so we have We know from sampling theory [10] that , so (11) We also know from sampling theory [10] that Since , we can substitute and simplify to get (12)

We can now substitute and rearrange to obtain (10). Theorem 5 (Hash-Allocation Load Balancing): Let objects be randomly partitioned among servers using HRW, with each server receiving exactly objects. Let be the request train size, and let the service time of requests and both have finite variance. Then, if is a random variable representing the amount of processing done by server (13) Proof: From Theorem 4, the coefficient of variation of as for any server. From (11), is independent of so that as and, hence, We can now apply Theorem 3, and (13) follows immediately. Using Lemma 2, we can now conclude that the processor loads are balanced when the conditions of Theorem 5 are met. That is, the load-balancing effectiveness increases as the demand increases. C. High Hit Rate It is easy to see that HRW avoids replication, thus potentially giving a higher hit rate, as long as clients have a consistent list of servers for each cluster. Again, we assume that the server list is known to clients a priori. For example, it may be statically configured, resolved at client startup time, or periodically distributed to clients. See [18] for further discussion of these issues. D. Minimal Disruption When a server goes down, all objects which mapped to that server must be reassigned. All other objects will be unaffected, and so the optimum disruption bound is achieved. The randomizing property of HRW allows the reassigned objects to be evenly divided among the remaining servers, thus preserving load balancing. When a server comes back up or when a server is added to the set , then the objects which get reassigned to it are exactly those that yield a higher weight for than for any other server. This again achieves the optimum disruption bound of Thus, we have shown that HRW achieves the minimum disruption bound. E. Comparing HRW with Other Mappings It is instructive to compare the performance of HRW qualitatively with that of other mappings, particularly with the round-robin and random mappings, which are also stateless. Section VII presents an empirical comparison of HRW with other mappings. The round-robin and random mappings do an excellent job of balancing loads, as Theorems 2 and 3 demonstrate. However, balancing server loads is not the primary criterion for favoring a mapping. Ultimately, it is often more important to optimize response time. For the application domains of our interest, server load balancing is an important goal only to the extent that it helps optimize response time.

8

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

Optimizing response time means reducing both the expected value as well as the variance of the response time. A serious problem with the round-robin and random mappings in caching domains we consider is that decreases in response time due to load balancing tend to be counterbalanced in practice by significant increases in retrieval latency due to cache misses. Each cache miss requires a retrieval from a remote provider, an operation that may be orders of magnitude more expensive than retrieval from a local cache. Some of this effect arises from replication of data elements in the server cache. In an intuitive sense, replication decreases the effective cache size for the cluster as a whole since replicated objects are held in more than one server, thus wasting cache space. However, replication is likely to be problem only if it grows quickly enough. We now demonstrate that in the absence of a deliberate effort to control it, replication can quickly get out of hand. Theorem 6 (Replication Growth for Random Selection): Let be a cluster of servers, and let be a series of requests for object For each such request , randomly select a server , and assign to If requests are processed before all servers cache the object , then (14) is Euler’s constant. where Proof: We can view the progression to full replication as a series of phases, with phase being the duration between servers caching the object and servers caching it. The process begins with phase 1 and ends after phase Let be the number of requests in phase Clearly, By definition (15) During phase there are servers which cache the object and servers that do not. Since requests are randomly assigned to servers, the probability that a given phase- request is sent to a server that does not cache the object is Thus , the number of phase- requests follows a geometric distribution, so that

Using linearity of expectation in (15), we get After changing the summation index appropriately, this reduces to

It is well known (see [19], for example) that

Substituting above, the theorem follows.

It is clear that full replication is achieved rather quickly. With ten servers in a cluster, the expected number of requests to reach full replication is about 30. Therefore, the effective cache size of the cluster is reduced by a factor of ten after an average of about 30 requests per object. In contrast, the replication factor in HRW is always zero. Thus, the effective cache size remains unchanged under HRW mappings. 1) Caching Under HRW: Since the goal of a caching policy is to maximize the hit rate, a caching policy attempts to cache the set of objects most likely to be requested next. Since the future is unknown, the caching algorithm must predict this set in practice. An “optimal” caching policy is one which maximizes the probability that the object named in the next request is in the cache. Let object have size and let be its popularity at time That is, is the probability that an incoming request at time is for object The expected hit rate for the next request is then equal to where is the set of cached objects. Thus, the optimal set of objects to cache at time in a cache of size maximizes subject to the constraint that This is an instance of the Knapsack problem, which is known to be NP-complete [20]. Cache replacement strategies can then be viewed as heuristics to solve this problem. Since the future is unknown, they must use local estimates of the ’s based on statistics such as recency or frequency of reference, typically deriving them from past history. If a request for object may be sent to any server in a cluster with equal probability, then each server will see the same set of object popularities That is, the probability that the next incoming request is for object is the same at all servers. We will refer to such mapping schemes as “nonpartitioned mappings.” These include all mappings previously discussed which allow replication (i.e., all except static priority and HRW). Conversely, we will refer to mapping schemes under which is nonzero at exactly one server for each as (completely) “partitioned mappings.” These include HRW and static priority. Assuming equal cache sizes, all servers under a nonpartitioned mapping will have the same expected hit rate under an optimal caching scheme, since they all see the same object popularities. However, as the number of servers grows, each server will see a smaller fraction of incoming requests, spaced farther apart, and so its estimates of can degrade in quality. Thus, we expect the hit rate seen by nonpartitioned mappings to decrease as the number of servers grows. As we will see in Section VII-B, trace-driven simulations have confirmed that this is indeed the case in practice. Theorem 7 (Partitioning Non-Harmful): Under an optimal caching scheme, the expected hit rate in a partitioned mapping will be greater than or equal to the expected hit rate in a nonpartitioned mapping. Proof: At time , let be the set of objects cached at some server under a nonpartitioned mapping and an optimal caching scheme using a cache of size Let be the expected hit rate for a request sent to that server at time Under an optimal caching scheme, we know that is maximized, subject to Without partitioning,

THALER AND RAVISHANKAR: USING NAME-BASED MAPPINGS

9

all servers see the same set of object popularities, so is the same for all servers. Hence the expected hit rate of the entire cluster is also Let be the set of objects mapped to server in a partitioned mapping. Let be the portion of due to objects which get mapped to in a partitioned mapping. (Thus, Let be the set of objects cached at under an optimal caching scheme using a cache of size Let be the expected hit rate at server under a partitioned mapping and an optimal caching scheme. We will now show by contradiction that and, hence, (i.e., the hit rate of the cluster is not decreased under a partitioned mapping). Assume that Then there exists a set of objects , with , and Thus, was not the optimum solution to the Knapsack problem at , and we have a contradiction. Thus, the expected hit rate under an optimal caching scheme in any partitioning scheme will be greater than or equal to the expected hit rate under an optimal caching scheme in any nonpartitioning scheme, and grows as the number of servers is increased since more objects can be cached. In addition, a server sees the entire request history for objects assigned to it, regardless of the total number of servers. Thus, the quality of estimates of does not degrade as the number of servers increases. Since caching policies depend on such estimates in practice, the hit rate is expected to increase with the number of servers in HRW and decrease in all other mapping schemes which do not partition the set of requestable objects. Again, as we will see in Section VII-B, trace-driven simulations have confirmed that this is indeed the case in practice. Thus, HRW allows the hit rate to be increased. For this effect to be significant, the maximum hit rate possible must also be significant, i.e., a significant number of requests must be for objects which were requested in the past. Combining this observation with the conditions in Theorem 5 giving good load balancing, we obtain the following corollary. Corollary 1 (HRW Applicability): HRW is particularly suitable for domains where there are a large number of requestable objects, the request rate is high, and there is a high probability that a requested object will be requested again. This condition is true in a wide variety of domains. We will study two of them, multicast routing and WWW caching, in more detail in Section VII. VI. IMPLEMENTING HRW MAPPINGS The weight function is the crucial determinant of HRW performance. To achieve a high hit rate, all clients should use the same weight function. Based on an evaluation of different randomization schemes (see Section VI-A), we recommend a HRW scheme based on the weight function defined as

(16)

where is a 31-bit digest of the object name , and is the address of the th server in the cluster. Note that the length of the address does not matter, as only the low-order 31 bits are significant in modulo 2 arithmetic. This function generates a pseudorandom weight in the range and is derived from the original BSD rand function,3 where it corresponds to srand(Si) srand(rand() D(k)) Weight rand(). This function can be implemented with only a few machine instructions,4 requires only 32-bit integer arithmetic, and exhibits a number of desirable properties, as we will see in Section VI-A. Thus, implementing HRW entails first computing and then computing for each The time to do this is typically negligible compared to the network latency. In the unlikely event that multiple servers are assigned the same weight for a name , ties can be broken by choosing the server with the highest The following theorem states exactly when such ties will occur. Theorem 8: In the function, a tie between two servers and occurs if and only if Proof: First, assume that Then it is easy to see from (16) that since modulo-2 congruence is preserved under addition, multiplication, and the XOR operation. For the other direction, assume that Then, since modulo-2 congruence is preserved under subtraction

(17) But and 2 are where relatively prime, so a standard result from number theory [21] tells us that we may cancel , leaving us with

Using the fact that modulo-2 congruence is preserved under the XOR operation, then by repeating the procedures above, we finally get that This will be the case if and only if the low 31 bits of and are the same. Theorem 8 can be used to determine when ties can occur in a given domain. It guarantees, for example, that a tie can occur in IPv4 only if the IP addresses of the two servers differ only in the most significant bit. Thus, no tie-breaking rule is needed such as when it is known that all servers are within the same network. A. Comparison of Weight Functions We now compare the performance of the function with that of other possible weight functions to see how well 3 Since rand is no longer the same on all platforms, implementations should use (16) directly. 4 For example, 7 on an i386 with gcc-O2, not counting the digest.

10

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

(a)

(b) Fig. 2. Two-stage random weight functions.

it achieves load balancing. We consider several alternative weight functions. The first competing weight function we consider is based on the Unix system functions random and srandom in place of rand and srand, resulting in a weight function we denote .5 The second function we consider uses the minimal standard random number generator [22], [23], resulting in the weight function

Our third alternative is to modify the follows:

function as

(18) It can be shown that Theorem 8 applies to as well, using a similar proof. Fig. 2 depicts the relationship between and , where each of the small boxes represents a randomizing filter. Finally we evaluate the option of performing an 8-bit exclusive-OR over all the bytes of and to get a single one-byte result. We call this the weight function. As a basis for comparing functions, we will assume that 100 randomly selected objects are being serviced at a time, and look at the coefficient of variation of the number of objects assigned to each server. Fig. 3 shows the results of this simulation, with each point representing an average over 5000 trials. Fig. 3(a) and (b) shows the results using random addresses for servers, and models the performance when servers are distributed across different networks. In Fig. 3(b), object names which yield consecutive values starting at were used. As can be seen, exhibits the best performance. Fig. 3(c) and (d) shows the results using consecutive addresses for servers, starting with the arbitrarily chosen IP address 173.187.132.245. In Fig. 3(d), object names that yield 5 The algorithm used by srandom is quite complex. As we will see, it does not perform significantly better than rand as a hash function, and is not worth describing here.

W

consecutive values starting at were again used. It is interesting to note that all methods but and were sensitive to the number of servers. We also ran other experiments (not shown) with different starting server addresses, and observed that the same methods were sensitive to the starting IP address as well. and remained relatively unaffected. The relative performance of and can best be understood by examining Fig. 2, which represents the functions as two randomization stages separated by an operator. If we fix the input to the first stage at a given value, and input a series of numbers to the XOR operator, we would expect the input to the second stage to be significantly correlated with the series Whenever we input a sequential series of numbers to the XOR in our experiments, the input to the second stage will be correlated with this sequential series, lowering the degree of randomness of the weight value output. On the other hand, when the second input is also uniformly distributed, both functions perform similarly. We also observed that and were about 200 times faster than , since srandom is computationally expensive. We thus conclude that, of those weight functions studied, and give the best load-balancing performance. The choice of which is most appropriate for use with HRW depends on the characteristics of the domain of use. Finally, while we could have simulated many other types of pseudorandom functions, we note that finding a good pseudorandom function is hard [22]. Given the good performance of as a hash function, we felt that investigating other weight functions was not necessary. VII. CASE STUDIES To show how HRW applies to a variety of domains, we now examine two applications in more detail. A. Shared-Tree Multicast Routing In shared-tree multicast routing protocols such as PIM [2] and CBT [1], receivers’ routers request packets for a specific session by sending a “join session” request toward the root of a distribution tree for that session. Sources send data to a session by sending it toward the root of its tree. The root, known as a rendezvous point (RP) in PIM, and a core in CBT, thus takes on the role of a server. Routers with directly-connected senders and receivers become the “clients.” An “object” in this domain is a multicast session identified by an IP multicast group address. The size of the object is unbounded. Since session data is real-time, and may be sent from multiple sources, it is essential for clients and providers to determine the correct server (RP) quickly. Otherwise, realtime data sent by a host could overflow the local router buffers before it is able to identify the correct RP server. Low latency is also important to receivers who want to join a session in progress. One example motivating the low latency requirement is known as the “bursty-source problem” [18], where a source periodically sends a burst of data and then remains silent for

THALER AND RAVISHANKAR: USING NAME-BASED MAPPINGS

11

(a)

(b)

(c)

(d)

Fig. 3. Weight functions compared. The

Y -axis

is the coefficient variable of the number of serviced objects across cluster servers.

a long time. An example is an application which broadcasts its current state once a day. If the source’s router had to resolve the correct server via an exchange of network messages each time (since the number of sessions may be too large to maintain mapping state for idle sessions), then every burst could be missed and receivers would never get useful data. In this application, the number of possible objects is large (2 multicast addresses), and all receivers for the same session request the same object, causing a significant concentration of requests. The conditions of Corollary 1 are thus satisfied, and the situation is ideal for the use of HRW. We focus on sparse-mode PIM in particular, since its evolution illustrates many of the concepts and goals discussed in Section III. The original description of PIMv1 [2] did not specify any mapping algorithm for assigning join requests to servers. Since replication was not prevented, providers sent session data to all servers in a cluster. This resulted in undesirable complexity and resource consumption. The next step, as the design of PIM evolved, was to specify a static priority scheme as the mapping algorithm. This avoided replication, reducing complexity and resource consumption, but meant that the liveness of higher priority servers in the cluster had to be tracked, incurring additional complexity. Finally, PIMv2 adopted our algorithm, HRW, as its mapping algorithm. The result is that the protocol complexity and state requirements are significantly lower than in PIMv1. Multicast

address allocation can be done using a variety of methods, so that object names may be assigned randomly or sequentially, while servers are likely to be scattered among many subnets within the routing domain. Since these circumstances roughly was adopted as the correspond to Fig. 3(a) and (b), weight function of choice in PIMv2. B. WWW Client-Side Proxy Caching WWW usage continues to increase and, hence, popular servers are likely to become more and more congested. One solution to this problem is to cache web pages at HTTP proxies [3], [4], [24]. Client requests then go through a local proxy server. If the proxy server has the page cached, the page is returned to the client without accessing the remote server. Otherwise, the page is retrieved and cached for future use. Various studies (e.g., [25] and [26]) have found that a cache hit rate of up to 50% can be achieved. Thus, since the number of possible objects is large and a significant concentration of requests exists, the conditions are appropriate for HRW. Popular WWW browsers such as Netscape Navigator,6 NCSA Mosaic, and lynx, now allow specifying one or more proxy servers through which requests for remote objects are sent. A single proxy, however, does not provide any fault tolerance. For a robust deployment, multiple proxies are 6 Netscape Communications Corporation, Netscape Navigator software. Available: HTTP: http://www.netscape.com.

12

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

TABLE II TRACE SUMMARY

required. For example, a single proxy hostname might map to multiple IP addresses. Some criteria must be used by a client to select a proxy to which to send a request. We now wish to compare various mapping algorithms through simulation, using an actual trace of WWW requests. In the discussion below, we will use the term “server” below to refer to the proxy rather than the actual server holding the object. This will allow the terminology to apply to the more general problem. In the following simulations, the objects and object sizes are taken from the publicly available WWW client-based traces described in [27], where all URL’s accessed from 37 workstations at Boston University were logged over a period of five months. Since we are interested in the performance of a proxy scheme, we use only those URL’s which referred to remote sites (not within the bu.edu domain) and were not found in the browser’s own cache. Table II shows the characterization of the resulting data set used for simulation. Note that about 50% of the URL’s requested were unique, giving an upper bound on the cache hit rate of around 50%, which agrees with the bound observed by [25] and [26]. Since a unique object name can, in general, have arbitrary length, and we wish to obtain a digest with which we can do to be the 31-bit 32-bit arithmetic, our simulation defined digest of the object name obtained by computing its CRC-32 [28] checksum and discarding the most significant bit. 1) Simulation: Our simulator implemented HRW, a roundrobin scheme, and a random allocation scheme. In a fourth scheme, similar to least-loaded allocation, a request was sent to the server with the least number of objects currently being serviced (with ties broken randomly). A fifth alternative, which fails to provide robustness, is to add more cache space to a single server rather than adding another server; thus, all available cache space is combined at a single server. We first preloaded server caches by simulating the caches with 60 000 requests and a least recently used (LRU) replacement strategy (by which point, the caches were full). We then computed statistics over the next 100 000 requests. In addition, we made the simplifying assumptions that all objects were cacheable, and that no objects were invalidated during the lifetime of the simulation (160 000 requests). Fig. 4 shows how the hit rate varied with the number of servers under each allocation scheme using 100-MB caches. The hit rate of HRW increased, approaching the maximum bound of 50%, since the effective cache size increases linearly with the number of servers. The hit rate decreased for other

Fig. 4. Hit rates of various allocation schemes.

Fig. 5. Hit rates of various total cache sizes under HRW.

allocation schemes, however, since the more servers there are, the less likely it is that a previous request for the same object was seen by the same server. Their hit rate curves were similar since each assigns requests independently of where objects are cached. We observe that, by six servers, HRW’s hit rate is double the hit rate of the other schemes. In Fig. 5, we compare the effects of using HRW with multiple 100-MB servers against those of combining all of the available cache space into a single server (in which case all mapping schemes are equivalent). As can be seen, when HRW is used, adding another 100-MB server is indeed comparable to adding another 100 MB of space to a single server. HRW with multiple servers provides better availability, however. In other words, other schemes give a hit rate that depends on the cache size on each server, whereas the HRW hit rate depends on the total cache space available at all servers combined. The small difference observed between the two curves in Fig. 5 is due to the fact that only whole objects are cached. That is, when there is enough space at all servers combined to hold an object, but not at any single server, HRW must evict an object from a cache. Fig. 6 shows the time that the server took to retrieve the requested object (which was zero if the object was cached). By comparison, Glassman [26] found the average response time seen by a client for an uncached page to be between 1.5 s for a cached page, using a 6–9 s, compared with

THALER AND RAVISHANKAR: USING NAME-BASED MAPPINGS

Fig. 6. Latency of various allocation schemes.

13

(a)

(b) Fig. 7. Speed improvement of HRW over random.

Fig. 8. Time-based cache performance.

digital web relay. Using and Fig. 7 shows the expected speed improvement as seen by the client. The line shown is based on the ratio of HRW’s latency to that seen by random. We expect the improvement to be much more pronounced for sites with low bandwidth connectivity to the outside world, such as New Zealand [29], since will rise while remains constant. Fig. 8 shows the hit rate and cache space used when no limit exists on cache space. Again, since we assume that no objects are invalidated during the lifetime of the simulation, no time-based expirations were simulated and, hence, no objects were evicted from any cache. As shown, hash allocation again achieves a much higher hit rate and lower space requirement as the number of servers increases, where the space requirement shown is ratio of the total amount of cache space used at the end of the simulation, over the combined size of all objects requested. In summary, WWW clients can achieve faster response time from clusters of proxy servers by using HRW. For example, when a proxy hostname resolves to multiple IP addresses, HRW could be used to choose an appropriate address rather than simply using the first address in the list.

value function and we showed that this model adequately characterizes the behavior of typical mapping functions. Typical mapping functions permit replication, resulting in longer latencies and increased space requirements in the domains that we consider. We argued that reducing replication would decrease latency and space requirements and would increase hit rates at cluster servers. In combination with the need for all clients to have the same view of which objects map to which servers, these considerations motivated the need for stateless mappings from objects to servers. We described various desirable properties of stateless mappings, including load balancing, minimal disruption as the set of active servers evolves, and efficient implementation. We then described an algorithm (HRW) which meets those needs using a purely local decision on the part of the client. We compared HRW to traditional schemes for assigning requests to servers, and showed that in distributed caching, using a stateless mapping allows a higher cache hit rate for fixed-size caches and a lower space requirement for variablesize caches. We also showed that HRW is very useful in real-time producer–consumer domains, where it is valuable for clients to independently deduce object–server mappings, and that HRW allows them to minimize overhead by relying on purely local decisions. Finally, we provided empirical evidence that our algorithm gives faster service times than traditional allocation schemes.

VIII. CONCLUSIONS We began with a model that views the mapping of requests to servers in a cluster as a minimization operation on a

14

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

HRW is most suitable for domains in which there are a large number of requestable objects, the request rate is high, there is a high probability that a requested object will be requested again, and the load due to a single object can be handled by a single server. HRW has already been applied to multicast routing, where it has been recently incorporated by both the PIM [17] and CBT [30] protocols. HRW is also applicable to the WWW. WWW clients could improve response time by using HRW to select servers in a cluster rather than by simply using the order presented by DNS. This improvement would be most significant at sites with low-bandwidth connectivity to the Internet using a cluster of proxy servers for outbound requests. ACKNOWLEDGMENT The authors wish to thank L. Wei for suggesting simple weight functions, such as , for use with HRW to use for comparison. Thanks also go to the other PIM coauthors who provided the original motivation for and feedback on HRW, and to Boston University for making the web traces publicly available. REFERENCES [1] T. Ballardie, P. Francis, and J. Crowcroft, “An architecture for scalable inter-domain multicast routing,” in Proc. ACM SIGCOMM, Sept. 1993, pp. 85–95. [2] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C.-G. Liu, and L. Wei, “An architecture for wide-area multicast routing,” in Proc. ACM SIGCOMM, Aug. 1994, pp. 126–135. [3] J. Gwertzman and M. Seltzer, “World-wide web cache consistency,” in Proc. 1996 USENIX Tech. Conf., Jan. 1996, pp. 141–151. [4] C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz, “The harvest information discovery and access system,” in Proc. Second Int. World Wide Web Conf., Oct. 1994, pp. 763–771. [5] M. S. Squillante and E. D. Lazowska, “Using processor-cache affinity information in shared-memory multiprocessor scheduling,” IEEE Trans. Parallel Distrib. Syst., vol. 4, pp. 131–143, Feb. 1993. [6] R. Vaswani and J. Zahorjan, “The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors,” in Proc. 13th Symp. Operating System Principles, Oct. 1991, pp. 26–40. [7] J. D. Salehi, J. F. Kurose, and D. Towsley, “The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version),” IEEE/ACM Trans. Networking, vol. 4, pp. 516–530, Aug. 1996. [8] V. Paxson and S. Floyd, “Wide-area traffic: The failure of poisson modeling,” in Proc. ACM SIGCOMM, Aug. 1994, pp. 257–268. [9] R. Jain and S. A. Routhier, “Packet trains—Measurements and a new model for computer network traffic,” IEEE J. Select. Areas Commun., vol. SAC-4, pp. 986–995, Sept. 1986. [10] M. Fisz, Probability Theory and Mathematical Statistics. New York: Wiley, 1963. [11] P. B. Danzig, R. S. Hall, and M. F. Schwartz, “A case for caching file objects inside internetworks,” Univ. Colorado, Boulder, CO, Tech. Rep. CU-CS-642-93, Mar. 1993. [12] Cisco Systems. “Scaling the world wide web.” [Online]. Available: HTTP: http://cio.cisco.com/warp/public/751/advtg/swww_wp.htm. [13] R. G. Smith, “The contract net protocol: High-level communication and control in a distributed problem solver,” ACM Trans. Comput., pp. 1104–1113, Dec. 1980. [14] M. Satyanarayanan, “Scalable, secure, and highly available distributed file access,” IEEE Comput. Mag., vol. 23, pp. 9–21, May 1990. [15] T. Brisco, “DNS support for load balancing,” RFC-1794, Apr. 1995.

[16] E. Katz, M. Butler, and R. McGrath, “A scalable HTTP server: The NCSA prototype,” Comput. Networks ISDN Syst., vol. 27, pp. 155–164, 1994. [17] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C.-G. Liu, P. Sharma, and L. Wei, “Protocol independent multicast-sparse mode (PIM-SM): Specification,” RFC-2117, June 1997. [18] D. Estrin, A. Helmy, P. Huang, and D. Thaler, “A dynamic bootstrap mechanism for rendezvous-based multicast routing,” IEEE/ACM Trans. Networking, to be published. [19] D. E. Knuth, The Art of Computer Programming, vol. 1., 2nd ed. Reading, MA: Addison-Wesley, 1973. [20] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: Freeman, 1988. [21] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An Introduction to the Theory of Numbers, 5th ed. New York: Wiley, 1991. [22] S. K. Park and K. W. Miller, “Random number generators: Good ones are hard to find,” CACM, vol. 31, no. 10, pp. 1192–1201, Oct. 1988. [23] D. G. Carta, “Two fast implementations of the “minimal standard” random number generator,” CACM, vol. 33, no. 1, Jan. 1990. [24] S. Williams, M. Abrams, C. R. Standridge, G. Abdulla, and E. A. Fox, “Removal policies in network caches for world-wide web documents,” in Proc. ACM SIGCOMM’96, pp. 293–305, Aug. 1996. [25] M. Abrams, C. R. Standridge, G. Abdulla, S. Williams, and E. A. Fox, “Caching proxies: Limitations and potentials,” in Proc. 4th Int. World-Wide Web Conf., Dec. 1995. [26] S. Glassman, “A caching relay for the world wide web,” Comput. Networks ISDN Syst., vol. 27, no. 2, Nov. 1994. [27] C. R. Cunha, A. Bestavros, and M. E. Crovella, “Characteristics of WWW client-based traces,” Boston Univ., Boston, MA, Tech. Rep. BU-CS-95-010, July 1995. [28] M. R. Nelson, “File verification using CRC,” Dr. Dobb’s J., May 1992. [29] D. Neal, “The harvest object cache in New Zealand,” Comput. Networks ISDN Syst, vol. 28, pp. 1415–1430, May 1996. [30] A. Ballardie, “Core based trees (CBT version 2) multicast routing: Protocol specification,” RFC-2189, Sept. 1997.

David G. Thaler (S’98) received the B.S. degree from Michigan State University, East Lansing, in 1992, and the M.S. degree from the University of Michigan, Ann Arbor, in 1994. He is currently working toward the Ph.D. degree at the University of Michigan, Ann Arbor. His research interests include network management and multicast routing. Mr. Thaler is a member of the Association for Computing Machinery (ACM). He is also an active participant in the Internet Engineering Task Force (IETF).

Chinya V. Ravishankar (S’82–M’84) received the B.Tech. degree in chemical engineering from the Indian Institute of Technology, Bombay, India, in 1975, and the M.S. and Ph.D. degrees in computer sciences from the University of Wisconsin, Madison, in 1986 and 1987, respectively. He has been on the faculty of the Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, since 1986. His teaching and research at the University of Michigan have been in the areas of databases, distributed systems, networking, and programming languages. He founded the Software Systems Research Laboratory at the University of Michigan, where he is also a member of the Real-Time Computing Laboratory. Dr. Ravishankar is a member of the IEEE Computer Society and of the Association for Computing Machinery (ACM).