Mobile File Filtering

TR-CS-97-02 Mobile File Filtering Xun Qu and Jeffrey X. Yu February 1997 Joint Computer Science Technical Report Series Department of Computer Scien...
1 downloads 2 Views 248KB Size
TR-CS-97-02

Mobile File Filtering Xun Qu and Jeffrey X. Yu February 1997

Joint Computer Science Technical Report Series Department of Computer Science Faculty of Engineering and Information Technology Computer Sciences Laboratory Research School of Information Sciences and Engineering

This technical report series is published jointly by the Department of Computer Science, Faculty of Engineering and Information Technology, and the Computer Sciences Laboratory, Research School of Information Sciences and Engineering, The Australian National University. Please direct correspondence regarding this series to: Technical Reports Department of Computer Science Faculty of Engineering and Information Technology The Australian National University Canberra ACT 0200 Australia or send email to: [email protected] A list of technical reports, including some abstracts and copies of some full reports may be found at: http://cs.anu.edu.au/techreports/

Recent reports in this series: TR-CS-97-01 Peter Arbenz and Markus Hegland. The stable parallel solution of general narrow banded linear systems. January 1997. TR-CS-96-09 Ralph Back, Jim Grundy, and Joakim von Wright. Structured calculational proof. November 1996. TR-CS-96-08 David Hawking and Paul Thistlewaite. Relevance weighting using distance between term occurrences. August 1996. TR-CS-96-07 Andrew Tridgell, Paul Mackerras, David Sitsky, and David Walsh. AP/Linux - initial implementation. June 1996. TR-CS-96-06 M. Hegland, M. H. Kahn, and M. R. Osborne. A parallel algorithm for the reduction to tridiagonal form for eigendecomposition. June 1996. TR-CS-96-05 Andrew Tridgell and Paul Mackerras. The rsync algorithm. June 1996.

Mobile File Filtering Xun Qu

Je rey Xu Yu

Computer Sciences Laboratory Research School of Information Science and Engineering Australian National University Canberra, ACT 0200, Australia

Department of Computer Science Australian National University Canberra, ACT 0200, Australia

Abstract The truly ubiquitous mobile computing requires that programs on mobile computers be able to continuously process as they can do on a distributed environment. This requests mobile le systems to provide solutions to weak data links and disconnected operations. This becomes important to utilise the limited bandwidths of wireless networks and reduce the probability of any voluntary disconnection which is caused by unacceptable delay of le accesses. In this paper, we propose two user-level dynamic ltering mechanisms, namely, inclusive and exclusive lters by which an adaptive volume is de ned. Both lters aim to minimise the communication costs for managing the les that are not used by mobile users on their mobile computers. A simulation study was conducted to study the e ectiveness of the proposed adaptive volumes, and we report our ndings.

1 Introduction With the marriage of portable computer system and mobile communication devices, more and more users will work on mobile systems to process their daily work. One of the key requirements of this new computing environment is the access to critical data regardless of location. The truly ubiquitous computing requires that programs on mobile computers be able to continuously process as they can do on a distributed environment. Therefore, le system support is essential. However, mobile le systems have to be able to work under variety of network conditions in the range from high speed links to unstable links. The network bandwidths of some unstable links, such as phone lines and wireless channels, are very small. Therefore, mobile le systems are requested to provide solutions to weak data links and disconnected operation [1], namely, the involuntary disconnection and the voluntary disconnection. The involuntary disconnection is caused by communication link failure, whereas the voluntary disconnection is due to the fact that nomadic users switch mobile systems into a partial or a disconnected communication mode [24] in the case that network bandwidth is very narrow and therefore the delay to le access is not acceptable. Consequently, it becomes important to utilise the limited bandwidths of wireless networks and reduce the probability of voluntary disconnection. Data replication is one of the basic techniques widely used in distributed le systems and mobile le systems to provide high data availability and improve system performance[16, 26, 13]. If data are 1

replicated on mobile systems, le access requests can be processed locally so as to avoid communications over networks. However, le systems have to manage data consistency if data is replicated over multiple sites. Research work on data replication on distributed/mobile le systems focuses on data consistency. In this paper, we will restrict ourselves to the issues of utilisation of bandwidths of wireless networks. This is motivated by the following two observations. First, a mobile le system will be shared by multiple users but will be used by a single user on a single mobile computer. This is di erent from the fact that multiple users will share le systems on a workstation. Second, the communication cost is proportional to the amount of data replication on multiple sites. Therefore, minimising the data replication on a mobile computer for a single mobile user will utilise the wireless network bandwidths best. With such a solution, we allow mobile users to exibly specify the data they have to work on on mobile le systems using ltering. Through ltering, we provide a minimal subtree of les for mobile users to carry out a particular task on their mobile computers. It is worth noting that this minimal subtree can consist of les from multiple volumes and does not necessarily contain all the les under a directory hierarchy. This minimal subtree shares the same name space with the le system from which it is replicated. In addition, the consistency will be managed between the minimal subtree on a mobile computer and the original le system. The ltering mechanisms we propose in this paper are user-level dynamic partitions which are di erent from system-level static partitions[9, 10, 11] in the existing distributed le systems. The remainder of the paper is organised as follows. In the next section, we review existing work on data replication. Section 3 outlines several motivating examples and presents our mobile solutions. We will also make a comparison with caching. In Section 4, we discuss the proposed inclusive and exclusive lter mechanisms including the structures, the operations and the semantics. Implementation issues will be given considerations in section 5. Section 6 presents a performance study of the ltering mechanisms, and nally, we conclude with discussions on future directions in Section 7.

2 Data Replication In the literature [3, 11, 11, 4, 5, 10, 12, 14, 17, 16], data replication can improve data availability and system performance in distributed le systems. In such systems, the information about the location and the content of a replica is available to all participating sites. Data in replicas will be shared by all the participating sites and will be under consistency control. Two units of replication are proposed, namely, the le replication in Roe[6] and RNFS[7] and the volume replication in AFS[9], Coda[10] and Ficus[11]. Since, the le-based replication requires a system to maintain a large database that keeps details of all replicated individual les, most existing distributed/mobile le systems adopt volume replication using a system-level static partition approach. By system-level static partition, we mean that a system designer or an administrator organises directories and les into non-overlapped volumes and then puts them together to seamlessly form a unique le hierarchy before users can work on these volumes at di erent sites. Since a volume is a whole subtree of a le system, data replication is forced to follow the all-or-nothing semantics | either to have all the les in a volume or nothing when a replica of a volume is created on a mobile system. This system-level static partition is needed to manage le sharing among multiple users working on workstations connected on high speed networks. However, due to the two observations we give in the previous section, the policy of system-level static partition can cause unnecessary data transfer and unnecessary consistent management for mobile users in wireless networks. Coda[3] and Ficus[16] are such systems that keep data replicas on mobile systems and maintain data consistency in weak network conditions and disconnected conditions. Ficus le system employs one level data replication in which all replicated les and directories are organised into volumes. A volume is a 2

subtree which contains all les rooted at the root of such a subtree, and is the unit to be distributed on hosts[12]. Replicas of a volume have peer-to-peer relationships to maintain one copy availability sharing semantics which means that clients can access any replica if it is available. The update propagation mechanism ensures all replicas of the same volume will converge to the same status. Volumes can be glued up to form a unique le hierarchy with the notion of graft point. A graft point is a mount point of a volume and contains locations of all replicas of this volume. In order to provide mobile solution to mobile users, volumes that contain the les under request have to be replicated on mobile system[16]. As a derivative of AFS, Coda adopts a two-level data replication scheme. Similar to Ficus, unique le hierarchy is composed of a set of volumes each of them is replicated among server systems located in the wired networks. A volume storage group(VGS) is a group of server systems that maintain the all replicas of a single volume. Such information is managed by the system control machine. Along with volume replications among servers, a local cache at a mobile computer is also employed[17, 18]. As such an exception, the project Rumor [20] at UCLA adopts a user-level static partition mechanism which is aimed to provide a user-level data replication and therefore assists mobile users to manage their replicated data on di erent systems. But the data replication is predetermined as a volume which is a subtree contains all the le under the root of such a subtree.

3 Motivation Several examples will be given in this section to show the motivations behind the ideas of mobile le ltering. a b c

n

d h e o

111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111

f p

q

111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111

user 1

i g

j s

r

00 11 11 00 00user 2 11

k

m

l 000 t 111 000 111

000 111 111 000 000111 111 u 000000 111 000 111 000 111 000 111 000111 111 000 000 111 000 111 000 111 000 111

00 11 11 00 00user 3 11

Figure 1: Shared Single Volume

Example 1 A volume is schematically depicted in Figure 1. A few mobile users work exclusively on some small subtrees in a volume which is shared by all the other users on the workstations connected to the wired networks.

Suppose that a system-level static volume is adopted. In the rst example, each of the mobile users has to replica the whole volume on their mobile computer. Furthermore, suppose there are n mobile users. Then the number of replicas of this volume is n + m if m replicas are used in the wired networks. Therefore, the data consistency has to be managed among the n + m replicas, in particular among the n 3

replicas in the wireless network. When a user modi es a le in either a mobile computer or a workstation, messages have to be exchanged among the n + m replicas. Even though mobile users are only interested in di erent set of les. Such as illustrated in Figure 1, though three mobile users have mutually exclusive interesting set of les, the le systems have to exchange messages to keep whole volumes consistent on individual systems given these les are all included in single volume.

Example 2 A mobile user has to access several les in two volumes which are shared by all the users on the workstations connected to the wired networks.

In the second example, though the user needs to access only a few of les from these volumes, all of them need fully replicating on mobile system. Consequently, the number of message exchanges for consistency management of replicated les increases proportionally. For the above two examples, one might argue that this can be managed by changing the volume sizes. In practice, the volume has to be predetermined. Based on the system-level static partition, there is only one possible solution that a system administrator divides a disk storage into several volumes. Then a user can have a single volume exclusively and organise his/her data into this volume. However, the optimal sizes of such volumes can not be determined beforehand. If the volume is large, users have to replicate unnecessary les in that volume. If the volume is small, users have to replicate multiple small volumes. Even though the volume size can possibly be chosen as optimal, it can not satisfy the users requirements in the long term. Moreover, it is essential that users have to share data with others. projects prj1 src

doc include 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111

1111 0000 0000 1111 0000 1111 0000 1111

111 000 *.tex, *.fig files 000 111 000 111 temporary files

module1 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111

head files 11 00 00 11 00 11 00*.c files, etc. 11

00 11

Figure 2: Software development project

Example 3 As shown in Figure 2, a group of users has to work on a software development project

cooperatively. Each member in the project will work on a subdirectory for editing and compiling and will need to read the other subdirectories. And some members in this group will work on mobile computers.

In this example, we assume that all the source les, header les and documents have to be consistent. Suppose each member will run a C compiler to compile les and use LATEXas well. The C compiler, for instance, cc, will generate a number of temporary les during compilation and nally generate binary les and possibly an executable le. The question here is whether or not we need to manage consistency for all such les including temporary les, binary les and executable les in a mobile environment. To explain further, suppose a user has several C source les and head les to compile. The total size for these les is B bytes. After running make facility, there will be several object les and an executable 4

le newly created. In addition, many temporary les will be created and removed during compilation, we assume that the total size of all les is S bytes. The proportion of B=S can be very small down to 20-30%. It implies that 70% of data transfer can be reduced between the mobile computer on which a user runs make and the other replicas. If there are n replicas, then the saving can be proportional to 0:7  n  S in a mobile environment. In addition, when debugging is needed, the core dump le does not need to be the candidates for consistency management. The situation for running LATEXwill be the same. It is unnecessary to transfer all the les excluding the le that need to be used for running LATEX. We introduce a user-level replication unit, called adaptive volume. This adaptive volume can be built on top of existing system-level partition volumes. An adaptive volume is a minimal subtree which only contains the minimum number of directories from the root and the data les users want to replicate. This adaptive volume can be regarded as a ltering since it lters in the les a user has to replicate. The speci cation of such ltering for the rst example is given below. We only show a simple speci cation for a user called user1 in Figure 1. This speci cation consists of three pairs, tentatively written as ((/a/b/c/d/e/p *) (/a/b/c/d/f/g/r *) (/a/b/c/h/i/j/s *)). Each pair speci es a path from the root and les under that path. The wildcard * speci es all the les under a given path. Given the above speci cation, a minimal subtree is created as an adaptive volume on the mobile computer the user1 is using. As can be seen the minimal subtree only contains the minimal tree structures and the les. In Figure 3, it shows three adaptive volumes on three di erent mobile computers for the three users. a

a

b c h f p

b

c

d e

a

b

g r

user 1

I s

J

11 00 00 11 00 user 2 11

c

d

h

e

000 111 t 111 000 000 111 000 111 000 111 000 111

111 000 000 111 000 111 000 111 000 111

k

f

I

o 000 111

h

d q 000 111 111 000 000 111 000 111 000 111 000 111

l u

11 00 00 11 00 11 00 11 00 11

11 00 00 11 00 user 3 11

Figure 3: Inclusive Filtering The major advantage of adaptive volume is that users have the exibility to replicate minimal set of les with a structure as a minimal subtree. As a result, the le systems can utilise this information to minimise the amount of communications in order to keep consistency for the les that users will never touch. In addition to the inclusive ltering, exclusive ltering is also proposed which simply indicates the les users don't want to be managed under consistency control. It implies that the mobile user doesn't want to inform the other sites of any updates on the les given in the exclusive ltering and the user doesn't want to listen to any updates on these les made at any other sites. A simple speci cation of exclusive ltering is given as (*.o). This lter will lter out all les with a name matching *.o under certain directories. The ltering we propose has the following advantages. First, from the viewpoint of users, users can replicate the minimal les to process their tasks on a mobile computer. The minimal les will be speci ed with ltering mechanisms. Such speci cation can be given without being aware of the systemlevel partitions. Since it is a ltering, users will share the same name space used in the original le systems. The ling service will be responsible for keeping consistency only for the les that mobile users are interested in. Users can work on an adaptive volume and then switch to another adaptive 5

volume if needed. As an extreme case, a user can specify an adaptive volume to include a whole le system. Second, from the system's point of view, the ltering mechanisms enforce locality by which the le system servers can understand which les should be kept in consistency. These ltering mechanisms can save a large amount of data transfer and therefore utilise the network bandwidths. It is worth noting that the proposed adaptive volumes in this paper are di erent from caching in many ways. First, cache caches data at the level of les and is based on the accesses in the past. Therefore, mobile users can not access the les they haven't accessed so far on a mobile computer if network is disconnected. Second, caching are designed to need to know nothing about the semantics of any les under its control. The situation will be similar with data pre-fetching using either a pro le[18] or a predictive mechanism[19]. Any changes on data in a local cache has to be passed to and processed at the server. On the contrary, the adaptive volumes replicate all the les including data les and directory les. The ltering mechanism can lter out unnecessary data transfer.

4 Filtering In this section, we rst give de nitions of le system hierarchy. For simplicity, we assume that a le system is a rooted tree de ned as follows1 .

De nition 1 A le system is de ned as a rooted directed tree G = (V; E ) where V is a set of vertices and E  V  V . V can be divided into two subsets Vd and Vf where Vd \ Vf =  and Vd [ Vf = V . We refer to the vertices in Vd as d-vertices for directories and the vertices in Vf as f-vertices for data les.

Either d-vertices or f-vertices can be a leaf vertices in G. But, only d-vertices can be non-leaf vertices in G. In the following, we use vr to refer to the root vertex of G and call vr as the root of the le system. In le systems, a le name can have a sux such as .c for a C source le and .o for a binary le. A le descriptor keeps information for the corresponding le. For example, a le can have a type which can be categorised as b, c, d, l, p or f for block special le, character special le, directory, symbolic link, fo (named pipe), or plain le, respectively. A le belongs to the user who is the owner of the le and belongs to a group. Also, a le has a le permission ags to control read/write/execution permissions.

De nition 2 A lter, f , is a set of pairs (n; a) where n is a le name using a simpli ed regular expression and a is a list of options specifying le attributes kept in le descriptors.

A speci cation of inclusive ltering is given below to include only the LATEXtext les, the gure les and the style les belonging to a user called mark. *.{tex,sty,fig} -user mark

Similarly, a three line speci cation of exclusive ltering is given below to lter out unnecessary les. core *.o * -type f -perm 0111

The rst line lters out the dump le core, the second line lters out all binary les with sux of *.o and the last line lters out all executable les other than the directory les. 1

We will extend it to DAG structure as our future work.

6

De nition 3 A subtree, S , is speci ed with a triplet of (p; fi; fe ) where p is an absolute path from the root, vr . fi and fe are inclusive and exclusive lter, respectively, as de ned in De nition 2.

Here, a path, p, indicates the root of a subtree. The lter, fi , speci es the les to be included in the subtree, whereas the lter, fe , speci es the les not to be included. The les to be included in the subtree are those les that satisfy the lter fi but not the lter fe. It is worth noting that if a directory satis es the lter fi but not fe , all les under that directory will be automatically included in the subtree. Also, the lter fe applies to all the les under the path, p.

De nition 4 An adaptive volume, V , is de ned using a set of subtrees, fs1; s2 ; :::; sn g. The V is a

minimal subtree in the sense that it can not be a connected tree including all the required les if any vertex is removed from this subtree. The graft point is de ned as the common path for all the paths, pi given in a subtree si = (pi; fi ; fe ).

Some comments can be made as follows in terms of the adaptive volume.

 Since an absolute path is used, the adaptive volume is rooted at the root of the le system. The directory structure of a volume constructs a basic framework to provide a limited but ecient working space for mobile users.

 The graft point is a common path which is not allowed to be removed if any adaptive volume is in use.

 Changes to a path pi of si = (pi; fi ; fe ) in an adaptive volume should be managed under consistency control. If a directory in the path pi changes its name, it has to inform all replicas if they are using this path. If a directory in the path pi is removed, it also has to inform all replicas. The exception is that adding any new directories in pi does not necessarily need to inform all the replicas.

 Any changes to any le satisfying fi but not fe of si = (pi; fi ; fe ) in an adaptive volume should be managed under consistency control. This applies to both data les and directory les.

5 Implementation Issues 5.1 Client-Server Model The goal of adaptive volume is to allow mobile users to make a local copy of working le set at a mobile computer in order to be able to continue their work. Di erent users can de ne their own adaptive volumes and use them simultaneously. Several servers in the wired networks will work as the servers for adaptive volume replicas. If there are multiple replicas of a same le system in the wired networks, these servers will cooperatively manage the consistency using the existing approaches. A replica on a mobile computer only needs to communicate with a server in the wired networks.

5.2 Initialisation At the beginning of setting up an adaptive volume at any system, a user can specify an adaptive volume for the les he/she wants to work on by using inclusive and exclusive lters. Then, a minimal subtree will be replicated on the mobile computer as requested. This initialisation process, which replicates all the directories and the les on demand, is simple. Some entries in directory les are not necessary. But 7

for simplicity, we will transfer the whole directories to the mobile computer, and remove such entries later.

5.3 Runtime Support To keep the system simple, on the server system, the graft point of any adaptive volume can not be changed when some replicas are made on this volume. An adaptive volume de nes the scope of les and directories. Following the scope, both client and server will work cooperatively to keep the les in this scope consistent. Interestingly, using adaptive volumes, a mobile user can create any les locally under the adaptive volumes. There are four kinds of changes, namely, le addition/deletion, directory addition/deletion on both server and on client respectively. Server and client systems have to abide by these rules to maintain adaptive volumes the same at both sides:

 The client and the server cannot delete/change the directories in graft points.  Users can add a new directory in any path pi of si = (pi; fi ; fe ) in an adaptive volume on either client or server. This operation will only be handled locally.

 Users can add a directory/ le in an adaptive volume. If the newly added directory/ le satis es the inclusive adaptive lter and does not satisfy the exclusive adaptive lter, this change has to be passed to the other sites. Otherwise, this newly added directory/ le is only handled locally.

 Users can delete a directory/ le on either client or server. This operation will remove the direc-

tory/ le locally, and will inform the other sites if this directory/ le satis es the inclusive adaptive lter and does not satisfy corresponding exclusive adaptive lter. The other sites will remove the directory/ le accordingly.

5.4 Sharing Semantics An adaptive volume service system is responsible for maintaining the consistency among replicas of shared les, thus keeps the sharing semantics. First, we address the issues that are introduced by using adaptive volumes. In the existing distributed le systems, a data le in any two replicas must be the same. This semantics is also supported in our system. In order to handle directories, the existing distributed le systems require that two replicas have the same number of directory entries and have the same attributes except inode number for every entry. However, our lter mechanisms shield some directory entries out. Therefore, two replicas are not the same in terms of directories in our system. In our system, the term of \same" implies that the common portion of entries de ned by adaptive volume is the same. For example, a directory contains three le entries under it: a, b and c. If only a is included in the user A's adaptive volume, any changes to the les under this directory except a is out of the user A's interest. Adding a le d at the other sites will not a ect the consistency under this directory the user A is using. Updating a directory will have impact on the users who are interested in it. Strong consistency requires strong network connectivity. Due to the network bandwidths available in mobile networks, the trade-o between the consistency and the network bandwidths have been reexamined. For example, in [13], a variable consistency scheme for mobile environments is proposed, and [24] discusses di erent ways to keep consistency by eciently exploiting di erent network connectivity. In our system, we adopt two di erent sharing semantics. When network connection is available, the strict CTO[23] semantics is supported. If the client system goes o network, it is impossible to keep CTO semantics with server. Therefore, all le access operations will be executed locally in an uncommitted 8

Parameters thinkTime workTime openFiles ctrlMsg updtMsg wfsSize writProb updtTime1 updtTime2

Description thinking interval time working interval time number of open les length of control message length of update message size of working le set percentage of write access system updating interval time 1 system updating interval time 2

Values 60 10 5 1 unit (200 bytes) 100 units (20K) 30 60 100 20% 180 60

Table 1: Simulation Parameters status. If the client system connects to the server again, the adaptive volume will be synchronised with server immediately using the existing approaches and to solve con icts[3, 25, 22]. In the following, we only address the situation when a client can connect to a server in the wired network. In our system, we use di erent strategies to keep data les and directories consistency. We adopt CTO semantics[23] to keep data les consistent in adaptive volumes. The CTO semantics is also used in NFS[2] and Sprite[14]. Following CTO semantics, the consistency checking will be triggered only by read and write operations on data les. For directories, we assume that the structure of the le hierarchy is less often changed than data les. We use the callback mechanism used in Coda[18, 26].

6 Performance Study 6.1 Simulation In this section, we report on a simulation study conducted to study the ltering mechanisms. The parameters and default values used in the performance study are given in Table 1. An event-driven simulator was developed using the simulation package C++SIM [21].

6.1.1 Model The mobile environment comprises mobile computers in the wireless network and workstations in the wired networks. As the rst study we conduct on this topic, we only consider le sharing in a single system-level volume. There are n replicas in the wireless network, and m replicas in the wired network for the system-level volume. On top of the system-level volume, a mobile user can have a single adaptive volume as a replica on his/her mobile computer. The mobile user will share the les with the other users. The client le system on mobile computer in the wireless network and le server on a workstation in the wired network are denoted as Cm and Sw , respectively. The system Cm only needs to communicate with one server Sw . The server Sw will communicate with all the other Sw if any updates occur. The mobile user is working on a set of working les. We use the average size of working le set and the average size of les given in [15]. Combining the data for productivity and programming environments, we use 30, 60 and 100 as the sizes of working les including data les and directories. This is controlled by the parameter wfsSize. The average length of data les is 27K. We assume that in a working le set i

j

j

i

9

30% is directory les and the size of a directory le is 128 bytes. On average, the size of a le becomes 20K approximately. The working pattern is modelled using two alternative periods, namely, a working period, workTime, and a thinking period, thinkTime. In the working period, a user opens several les for read or write access. According to [8], 90% of les will be closed in 10 seconds, so we use 10 seconds as the average for the working period. The server Sw will periodically updates les since multiple users work on the same volume in the wired network. The updating can a ect the mobile user's working le set. The average interval time for the server Sw to update a le in the working le set is controlled by updtTime1. The average interval time for the server Sw to update a le outside the working le set is controlled by updtTime2. The write access rate of system Cm is governed by writProb, which is assigned a value of 20%. The results of our simulation prove these updtTime1, updtTime2 and writProb are reasonable, since the write updating con icts of les is about 0.07% which is close to that reported in [15]. The CTO sharing semantics is chosen in our simulation. At the Sw site, when a le/directory is updated in a volume, the server Sw will send a control message to all the Cm that is communicating with this Sw . If a system-level volume is used, all updates against directories have to be sent to Cm . If an adaptive volume is used, it is not necessary to send all such messages to Cm . Upon receiving this control message, the mobile system Cm will send back a message to con rm. At the Cm site, when opening a le, the Cm rstly checks if it is updated. If any updates occur, the local copy will be updated. Before closing a le, the Cm will send back a new copy to the server Sw if the le is locally updated. We do not consider update con icts because this likelihood is small and solution of con icts will not a ect our ltering gain. We also assume the protocol control message is 200 bytes long. Two methods are simulated to update les. One is the data-based by which a whole le will be transfered over the network. The other is a log-based which needs to transfer only log information, which is assumed to be 10% of a le. In other words, 10 and 100 units of data are transfered over the network respectively. j

j

j

i

j

j

i

j

i

i

i

i

i

i

j

55000 wfsSize = 30 wfsSize = 60 wfsSize = 100

50000 45000

Network traffic (units)

40000 35000 30000 25000 20000 15000 10000 5000 0

20

40 60 ratio of working file set/volume size (%)

80

100

Figure 4: network trac (log-based updating)

6.1.2 Experiment result In this simulation, we analyse the network workload between two servers, namely, Cm1 and Sw1 . The updates on Sw1 also implies the update propagation from the other server Sw for any i 6= 1. In Figure 4, 5 and 6, the x-direction is the ratio between the size of the working le set and the size i

10

of the volume to be replicated. The 100% means that the adaptive volume only contains all the les a user really needs. In this case, the size of the adaptive volume is the smallest among the others. For the others, for example 40%, it implies that 60% of the les in the adaptive volume are not les that a mobile user is interested in. In Figure 4 and 5, there are three lines for three di erent wftSizes, 30, 60 and 100. Each line goes down when the ratio increases. The line for wfsSize = 30 decreases faster than the other two lines. This means a large volume will cause a large amount of network trac in order to keep les consistent. In these two gures, the point at 100% in the x-direction is the point that the adaptive volume is smallest. At this point, network trac reaches its minimum value on each line. In Figure 4, all the three lines decrease more quickly than those in Figure 5. This means that the saving of the ltering is higher if the log-based updating is used. Since most of the network trac is caused by le updating, our ltering mechanisms reduce the control messages, and therefore reduce unnecessary network trac for updating les outside the working le set. 130000 wfsSize = 30 wfsSize = 60 wfsSize = 100

120000

Network traffic (units)

110000

100000

90000

80000

70000

60000

50000 0

20

40 60 ratio of working file set/volume size (%)

80

100

Figure 5: network trac (data-based updating) In Figure 6, the potential performance gaining is given for both the data-based and the log-based updating approaches. With the log-based approach, the potential performance improvement will be large if the working le set is large accordingly. A large working le set with a ratio other than 100% implies that there are unnecessary les being replicated. Our ltering mechanism aims to push this situation towards the 100% radio as much as possible. The zero potential performance improvement in Figure 6 means that this system has reached its optimal situation. Utilities LATEX text editor (emacs or vi) x g bibtex

Temporary les

Non-temporary les

*.aux *.log *.dvi *.ps *~ #*# *.bak

*.tex *.bib

*.fig.bak

*.fig *.eps *.ps

*.blg

*.bst *.bib *.bbl

Table 2: Two le groups. 11

80 log-based,wfsSize=60 log-based,wfsSize=100 date-based,wfsSize=60 data-based,wfsSize=100

70

performance improvement

60

50

40

30

20

10

0 0

20

40 60 ratio of working file set/volume size (%)

80

100

Figure 6: Performance improvement

6.2 Exclusion of Unnecessary Files 90 80

percentage

70 60 50 40 30 20

file number file size 00 11

000 111 000 111 11 00 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 000 111 00 11 00 11 000 111 00 11 00 11 000 11 111 00 00 11 000 111 00 11 000 111 00 11 00 11 000 11 111 00 11 000 111 00 00 000 111 00 11 000 11 111 00 11 00 11 000 111 00 11 000 111 00 11 00 00 11 000 11 111 00 11 000 11 111 00 11 00 11 00 11 00 000 111 00 11 000 111 00 11 00 11 00 11 00 11 000 11 111 00 11 000 111 00 11 00 11 00 00 000 111 00 11 000 11 111 00 11 00 11 00 11 00 11 000 111 00 11 000 111 00 11 00 11 00 11 00 11 11 00 11 000 11 111 00 11 00 11 000 111 00 11 00 00 00 11 00 000 111 00 11 00 11 000 11 111 00 11 00 11 00 11 00 11 00 11 000 111 00 11 00 11 000 111 00 11 00 11 00 00 11 11 00 11 11 000 11 111 00 11 00 11 000 11 111 00 11 00 00 11 00 00 11 00 000 111 00 11 00 11 000 111 00 11 00 11 00 11 00 11 00 11 11 00 11 11 000 11 111 00 11 00 11 000 111 00 11 00 00 00 00 11 00 000 11 111 00 11 11 00 11 000 11 111 00 11 00 000 111 00 11 00 11 00 11 11 00 000 111 00 00 11 000 111 00 11 00 11 000 111 00 11 00 11 00 11 11 00 11 11 000 11 111 00 11 00 11 000 111 00 11 00 000 111 00 00 00 11 00 000 111 00 11 00 11 000 11 111 00 11 00 11 000 111 00 11 00 11 00 11 00 11 000 111 00 11 00 11 000 111 00 11 00 11 000 111 00 11 00 11 11 00 11 00 11 11 00 11 11 000 11 111 00 11 00 11 000 111 00 00 000 111 00 00 11 00 00 11 00 000 111 00 11 00 11 000 11 111 00 11 00 11 000 111 00 11 11 00 11 11 00 11 00 11 11 00 11 11 000 111 00 11 00 11 000 111 00 00 000 111 00 00 00 11 00 00 11 00 11 000 11 111 00 11 11 00 11 000 11 111 00 11 00 11 00 000 111 00 11 00 11 00 11 00 11 00 11 00 000 111 00 11 00 11 000 111 00 11 00 11 00 11 000 111 00 11 00 11 11 00 11 11 00 11 00 11 11 00 11 11 000 11 111 00 11 000 11 111 00 11 00 11 000 111 00 00 11 00 000 111 00 00 00 11 00 00 11 00 000 111 00 11 00 000 111 00 11 000 11 111 00 11 00 11 11 00 11 000 111 00 11 00 11 00 11 00 11 00 00 11 00 11 000 111 00 11 000 111 00 11 00 11 000 111 00 11 00 11 00 11 000 111 00 11 00 11 11 00 11 11 00 11 11 00 11 11 00 11 11 00 11 000 11 111 00 11 00 11 000 11 111 00 000 111 00 00 11 00 000 111 00 00 11 00 11 00 00 11 00 00 11 000 111 00 11 000 111 00 00 11 000 11 111 000 11 111 00 11 00 11 11 00 11 000 111 00 11 00 11 00 11 00 11 11 00 11 00 11 00 000 111 00 000 11 111 00 00 11 000 111 000 111 00 00 11 00 11 000 111 00 00 11 00 11 00 11 00 11 00 11 11 00 11 11 000 11 111 00 11 11 00 11 000 11 111 00 000 11 111 000 11 111 00 11 00 11 11 00 000 111 00 11 00 00 00 00 00 00 11 000 111 00 11 000 111 00 00 11 000 111 000 111 00 11 00 11 00 11 000 111 00 11 00 11 11 00 11 11 00 11 11 00 11 11 00 11 11 00 11 000 111 00 000 111 00 11 00 000 11 111 000 111 00 00 11 00 11 000 111 00 00 11 00 00 00 11 00 11 00 11 00 11 000 11 111 00 11 11 000 11 111 00 11 11 000 111 00 11 000 11 111 000 11 111 00 11 00 11 11 00 000 111 00 11 00 00 00 11 00 00 00 11 00 000 111 00 11 000 111 00 000 111 00 11 000 111 000 111 00 11 00 11 00 11 000 111 00 11 00 11 11 00 11 11 00 11 11 00 11 11 00 11 11 00 11 00 11 11 111 000 11 00 11 00 11 11 000 11 111 000 111 00 111 000 000 111 00 00 00 000 111 00 00 11 00 11 00 00 11 00 00 11 00 111 000 00 11 000 111 00 000 111 00 000 11 111 000 11 111 11 00 11 00 11 000 111 00 00 11 00 00 00 11 00 11 00 11 00 11 11 00 11

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 observations

Figure 7: Temporary les In this section, we report our nding on the necessity of excluding unnecessary les. We choose LATEX utilities as an example to investigate at rst since it will be a common practice for mobile users to work on mobile computers. We assume that a mobile user will use a text editor, such as emacs, to edit text les and LATEX facility to process as well. The text editor and LATEX facility will create many temporary les that need not be managed under consistency control. These temporary les include backup les, log les and some intermediate output les. In Table 2, we list temporary les and non-temporary les for document processing. We include *.ps in both groups since users may want to have the postscript les exported by xfig managed consistently between the client and the server. However, the nal postscript le need not be managed under consistent control since it can be reproduced at either client/server site. We have investigated many peoples document directories and found that the number and the size of temporary les cannot be ignored. Figure 7 shows some results of our investigation for 20 such document directories. The average number and the average size of temporary les are 49.8% and 56.65%, respectively. Therefore, our adaptive volume can reduce up to 50% of the total number of les 12

to be replicated and kept consistent. The total saving can be up to 50%.

7 Conclusion In this paper, we propose an adaptive volume for data replication using two lter mechanisms, namely, inclusive and exclusive lters. Our performance study shows that a large portion of communication cost can be reduced. Our adaptive volume can be simply extended to support le facilities in other distributed/mobile environments. We will extend our work in the following two directions. First, we plan to analyse the e ectiveness of our adaptive volume when other sharing semantics and the corresponding protocol are used. Second, we plan to enhance the functionality of the adaptive volume and allow adaptive volumes to be freely transferred between systems.

References [1] E. Pitoura and B. Bhargava. Dealing with Mobility: Issues and Research Challenges. Technical report, Department of Computer Sciences, Purdue University, CSD-TR-93-070, November 1993. [2] Russel Sandberg, et al. Design and Implementation of Sun Network File System. In USENIX Summer Conference Proceedings. USENIX Association, June 1985. [3] M. Satyanarayanan, James J. Kistler et. al. Experience with Disconnected Operation in a Mobile Computing Environment. Technical report, School of Computer Science, Carnegie Mellon University, CMU-CS-93-168, June 1993. [4] L. Kalwell Jr., S. Beckhardt, T. Halvorsen and et. al. Replicated document management in a group communication system. In Proceedings of Conference on Computer-Supported Cooperative Work, Porland, Oregon, Sept. 1988. [5] D.C. Oppen and Y.K. Dalal. The Clearinghouse: A decentralized agent for locating named objects in a distributed environment. ACM Transactions on Oce Information Systems 1(3):230-253, July, 1983. [6] C.S.Ellis and R.A.Floyd. The Roe File System. In proceedings of Third Symposium on Reliability in Distributed Software and Database System, page 175-181. IEEE, Oct. 1983. [7] K. Marzullo and F. Schmuck. Supplying High Availability with a Standard Network File System. In proceedings of Eighth International Conference on Distributed Computing Systems, page 447-453. IEEE June 1988. [8] J.K. Ousterhout, H.D. Costa, D. Harrison, and et. al. A trace-driven analysis of the Unix 4.2 BSD le system. Technical Report UCB/CSD 85/230, UCB, 1985. [9] James H. Morris, Mahadev Satyanarayanan and et. al. Andrew: A Distributed Personal Computing Environment. pp.184-201, Communications of the ACM, volume 29, number 3, March 1986. [10] M. Satyanarayanan, J.J. Kistler, P.Kumar and et. al. Coda: A Highly Available File System for a Distributed Workstation Environment. IEEE Transactions on Computer Systems 39(4):447-459, April 1990. 13

[11] Richard G. Guy, John S. Heidemann, Wai Mak, et al. Implementation of the Ficus Replicated File System. In USENIX Conference Proceedings, pp. 63-71, USENIX June, 1990. [12] T.W. Page Jr., R.G. Guy, J.S. Heidemann and et. al. Management of Replicated Volume Location Data in the Ficus Replicated File System. in USENIX Proceedings, USENIX, June 1991. [13] C.D. Tait and D. Duchamp. An Ecient Variable-Consistency Replicated File Service. In proceedings of File System Workshop, USENIX, pp111-126. May 1992, Ann Arbor, MI, USA. [14] J.K. Ousterhout, A.R. Cherenson, F. Douglis, and et. al. The sprite network operating system. IEEE Computer, 23-36, February 1988. [15] G.H. Kuenning, G.J. Popek and P.L. Reiher. An Analysis of Trace Data for Predictive File Caching in Mobile Computing. In Proceedings of the 1994 Summer Usenix Conference. [16] J.S. Heidemann, T.W. Page, R.G. Guy and G.J. Popek. Primarily Disconnected Operation: Experiences with Ficus. in Proceedings of the Second Workshop on Management of Replicated Data, IEEE November 1992. [17] L.B. Huston. Disconnected Operation for AFS. CITI Technical Report 93-3, University of Michigan, USA. [18] M. Satyanarayanan. Mobile Information Access. Technical report CMU-CS-96-107, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA. [19] Geo rey H. Kuenning. The Design of the Seer Predictive Caching System. In proceedings of Mobile Computing Systems and Applications 1994, Santa Cruz, CA, December 1994. [20] Rumor project: http://ficus-www.cs.ucla.edu/rumor/

[21] M.C. Little and D.L. McCue. Construction and Use of a Simulation Package in C++. [22] P. Kumar and M. Satyanarayanan. Flexible and Safe Resolution of File Con icts. In Proceedings of the USENIX Winter 1995 Technical Conference. Jan. 1995, New Orleans, USA. [23] Dan Duchamp and Carl D. Tait. An Interface to Support Lazy Replicated File Service. In Proceedings of The Second Workshop on Management of Replicated Data, IEEE Monterey CA, Nov 1992, pp.6-8. [24] P. Honeyman and L. B. Huston. Communications and Consistency in Mobile File Systems. CITI Technical Report 95-11. University of Michigan, Ann Arbor USA. [25] A.D. Joseph, A.F. deLespinasse, J.A. Tauber et. al. Rover: A Toolkit for Mobile Information Access. In Proceedings of 15th Symposium on Operating Systems Principles, December 1995, Copper Mountain Resort, USA. [26] L. Mummert and M. Satyanarayanan. Variable Granularity Cache Coherence. Open System Review, January 1994, Vol. 28, No. 1, pp. 55-60. [27] L. Mummert and M. Satyanarayanan. Large Granularity Cache Coherence for Intermittent Connectivity. Proceedings of the 1994 Summer USENIX Conference, January 1994, Boston, USA. 14