Energy-Efficient and Performance-Enhanced Disks Using Flash-Memory Cache

Energy-Efﬁcient and Performance-Enhanced Disks Using Flash-Memory Cache∗ Jen-Wei Hsieh Department of Computer Science and Information Engineering Nat...

Author: Clifford Warren

1 downloads 0 Views 186KB Size

Report

Download PDF

Recommend Documents

Synco simple and energyefficient

Resizing Cache and Snapshot Disks of the Nasuni Filer Nasuni Corporation Natick, MA

Using Exadata Flash Disks to Speed up Joins and Sorts

Interactive Rendering using the Render Cache

Predicating Load Latencies Using Cache Profiling

Become happier by using Varnish Cache

Burst of turboengine disks

Memory Hierarchy and Cache. Memory Hierarchy and Cache

Review. Storing Data: Disks and Files. Disks, Memory, and Files. Disks and Files. Disks. Why Not Store Everything in Main Memory?

Protoplanetary Disks

Inexpensive Disks)

si disks

Secondary Memory. Magnetic Disks: CPU. cache memory. Tape Memory. Main Memor y. Disk Memory

Implications on Database Management Systems with using Solid State Disks

zfs - A Scalable distributed File System using Object Disks

zfs -AScalable Distributed File System Using Object Disks

Cache Memory and Performance

External Memory. Memory Hierarchy. Magnetic Disks. Magnetic Disks. Magnetic Disks. Magnetic Disks. Maximum Latency Time. Average Latency Time

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

Lecture 17: Cache wrap-up and Memory Intro. Cache Review

Using the Cache Analysis Tool to Improve I-Cache Utilization on C55x Targets

Frontier Disks. Heavy Offset Disks. Tandem Disks. DH51 DH52 DH53 TM51 Series

Energy-Efﬁcient and Performance-Enhanced Disks Using Flash-Memory Cache∗ Jen-Wei Hsieh

Department of Computer Science and Information Engineering National Chiayi University, Chiayi, Taiwan 60004, ROC

[email protected]

Tei-Wei Kuo

Department of Computer Science and Information Engineering Graduate Institute of Networking and Multimedia National Taiwan University, Taipei, Taiwan 106, ROC

Po-Liang Wu

Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan 106, ROC

[email protected]

[email protected] Yu-Chung Huang

Genesys Logic, Inc. Taipei, Taiwan 231, R.O.C.

[email protected] ABSTRACT

performance, while ReadyDrive [6] enables Windows Vista PCs equipped with a hybrid hard disk (a new type of disk with integrated non-volatile ﬂash memory) to boot up faster, resume from hibernate in less time, preserve battery power, and improve disk reliability. However, ﬂash memory does have several unique characteristics that introduce challenges to the management issues. A NAND ﬂash memory is organized in terms of blocks, where each block is of a ﬁxed number of pages. Data must be written to the free space of ﬂash memory. When a ﬂash memory page is written, the space is no longer available unless it is erased. As a result, out-place-update is usually adopted in the management. A block is the basic unit for erase operations, while reads and writes are processed in terms of pages.1 The typical block size and the page size of a NAND ﬂash memory are 16KB and 512B, respectively.2 After the processing of a large number of page writes, the number of free pages on ﬂash memory would be low. Garbage collection are needed to reclaim invalid pages scattered over blocks (due to out-place update) so that they could become free pages. A ﬂash-memory block has a limitation on erasures, block erased over 106 times might suﬀer from frequent write errors. “Wearlevelling” is usually adopted to erase blocks evenly so that a longer overall lifetime is achieved. One of the most pioneering work in adopting ﬂash memory as a disk cache is done by Marsh et al. [5]. Due to the state of the art at that time, the study had 20MB NOR ﬂash memory as cache for a 40MB hard disk, which implies that an eﬃcient lookup mechanism to locate the cache space of given Logical Block Addresses (LBA’s) for large capacity ﬂash memory was not considered. Another issue of adopting ﬂash memory as a disk cache is its robustness, since ﬂash memory suﬀers from worn-out eﬀect. Diﬀerent from the past work, this work is motivated by the needs of management in caching data for

This work explores the unique characteristics of ﬂash memory in serving as a cache layer for disks. The experiments show that the proposed management scheme could save up to 20% energy consumption while reduce the read response time by the two third and the write response time by the ﬁve sixth of their counterparts. The estimated lifetime of the ﬂashmemory cache is signiﬁcantly improved as well.

Categories and Subject Descriptors C.0 [Computer Systems Organization]: General; B.3.2 [Memory Structure]: Design Styles—Cache memory

General Terms Management

Keywords Flash memory, cache, energy eﬃcient, performance

1. INTRODUCTION Flash memory recently gains a lot of attention in serving as a storage-system alternative (e.g., [1, 3, 4, 8]) or as caches for hard disks. In particular, Windows ReadyBoost [6] lets users use a removable ﬂash memory device to improve system ∗Supported in part by research grants from Taiwan, ROC National Science Council under Grants NSC95-2219-E-002014 and NSC 95R0062-AE00-07.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. ISLPED’07, August 27–29, 2007, Portland, Oregon, USA. Copyright 2007 ACM 978-1-59593-709-4/07/0008 ...$5.00.

1 Note that terms “page” and “block” used here are diﬀerent from those used for disks. As can be seen, the term “page” refers to a unit that is smaller than the unit referred to by the term“block.” 2 Some ﬂash memory adopts 128KB blocks and 2KB pages.

334

: Used/Dead Page

disks, especially when the characteristics of ﬂash memory are considered. Note that well-known caching strategies, such as direct mapped cache and set associative cache [2], would suffer from signiﬁcant deterioration on read/write performance (due to the write-once and wear-levelling features of ﬂash memory), if they are implemented without considering the characteristics of ﬂash memory. This paper presents an efﬁcient lookup mechanism to locate the cached data of given LBA’s over ﬂash memory and have it being integrated with an LRU-based caching strategy. It also considers read and write requests jointly in energy eﬃciency and performance issues. A garbage collection strategy is proposed in an integrated way to consider the hotness of data and the system performance. The capability of the proposed strategies is evaluated by a series of experiments based on realistic workloads. The rest of this paper is organized as follows: Section 2 presents our management schemes, including a joint lookup and caching mechanism, a garbage-collection strategy, and a replacement policy, for a ﬂash-memory cache. Other implementation remarks are also presented. The capability of the proposed management schemes is evaluated by a series of experiments in Section 3. Section 4 is the conclusion.

(LBAi, LBAi + R) Caching Buffer

(LBAj, LBAj + R) Caching Buffer

(LBAk, LBAk + R) Caching Buffer

LBAj d LBA d LBAj R

LBA

Primary Block

Hash1

Primary Block

Overflow Block

Primary Block

Collision!!

Entry Table Hash2

Flash Memory Block

Flash Memory Block

Figure 1: The Organization of Management Information for Cached Data. hash function to an entry in the table, where an example hash function is H(LBA) = (LBA/(K × N P B)) mod EN . Here stretch factor K is any constant no less than 1, and N P B and EN are the number of pages in a block and the number of entries in the table, respectively. A link of caching buﬀers is attached to each entry, and the length of each link might change with the access patterns. Each caching buﬀer, that corresponds to a range of LBA’s (LBAi , LBAi + R), consists of a primary block and an overﬂow block if it exists, where R is any ﬁxed multiple of the number of pages in a block, e.g., K × N P B. (Note that each primary/overﬂow block maps to a physical ﬂash-memory block.) The lookup of a given LBA fails if it is not in the LBA range of any caching buﬀer associated with the hash entry. The lookup of an LBA starts with a hashing to a speciﬁc hash entry and followed by a search of caching buﬀers associated with the entry, as shown in Figure 1. The lookup of the LBA is done by hashing again with a pre-deﬁned hash function to a speciﬁc page in the primary block of the corresponding caching buﬀer. An example hash function in locating the target page is P ageIndex = LBA mod N P B. An overﬂow block is attached to a caching buﬀer if there is an attempt to overwrite the data in the hashed page of the primary block and no overﬂow block is allocated yet. Free pages in an overﬂow block are written sequentially.

2. MANAGEMENT SCHEMES 2.1 Overview The management of ﬂash-memory cache should consider the characteristics of ﬂash memory and the access pattern of users over disks. Three potential situations in caching are considered: (1) When a read request arrives, the LBA of the request must be checked up to see if the corresponding data are in the cache. If the answer is “yes,” the read request can be satisﬁed without accessing any hard disk. (2) If the answer is “no,” the data are retrieved from the corresponding hard disk and then cached in the ﬂash memory for future access. (3) When a write request arrives, the data are cached in the ﬂash memory. No extra action is taken unless a data write back is required. The three potential situations introduce several design and implementations issues. One critical issue is an eﬃcient lookup strategy for a given LBA. Such a strategy is needed to look for any data corresponding to a given LBA on the ﬂash memory, regardless of whether it is for a read or a write. When a write request is considered, we must invalidate an existing copy in the cache if the corresponding data exist in the cache. Another critical issue is on the replacement strategy, when the cache is full, or the ﬂash memory needs garbage collection. A good replacement strategy should reduce the chance of cache missing. Other important issues include an energy-eﬃcient strategy in ﬂushing written data to disks, cache robustness, and cache utilization, etc. In Section 2.2, we shall present data structures and strategies in the management of the ﬂashmemory cache, especially for eﬃcient data lookup when the user access pattern changes dynamically. Section 2.3 proposes our garbage collection and replacement strategies. Section 2.4 discusses a rebuilding procedure for the entry table.

2.2.2 Read Requests and Write Requests When a read request arrives, the LBA of the request is checked up to see if the required data is cached in the ﬂash memory. The corresponding entry of the given LBA is ﬁrst derived by hashing. The corresponding caching buﬀer of the LBA is then derived by searching over associated buﬀers of the entry. If the target caching buﬀer is not found, the data must be retrieved from the corresponding disk and cached for any future access by allocating a new caching buﬀer to the entry. If such a caching buﬀer is found, then the given LBA is searched over the primary block and the overﬂow block to locate the data. If the data is available in the cache, the read request can be satisﬁed immediately without accessing any disk. If it is not found in any of the blocks, then a read operation to a proper disk is needed to retrieved the data, and the retrieved data must be cached. When the data is retrieved from a disk, such information might be useful in preventing disks from being disturbed (from spin-down status) because the system could know the device status and might activate writing of dirty data back to its corresponding disk.

2.2 Data Lookup and Caching 2.2.1

: Free Page

Management Information

The management of ﬂash-memory cache is based on the idea of set associative [2]. An entry table is used to do bookkeeping for data in the cache. Each given LBA is hashed by a

335

free block pointed by the pointer is returned, and the pointer moves to the next free block one-by-one along the circular array. Examples in the allocation of free blocks are as shown in Figure 2.(b).

When a write request arrives, it is checked to see if its LBA exists in any corresponding caching buﬀer. The corresponding caching buﬀer of the LBA is then derived by searching over associated buﬀers. If no such a caching buﬀer exists, a new caching buﬀer is allocated and attached to the corresponding entry of the entry table. We always try to cache the data in the primary block ﬁrst. If the corresponding page of the primary block is occupied by old-version data of the LBA, the page will be invalidated. We then try to cache the data in the ﬁrst available page of the overﬂow block. An overﬂow block is allocated for the caching buﬀer if it does not exist. If there exists an overﬂow block, it must be checked to see if any free page is available. A garbage collection is invoked to reclaim invalid pages of the primary and overﬂow blocks if there is no free page left in the overﬂow block. When the data is cached in the overﬂow block, any page for the old version of the data is invalidated. After garbage collection, the data would be written to the primary and overﬂow blocks, as described above. Note that the corresponding page of the data in the primary block might still be occupied because of a hash collision. In other words, an overﬂow block might still be needed. Finally, the data are cached in the overﬂow block.

5 : Order of Allocation Requests Caching Buffer

Caching Buffer

1

: In Use

4

5

Caching Buffer

3 Caching Buffer

Caching Buffer

Caching Buffer

2 Entry Table

Entry Table

Flash Memory Blocks

Flash Memory Blocks

1 Pointer of Free Blocks

2

3

4

5

Allocation (Find the First Free Block)

Pointer of Free Blocks

(a) Before Allocation.

(b) After Allocation.

Figure 2: Allocations of Free Blocks. To speed up the seeking of any free block and to help in the locating of the LRU caching buﬀer, an access map, that is an array of bits, is introduced to keep the access record. Each bit in the access map corresponds to a unique block in the circular array. When any block of a caching buﬀer is accessed, the corresponding bit is set to 1. A replacement pointer that initially equals to the free pointer moves along the circular queue whenever there is any need to locating an LRU caching buﬀer or to recycle used blocks. When the replacement pointer moves, it stops at the bit with value 0. The caching buﬀer corresponding to the block is considered as the LRU buﬀer and recycled. If the replacement pointer moves on a bit with value 1, the bit is set as 0, and the pointer moves to the next bit, as shown in Figure 3.

Garbage Collection

When there is no free page in an overﬂow block, garbage collection should start to recycle pages occupied by invalid pages of the overﬂow block and its corresponding primary block. If the disk is not spinning down or idle during the garbage collection, data in the blocks that correspond to write requests should be written back to the disk. The strategy of the proposed garbage collection is based on two major ideas: (1) If the disk is spinning down or idle, the system should avoid writing data cached in the blocks back to the disk whenever possible. (2) When valid pages of the two blocks are written back to the caching buﬀer (with new primary and overﬂow blocks), they are written back in an LRU fashion. That is, valid pages in the overﬂow block are written back to the buﬀer earlier than those in the primary block, and valid pages in the overﬂow block are written back to the buﬀer from the bottom to the top of the overﬂow block. A new primary block is allocated and associated with the caching buﬀer, and an overﬂow block is not allocated until necessary. If the disk is spinning down, then all of the data that correspond to writes must be kept in the cache whenever possible. Otherwise, the data should be written to the corresponding disks, and the rest valid pages of the previous primary and overﬂow blocks (correspond to reads) are written back to the new primary and overﬂow blocks of the caching buﬀer in an LRU fashion. The previous primary and overﬂow blocks are then inserted into a queue to erase.

2.3.2

~

: Free

Caching Buffer

2.3 Garbage Collection and Data Replacement 2.3.1

1

: Allocated

1

~

1

2 : Order of Allocation Requests Accessed A

~

2 : Order of Allocation Requests Caching Buffer

Caching Buffer

1

B

A

C

D

D

E

Caching Buffer

Accessed

Caching Buffer

F

E

2

D

1

C

F

A

2

Replacement Pointer

B

Corresponding Caching Buffer Replaced

(a) Before Replacement.

The entry table of caching buﬀers, that changes over time, is used to do bookkeeping for data in the cache. Whenever there exists any problem in allocating a new block, we must execute a replacement strategy to recycle one or more caching buﬀers and their associated blocks. The basic idea is to pick up the LRU caching buﬀer for replacement to avoid any cache miss! Blocks of the ﬂash memory are considered as a circular array, and a free pointer always points to a free block, as shown in Figure 2.(a). Whenever a free block is needed, the

E

D

1

C

F

2

Access Map 1 0 0 0 0 0 0 1 0 0 0

Access Map 1 0 0 1 1 0 0 1 0 0 0

Replacement Strategy

2

Flash Memory Blocks

Flash Memory Blocks B

Caching Buffer

F

Entry Table

Entry Table

E

B C Replaced!

Caching Buffer

Accessed

A

Caching Buffer

1

Replacement Pointer

(b) After Replacement.

Figure 3: The Access Map and Replacement.

2.4 Rebuilding Procedure of the Entry Table When a computer shut down normally, there exists many strategies in accelerating the rebuilding of the entry table. This section illustrates a simple procedure in rebuilding the entry table by scanning blocks on the ﬂash memory without any auxiliary information when the system crashed.

336

2sec~15sec 241sec~1200sec

To create the entry table, we examine blocks with valid pages. If all of the written pages in a block are scattered, then the block must be a primary block. We restore the information of the primary block for its corresponding caching buﬀer and then associate the buﬀer with the corresponding entry. On the other hand, if all of the written pages in a block are written in a sequential order, then the block might be either a primary block or an overﬂow block. Each written page in the block must be checked up to see if its page index is consistent with the one derived from the page-index hashing of its LBA. If there exists any inconsistency, then the block must be an overﬂow block, and the information of the overﬂow block must be restored for the corresponding caching buﬀer; otherwise, the block can be either a primary block or an overﬂow block, depending on the discovery of any block being associated with its corresponding caching block.

T ot a l I dl e Tim e ( se c)

1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0

No Cache

512MB 1024MB 2048MB 4096MB 1024MB 1024MB 1024MB 1024MB 1024MB K=1 K=1 K=1 K=1 K=2 K=4 K=8 Direct Set Mapped Associative Implementation Strategies

Figure 4: The Distribution of Idle Times.

3.1 Experiment Setup This section evaluates the performance of the proposed implementation strategies in energy eﬃciency, read/write response time, and number of block erasures. Four diﬀerent capacities of the ﬂash-memory cache were simulated for the performance evaluation, and impacts of the stretch factor K were explored. The size of a ﬂash-memory block was 16KB, and the number of entries M in the entry table was set to 16384. In addition to comparisons between diﬀerent cache sizes and stretch factors, two well-known caching-management mechanisms, a direct mapped cache and a set associative cache, were simulated for comparison. The trace of data access for performance evaluation was collected over a 80GB hard disk of a personal computer with a 1GB RAM, and an AMD Athlon64 K8-3000+ 939 CPU. The operating system was Windows XP SP2, and the hard disk was formatted as NTFS. Traces were collected by DiskMon3 , and the duration for trace collecting was one month. The workload of the personal computer in accessing the hard disk corresponds to daily use of most people, i.e., web surﬁng, movie playing, peer-to-peer ﬁle sharing, e-mail sending/receiving, and document typesetting/reading/editing. To evaluate the ﬂash-memory cache in a steady state, we used the ﬁrst week trace to ﬁll up the ﬂash-memory cache and collected statistics for the rest of the trace such that the eﬀect of garbage collection could be observed.

3.2 Experiment Results

to the disk could be fulﬁlled by accessing the ﬂash-memory cache, and time intervals between two consecutive disk accesses could be prolonged. Diﬀerent stretch factor K resulted in diﬀerent data placement manner. As K became larger, the disk idle time can be improved. It can be observed that the total idle time achieved by setting K = 8 for a 1024MB ﬂash-memory cache was almost compared to that achieved by having a 4096MB ﬂash-memory cache with K = 1. This was because a large K can prevent a huge but not frequently accessed ﬁle, e.g., a movie chip, from spreading over numerous caching buﬀers. In other words, chances to swap out frequently accessed data when sequentially accessing such a huge ﬁle were reduced when the stretch factor was set large. Due to the ﬂexible management over data placement, the proposed implementation strategy outperformed a direct mapped cache and a set associative cache.

3.2.2 The Energy Efﬁciency In our simulation, energy consumptions under various implementation strategies were derived from the statistic results of disk idle times, the number of disk spin-ups/spin-downs, and the number of ﬂash memory read/write/erase operations. To simplify the estimation, we assume the disk has only two modes, namely active and standby. No matter what action (seek/rotation/transfer) the disk was taken, we assume the consumed power was the same. When no action was taken for 30 seconds, the disk turned from an active mode into an idle mode. Note that a mode transition of the disk requires an extra energy. Detailed parameters of power consumptions were modelled in Table 1. IBM Ultrastar 36Z15 [9] Spin-down Spin-up Active Standby 13J 135J 13.5W 2.5W

The Total Idle Time

Flash Memory [7] Read Write Erase 30mW 60mW 60mW

Table 1: Power Consumption Parameters.

Before we demonstrate the energy eﬃciency under various caching implementation strategies, the distribution of disk idle times, which aﬀects the energy consumption, is worthy to note. Figure 4 shows the impact of diﬀerent implementation strategies on disk idle times. Time intervals between any two consecutive disk accesses were compiled and ranked into six degrees according to the length of time intervals. Note that idle-time intervals less than two seconds were ﬁltered out, since spined the disk down and then spined it up again within two seconds does not help in the power saving. As the cache size became larger, more data can be retained in the ﬂash-memory cache. As a result, many access requests 3

61sec~240sec 2401sec~

1,600,000

3. PERFORMANCE EVALUATION

3.2.1

16sec~60sec 1201sec~2400sec

Figure 5 illustrates the comparison of the energy eﬃciencies under various implementation strategies for the 23-day trace. Suppose the energy consumed by the disk without any ﬂashmemory cache was x, and the energy consumed by the disk with some implementation strategy was y. The saved energy in the ﬁgure was x − y, and we also accordingly derived the saved energy ratio, which is (x − y)/x. The energy eﬃciency was dominated by idle times. A long idle-time interval was superior to several short ones due to less spin-up and spindown overheads, even though total idle times were the same. A longer idle-time interval a disk can stay, a better energy

http://www.sysinternals.com/Utilities/Diskmon.html

337

3,500,000

In the simulation over the 23-day trace, the maximum erase counts among all ﬂash-memory blocks for various implementation strategies are listed in Table 3. Based on these information, the life cycle of the ﬂash-memory cache under diﬀerent implementation strategies could be estimated. The ﬂashmemory cache under the proposed implementation strategy (for cache size = 1024MB and K = 1) could last over 203 years, while a direct mapped cache could only work for 2.4 months and a set associative cache did not function well after 7 months.

19.94%

S a ve d E n e r g y ( Jo u le )

3,000,000 15.7%

14.65%

2,500,000

12.34% 2,000,000

10.84% 8.97%

1,500,000

8.42%

7.9% 5.54%

1,000,000 500,000 0 512MB K=1

1024MB K=1

2048MB K=1

4096MB K=1

1024MB K=2

1024MB K=4

1024MB K=8

1024MB 1024MB Direct Set Mapped Associative

Implementation Strategies

Figure 5: Comparison of Energy Eﬃciencies.

512MB, K = 1 1024MB, K = 1 2048MB, K = 1 4096MB, K = 1 1024MB, K = 2 1024MB, K = 4 1024MB, K = 8 1024MB, Direct Mapped 1024MB, Set Associative

eﬃciency it can achieve. As shown in Figure 5, we could save about 20% energy consumption while adopting a 4GB ﬂash-memory cache for a 80GB disk.

3.2.3

The Number of Block Erasures

Estimated Product Lifetime 110.6 years 203.3 years 350 years 420 years 22.67 years 28.4 months 12.8 months 2.4 months 6.25 months

Table 3: The Estimated Product Lifetime.

Since ﬂash memory has a limitation on the block-erasure count, the distribution of erase counts over ﬂash-memory blocks was deﬁnitely a major evaluation metric. The number of erasures over each ﬂash-memory block is separately accumulated. According to erase counts, ﬂash-memory blocks were sorted into groups. The number of groups and the covered range of erase counts for each group implied the quality of achieved wear-levelling, from which the life cycle of a ﬂashmemory cache can be estimated. A large cache size improved not only the idle time but also the quality of wear-levelling. Erasures over ﬂash-memory blocks were amortized when the cache size became large. Different from idle times, the impact of the cache size on the distribution of erase counts was more predictable. When the size of ﬂash-memory cache was double, the peak in the distribution roughly grew into double, and the range of erase counts roughly became half. On the other hand, although a large stretch factor was beneﬁcial to idle times, it greatly deteriorated the quality of wear levelling. When K became larger, the range of erase counts over ﬂash-memory blocks expanded. In addition, total erasures over ﬂash-memory blocks boosted as well. Table 2 lists total erasures for a 23-day trace in the experiment. Since both a direct mapped cache and a set associative cache do not take out-place-update nature of ﬂash memory into consideration, deviations of erase counts among ﬂash-memory blocks were large. In addition, their suﬀered erasure overheads were also enormous, as shown in Table 2. Note that when K was set large in the proposed strategy, the performance gap on idle times can even widen while the erasure overhead was still superior to a direct mapped cache or a set associative cache. 512MB, K = 1 1024MB, K = 1 2048MB, K = 1 4096MB, K = 1 1024MB, K = 2 1024MB, K = 4 1024MB, K = 8 1024MB, Direct Mapped 1024MB, Set Associative

Maximum Erasure Counts (23-day) 570 310 180 150 2,780 26,600 59,000 314,500 121,000

3.2.4 The Read/Write Response Time Since ﬂash memory is a kind of EEPROM, the ﬂash-memory cache has intrinsic limitation in improving the performance of data accessing. In addition to the penalty of disk access during a cache miss, the ﬂash memory cache suﬀered from the erasure overhead when the utilization of the cache space was high. Without a proper space management, read/write requests could suﬀer from a series of page reads, page writes, block erasures, and even disk accesses. In the proposed implementation strategy, the garbage collection was properly designed such that block erasures could be postponed until a system idle time. To illustrate the read/write performance, the simulation adopts the access parameters of Samsung K9F6408U0A 8MB NAND Flash Memory and Western Digital Caviar WD800JB 80GB 7200RPM 8MB IDE Ultra ATA100 Hard Drive. Their performance characteristics are listed in Table 4. K9F6408U0A Caviar WD800JB

Read 36.55μs 13.1ms

Write 226.65μs 13.1ms

Erase 2ms N/A

Table 4: Performance Characteristics. Figure 6 (a) and (b) compares average read/write response times among diﬀerent cache sizes in terms of a day. As the cache size got larger, a better read/write response time can be achieved. Figure 6 (c) and (d) shows impacts of the stretch factor over the average read/write response time. When the stretch factor became larger, the average read/write response time deteriorated quickly. As shown in the ﬁgure, when K = 8, the average write response time even got worse than the disk without any ﬂash-memory cache. This was because a great deal of erase operations were introduced. Figure 6 (e) compares average read response times among diﬀerent implementation strategies in terms of a day. As shown in the ﬁgure, the proposed strategy could save up to two third of the read response time and save one third of the read response time in average. On the other hand, a direct mapped cache did not improve the read response time in most cases, while the average read performance of a set associative

Total Erasure 15,662,090 14,995,330 14,265,300 14,317,930 73,866,860 272,317,700 406,628,000 418,747,600 296,467,000

Table 2: Comparison of Total Erasures.

338

14

16

16

14

14

2048MB, K=1

4096MB, K=1

K=1

10 8 6 4 2

12 10 8 6 4 2

1024MB, K=4

1024MB, K=8

Day

(a) Cache Size/Read. 2048MB, K=1 4096MB, K=1

7 6 5 4 3 2 1

A ver ag e W ri t e Re sp on se Tim e (m s)

512MB, K=1 1024MB, K=1

(b) Cache Size/Write.

6 4 2

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

(e) Implementation/Read.

16

14

14

12

12 10 8 6 4 1024MB, K=4

2 1024MB, K=1

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

1024MB, K=2

10 8 6 4 2 K=1

1024MB, K=8

0

0

Set Associative

10

(c) Stretch Factor/Read.

9

Direct Mapped

12

0

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A ver ag e Wr it e Re sp on se Tim e (m s)

8

A ve ra g e W r it e Re s p o n se T im e ( m s )

1024MB, K=2

0

0

8

1024MB, K=1

A ve ra ge Re a d Re sp on se Time (m s)

1024MB, K=1

A ver age R ea d Re spo n se Ti me (ms)

A v e r a g e R e a d Re s p o n s e T i m e ( m s )

512MB, K=1 12

Direct Mapped

Set Associative

0

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

(d) Stretch Factor/Write.

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

(f) Implementation/Write.

Figure 6: The Impacts of Cache Size, Stretch Factor, and Diﬀerent Implementation Strategies (Cache Size = 1024MB) on the Average Read/Write Response Time. the proposed ﬂash-memory caching scheme, such that the more realistic experimental results and comparisons with related works (such as ReadyBoost and ReadyDrive of Windows Vista) could be obtained.

cache was among that of the proposed strategy and a direct mapped cache. Figure 6 (f) illustrates the comparison of average write response times between diﬀerent implementation strategies. As shown in the ﬁgure, the proposed strategy could save up to ﬁve sixth of the write response time and save two third of the write response time in average. Although a set associative cache is superior to a direct mapped cache in write response, both of their improvements in write response time were minor.

5. REFERENCES [1] Aleph One Company. Yet Another Flash Filing System. [2] J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1996. [3] J.-W. Hsieh, L.-P. Chang, and T.-W. Kuo. Eﬃcient On-Line Identiﬁcation of Hot Data for Flash-Memory Management. In ACM SAC, pages 838–842, Mar 2005. [4] M-Systems. Flash-memory Translation Layer for NAND ﬂash (NFTL), 1998. [5] B. Marsh, F. Douglis, and P. Krishnan. Flash Memory File Caching for Mobile Computers. In HICSS, pages 451–460, 1994. [6] Microsoft Corporation. Windows Vista. [7] Spansion. 3.0 Volt-only Flash Memory Technology. [8] Y.-L. Tsai, J.-W. Hsieh, and T.-W. Kuo. Conﬁgurable NAND Flash Translation Layer. In IEEE SUTC, June 2006. [9] J. Zedlewski, S. Sobti, N. Garg, F. Zheng, A. Krishnamurthy, and R. Wang. Modeling Hard-Disk Power Consumption. In FAST’03, pages 217–230, Mar 2003.

4. CONCLUSION AND FUTURE WORK This work targets the unique characteristics of ﬂash memory in serving as a cache layer for disks. An eﬃcient data lookup and caching strategy are proposed based on the idea of set associative, but the proposed strategy is more ﬂexible and takes ﬂash-memory nature into consideration. The coupled garbage collection and replacement strategies are also designed accordingly. Diﬀerent stretch factor could result in diﬀerent data placement manner, from which a trade-oﬀ between energy eﬃciency and life cycle can be tuned. Our trace-driven simulation shows that length and frequency of disk idle times could be improved under the proposed strategy, from which up to 20% energy consumption could be saved. In addition, the ﬂash-memory cache under the proposed implementation strategy could last over 203 years, while direct mapped cache could only work for less than three months and set associative cache did not function well after seven months. For data accessing, the proposed strategy could save up to two-third read response time in terms of a day and save one-third read response time in average for a 23-day trace. The performance improvement was even better for writes. The proposed strategy could save up to ﬁve-sixth write response time in terms of a day and save two-third write response time in average for a 23-day trace. For the future work, we shall implement the prototype of

339