Energy-Efficient and Performance-Enhanced Disks Using Flash-Memory Cache

Energy-Efficient and Performance-Enhanced Disks Using Flash-Memory Cache∗ Jen-Wei Hsieh Department of Computer Science and Information Engineering Nat...
Author: Clifford Warren
1 downloads 0 Views 186KB Size
Energy-Efficient and Performance-Enhanced Disks Using Flash-Memory Cache∗ Jen-Wei Hsieh

Department of Computer Science and Information Engineering National Chiayi University, Chiayi, Taiwan 60004, ROC

[email protected]

Tei-Wei Kuo

Department of Computer Science and Information Engineering Graduate Institute of Networking and Multimedia National Taiwan University, Taipei, Taiwan 106, ROC

Po-Liang Wu

Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan 106, ROC

[email protected]

[email protected] Yu-Chung Huang

Genesys Logic, Inc. Taipei, Taiwan 231, R.O.C.

[email protected] ABSTRACT

performance, while ReadyDrive [6] enables Windows Vista PCs equipped with a hybrid hard disk (a new type of disk with integrated non-volatile flash memory) to boot up faster, resume from hibernate in less time, preserve battery power, and improve disk reliability. However, flash memory does have several unique characteristics that introduce challenges to the management issues. A NAND flash memory is organized in terms of blocks, where each block is of a fixed number of pages. Data must be written to the free space of flash memory. When a flash memory page is written, the space is no longer available unless it is erased. As a result, out-place-update is usually adopted in the management. A block is the basic unit for erase operations, while reads and writes are processed in terms of pages.1 The typical block size and the page size of a NAND flash memory are 16KB and 512B, respectively.2 After the processing of a large number of page writes, the number of free pages on flash memory would be low. Garbage collection are needed to reclaim invalid pages scattered over blocks (due to out-place update) so that they could become free pages. A flash-memory block has a limitation on erasures, block erased over 106 times might suffer from frequent write errors. “Wearlevelling” is usually adopted to erase blocks evenly so that a longer overall lifetime is achieved. One of the most pioneering work in adopting flash memory as a disk cache is done by Marsh et al. [5]. Due to the state of the art at that time, the study had 20MB NOR flash memory as cache for a 40MB hard disk, which implies that an efficient lookup mechanism to locate the cache space of given Logical Block Addresses (LBA’s) for large capacity flash memory was not considered. Another issue of adopting flash memory as a disk cache is its robustness, since flash memory suffers from worn-out effect. Different from the past work, this work is motivated by the needs of management in caching data for

This work explores the unique characteristics of flash memory in serving as a cache layer for disks. The experiments show that the proposed management scheme could save up to 20% energy consumption while reduce the read response time by the two third and the write response time by the five sixth of their counterparts. The estimated lifetime of the flashmemory cache is significantly improved as well.

Categories and Subject Descriptors C.0 [Computer Systems Organization]: General; B.3.2 [Memory Structure]: Design Styles—Cache memory

General Terms Management

Keywords Flash memory, cache, energy efficient, performance

1. INTRODUCTION Flash memory recently gains a lot of attention in serving as a storage-system alternative (e.g., [1, 3, 4, 8]) or as caches for hard disks. In particular, Windows ReadyBoost [6] lets users use a removable flash memory device to improve system ∗Supported in part by research grants from Taiwan, ROC National Science Council under Grants NSC95-2219-E-002014 and NSC 95R0062-AE00-07.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED’07, August 27–29, 2007, Portland, Oregon, USA. Copyright 2007 ACM 978-1-59593-709-4/07/0008 ...$5.00.

1 Note that terms “page” and “block” used here are different from those used for disks. As can be seen, the term “page” refers to a unit that is smaller than the unit referred to by the term“block.” 2 Some flash memory adopts 128KB blocks and 2KB pages.

334

: Used/Dead Page

disks, especially when the characteristics of flash memory are considered. Note that well-known caching strategies, such as direct mapped cache and set associative cache [2], would suffer from significant deterioration on read/write performance (due to the write-once and wear-levelling features of flash memory), if they are implemented without considering the characteristics of flash memory. This paper presents an efficient lookup mechanism to locate the cached data of given LBA’s over flash memory and have it being integrated with an LRU-based caching strategy. It also considers read and write requests jointly in energy efficiency and performance issues. A garbage collection strategy is proposed in an integrated way to consider the hotness of data and the system performance. The capability of the proposed strategies is evaluated by a series of experiments based on realistic workloads. The rest of this paper is organized as follows: Section 2 presents our management schemes, including a joint lookup and caching mechanism, a garbage-collection strategy, and a replacement policy, for a flash-memory cache. Other implementation remarks are also presented. The capability of the proposed management schemes is evaluated by a series of experiments in Section 3. Section 4 is the conclusion.

(LBAi, LBAi + R) Caching Buffer

(LBAj, LBAj + R) Caching Buffer

(LBAk, LBAk + R) Caching Buffer

LBAj d LBA d LBAj  R

LBA

Primary Block

Hash1

Primary Block

Overflow Block

Primary Block

Collision!!

Entry Table Hash2

Flash Memory Block

Flash Memory Block

Figure 1: The Organization of Management Information for Cached Data. hash function to an entry in the table, where an example hash function is H(LBA) = (LBA/(K × N P B)) mod EN . Here stretch factor K is any constant no less than 1, and N P B and EN are the number of pages in a block and the number of entries in the table, respectively. A link of caching buffers is attached to each entry, and the length of each link might change with the access patterns. Each caching buffer, that corresponds to a range of LBA’s (LBAi , LBAi + R), consists of a primary block and an overflow block if it exists, where R is any fixed multiple of the number of pages in a block, e.g., K × N P B. (Note that each primary/overflow block maps to a physical flash-memory block.) The lookup of a given LBA fails if it is not in the LBA range of any caching buffer associated with the hash entry. The lookup of an LBA starts with a hashing to a specific hash entry and followed by a search of caching buffers associated with the entry, as shown in Figure 1. The lookup of the LBA is done by hashing again with a pre-defined hash function to a specific page in the primary block of the corresponding caching buffer. An example hash function in locating the target page is P ageIndex = LBA mod N P B. An overflow block is attached to a caching buffer if there is an attempt to overwrite the data in the hashed page of the primary block and no overflow block is allocated yet. Free pages in an overflow block are written sequentially.

2. MANAGEMENT SCHEMES 2.1 Overview The management of flash-memory cache should consider the characteristics of flash memory and the access pattern of users over disks. Three potential situations in caching are considered: (1) When a read request arrives, the LBA of the request must be checked up to see if the corresponding data are in the cache. If the answer is “yes,” the read request can be satisfied without accessing any hard disk. (2) If the answer is “no,” the data are retrieved from the corresponding hard disk and then cached in the flash memory for future access. (3) When a write request arrives, the data are cached in the flash memory. No extra action is taken unless a data write back is required. The three potential situations introduce several design and implementations issues. One critical issue is an efficient lookup strategy for a given LBA. Such a strategy is needed to look for any data corresponding to a given LBA on the flash memory, regardless of whether it is for a read or a write. When a write request is considered, we must invalidate an existing copy in the cache if the corresponding data exist in the cache. Another critical issue is on the replacement strategy, when the cache is full, or the flash memory needs garbage collection. A good replacement strategy should reduce the chance of cache missing. Other important issues include an energy-efficient strategy in flushing written data to disks, cache robustness, and cache utilization, etc. In Section 2.2, we shall present data structures and strategies in the management of the flashmemory cache, especially for efficient data lookup when the user access pattern changes dynamically. Section 2.3 proposes our garbage collection and replacement strategies. Section 2.4 discusses a rebuilding procedure for the entry table.

2.2.2 Read Requests and Write Requests When a read request arrives, the LBA of the request is checked up to see if the required data is cached in the flash memory. The corresponding entry of the given LBA is first derived by hashing. The corresponding caching buffer of the LBA is then derived by searching over associated buffers of the entry. If the target caching buffer is not found, the data must be retrieved from the corresponding disk and cached for any future access by allocating a new caching buffer to the entry. If such a caching buffer is found, then the given LBA is searched over the primary block and the overflow block to locate the data. If the data is available in the cache, the read request can be satisfied immediately without accessing any disk. If it is not found in any of the blocks, then a read operation to a proper disk is needed to retrieved the data, and the retrieved data must be cached. When the data is retrieved from a disk, such information might be useful in preventing disks from being disturbed (from spin-down status) because the system could know the device status and might activate writing of dirty data back to its corresponding disk.

2.2 Data Lookup and Caching 2.2.1

: Free Page

Management Information

The management of flash-memory cache is based on the idea of set associative [2]. An entry table is used to do bookkeeping for data in the cache. Each given LBA is hashed by a

335

free block pointed by the pointer is returned, and the pointer moves to the next free block one-by-one along the circular array. Examples in the allocation of free blocks are as shown in Figure 2.(b).

When a write request arrives, it is checked to see if its LBA exists in any corresponding caching buffer. The corresponding caching buffer of the LBA is then derived by searching over associated buffers. If no such a caching buffer exists, a new caching buffer is allocated and attached to the corresponding entry of the entry table. We always try to cache the data in the primary block first. If the corresponding page of the primary block is occupied by old-version data of the LBA, the page will be invalidated. We then try to cache the data in the first available page of the overflow block. An overflow block is allocated for the caching buffer if it does not exist. If there exists an overflow block, it must be checked to see if any free page is available. A garbage collection is invoked to reclaim invalid pages of the primary and overflow blocks if there is no free page left in the overflow block. When the data is cached in the overflow block, any page for the old version of the data is invalidated. After garbage collection, the data would be written to the primary and overflow blocks, as described above. Note that the corresponding page of the data in the primary block might still be occupied because of a hash collision. In other words, an overflow block might still be needed. Finally, the data are cached in the overflow block.

5 : Order of Allocation Requests Caching Buffer

Caching Buffer

1

: In Use

4

5

Caching Buffer

3 Caching Buffer

Caching Buffer

Caching Buffer

2 Entry Table

Entry Table

Flash Memory Blocks

Flash Memory Blocks

1 Pointer of Free Blocks

2

3

4

5

Allocation (Find the First Free Block)

Pointer of Free Blocks

(a) Before Allocation.

(b) After Allocation.

Figure 2: Allocations of Free Blocks. To speed up the seeking of any free block and to help in the locating of the LRU caching buffer, an access map, that is an array of bits, is introduced to keep the access record. Each bit in the access map corresponds to a unique block in the circular array. When any block of a caching buffer is accessed, the corresponding bit is set to 1. A replacement pointer that initially equals to the free pointer moves along the circular queue whenever there is any need to locating an LRU caching buffer or to recycle used blocks. When the replacement pointer moves, it stops at the bit with value 0. The caching buffer corresponding to the block is considered as the LRU buffer and recycled. If the replacement pointer moves on a bit with value 1, the bit is set as 0, and the pointer moves to the next bit, as shown in Figure 3.

Garbage Collection

When there is no free page in an overflow block, garbage collection should start to recycle pages occupied by invalid pages of the overflow block and its corresponding primary block. If the disk is not spinning down or idle during the garbage collection, data in the blocks that correspond to write requests should be written back to the disk. The strategy of the proposed garbage collection is based on two major ideas: (1) If the disk is spinning down or idle, the system should avoid writing data cached in the blocks back to the disk whenever possible. (2) When valid pages of the two blocks are written back to the caching buffer (with new primary and overflow blocks), they are written back in an LRU fashion. That is, valid pages in the overflow block are written back to the buffer earlier than those in the primary block, and valid pages in the overflow block are written back to the buffer from the bottom to the top of the overflow block. A new primary block is allocated and associated with the caching buffer, and an overflow block is not allocated until necessary. If the disk is spinning down, then all of the data that correspond to writes must be kept in the cache whenever possible. Otherwise, the data should be written to the corresponding disks, and the rest valid pages of the previous primary and overflow blocks (correspond to reads) are written back to the new primary and overflow blocks of the caching buffer in an LRU fashion. The previous primary and overflow blocks are then inserted into a queue to erase.

2.3.2

~

: Free

Caching Buffer

2.3 Garbage Collection and Data Replacement 2.3.1

1

: Allocated

1

~

1

2 : Order of Allocation Requests Accessed A

~

2 : Order of Allocation Requests Caching Buffer

Caching Buffer

1

B

A

C

D

D

E

Caching Buffer

Accessed

Caching Buffer

F

E

2

D

1

C

F

A

2

Replacement Pointer

B

Corresponding Caching Buffer Replaced

(a) Before Replacement.

The entry table of caching buffers, that changes over time, is used to do bookkeeping for data in the cache. Whenever there exists any problem in allocating a new block, we must execute a replacement strategy to recycle one or more caching buffers and their associated blocks. The basic idea is to pick up the LRU caching buffer for replacement to avoid any cache miss! Blocks of the flash memory are considered as a circular array, and a free pointer always points to a free block, as shown in Figure 2.(a). Whenever a free block is needed, the

E

D

1

C

F

2

Access Map 1 0 0 0 0 0 0 1 0 0 0

Access Map 1 0 0 1 1 0 0 1 0 0 0

Replacement Strategy

2

Flash Memory Blocks

Flash Memory Blocks B

Caching Buffer

F

Entry Table

Entry Table

E

B C Replaced!

Caching Buffer

Accessed

A

Caching Buffer

1

Replacement Pointer

(b) After Replacement.

Figure 3: The Access Map and Replacement.

2.4 Rebuilding Procedure of the Entry Table When a computer shut down normally, there exists many strategies in accelerating the rebuilding of the entry table. This section illustrates a simple procedure in rebuilding the entry table by scanning blocks on the flash memory without any auxiliary information when the system crashed.

336

2sec~15sec 241sec~1200sec

To create the entry table, we examine blocks with valid pages. If all of the written pages in a block are scattered, then the block must be a primary block. We restore the information of the primary block for its corresponding caching buffer and then associate the buffer with the corresponding entry. On the other hand, if all of the written pages in a block are written in a sequential order, then the block might be either a primary block or an overflow block. Each written page in the block must be checked up to see if its page index is consistent with the one derived from the page-index hashing of its LBA. If there exists any inconsistency, then the block must be an overflow block, and the information of the overflow block must be restored for the corresponding caching buffer; otherwise, the block can be either a primary block or an overflow block, depending on the discovery of any block being associated with its corresponding caching block.

T ot a l I dl e Tim e ( se c)

1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0

No Cache

512MB 1024MB 2048MB 4096MB 1024MB 1024MB 1024MB 1024MB 1024MB K=1 K=1 K=1 K=1 K=2 K=4 K=8 Direct Set Mapped Associative Implementation Strategies

Figure 4: The Distribution of Idle Times.

3.1 Experiment Setup This section evaluates the performance of the proposed implementation strategies in energy efficiency, read/write response time, and number of block erasures. Four different capacities of the flash-memory cache were simulated for the performance evaluation, and impacts of the stretch factor K were explored. The size of a flash-memory block was 16KB, and the number of entries M in the entry table was set to 16384. In addition to comparisons between different cache sizes and stretch factors, two well-known caching-management mechanisms, a direct mapped cache and a set associative cache, were simulated for comparison. The trace of data access for performance evaluation was collected over a 80GB hard disk of a personal computer with a 1GB RAM, and an AMD Athlon64 K8-3000+ 939 CPU. The operating system was Windows XP SP2, and the hard disk was formatted as NTFS. Traces were collected by DiskMon3 , and the duration for trace collecting was one month. The workload of the personal computer in accessing the hard disk corresponds to daily use of most people, i.e., web surfing, movie playing, peer-to-peer file sharing, e-mail sending/receiving, and document typesetting/reading/editing. To evaluate the flash-memory cache in a steady state, we used the first week trace to fill up the flash-memory cache and collected statistics for the rest of the trace such that the effect of garbage collection could be observed.

3.2 Experiment Results

to the disk could be fulfilled by accessing the flash-memory cache, and time intervals between two consecutive disk accesses could be prolonged. Different stretch factor K resulted in different data placement manner. As K became larger, the disk idle time can be improved. It can be observed that the total idle time achieved by setting K = 8 for a 1024MB flash-memory cache was almost compared to that achieved by having a 4096MB flash-memory cache with K = 1. This was because a large K can prevent a huge but not frequently accessed file, e.g., a movie chip, from spreading over numerous caching buffers. In other words, chances to swap out frequently accessed data when sequentially accessing such a huge file were reduced when the stretch factor was set large. Due to the flexible management over data placement, the proposed implementation strategy outperformed a direct mapped cache and a set associative cache.

3.2.2 The Energy Efficiency In our simulation, energy consumptions under various implementation strategies were derived from the statistic results of disk idle times, the number of disk spin-ups/spin-downs, and the number of flash memory read/write/erase operations. To simplify the estimation, we assume the disk has only two modes, namely active and standby. No matter what action (seek/rotation/transfer) the disk was taken, we assume the consumed power was the same. When no action was taken for 30 seconds, the disk turned from an active mode into an idle mode. Note that a mode transition of the disk requires an extra energy. Detailed parameters of power consumptions were modelled in Table 1. IBM Ultrastar 36Z15 [9] Spin-down Spin-up Active Standby 13J 135J 13.5W 2.5W

The Total Idle Time

Flash Memory [7] Read Write Erase 30mW 60mW 60mW

Table 1: Power Consumption Parameters.

Before we demonstrate the energy efficiency under various caching implementation strategies, the distribution of disk idle times, which affects the energy consumption, is worthy to note. Figure 4 shows the impact of different implementation strategies on disk idle times. Time intervals between any two consecutive disk accesses were compiled and ranked into six degrees according to the length of time intervals. Note that idle-time intervals less than two seconds were filtered out, since spined the disk down and then spined it up again within two seconds does not help in the power saving. As the cache size became larger, more data can be retained in the flash-memory cache. As a result, many access requests 3

61sec~240sec 2401sec~

1,600,000

3. PERFORMANCE EVALUATION

3.2.1

16sec~60sec 1201sec~2400sec

Figure 5 illustrates the comparison of the energy efficiencies under various implementation strategies for the 23-day trace. Suppose the energy consumed by the disk without any flashmemory cache was x, and the energy consumed by the disk with some implementation strategy was y. The saved energy in the figure was x − y, and we also accordingly derived the saved energy ratio, which is (x − y)/x. The energy efficiency was dominated by idle times. A long idle-time interval was superior to several short ones due to less spin-up and spindown overheads, even though total idle times were the same. A longer idle-time interval a disk can stay, a better energy

http://www.sysinternals.com/Utilities/Diskmon.html

337

3,500,000

In the simulation over the 23-day trace, the maximum erase counts among all flash-memory blocks for various implementation strategies are listed in Table 3. Based on these information, the life cycle of the flash-memory cache under different implementation strategies could be estimated. The flashmemory cache under the proposed implementation strategy (for cache size = 1024MB and K = 1) could last over 203 years, while a direct mapped cache could only work for 2.4 months and a set associative cache did not function well after 7 months.

19.94%

S a ve d E n e r g y ( Jo u le )

3,000,000 15.7%

14.65%

2,500,000

12.34% 2,000,000

10.84% 8.97%

1,500,000

8.42%

7.9% 5.54%

1,000,000 500,000 0 512MB K=1

1024MB K=1

2048MB K=1

4096MB K=1

1024MB K=2

1024MB K=4

1024MB K=8

1024MB 1024MB Direct Set Mapped Associative

Implementation Strategies

Figure 5: Comparison of Energy Efficiencies.

512MB, K = 1 1024MB, K = 1 2048MB, K = 1 4096MB, K = 1 1024MB, K = 2 1024MB, K = 4 1024MB, K = 8 1024MB, Direct Mapped 1024MB, Set Associative

efficiency it can achieve. As shown in Figure 5, we could save about 20% energy consumption while adopting a 4GB flash-memory cache for a 80GB disk.

3.2.3

The Number of Block Erasures

Estimated Product Lifetime 110.6 years 203.3 years 350 years 420 years 22.67 years 28.4 months 12.8 months 2.4 months 6.25 months

Table 3: The Estimated Product Lifetime.

Since flash memory has a limitation on the block-erasure count, the distribution of erase counts over flash-memory blocks was definitely a major evaluation metric. The number of erasures over each flash-memory block is separately accumulated. According to erase counts, flash-memory blocks were sorted into groups. The number of groups and the covered range of erase counts for each group implied the quality of achieved wear-levelling, from which the life cycle of a flashmemory cache can be estimated. A large cache size improved not only the idle time but also the quality of wear-levelling. Erasures over flash-memory blocks were amortized when the cache size became large. Different from idle times, the impact of the cache size on the distribution of erase counts was more predictable. When the size of flash-memory cache was double, the peak in the distribution roughly grew into double, and the range of erase counts roughly became half. On the other hand, although a large stretch factor was beneficial to idle times, it greatly deteriorated the quality of wear levelling. When K became larger, the range of erase counts over flash-memory blocks expanded. In addition, total erasures over flash-memory blocks boosted as well. Table 2 lists total erasures for a 23-day trace in the experiment. Since both a direct mapped cache and a set associative cache do not take out-place-update nature of flash memory into consideration, deviations of erase counts among flash-memory blocks were large. In addition, their suffered erasure overheads were also enormous, as shown in Table 2. Note that when K was set large in the proposed strategy, the performance gap on idle times can even widen while the erasure overhead was still superior to a direct mapped cache or a set associative cache. 512MB, K = 1 1024MB, K = 1 2048MB, K = 1 4096MB, K = 1 1024MB, K = 2 1024MB, K = 4 1024MB, K = 8 1024MB, Direct Mapped 1024MB, Set Associative

Maximum Erasure Counts (23-day) 570 310 180 150 2,780 26,600 59,000 314,500 121,000

3.2.4 The Read/Write Response Time Since flash memory is a kind of EEPROM, the flash-memory cache has intrinsic limitation in improving the performance of data accessing. In addition to the penalty of disk access during a cache miss, the flash memory cache suffered from the erasure overhead when the utilization of the cache space was high. Without a proper space management, read/write requests could suffer from a series of page reads, page writes, block erasures, and even disk accesses. In the proposed implementation strategy, the garbage collection was properly designed such that block erasures could be postponed until a system idle time. To illustrate the read/write performance, the simulation adopts the access parameters of Samsung K9F6408U0A 8MB NAND Flash Memory and Western Digital Caviar WD800JB 80GB 7200RPM 8MB IDE Ultra ATA100 Hard Drive. Their performance characteristics are listed in Table 4. K9F6408U0A Caviar WD800JB

Read 36.55μs 13.1ms

Write 226.65μs 13.1ms

Erase 2ms N/A

Table 4: Performance Characteristics. Figure 6 (a) and (b) compares average read/write response times among different cache sizes in terms of a day. As the cache size got larger, a better read/write response time can be achieved. Figure 6 (c) and (d) shows impacts of the stretch factor over the average read/write response time. When the stretch factor became larger, the average read/write response time deteriorated quickly. As shown in the figure, when K = 8, the average write response time even got worse than the disk without any flash-memory cache. This was because a great deal of erase operations were introduced. Figure 6 (e) compares average read response times among different implementation strategies in terms of a day. As shown in the figure, the proposed strategy could save up to two third of the read response time and save one third of the read response time in average. On the other hand, a direct mapped cache did not improve the read response time in most cases, while the average read performance of a set associative

Total Erasure 15,662,090 14,995,330 14,265,300 14,317,930 73,866,860 272,317,700 406,628,000 418,747,600 296,467,000

Table 2: Comparison of Total Erasures.

338

14

16

16

14

14

2048MB, K=1

4096MB, K=1

K=1

10 8 6 4 2

12 10 8 6 4 2

1024MB, K=4

1024MB, K=8

Day

(a) Cache Size/Read. 2048MB, K=1 4096MB, K=1

7 6 5 4 3 2 1

A ver ag e W ri t e Re sp on se Tim e (m s)

512MB, K=1 1024MB, K=1

(b) Cache Size/Write.

6 4 2

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

(e) Implementation/Read.

16

14

14

12

12 10 8 6 4 1024MB, K=4

2 1024MB, K=1

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

1024MB, K=2

10 8 6 4 2 K=1

1024MB, K=8

0

0

Set Associative

10

(c) Stretch Factor/Read.

9

Direct Mapped

12

0

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A ver ag e Wr it e Re sp on se Tim e (m s)

8

A ve ra g e W r it e Re s p o n se T im e ( m s )

1024MB, K=2

0

0

8

1024MB, K=1

A ve ra ge Re a d Re sp on se Time (m s)

1024MB, K=1

A ver age R ea d Re spo n se Ti me (ms)

A v e r a g e R e a d Re s p o n s e T i m e ( m s )

512MB, K=1 12

Direct Mapped

Set Associative

0

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

(d) Stretch Factor/Write.

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

(f) Implementation/Write.

Figure 6: The Impacts of Cache Size, Stretch Factor, and Different Implementation Strategies (Cache Size = 1024MB) on the Average Read/Write Response Time. the proposed flash-memory caching scheme, such that the more realistic experimental results and comparisons with related works (such as ReadyBoost and ReadyDrive of Windows Vista) could be obtained.

cache was among that of the proposed strategy and a direct mapped cache. Figure 6 (f) illustrates the comparison of average write response times between different implementation strategies. As shown in the figure, the proposed strategy could save up to five sixth of the write response time and save two third of the write response time in average. Although a set associative cache is superior to a direct mapped cache in write response, both of their improvements in write response time were minor.

5. REFERENCES [1] Aleph One Company. Yet Another Flash Filing System. [2] J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1996. [3] J.-W. Hsieh, L.-P. Chang, and T.-W. Kuo. Efficient On-Line Identification of Hot Data for Flash-Memory Management. In ACM SAC, pages 838–842, Mar 2005. [4] M-Systems. Flash-memory Translation Layer for NAND flash (NFTL), 1998. [5] B. Marsh, F. Douglis, and P. Krishnan. Flash Memory File Caching for Mobile Computers. In HICSS, pages 451–460, 1994. [6] Microsoft Corporation. Windows Vista. [7] Spansion. 3.0 Volt-only Flash Memory Technology. [8] Y.-L. Tsai, J.-W. Hsieh, and T.-W. Kuo. Configurable NAND Flash Translation Layer. In IEEE SUTC, June 2006. [9] J. Zedlewski, S. Sobti, N. Garg, F. Zheng, A. Krishnamurthy, and R. Wang. Modeling Hard-Disk Power Consumption. In FAST’03, pages 217–230, Mar 2003.

4. CONCLUSION AND FUTURE WORK This work targets the unique characteristics of flash memory in serving as a cache layer for disks. An efficient data lookup and caching strategy are proposed based on the idea of set associative, but the proposed strategy is more flexible and takes flash-memory nature into consideration. The coupled garbage collection and replacement strategies are also designed accordingly. Different stretch factor could result in different data placement manner, from which a trade-off between energy efficiency and life cycle can be tuned. Our trace-driven simulation shows that length and frequency of disk idle times could be improved under the proposed strategy, from which up to 20% energy consumption could be saved. In addition, the flash-memory cache under the proposed implementation strategy could last over 203 years, while direct mapped cache could only work for less than three months and set associative cache did not function well after seven months. For data accessing, the proposed strategy could save up to two-third read response time in terms of a day and save one-third read response time in average for a 23-day trace. The performance improvement was even better for writes. The proposed strategy could save up to five-sixth write response time in terms of a day and save two-third write response time in average for a 23-day trace. For the future work, we shall implement the prototype of

339

Suggest Documents