A Study of Linux File System Evolution

A Study of Linux File System Evolution Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Shan Lu Computer Sciences Department, University ...

Author: Anabel Cook

4 downloads 1 Views 341KB Size

Report

Download PDF

Recommend Documents

Lustre: A SAN File System for Linux

StegFS: A Steganographic File System for Linux

Linux File System (LFS) System Directories

Model-checking Part of a Linux File System *

Linux File System Performance and Safety Advisory

Linux File System and Basic Commands

Linux Kernel Encryption Support for File system

Boot Linux zimage from File System

Journaled File System (JFS) for Linux

A 64-bit, Shared Disk File System for Linux

OPS105 Lab 01 - Exploring A Linux File System

Hierarchical Performance Measurement and Modeling of the Linux File System

W4118: Linux file systems

Evolution of P2P file sharing

Implementation of a Linux Log-Structured File System with a Garbage Collector

XFS: the big storage file system for Linux

Red Hat Enterprise Linux 7 Global File System 2

File Implementation in the Linux

Sequential File Prefetching In Linux

The Evolution of Real-Time Linux

Linux Virtual File System. Wu Chia-Chih Linux Kernel Trace Seminar

A Cached WORM File System

File-System Structure. A Typical File Control Block (FCB) Layered File System in OS. Virtual File Systems. Schematic View of Virtual File System

A Performance Study of Veritas Storage Foundation Cluster File System in a NFS file serving environment. March 2008

A Study of Linux File System Evolution Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Shan Lu Computer Sciences Department, University of Wisconsin, Madison

Abstract

of patches which describe how one version transforms to the next, enables us to carefully analyze how file systems have changed over time. A new type of “systems software archeology” is now possible.

We conduct a comprehensive study of file-system code evolution. By analyzing eight years of Linux file-system changes across 5079 patches, we derive numerous new In this paper, we perform the first comprehensive study (and sometimes surprising) insights into the file-system development process; our results should be useful for both of the evolution of Linux file systems, focusing on six mathe development of file systems themselves as well as the jor and important ones: Ext3 [47], Ext4 [31], XFS [46], Btrfs [30], ReiserFS [13], and JFS [10]. These file sysimprovement of bug-finding tools. tems represent diverse features, designs, implementations and even groups of developers. We examine every file1 Introduction Open-source local file systems, such as Linux Ext4 [31], system patch in the Linux 2.6 series over a period of eight XFS [46], and Btrfs [30], remain a critical component years including 5079 patches. By carefully studying each in the world of modern storage. For example, many re- patch to understand its intention, and then labeling the cent distributed file systems, such as Google GFS [17] and patch accordingly along numerous important axes, we can Hadoop DFS [43], all replicate data objects (and associ- gain deep quantitative insight into the file-system develated metadata) across local file systems. On smart phones, opment process. We can then answer questions such as most user data is managed by a local file system; for ex- “what are most patches for?”, “what types of bugs are ample, Google Android phones use Ext4 [2, 23] and Ap- common?”, and in general gain a new level of insight into ple’s iOS devices use HFSX [34]. Finally, many desktop the common approaches and issues that underlie current users still do not backup their data regularly [21, 29]; in file-system development and maintenance. We make the following high-level observations (§3). A large number of patches (nearly 50%) are maintenance patches, reflecting the constant refactoring work needed to keep code simple and maintainable. The remaining dominant category is bugs (just under 40%, about 1800 bugs), showing how much effort is required to slowly inch towards a “correct” implementation; perhaps this hard labor explains why some have found that the quality of open source projects is better than the proprietary software average [1]. Interestingly, the number of bugs does not die down over time (even for stable file systems), rather ebbing and flowing over time.

this case, the local file system clearly plays a critical role as sole manager of user data. Open-source local file systems remain a moving target. Developed by different teams with different goals, these file systems evolve rapidly to add new features, fix bugs, and improve performance and reliability, as one might expect in the open-source community [38]. Major new file systems are introduced every few years [12, 30, 32, 39, 46]; with recent technology changes (e.g., Flash [11, 18]), we can expect even more flux in this domain. However, despite all the activity in local file system development, there is little quantitative understanding of their code bases. For example, where does the complexity of such systems lie? What types of bugs are common? Which performance features exist? Which reliability features are utilized? These questions are important to answer for different communities: for developers, so that they can improve current designs and implementations and create better systems; for tool builders, so that they can improve their tools to match reality (e.g., by finding the types of bugs that plague existing systems). One way to garner insight into these questions is to study the artifacts themselves. Compared with proprietary software, open source projects provide a rich resource for source code and patch analysis. The fact that every version of Linux is available online, including a detailed set

Breaking down the bug category further (§4), we find that semantic bugs, which require an understanding of file-system semantics to find or fix, are the dominant bug category (over 50% of all bugs). These types of bugs are vexing, as most of them are hard to detect via generic bug detection tools [9, 35]; more complex model checking [52] or formal specification [24] may be needed. Concurrency bugs are the next most common (about 20% of bugs), more prevalent than in user-level software [26, 42, 45]. Within this group, atomicity violations and deadlocks dominate. Kernel deadlocks are common (many caused by incorrectly using blocking kernel functions), hinting that recent research [22, 49] might be needed in-kernel. The remaining bugs are split relatively 1

USENIX Association

11th USENIX Conference on File and Storage Technologies (FAST ’13) 31

2

evenly across memory bugs and improper error-code handling. In the memory bug category, memory leaks and null-pointer dereferences are common; in the error-code category, most bugs simply drop errors completely [19]. We also categorize bugs along other axes to gain further insight. For example, when broken down by consequence, we find that most of the bugs we studied lead to crashes or corruption, and hence are quite serious; this result holds across semantic, concurrency, memory, and error code bugs. When categorized by data structure, we find that B-trees, present in many file systems for scalability, have relatively few bugs per line of code. When classified by whether bugs occur on normal or failure-handling paths, we make the following important discovery: nearly 40% of all bugs occur on failure-handling paths. File systems, when trying to react to a failed memory allocation, I/O error, or some other unexpected condition, are highly likely to make further mistakes, such as incorrect state updates and missing resource releases. These mistakes can lead to corruption, crashes, deadlocks and leaks. Future system designs need better tool or language support to make these rarely-executed failure paths correct. Finally, while bug patches comprise most of our study, performance and reliability patches are also prevalent, accounting for 8% and 7% of patches respectively (§5). The performance techniques used are relatively common and widespread (e.g., removing an unnecessary I/O, or downgrading a write lock to a read lock). About a quarter of performance patches reduce synchronization overheads; thus, while correctness is important, performance likely justifies the use of more complicated and time saving synchronization schemes. In contrast to performance techniques, reliability techniques seem to be added in a rather ad hoc fashion (e.g., most file systems apply sanity checks non-uniformly). Inclusion of a broader set of reliability techniques could harden all file systems. Beyond these results, another outcome of our work is an annotated dataset of file-system patches, which we make publicly available for further study (at this URL: pages.cs.wisc.edu/˜ll/fs-patch) by file-system developers, systems-language designers, and bug-finding tool builders. We show the utility of PatchDB by performing a case study (§6); specifically, we search the dataset to find bugs, performance fixes, and reliability techniques that are unusually common across all file systems. This example brings out one theme of our study, which is that there is a deep underlying similarity in Linux local file systems, even though these file systems are significantly different in nature (e.g., designs, features, and groups of developers). The commonalities we do find are good news: by studying past bug, performance, and reliability patches, and learning what issues and challenges lie therein, we can greatly improve the next generation of file systems and tools used to build them.

Methodology

In this section, we first give a brief description of our target file systems. Then, we illustrate how we analyze patches with a detailed example. Finally, we discuss the limitations of our methodology.

2.1

Target File Systems

2.2

Classiﬁcation of File System Patches

Our goal in selecting a collection of disk-based file systems is to choose the most popular and important ones. The selected file systems should include diverse reliability features (e.g., physical journaling, logical journaling, checksumming, copy-on-write), data structures (e.g., hash tables, indirect blocks, extent maps, trees), performance optimizations (e.g., asynchronous thread pools, scalable algorithms, caching, block allocation for SSD devices), advanced features (e.g., pre-allocation, snapshot, resize, volumes), and even a range of maturity (e.g., stable, under development). For these reasons, we selected six file systems and their related modules: Ext3 with JBD [47], Ext4 with JBD2 [31], XFS [46], Btrfs [30], ReiserFS [13], and JFS [10]. Ext3, JFS, ReiserFS and XFS were all stable and in production use before the Linux 2.6 kernel. Ext4 was introduced in Linux 2.6.19 and marked stable in Linux 2.6.28. Btrfs was added into Linux 2.6.29 and is still under active development. For each file system, we conduct a comprehensive study of its evolution by examining all patches from Linux 2.6.0 (Dec ’03) to 2.6.39 (May ’11). These are Linux mainline versions, which are released every three months with aggregate changes included in change logs. Patches consist of all formal modifications in each new kernel version, including new features, code maintenance, and bug fixes, and usually contain clear descriptions of their purpose and rich diagnostic information. On the other hand, Linux Bugzilla [3] and mailing lists [4, 5] are not as well organized as final patches, and may only contain a subset or superset of final changes merged in kernel. To better understand the evolution of different file systems, we conduct a broad study to answer three categories of fundamental questions: • Overview: What are the common types of patches in file systems and how do patches change as file systems evolve? Do patches of different types have different sizes? • Bugs: What types of bugs appear in file systems? Do some components of file systems contain more bugs than others? What types of consequences do different bugs have? • Performance and Reliability: What techniques are used by file systems to improve performance? What common reliability enhancements are proposed in file systems? 2

32 11th USENIX Conference on File and Storage Technologies (FAST ’13)

USENIX Association

Type Bug

[PATCH] fix possible NULL pointer in fs/ext3/super.c. In fs/ext3/super.c::ext3 get journal() at line 1675 ‘journal’ can be NULL, but it is not handled right (detect by Coverity’s checker).

Performance

--/fs/ext3/super.c +++ /fs/ext3/super.c @@ -1675,6 +1675,7 @@ journal_t *ext3_get_journal()

Reliability Feature

1 if (!journal){ 2 printk(KERN_ERR "EXT3: Could not load ... "); 3 iput(journal_inode); 4 + return NULL; 5 } 6 journal->j_private = sb;

Maintenance

Description Fix existing bugs Propose more efficient designs or implementations to improve performance (e.g., reducing synchronization overhead or use tree structures) Improve file-system robustness (e.g., data integrity verification, user/kernel pointer annotations, access-permission checking) Implement new features Maintain the code and documentation (e.g., adding documentation, fix compiling error, changing APIs)

Table 1: Patch Type. This table describes the classification and

Figure 1: An Example Patch. An Ext3 patch.

definition of file-system patches.

3

To answer these questions, we manually analyzed each patch to understand its purpose and functionality, examining 5079 patches from the selected Linux 2.6 file systems. Each patch contains a patch header, a description body, and source-code changes. The patch header is a high-level summary of the functionality of the patch (e.g., fixing a bug). The body contains more detail, such as steps to reproduce the bug, system configuration information, proposed solutions, and so forth. Given these details and our knowledge of file systems, we categorize each patch along a number of different axes, as described later. Figure 1 shows a real Ext3 patch. We can infer from the header that this patch fixes a null-pointer dereference bug. The body explains the cause of the null-pointer dereference and the location within the code. The patch also indicates that the bug was detected with Coverity [9]. This patch is classified as a bug (type=bug). The size is 1 (size=1) as one line of code is added. From the related source file (super.c), we infer the bug belongs to Ext3’s superblock management (data-structure=super). A nullpointer access is a memory bug (pattern=memory,nullptr) and can lead to a crash (consequence=crash). However, some patches have less information, making our analysis harder. In these cases, we sought out other sources of information, including design documents, forum and mailing-list discussions, and source-code analysis. Most patches are analyzed with high confidence given all the available information and our domain knowledge. Examples are shown throughout to give more insight as to how the classification is performed. Limitations: Our study is limited by the file systems we chose, which may not reflect the characteristics of other file systems, such as other non-Linux file systems and flash-device file systems. We only examined kernel patches included in Linux 2.6 mainline versions, thus omitting patches for Ext3, JFS, ReiserFS, and XFS from Linux 2.4. As for bug representativeness, we only studied the bugs reported and fixed in patches, which is a biased subset; there may be (many) other bugs not yet reported. A similar study may be needed for user-space utilities, such as mkfs and fsck [33].

Patch Overview

File systems evolve through patches. A large number of patches are discussed and submitted to mailing lists, bugreport websites, and other forums. Some are used to implement new features, while others fix existing bugs. In this section, we investigate three general questions regarding file-system patches. First, what are file-system patch types? Second, how do patches change over time? Lastly, what is the distribution of patch sizes?

3.1

Patch Type

We classify patches into five categories (Table 1): bug fixes (bug), performance improvements (performance), reliability enhancements (reliability), new features (feature), and maintenance and refactoring (maintenance). Each patch usually belongs to a single category. Figure 2(a) shows the number and relative percentages of patch types for each file system. Note that even though file systems exhibit significantly different levels of patch activity (shown by the total number of patches), the percentage breakdowns of patch types are relatively similar. Maintenance patches are the largest group across all file systems (except Btrfs, a recent and not-yet-stable file system). These patches include changes to improve readability, simplify structure, and utilize cleaner abstractions; in general, these patches represent the necessary costs of keeping a complex open-source system well-maintained. Because maintenance patches are relatively uninteresting, we do not examine them further. Bug patches have a significant presence, comprising nearly 40% of patches. Not surprisingly, the Btrfs has a larger percentage of bug patches than others; however, stable and mature file systems (such as Ext3) also have a sizable percentage of bug patches, indicating that bug fixing is a constant in a file system’s lifetime (Figure 5). Because this class of patch is critical for developers and tool builders, we characterize them in detail later (§4). Both performance and reliability patches occur as well, although with less frequency than maintenance and bug patches. They reveal a variety of techniques used by different file systems, motivating further study (§5). 3

USENIX Association

11th USENIX Conference on File and Storage Technologies (FAST ’13) 33

(a) Patch Type

1786

80

158

229

358

450

511

Reliability

Feature

Percentage

0.6 0.4

1

10

100 Lines of Code

1000

10000

Figure 3: Patch Size. This figure shows the size distribution for

All

JFS

0.0

different patch types, in terms of lines of modifications.

(b) Bug Pattern Figure 2: Patch Type and Bug Pattern. This figure shows the distribution of patch types and bug patterns. The total number of patches is on top of each bar.

Name balloc dir extent file inode trans super tree other

Finally, feature patches account for a small percentage of total patches; as we will see, most of feature patches contain more lines of code than other patches. Summary: Nearly half of total patches are for code maintenance and documentation; a significant number of bugs exist in not only new file systems, but also stable file systems; all file systems make special efforts to improve their performance and reliability; feature patches account for a relatively small percentage of total patches.

3.2

Performance

0.2

Reiser

0%

Bug

1.0 0.8

Ext3

0%

Error Code

Btrfs

20%

All

20%

JFS

40%

Reiser

40%

Ext3

60%

Btrfs

60%

XFS

80%

Ext4

80%

Concurrency

Memory

XFS

100%

Semantic

Ext4

5079

Reliability

191

384

537

809

Maintenance

1154

Performance

Feature

2004

100%

Bug

Description Data block allocation and deallocation Directory management Contiguous physical blocks mapping File read and write operations Inode-related metadata management Journaling or other transactional support Superblock-related metadata management Generic tree structure procedures Other supporting components (e.g., xattr, ioctl, resize)

Table 2: Logical Components. This table shows the classification and definition of file-system logical components.

tend to have larger bug patches (e.g., Btrfs and XFS) (not shown due to lack of space). Interestingly, feature patches are significantly larger than other patch types. Over 50% of these patches have more than 100 lines of code; 5% have over 1000 lines of code. Summary: Bug patches are generally small; complicated file systems have larger bug patches; reliability and performance patches are medium-sized; feature patches are significantly larger than other patch types.

Patch Trend

File systems change over time, integrating new features, fixing bugs, and enhancing reliability and performance. Does the percentage of different patch types increase or decrease with time? We studied the changes in patches over time and found few changes (not shown). While the number of patches per version increased in general, the percentage of maintenance, bug, reliability, performance, and feature patches remained relatively stable. Although there were a few notable exceptions (e.g., Btrfs had a time where a large number of performance patches were added), the statistics shown in the previous section are relatively good summaries of the behavior at any given time. Perhaps most interestingly, bug patches do not decrease over time; living code bases constantly incorporate bug fixes (see §4). Summary: The patch percentages are relatively stable over time; newer file systems (e.g., Btrfs) deviate occasionally; bug patches do not diminish despite stability.

4

File System Bugs

In this section, we study file-system bugs in detail to understand their patterns and consequences comprehensively. First, we show the distribution of bugs in filesystem logical components. Second, we describe our bug pattern classification, bug trends, and bug consequences. Finally, we analyze each type of bug with a more detailed classification and a number of real examples.

4.1

Correlation Between Code and Bugs

The code complexity of file systems is growing. FFS had only 1200 lines of code [32]; modern systems are notably 3.3 Patch Size larger, including Ext4 (29K LOC), Btrfs (47K LOC), and Patch size is one approximate way to quantify the com- XFS (64K LOC). Several fundamental questions are gerplexity of a patch, and is defined here as the sum of lines mane: How is the code distributed among different logiof added and deleted by a patch. Figure 3 displays the size cal components? Where are the bugs? Does each logical distribution of bug, performance, reliability, and feature component have an equal degree of complexity? patches. Most bug patches are small; 50% are less than File systems generally have similar logical compo10 lines of code. However, more complex file systems nents, such as inodes, superblocks, and journals. To en4 34 11th USENIX Conference on File and Storage Technologies (FAST ’13)

USENIX Association

0.3

Btrfs

0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 Percentage of Code 0.4

ReiserFS

Semantic

0.4

Ext4

0.2

Concurrency

0.0 0.0 0.1 0.2 0.3 0.4 Percentage of Code

0.3

Type

inode

0.1 0.0 0.0 0.1 0.2 0.3 0.4 Percentage of Code

0.3

Memory

0.1

0.4

file other

Ext3

0.2 0.1

Error Code

0.2

Percentage of Bugs

XFS

extent tree

Percentage of Bugs

0.3

dir super

Percentage of Bugs

Percentage of Bugs

0.4

Percentage of Bugs

0.4

Percentage of Bugs

balloc trans

0.0 0.0 0.1 0.2 0.3 0.4 Percentage of Code 0.4

Sub-Type State Logic Config I/O Timing Generic Atomicity Order Deadlock Miss unlock Double unlock Wrong lock Resource leak Null pointer Dangling Pt Uninit read Double free Buf overflow Miss Error Wrong Error

Description Incorrectly update or check file-system state Wrong algorithm/assumption/implementation Missed configuration Wrong I/O requests order Generic semantic bugs: wrong type, typo The atomic property for accesses is violated The order of multiple accesses is violated Deadlock due to wrong locking order Miss a paired unlock Unlock twice Use the wrong lock Fail to release memory resource Dereference null pointer Dereference freed memory Read uninitialized variables Free memory pointer twice Overrun a buffer boundary Error code is not returned or checked Return or check wrong error code

Table 3: Bug Pattern Classiﬁcation.

This table shows the

classification and definition of file-system bugs.

JFS

Second, transactional code represents a substantial percentage of each code base (as shown by the relatively high 0.2 0.2 x-axis values) and, for most file systems, has a propor0.1 0.1 tional amount of bugs. This relationship holds for Ext3 as well, even though Ext3 uses a separate journaling module 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 (JBD); Ext4 (with JBD2) has a slightly lower percentage Percentage of Code Percentage of Code of bugs because it was built upon a more stable JBD from Figure 4: File System Code and Bug Correlation. This Linux 2.6.19. In summary, transactions continue to be figure shows the correlation between code and bugs. The x-axis shows a double-edged sword in file systems: while transactions the average percent of code of each component (over all versions); the improve data consistency in the presence of crashes, they y-axis shows the percent of bugs of each component (over all versions). often add many bugs due to their large code bases. Third, the percentage of bugs in tree components of able fair comparison, we partition each file system into XFS, Btrfs, ReiserFS, and JFS is surprisingly small comnine logical components (Table 2). Figure 4 shows the percentage of bugs versus the per- pared to code size. One reason may be the care taken centage of code for each of the logical components across to implement such trees (e.g., the tree code is the only all file systems and versions. Within a plot, if a point is portion of ReiserFS filled with assertions). File systems above the y = x line, it means that a logical component should be encouraged to use appropriate data structures, (e.g., inodes) has more than its expected share of bugs, even if they are complex, because they do not induce an hinting at its complexity; a point below said line indicates inordinate amount of bugs. Although bug patches also relate to feature patches, it is a component (e.g., a tree) with relatively few bugs per line of code, thus hinting at its relative ease of implementation. difficult to correlate them precisely. Code changes partly We make the following observations. First, for all file or totally overlap each other overtime. A bug patch may systems, the file, inode, and super components have a high involve both old code and recent feature patches. Summary: The file, inode, and superblock compobug density. The file component is high in bug density einents contain a disproportionally large number of bugs; ther due to bugs on the fsync path (Ext3) or custom file I/O transactional code is large and has a proportionate number routines added for higher performance (XFS, Ext4, Reisof bugs; tree structures are not particularly error-prone, erFS, JFS), particularly so for XFS, which has a custom and should be used when needed without much worry. buffer cache and I/O manager for scalability [46]. The inode and superblock are core metadata structures with rich and important information for files and file systems, 4.2 Bug Patterns which are widely accessed and updated; thus, it is per- To build a more reliable file system, it is important to haps unsurprising that a large number of bugs arise therein understand the type of bugs that are most prevalent and (e.g., forgetting to update a time field in an inode, or not the typical patterns across file systems. Since different types of bugs require different approaches to detect and properly using a superblock configuration flag). 0.3

0.3

5 USENIX Association

11th USENIX Conference on File and Storage Technologies (FAST ’13) 35

Semantic

Concurrency

XFS

Number of Bugs

40

20

10

10 0

10

Number of Bugs

20

30

40

Btrfs

80

0

0

10

20

30

40

Ext3

15

60

Table 4: Bug Consequence Classiﬁcation. This table shows

10

40

the definitions of various bug consequences.

5

20 0

10

20

30

40

ReiserFS

40

Number of Bugs

Corruption

30

20

0

Description On-disk or in-memory data structures are corrupted (e.g., file data or metadata corruption, wrong statistics) File system becomes unusable Crash (e.g., dereference null pointer, assertion failures, panics) Operation failure or unexpected error code returned Error (e.g., failed write operation due to ENOSPC error) Deadlock Wait for resources in circular chain File system makes no progress Hang (e.g., infinite loop, live lock) System resources are not freed after usage Leak (e.g., forget to free allocated file-system objects) Diverts from expectation, excluding the above ones Wrong (e.g., undefined behavior, security vulnerability) Type

Error Code

Ext4

40

30

0

Memory

0

0

10

20

30

40

30

40

been developed to detect memory bugs [9, 35], and some of them are used to detect file-system bugs. Error code bugs account for only 10% of total bugs. Summary: Beyond maintenance, bug fixes are the most common patch type; over half of file-system bugs are semantic bugs, likely requiring domain knowledge to find and fix; file systems have a higher percentage of concurrency bugs compared with user-level software; memory and error code bugs arise but in smaller percentages.

JFS

10

30 20

5

10 0

0

10

20

30

Linux Version

40

0

0

10

20

Linux Version

4.3

Bug Trends

File systems mature from the initial development stage to the stable stage over time, by applying bug-fixing, performance and reliability patches. Various bug detection pattern evolution for each file system over all versions. and testing tools are also proposed to improve file-system fix, these fine-grained bug patterns provide useful infor- stability. A natural question arises: do file-system bug mation to developers and tool builders alike. patterns change over time, and in what way? Our results (Figure 5) show that within bugs, the relWe partition file-system bugs into four categories based on their root causes as shown in Table 3. The four ma- ative percentage of semantic, concurrency, memory, and jor categories are semantic [26, 44], concurrency [16, 28], error code bugs varies over time, but does not converge; a great example is XFS, which under constant development memory [14, 26, 44], and error code bugs [19, 40]. Figure 2(b) (page 4) shows the total number and per- goes through various cycles of higher and lower numbers centage of each type of bug across file systems. There are of bugs. Interesting exceptions occasionally arise (e.g., about 1800 total bugs, providing a great opportunity to ex- the BKL removal from ReiserFS led to a large increase in plore bug patterns at scale. Semantic bugs dominate other concurrency bugs in 2.6.33). JFS does experience a detypes (except for ReiserFS). Most semantic bugs require cline in bug patches, perhaps due to its decreasing usage file-system domain knowledge to understand, detect, and and development [6]. JFS and ReiserFS both have relfix; generic bug-finding tools (e.g., Coverity [9]) may atively small developer and user bases compared to the have a hard time finding these bugs. Concurrency bugs more active file systems XFS, Ext4 and Btrfs. Summary: Bug patterns do not change significantly account for about 20% on average across file systems (exover time, increasing and decreasing cyclically; large decept for ReiserFS), providing a stark contrast to user-level viations arise due to major structural changes. software where fewer than 3% of bugs are concurrencyrelated [26, 42, 45]. ReiserFS stands out along these measures because of its transition, in Linux 2.6.33, away from 4.4 Bug Consequences the Big Kernel Lock (BKL), which introduced a large As shown in Figure 2(b) (on page 4), there are a signifinumber of concurrency bugs. There are also a fair num- cant number of bugs in file systems. But how serious are ber of memory-related bugs in all file systems; their per- these file-system bugs? We now categorize each bug by centages are lower than that reported in user-level soft- impact; such bug consequences include severe ones (data ware [26, 45]. Many research and commercial tools have corruption, system crashes, unexpected errors, deadlocks, Figure 5: Bug Pattern Evolution. This figure shows the bug

6 36 11th USENIX Conference on File and Storage Technologies (FAST ’13)

USENIX Association

(a) By File Systems

Bug Pattern Examples and Analysis

To gain further insight into the different classes of bugs, we now describe each class in more detail. We present examples of each and further break down each major class (e.g., memory bugs) into smaller sub-classes (e.g., leaks, null-pointer dereferences, dangling pointers, uninitialized reads, double frees, and buffer overflows).

1786

168

133

73

185

250

363

661

4.5

4.5.1 Semantic Bugs Semantic bugs are dominant in file systems, as shown in Figure 2(b). Understanding these bugs often requires file-system domain knowledge. Semantic bugs usually are difficult to categorize in an informative and general way. However, we are the first to identify several common types of file-system specific semantic bugs based on extensive analysis and careful generalization of many semantic bugs across file systems. These common types and typical patterns provide useful guidelines for analysis and detection of file-system semantic bugs. We partition the semantic bugs into five categories as described in Table 3, including state, logic, config, I/O timing and generic. Figure 7(a) shows the percentage breakdown and total number of semantic bugs; each is explained in detail below. File systems maintain a large amount of in-memory and on-disk state. Generally, operations transform the file system from one consistent state to another; a mistaken state update or access may lead to serious consequences. As shown in Figure 7(a), these state bugs contribute to roughly 40% of semantic bugs. An example of a state bug is shown in S1 of Table 5 (on page 9), which misses an inode-field update. Specifically, the buggy version of ext3 rename() does not update the mtime and ctime of the directory into which the file is moved, leaving metadata in an incorrect state. There are also numerous logic bugs, which arise via the use of wrong algorithms, bad assumptions, and incorrect implementations. An example of a wrong algorithm is shown in S2 of Table 5: find group other() tries to find a block group for inode allocation, but does not check all candidate groups; the result is a possible ENOSPC error even when the file system has free inodes. File system behavior is also affected by various configuration parameters, such as mount options and special hardware support. Unfortunately, file systems often forget or misuse such configuration information (about 10% to 15% of semantic bugs are of this flavor). A semantic configuration bug is shown in S3 of Table 5; when Ext4 loads the journal from disk, it forgets to check if the device is read-only before updating the on-disk superblock. Correct I/O request ordering is critical for crash consistency in file systems. The I/O timing category contains bugs involving incorrect I/O ordering. For example, in ordered journal mode, a bug may flush metadata to disk before the related data blocks are persisted. We found

All_Bug

Wrong

Leak

Hang

0%

DeadL

0%

Error Code

Error

20%

All

20%

JFS

40%

Reiser

40%

Ext3

60%

Btrfs

60%

XFS

80%

Ext4

80%

Concurrency

Memory

Crash

100%

Semantic

Corrupt

1833

80

Deadlock

166

100%

235

Wrong

366

Error

Leak

461

Crash

Hang

525

Corrupt

(b) By Bug Patterns

Figure 6: Bug Consequences. This figure displays the break-

down of bug consequences for file systems and bug patterns. The total number of consequences is shown on top of each bar. A single bug may cause multiple consequences; thus, the number of consequences instances is slightly higher than that of bugs in Figure 2(b).

system hangs and resource leaks), and other wrong behaviors. Table 4 provides more detail on these categories. Figure 6(a) shows the per-system breakdowns. Data corruption is the most predominant consequence (40%), even for well-tested and mature file systems. Crashes account for the second largest percentage (20%); most crashes are caused by explicit calls to BUG() or Assert() as well as null-pointer dereferences. If the patch mentions that the crash also causes corruption, then we classify this bug with multiple consequences. Unexpected errors and deadlocks occur quite frequently (just under 10% each on average), whereas other bug consequences arise less often. For example, exhibiting the wrong behavior without more serious consequences accounts for only 5-10% of consequences in file systems, whereas it is dominant in user applications [26]. Given that file-system bugs are serious bugs, we were curious: do certain bug types (e.g., semantic, concurrency, memory, or error code) exhibit different levels of severity? Figure 6(b) shows the relationship between consequences and bug patterns. Semantic bugs lead to a large percentage of corruptions, crashes, errors, hangs, and wrong behaviors. Concurrency bugs are responsible for nearly all deadlocks (almost by definition) and a fair percentage of corruptions and hangs. Memory bugs lead to many memory leaks (as expected) and a fair amount of crashes. Finally, error code bugs lead to a relatively small percentage of corruptions, crashes, and (unsurprisingly) errors. Summary: File system bugs cause severe consequences; corruptions and crashes are most common; wrong behavior is uncommon; semantic bugs can lead to significant amounts of corruptions, crashes, errors, and hangs; all bug types have severe consequences. 7 USENIX Association

11th USENIX Conference on File and Storage Technologies (FAST ’13) 37

(a) Semantic Bugs

(b) Concurrency Bug

(c) Memory Bugs

179

12

8

27

Reiser

All

50

Ext3

JFS

38

Btrfs

All

JFS

0%

Reiser

0%

Ext3

0%

Btrfs

0%

XFS

20%

Ext4

20%

All

20%

JFS

20%

Reiser

40%

Ext3

40%

Btrfs

40%

XFS

40%

Ext4

60%

All

60%

JFS

60%

Reiser

60%

Ext3

80%

Btrfs

80%

XFS

80%

Ext4

80%

44

100%

Wrong Error

XFS

Miss Error

Ext4

219

15

13

26

47

62

100%

56

Overflow

366

Dangling

Double_F

21

Null

Uninit_R

67

Leak

Wrong_L

38

Order

Double_UL

65

Deadlock

Miss_UL

80

100%

Atomicity

95

1022

36

Config

66

138

196

Generic

270

Logic

I/O Timing

316

100%

State

(d) Error Code Bugs

Figure 7: Detailed Bug Patterns. The detailed classification for each bug pattern; total number of bugs is shown on top of each bar. that only a small percentage of semantic bugs (3-9%) are I/O timing bugs; however, these bugs can lead to potential data loss or corruption. A fair amount of generic bugs also exist in all file systems, such as using the wrong variable type or simple typos. These bugs are general coding mistakes (such as comparing unsigned variable with zero [48]), and may be fixed without much file-system knowledge. Summary: Incorrect state update and logic mistakes dominate semantic bug patterns; configuration errors are also not uncommon; incorrect I/O orderings are rare (but can have serious consequences); generic bugs require the least file-system knowledge to understand.

Many deadlocks are found in ReiserFS, once again due to the BKL. The BKL could be acquired recursively; replacing it introduced a multitude of locking violations, many of which led to deadlock. A typical memory-related deadlock is shown in C2 of Table 5. Btrfs uses extent readpages() to read free space information; however, it should not use GFP KERNEL flag to allocate memory, since the VM memory allocator kswapd will recursively call into filesystem code to free memory. The fix changes the flag to GFP NOFS to prevent VM re-entry into file-system code. The remaining four categories account for a small percentage. Missing unlocks happen mostly in exit or failure paths (e.g., putting resource releases at the end of functions with goto statements). C3 of Table 5 shows a missing-unlock bug. ext3 group add() locks super block (line 1) but forgets to unlock on an error (line 4). Summary: Concurrency bugs are much more common in file systems than in user-level software. Atomicity and deadlock bugs represent a significant majority of concurrency bugs; many deadlock bugs are caused by wrong kernel memory-allocation flags; most missing unlocks happen on exit or failure paths.

4.5.2 Concurrency Bugs Concurrency bugs have attracted a fair amount of attention in the research community as of late [16, 22, 28, 49, 50]. To better understand file-system concurrency bugs, we classify them into six types as shown in Table 3 (on page 5): atomicity violations, deadlocks, order violations, missed unlocks, double unlocks, and wrong locks. Figure 7(b) shows the percentage and total number of each category of concurrency bugs. Atomicity violation bugs are usually caused by a lack of proper synchronization methods to ensure exclusive data access, often leading to data corruption. An example of an atomicity violation bug in Ext4 is shown in C1 of Table 5. For this bug, when two CPUs simultaneously allocate blocks, there is no protection for the i cached extent structure; this atomicity violation could thus cause the wrong location on disk to be read or written. A simple spin-lock resolves the bug. There are a large number of deadlocks in file systems (about 40%). Two typical causes are the use of the wrong kernel memory allocation flag and calling a blocking function when holding a spin lock. These patterns are not common in application-level deadlocks, and thus are useful to both developers (who should be wary of such patterns) and tool builders (who should detect them).

4.5.3 Memory Bugs Memory-related bugs are common in many source bases, and not surprisingly have been the focus of many bug detection tools [9, 35]. We classify memory bugs into six categories, as shown in Table 3: resource leaks, null pointer dereferences, dangling pointers, uninitialized reads, double frees, and buffer overflows. Resource leaks are the most dominant, over 40% in aggregate; in contrast, studies of user-level programs show notably lower percentages [26, 42, 45]. We find that roughly 70% of resource leaks happen on exit or failure paths; we investigate this further later (§4.6). An example of resource leaks (M1 of Table 5) is found in btrfs new inode() which allocates an inode but forgets to free it upon failure. 8

38 11th USENIX Conference on File and Storage Technologies (FAST ’13)

USENIX Association

ext3/namei.c, 2.6.26 Semantic (S1) 1 ext3 rename(...){ 2 + new dir->i ctime = CURRENT TIME SEC; 3 + new dir->i mtime = CURRENT TIME SEC; 4 + ext3 mark inode dirty(handle, new dir); ext4/super.c, 2.6.37 Semantic (S3) 1 ext4 load journal(...){ 2 if (journal devnum && ...) 3 + if (!read only && journal devnum ...) 4 es->s journal dev = devnum; btrfs/extent io.c, 2.6.39 Concurrency (C2) 1 extent readpages(...){ 2 if (!add to page cache lru(page, mapping, 3 page->index, GFP KERNEL)) { 4 + page->index, GFP NOFS)) { 5 extent read full page(...); btrfs/inode.c, 2.6.30 Memory (M1) 1 btrfs new inode(...){ 2 inode = new inode(...); 3 ret = btrfs set inode index(...); 4 if (ret) 5 return ERR PTR(ret); 6 + if (ret) { 7 + iput(inode); return ERR PTR(ret); 8 + } reiserfs/xattr acl.c, 2.6.16 Error Code (E1) 1 reiserfs get acl(...){ 2 acl = posix acl from disk(...); 3 *p acl = posix acl dup(acl); 4 + if (!IS ERR(acl)) 5 + *p acl = posix acl dup(acl); ext4/extents.c, 2.6.31 Performance (P1) 1 ext4 fiemap(...){ 2 down write(&EXT4 I(inode)->i data sem); 3 + down read(&EXT4 I(inode)->i data sem); 4 error = ext4 ext walk space(...); 5 up write(&EXT4 I(inode)->i data sem); 6 + up read(&EXT4 I(inode)->i data sem);

ext3/ialloc.c, 2.6.4 Semantic (S2) 1 find group other(...){ 2 group = parent group + 1; 3 for (i = 2; i < ngroups; i++) { 4 + group = parent group; 5 + for (i = 0; i < ngroups; i++) { ext4/extents.c, 2.6.30 Concurrency (C1) 1 ext4 ext put in cache(...){ 2 + spin lock(i block reservation lock); 3 cex = &EXT4 I(inode)->i cached extent; 4...6 cex->ec FOO = FOO; // elided for brevity 7 + spin unlock(i block reservation lock); ext3/resize.c, 2.6.17 Concurrency (C3) 1 lock super(sb); 2 if (input->group != sbi->s groups count){ 3 ... ... 4 + unlock super(sb); 5 err = -EBUSY; 6 goto exit journal; ext3/super.c, 2.6.7 Memory (M2) 1 ext3 get journal(...){ 2 if (!journal) { 3 ... ... 4 + return NULL; 5 } 6 journal->j private = sb; jfs/jfs imap.c, 2.6.27 1 diAlloc(...){ 2 jfs error(...); 3 return EIO; 4 + return -EIO;

Error Code (E2)

btrfs/free-space-cache.c, 2.6.39 Performance (P2) 1 btrfs find space cluster(...){ 2 + if (bg->free space < min bytes){ 3 + spin unlock(&bg->tree lock); 4 + return -ENOSPC; 5 + } 6 /* start to search for blocks */

Table 5: Code Examples. This table shows the code examples of bug patterns and performance patches. As we see in Figure 7(c), null-pointer dereferences are also common in both mature and young file systems (the remaining memory bugs account for small percentages). An example is shown in M2 of Table 5; a return statement is missing, leading to a null-pointer dereference. Summary: Resource leaks are the largest category of memory bug, significantly higher than that in user-level applications; null-pointer dereferences are also common; failure paths contribute strongly to these bugs; many of these bugs have simple fixes.

turn an error code (line 2). However, without error checking, acl is accessed and thus the kernel crashes (line 3). An example of a wrong error code is shown in E2 of Table 5. diAlloc()’s return value should be -EIO. However, in line 3, the original code returns the close (but wrong) error code EIO; callers thus fail to detect the error. Summary: Error handling bugs occur in two flavors, missing error handling or incorrect error handling; the bugs are relatively simple in nature.

4.6

4.5.4 Error Code Bugs File systems need to handle a wide range of errors, including memory-allocation failures, disk-block allocation failures, I/O failures [7, 8], and silent data corruption [37]. Handling such faults, and passing error codes through a complex code base, has proven challenging [19, 40]. Here, we further break down error-code errors. We partition the error code bugs into missing error codes and wrong error codes as described in Table 3. Figure 7(d) shows the breakdown of error code bugs. Missing errors are generally twice as prevalent as wrong errors (except for JFS, which has few of these bugs overall). A missing error code example is shown in E1 of Table 5. The routine posix acl from disk() could re-

The Failure Path

Many bugs we found arose not in common-case code paths but rather in more unusual fault-handling cases [19, 52]. This type of error handling (i.e., reacting to disk or memory failures) is critical to robustness, since bugs on failure paths can lead to serious consequences. We now quantify bug occurrences on failure paths; Tables 6 (a) and (b) present our accumulated results. As we can see from the first table, roughly a third of bugs are introduced on failure paths across all file systems. Even mature file systems such as Ext3 and XFS make a significant number of mistakes on these rarer code paths. When broken down by bug type in the second table, we see that roughly a quarter of semantic bugs occur on failure paths, usually in the previously-defined state and logic 9

USENIX Association

11th USENIX Conference on File and Storage Technologies (FAST ’13) 39

XFS 200 (39.1%)

Type

Ext4 149 (33.1%)

Semantic 283 (27.7%)

Btrfs Ext3 ReiserFS JFS 144 88 63 28 (40.2%) (38.4%) (39.9%) (35%) (a) By File System Concurrency Memory Error Code 93 117 179 (25.4%) (53.4%) (100%) (b) By Bug Pattern

Synchronization Access Optimization Schedule

Table 6: Failure Related Bugs. This table shows the number

Scalability

and percentage of the bugs related to failures in file systems.

categories. Once a failure happens (e.g., an I/O fails), the file system needs to free allocated disk resources and update related metadata properly; however, it is easy to forget these updates, or perform them incorrectly, leading to many state bugs. In addition, wrong algorithms (logic bugs) are common; for example, when block allocation fails, most file systems return ENOSPC immediately instead of retrying after committing buffered transactions. A quarter of concurrency bugs arise on failure paths. Sometimes, file systems forget to unlock locks, resulting in deadlock. Moreover, when file systems output errors to users, they sometimes forget to unlock before calling blocking error-output functions (deadlock). These types of mistakes rarely arise in user-level code [28]. For memory bugs, most resource-leak bugs stem from forgetting to release allocated resources when I/O or other failures happen. There are also numerous null-pointer dereference bugs which incorrectly assume certain pointers are still valid after a failure. Finally (and obviously), all error code bugs occur on failure paths (by definition). It is difficult to fully test failure-handling paths to find all types of bugs. Most previous work has focused on memory resource leaks [41, 52], missing unlock [41, 52] and error codes [19, 40]; however, existing work can only detect a small portion of failure-handling errors, especially omitting a large amount of semantic bugs on failure paths. Our results provide strong motivation for improving the quality of failure-handling code in file systems. Summary: A high fraction of bugs occur due to improper behavior in the presence of failures or errors across all file systems; memory-related errors are particularly common along these rarely-executed code paths; a quarter of semantic bugs are found on failure paths.

5

Performance and Reliability

A small but important set of patches improve performance and reliability, which are quantitatively different than bug patches (Figure 3). Performance and reliability patches account for 8% and 7% of patches respectively.

5.1

Performance Patches

Performance is critical for all file systems. Performance patches are proposed to improve existing designs or implementations. We partition these patches into six categories as shown in Table 7, including synchronization (sync), access optimization (access), scheduling (sched),

Locality Other

Description Inefficient usage of synchronization methods (e.g., removing unnecessary locks, using smaller locks, using read/write locks) Apply smarter access strategies (e.g., caching metadata and statistics, avoiding unnecessary I/O and computing) Improve I/O operations scheduling (e.g., batching writes, opportunistic readahead) Scale on-disk and in-memory data structures (e.g., using trees or hash tables, per block group structures, reducing memory usage of inodes) Overcome sub-optimal data block allocations (e.g., reducing file fragmentation, clustered I/Os) Other performance improvement techniques (e.g., reducing function stack usage)

Table 7: Performance Patch Type.

This table shows the classification and definition of performance patches.

scalability (scale), locality (locality), and other. Figure 8(a) shows the breakdown. Synchronization-based performance improvements account for over a quarter of all performance patches across file systems. Typical solutions used include removing a pair of unnecessary locks, using finer-grained locking, and replacing write locks with read/write locks. A sync patch is shown in P1 of Table 5; ext4 fiemap() uses write instead of read semaphores, limiting concurrency. Access patches use smarter strategies to optimize performance, including caching and work avoidance. For example, Ext3 caches metadata stats in memory, avoiding I/O. Figure 8(a) shows access patches are popular. An example Btrfs access patch is shown in P2 of Table 5; before searching for free blocks, it first checks whether there is enough free space, avoiding unnecessary work. Sched patches improve I/O scheduling for better performance, such as batching of writes, opportunistic readahead, and avoiding unnecessary synchrony in I/O. As can be seen, sched has a similar percentage compared to sync and access. Scale patches utilize scalable on-disk and inmemory data structures, such as hash tables, trees, and per block-group structures. XFS has a large number of scale patches, as scalability was always its priority. Summary: Performance patches exist in all file systems; sync, access, and sched each account for a quarter of the total; many of the techniques used are fairly standard (e.g., removing locks); while studying new synchronization primitives, we should not forget about performance.

5.2

Reliability Patches

Finally we study our last class of patch, those that aim to improve file-system reliability. Different from bug-fix patches, reliability patches are not utilized for correctness. Rather, for example, such a patch may check whether the super block is corrupted before mounting the file system; further, a reliability patch might enhance error propagation [19] or add more debugging information. Table 8 presents the classification of these reliability patches, including adding assertions and other functional robustness 10

40 11th USENIX Conference on File and Storage Technologies (FAST ’13)

USENIX Association

(a) Performance

18

354

All

0%

22

0%

JFS

20%

43

20%

Reiser

40%

49

40%

Ext3

60%

80

60%

Btrfs

80%

142

80%

XFS

415

9

24

41

126

80

135

100%

Error

Ext4

(robust), corruption defense (corruption), error enhancement (error), annotation (annotation), and debugging (debug). Figure 8(b) displays the distributions. Robust patches check permissions, enforce file-system limits, and handle extreme cases in a more friendly manner. Btrfs has the largest percentage of these patches, likely due to its early stage of development. Corruption defense patches validate the integrity of metadata when reading from disk. For example, a patch to the JBD (used by Ext3) checks that the journal length is valid before performing recovery; similarly, a patch to Ext4 checks that a directory entry is valid before traversing that directory. In general, many corruption patches are found at the I/O boundary, when reading from disk. Error enhancement patches improve error handling in a variety of ways, such as more detail in error codes, removing unnecessary error messages, and improving availability, for example by remounting read-only instead of crashing. This last class is common in all file systems, which each slowly replaced unnecessary BUG() and assertion statements with more graceful error handling. Annotation patches label variables with additional type information (e.g., endianness) and locking rules to enable better static checking. ReiserFS uses lock annotations to help prevent deadlock, whereas XFS uses endianness annotations for numerous variable types. Debug patches simply add more diagnostic information at failure-handling points within the file system. Interestingly, reliability patches appear more ad hoc than bug patches. For bug patches, most file systems have similar pattern breakdowns. In contrast, file systems make different choices for reliability, and do so in a generally non-uniform manner. For example, Btrfs focuses more on Robust patches, while Ext3 and Ext4 prefer to add more Corruption defense patches. Summary: We find that reliability patches are added to file systems over time as part of hardening; most add simple checks, defend against corruption upon reading from disk, or improve availability by returning errors instead of crashing; annotations help find problems at compile time; debug patches add diagnostic information; reliability patch usage, across all file systems, seems ad hoc.

Debug

All

fication and definition of reliability patches.

Corrupt

Annotation

JFS

Table 8: Reliability Patch Type. This table shows the classi-

Robust

Other

Reiser

Debug

Sched

Locality

Ext3

Annotation

Access

Scale

Btrfs

Corruption Defense Error Enhancement

100%

Sync

XFS

Robust

Description Enhance file-system robustness (e.g., boundary limits and access permission checking, additional internal assertions) Improve file systems’ ability to handle various possible corruptions Improve original error handling (e.g., gracefully handling failures, more detailed error codes) Add endianness, user/kernel space pointer and lock annotations for early bug detection Add more internal debugging or tracing support

Ext4

Type

(b) Reliability

Figure 8: Performance and Reliability Patches.

This figure shows the performance and reliability patterns. The total number of patches is shown on top of each bar.

6

Case Study Using PatchDB

The patch dataset constructed from our analysis of 5079 patches contains fine-grained information, including characterization of bug patterns (e.g., which semantic bugs forget to synchronize data), detailed bug consequences (e.g., crashes caused by assertion failures or null-pointer dereferences), incorrect bug fixes (e.g., patches that are reverted after being accepted), performance techniques (e.g., how many performance patches remove unnecessary locks), and reliability enhancements (e.g., the location of metadata integrity checks). These details enable further study to improve file-system designs, propose new system language constructs, build custom bug-detection tools, and perform realistic fault injection. In this section, we show the utility of PatchDB by examining which patches are common across all file systems. Due to space concerns, we only highlight a few interesting cases. A summary is found in Table 9. We first discuss specific common bugs. Within semantic bugs is forget sync, in which a file system forgets to force data or metadata to disk. Most forget sync bugs relate to fsync. Even for stable file systems, there are a noticeable number of these bugs, leading to data loss or corruption under power failures. Another common mistake is forget config, in which mount options, feature sets, or hardware support are overlooked. File systems also return the ENOSPC error code despite the presence of free blocks (early enospc); Btrfs has the largest number of these bugs, and even refers to the Ext3 fix strategy in its patches. Even though semantic bugs are dominant in file systems, few tools can detect semantic bugs due to the difficulty of specifying correct behavior [15, 25, 27]. Fortunately, we find that many semantic bugs appear across file systems, which can be leveraged to improve bug detection. For concurrency bugs, forgetting to lock an inode when updating it is common; perhaps a form of monitors [20] 11

USENIX Association

11th USENIX Conference on File and Storage Technologies (FAST ’13) 41

7

Related Work

JFS

Reiser

Ext3

Btrfs

Ext4

Typical Cases forget sync forget config early enospc wrong log credit lock inode update lock sleep wrong kmalloc flag miss unlock leak on failure leak on exit miss I/O error miss mem error bad error access remove lock avoid redun write check before work save struct mem metadata validation graceful handle

XFS

tial bugs in Linux 1.0 to 2.4.1 [14] and Linux 2.6.0 to 2.6.33 [36]. Most detected faults are generic memory and 17 11 6 11 5 1 concurrency bugs. Both studies find that device drivers 43 43 23 16 8 1 contain the most faults, while Palix et al. [36] also show Semantic 5 9 14 7 that file-system errors are rising. Yin et al. [53] analyze in6 8 1 1 1 correct bug-fixes in several operating systems. Our work 6 5 2 4 4 2 8 8 1 1 8 embellishes these studies, focusing on all file-system bugs Concurrency 20 3 3 2 1 found and fixed over eight years and providing more detail 10 7 4 2 2 4 on which bugs plague file systems. 14 21 16 11 1 3 Memory User-Level Bugs: Various aspects of modern user-level 1 1 4 1 10 11 8 15 4 1 open source software bugs have also been studied, inError Code 4 2 13 1 1 cluding patterns, impacts, reproducibility, and fixes [16, 3 8 2 26, 28, 42, 50]. As our findings show, file-systems bugs 17 14 14 8 5 1 6 4 5 4 2 display different characteristics compared with user-level Performance 8 5 15 2 1 software bugs, both in their patterns and consequences 3 9 1 3 (e.g., file-system bugs have more serious consequences 12 9 1 7 2 1 Reliability than user-level bugs; concurrency bugs are much more 8 6 5 5 1 4 common). One other major difference is scale; the numTable 9: Common File System Patches. This table shows ber of bugs (about 1800) we study is larger than previous the classification and count of common patches across all file systems. efforts [16, 26, 28, 42, 50] would help. Calling a blocking function when hold- File-System Bugs: Several research projects have been ing a spin lock (lock sleep) occurs frequently (also in proposed to detect and analyze file-system bugs. For exdrivers [14, 36]). As we saw earlier (§4.5.2), using the ample, Yang et al. [51, 52] use model checking to detect wrong kernel memory allocation flag is a major source of file-system errors; Gunawi et al. [19] use static analysis deadlock (particularly XFS). All file systems miss unlocks techniques to determine how error codes are propagated in frequently, in contrast to user applications [28]. file systems; Rubio-Gonzalez et al. [40] utilize static analFor memory bugs, leaks happen on failure or exit paths ysis to detect similar problems; Prabhakaran et al. [37] frequently. For error code bugs, there are a large number study how file systems handle injected failures and corof missed I/O error bugs. For example, Ext3, JFS, Reis- ruptions. Our work complements this work with insights erFS and XFS all ignore write I/O errors on fsync before on bug patterns and root causes. Further, our public bug Linux 2.6.9 [37]; as a result, data could be lost even when dataset provides useful hints and patterns to aid in the defsync returned successfully. Memory allocation errors are velopment of new file-system bug-detection tools. also often ignored (especially in Btrfs). Three file systems 8 Conclusions mistakenly dereference error codes. For performance patches, removing locks (without sac- We performed a comprehensive study of 5079 patches rificing correctness) is common. File systems also tend to across six Linux file systems; our analysis includes one write redundant data (e.g., fdatasync unnecessarily flushes of the largest studies of bugs to date (nearly 1800 bugs). metadata). Another common performance improvement Our observations, summarized in the introduction and case is check before work, in which missing specific con- throughout, should be of utility to file-system developers, dition checking costs unnecessary I/O or CPU overhead. systems-language designers, and tool makers; the careful Finally, for reliability patches, metadata validation (i.e., study of these results should result in a new generation of inode, super block, directory and journal) is popular. Most more robust, reliable, and performant file systems. of these patches occur in similar places (e.g., when mountAcknowledgments ing the file system, recovering from the journal, or reading an inode). Also common is replacing BUG() and We thank Ric Wheeler (our shepherd) and the anonymous reviewers for their excellent feedback and suggestions. We also Assert() calls with more graceful error handling. Summary: Despite their diversity, file-system patches thank the members of the ADSL research group for their inshare many similarities across implementations; some ex- sightful comments. This material is based upon work supported amples occur quite frequently; PatchDB affords new op- by the National Science Foundation under the following grants: CNS-1218405, CCF-0937959, CSR-1017518, CCF-1016924, portunities to study such phenomena in great detail. Patch Type

Operating-System Bugs: Faults in Linux have been studied [14, 36]. Static analysis tools are used to find poten-

as well as generous support from NetApp, EMC, and Google. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or other institutions.

12 42 11th USENIX Conference on File and Storage Technologies (FAST ’13)

USENIX Association

References

[1] Coverity Scan: 2011 Open Source Integrity Report. http://www.coverity.com/library/pdf/coverity-scan2011-open-source-integrity-report.pdf. [2] First Galaxy Nexus Rom Available, Features Ext4 Support. http://androidspin.com/2011/12/06/first-galaxynexus-rom-available-features-ext4-support/. [3] Kernel Bug Tracker. http://bugzilla.kernel.org/. [4] Linux Filesystem Development List. http://marc.info/?l=linux-fsdevel. [5] Linux Kernel Mailing List. http://lkml.org/. [6] IBM Journaled File System. http://en.wikipedia.org/wiki/ JFS (file system), September 2012. [7] Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proceedings of the 2007 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’07), San Diego, California, June 2007. [8] Lakshmi N. Bairavasundaram, Garth R. Goodson, Bianca Schroeder, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. An Analysis of Data Corruption in the Storage Stack. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST ’08), pages 223–238, San Jose, California, February 2008. [9] Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Communications of the ACM, February 2010. [10] Steve Best. JFS Overview. http://jfs. sourceforge.net/project/pub/jfs.pdf, 2000. [11] Simona Boboila and Peter Desnoyers. Write Endurance in Flash Drives: Measurements and Analysis. In Proceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST ’10), San Jose, California, February 2010. [12] Jeff Bonwick and Bill Moore. ZFS: The Last Word in File Systems. http://opensolaris.org/os/ community/zfs/docs/zfs_last.pdf, 2007. [13] Florian Buchholz. The structure of the Reiser file system. http://homes.cerias.purdue.edu/ ˜florian/reiser/reiserfs.php, January 2006. [14] Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. An Empirical Study of Operating System Errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP ’01), pages 73– 88, Banff, Canada, October 2001. [15] Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP ’01), pages 57–72, Banff, Canada, October 2001. [16] Pedro Fonseca, Cheng Li, Vishal Singhal, and Rodrigo Rodrigues. A Study of the Internal and External Effects of Concurrency Bugs. In Proceedings of the International Conference on Dependable Systems and Networks (DSN ’10), Chicago, USA, June 2010. [17] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03), pages 29–43, Bolton Landing, New York, October 2003. [18] L. M. Grupp, A. M. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P. H. Siegel, and J. K. Wolf. Characterizing Flash Memory: Anomalies, Observations, and Applications. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09), New York, New York, December 2009.

[19] Haryadi S. Gunawi, Cindy Rubio-Gonzalez, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Ben Liblit. EIO: Error Handling is Occasionally Correct. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST ’08), pages 207–222, San Jose, California, February 2008. [20] C.A.R. Hoare. Monitors: An Operating System Structuring Construct. Communications of the ACM, 17(10), October 1974. [21] Steve Jobs, Bertrand Serlet, and Scott Forstall. Keynote Address. Apple World-wide Developers Conference, 2006. [22] Horatiu Jula, Daniel Tralamazza, Cristian Zamfir, and George Candea. Deadlock Immunity: Enabling Systems to Defend Against Deadlocks. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI ’08), San Diego, California, December 2008. [23] Hyojun Kim, Nitin Agrawal, and Cristian Ungureanu. Revisiting Storage for Smartphones. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST ’12), San Jose, California, February 2012. [24] Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Michael Norrish, Rafal Kolanski, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: Formal Verification of an OS Kernel. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP ’09), Big Sky, Montana, October 2009. [25] Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI ’04), San Francisco, California, December 2004. [26] Zhenmin Li, Lin Tan, Xuanhui Wang, Shan Lu, Yuanyuan Zhou, and Chengxiang Zhai. Have Things Changed Now? – An Empirical Study of Bug Characteristics in Modern Open Source Software. In Workshop on Architectural and System Support for Improving Software Dependability (ASID ’06), San Jose, California, October 2006. [27] Zhenmin Li and Yuanyuan Zhou. PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code. In Proceedings of the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’05), Lisbon, Portugal, September 2005. [28] Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from Mistakes — A Comprehensive Study on Real World Concurrency Bug Characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII), Seattle, Washington, March 2008. [29] Cathy Marshall. ”It’s like a fire. You just have to move on”: Rethinking Personal Digital Archiving. Keynote at FAST 2008, February 2008. [30] Chris Mason. The Btrfs Filesystem. oss.oracle. com/projects/btrfs/dist/documentation/ btrfs-ukuug.pdf, September 2007. [31] Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Alex Tomas Andreas Dilge and, and Laurent Vivier. The New Ext4 filesystem: Current Status and Future Plans. In Ottawa Linux Symposium (OLS ’07), Ottawa, Canada, July 2007. [32] Marshall K. McKusick, William N. Joy, Sam J. Leffler, and Robert S. Fabry. A Fast File System for UNIX. ACM Transactions on Computer Systems, 2(3):181–197, August 1984. [33] Marshall Kirk McKusick, Willian N. Joy, Samuel J. Leffler, and Robert S. Fabry. Fsck - The UNIX File System

13 USENIX Association

11th USENIX Conference on File and Storage Technologies (FAST ’13) 43

[34] [35]

[36]

[37]

[38] [39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47] [48]

Check Program. Unix System Manager’s Manual - 4.3 BSD Virtual VAX-11 Version, April 1986. Sean Morrissey. iOS Forensic Analysis: for iPhone, iPad, and iPod Touch. Apress, 2010. Yoann Padioleau, Julia Lawall, Ren´e Rydhof Hansen, and Gilles Muller. Documenting and Automating Collateral Evolutions in Linux Device Drivers. In Proceedings of the EuroSys Conference (EuroSys ’08), Glasgow, Scotland UK, March 2008. Nicolas Palix, Gael Thomas, Suman Saha, Christophe Calves, Julia Lawall, and Gilles Muller. Faults in Linux: Ten Years Later. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV), Newport Beach, California, March 2011. Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. IRON File Systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP ’05), pages 206–220, Brighton, United Kingdom, October 2005. Eric S. Raymond. The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. O’Reilly, October 1999. Mendel Rosenblum and John Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems, 10(1):26–52, February 1992. Cindy Rubio-Gonzalez, Haryadi S. Gunawi, Ben Liblit, Remzi H. Arpaci-Dusseau, and Andrea C. ArpaciDusseau. Error Propagation Analysis for File Systems. In Proceedings of the ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation (PLDI ’09), Dublin, Ireland, June 2009. Suman Saha, Julia Lawall, and Gilles Muller. Finding Resource-Release Omission Faults in Linux. In Workshop on Programming Languages and Operating Systems (PLOS ’11), Cascais, Portugal, October 2011. Swarup Kumar Sahoo, John Criswell, and Vikram Adve. An Empirical Study of Reported Bugs in Server Software with Implications for Automated Bug Diagnosis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE ’10), Cape Town, South Africa, May 2010. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The Hadoop Distributed File System. In Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST ’10), Incline Village, Nevada, May 2010. Mark Sullivan and Ram Chillarege. Software Defects and their Impact on System Availability – A Study of Field Failures in Operating Systems. In Proceedings of the 21st International Symposium on Fault-Tolerant Computing (FTCS-21), Montreal, Canada, June 1991. Mark Sullivan and Ram Chillarege. A Comparison of Software Defects in Database Management Systems and Operating Systems. In Proceedings of the 22st International Symposium on Fault-Tolerant Computing (FTCS22), pages 475–484, Boston, USA, July 1992. Adan Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. Scalability in the XFS File System. In Proceedings of the USENIX Annual Technical Conference (USENIX ’96), San Diego, California, January 1996. Stephen C. Tweedie. Journaling the Linux ext2fs File System. In The Fourth Annual Linux Expo, Durham, North Carolina, May 1998. Xi Wang, Haogang Chen, Zhihao Jia, Nickolai Zeldovich, and M. Frans Kaashoek. Improving Integer Security for

[49]

[50]

[51]

[52]

[53]

Systems. In Proceedings of the 10th Symposium on Operating Systems Design and Implementation (OSDI ’12), Hollywood, California, October 2012. Yin Wang, Terence Kelly, Manjunath Kudlur, Stphane Lafortune, and Scott Mahlke. Gadara: Dynamic Deadlock Avoidance for Multithreaded Programs. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI ’08), San Diego, California, December 2008. Weiwei Xiong, Soyeon Park, Jiaqi Zhang, Yuanyuan Zhou, and Zhiqiang Ma. Ad Hoc Synchronization Considered Harmful. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI ’10), Vancouver, Canada, December 2010. Junfeng Yang, Can Sar, and Dawson Engler. EXPLODE: A Lightweight, General System for Finding Serious Storage System Errors. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI ’06), Seattle, Washington, November 2006. Junfeng Yang, Paul Twohey, Dawson Engler, and Madanlal Musuvathi. Using Model Checking to Find Serious File System Errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI ’04), San Francisco, California, December 2004. Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram. How Do Fixes Become Bugs? – A Comprehensive Characteristic Study on Incorrect Fixes in Commercial and Open Source Operating Systems. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE ’11), Szeged, Hungary, September 2011.

14 44 11th USENIX Conference on File and Storage Technologies (FAST ’13)

USENIX Association