Linux Filesystems Performance for Databases

Linux Filesystems Performance for Databases Portland PostgreSQL Performance Pad Selena Deckelmann [email protected] PostgreSQL Global Development...
Author: Marshall Johns
3 downloads 0 Views 6MB Size
Linux Filesystems Performance for Databases Portland PostgreSQL Performance Pad

Selena Deckelmann [email protected] PostgreSQL Global Development Group twitter: @selenamarie

SC O

N O

09 20

Do filesystems do what we expect?

SC

O N

O 09

20

3

We are volunteers.

SC

O N

O 09

20

4

We think you should run these tests.

SC

O N

O 09

20

5

We are: DBAs Sysadmins Performance tuners SC

O N

O 09

20

6

How will this hardware perform?

SC

O N

O 09

20

7

How will this filesystem perform?

SC

O N

O 09

20

8

Why should you care about filesystem-specific performance?

SC

O N

O 09

20

9

Expectations

SC

O N

O 09

20

10

Where to start?

SC

O N

O 09

20

11

The Defaults.

SC

O N

O 09

20

12

SC O

N O

09 20

13

Not addressing reliability

SC

O N

O 09

20

14

Very Narrow Use Case: A Relational Database

SC

O N

O 09

20

15

Need for periodic testing. (And we've got some hardware!)

SC

O N

O 09

20

16

★Kernel

differences ★FS patch-level differences ★Mount options ★mkfs options

SC

O N

O 09

20

17

Focused on THROUGHPUT (Because that’s what people who buy large systems look for)

SC

O N

O 09

20

18

Later: Response Time Operations per second

SC

O N

O 09

20

19

No, we will not be testing ZFS. SC

O N

O 09

20

20

FS

BtrFS (nope, not yet) SC

O N

O 09

20

21

What do we expect?

SC

O N

O 09

20

22

Some conventional wisdom:

SC

O N

O 09

20

23

“RAID5 is the worst choice for a database.” SC

O N

O 09

20

24

“LVM incurs too much overhead to use.” SC

O N

O 09

20

25

“Striping doubles performance.”

SC

O N

O 09

20

26

“Turning off 'atime' is a big performance gain.” SC

O N

O 09

20

27

“Getting rid of atime updates would give us more everyday Linux performance than all the pagecache speedups of the last 10 years, _combined_.” SC

O N

O 09

20

28

“Journaling filesystems (ext3) will have worse performance than nonjournaling filesystems (ext2).” SC

O N

O 09

20

29

“Your read-ahead buffer is big enough.” SC

O N

O 09

20

30

Now... on to the good stuff.

SC

O N

O 09

20

31

SC O

N O

09 20

32

PostgreSQL’s Portland Performance Pad

SC

O N

O 09

20

Hosted by CommandPrompt, Inc. 33

Our machine: HP ProLiant DL380G5 Smart Array p800 72GB 15,000 RPM SAS (up to 25 disks) 32GB RAM Linux: 2.6.25-gentoo-r6 *New tests being run with 2.6.28 SC

O N

O 09

20

34

Our machine: Chosen because of it’s low, low price.

Thank you, HP. SC

O N

O 09

20

35

Our tests: fio 64 GB working set 8 threads no fadvise no direct i/o 8KB blocksize I/O elevator: deadline SC

O N

O 09

20

36

Our tests: fio read (sequential, random) write (sequential, random) read-write (50/50 mix) SC

O N

O 09

20

37

Our stats: sar mpstat iostat vmstat readprofile SC

O N

O 09

20

38

Our tests: Chosen because of their relevance to PostgreSQL SC

O N

O 09

20

39

Filesystems Tested: ext2 ext3 jfs xfs reiserfs ext4 (but had trouble) SC

O N

O 09

20

40

Disk configs tested: Single disk RAID-0 RAID-1 RAID-5 RAID-10 RAID-6 SC

O N

O 09

20

41

The Data: http://moourl.com/fsperf

SC

O N

O 09

20

42

Confessions: • May be high standard deviation with results (don’t know yet!) •No filesystem tuning, all default create and mount options •No software raid comparison or lvm (volume management test) for 2.6.28 tests SC

O N

O 09

20

43

Confessions: • Some xfs runs had to be repeated and some ext4 runs did not complete successfully • Only presenting throughput • Interested in system performance for a specific application, not code performance SC

O N

O 09

20

44

Confessions: •I/O profiles don’t exhibit atime or partition alignment issues •Disk controller firmware not at the latest version in 2.6.25 tests •Software RAID is on top of 1 disk RAID 0 devices (HP SmartArray doesn’t have JBOD option) SC

O N

O 09

20

45

AUDIENCE PARTICIPATION Higher throughput: ext2 or ext3?

SC

O N

O 09

20

46

SC O

N O

09 20

47

SC O

N O

09 20

48

SC O

N O

09 20

49

SC O

N O

09 20

50

Seek bundling/batching in ext3 is better?

SC

O N

O 09

20

51

What if we add a disk?

SC

O N

O 09

20

52

SC O

N O

09 20

53

SC O

N O

09 20

54

SC O

N O

09 20

55

SC O

N O

09 20

56

SC O

N O

09 20

57

AUDIENCE PARTICIPATION RAID 0 (stripe) versus RAID 1 (mirroring) performance?

SC

O N

O 09

20

58

SC O

N O

09 20

59

SC O

N O

09 20

60

SC O

N O

09 20

61

What happens when we: add disks to a RAID 0 (stripe) LUN?

SC

O N

O 09

20

62

SC O

N O

09 20

63

SC O

N O

09 20

64

SC O

N O

09 20

65

SC O

N O

09 20

66

Adding disks to a RAID 5 LUN

SC

O N

O 09

20

67

SC O

N O

09 20

68

SC O

N O

09 20

69

SC O

N O

09 20

70

Only have 4 disks? What should you do?

SC

O N

O 09

20

71

SC O

N O

09 20

72

SC O

N O

09 20

73

SC O

N O

09 20

74

SC O

N O

09 20

75

In most cases, RAID 5 out-performs on sequential writes (xlog). Random writes is only an improvement on xfs and reiserfs.

SC

O N

O 09

20

76

Are software RAID and LVM are slow?

SC

O N

O 09

20

77

SC O

N O

09 20

78

SC O

N O

09 20

79

The Read-ahead buffer

SC

O N

O 09

20

80

AUDIENCE PARTICIPATION Readahead buffer: Default is 128 K What do you think it should be?

SC

O N

O 09

20

81

SC O

N O

09 20

82

And is there a cost to increasing the buffer that much? SC

O N

O 09

20

83

SC O

N O

09 20

84

http://moourl.com/readaheadconfirm

SC

O N

O 09

20

85

OLTP workload • DBT-2 toolkit

(Fair-use derivative of TPC-C)

• Used 35 drives ultimately • pgtune:

http://pgfoundry.org/projects/pgtune/

SC

O N

O 09

20

86

SC O

N O

09 20

87

SC O

N O

09 20

88

7% improvement! :)

SC

O N

O 09

20

89

For more info... • See Mark Wong’s blog:

http://pugs.postgresql.org/blog/92

• Takeaway: for DBT-2, increasing

checkpoint_segments had the largest impact (fewer checkpoints :)

SC

O N

O 09

20

90

Future Work •OLTP system characterization, sizing (ongoing) •Daily OLTP regression testing •More presentations •P5 - PostgreSQL Portland Performance Pad PRACTICE (done!) SC

O N

O 09

20

91

MOAR Hardware? Thanks again, HP! MSA70, DL380 in late 2009 ?? SC

O N

O 09

20

92

Let’s recap...

SC

O N

O 09

20

93

“RAID5 is the worst choice for a database.” Fast for sequential writes in our tests. “LVM incurs too much overhead to use. Software RAID is slower.” For reads – throughput is about the same, but saw higher CPU.

SC

O

“Turning off 'atime' is a big performance gain.” Not in our tests. But, 2-3% for “free”. N

O 09

20

94

“Journaling filesystems will have worse performance than non-journaling filesystems.” Turn the data journaling off on ext3, and you do see better performance, but there are edge cases and performance differences we could not explain. “Striping doubles performance.” Performance is better, but no where near double. Why? SC

O N

O 09

20

95

“Your read-ahead buffer is big enough.” Your read-ahead buffer IS NOT big enough. Make it 8MB. And can we make that the default?

SC

O N

O 09

20

96

Thank you! Results: http://wiki.postgresql.org/wiki/ HP_ProLiant_DL380_G5_Tuning_Guide

http://moourl.com/fsperf Selena Deckelmann [email protected] twitter: @selenamarie SC

O N

O 09

20

97