NonStop Performance Update 2011

ENERGIZE

Your NONSTOP ENVIROMENT – NONSTOP PERFORMANCE UPDATE 2011 – REPEAT

Jim Smullen ©2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Deriving Optimum Performance from NonStop Servers •

Sizing and Deployment of NonStop Servers in production environments



Rising transaction volumes, greater availability, lower cost of ownership

3

June 8, 2011

QUICK ADOPTION OF NEWEST INDUSTRY STANDARD PROCESSORS

Introducing the newest member – NB54000c •

Mere 6 months after industry wide Integrity „Tukwila‟ announcement



Almost doubling the performance of NB50000c (1.85x)



Lower Cost of Ownership



Same NonStop Fundamentals – high availability, superior service, always on

5

June 8, 2011

NonStop Performance Highlights - 2011 •



Introducing the highest performing NonStop Server yet -- NB54000c –

Based on Quad Core Intel Integrity 9300 series processors



1.85x Throughput performance improvement compared to NB50000c (dual-core)

Introducing next generation Storage and IP CLIMs – G6 CLIMs –

20% better throughput performance than previous generation G5 CLIMs



NSAA Servers on H06.21 - even performance compared to H06.20



NB50000c Servers on J06.11 - Slightly improved performance compared to J06.10



Storage Encryption – G5 and G6 Storage CLIMs



SQL/MX 3.0 – even performance compared to SQL/MX 2.3.4

6

June 8, 2011

Terminology •

CPU Cost – CPU time spent during an operation (CPU busy in time/no. of operations) • units microseconds or milliseconds • Smaller is better



Elapsed Time – Wall Clock time spent during an operation • units microseconds, milliseconds or seconds • Smaller is better



Latency – Wall clock time to transfer a message from source to destination • units microseconds or milliseconds • Smaller is better



Speedup ratios – CPU Speedup is ratio of “Previous CPU Cost value” to “New CPU cost value” – Throughput Speedup is ratio of “New throughput value” to “Previous throughput value” • Values greater than 1 indicate performance improvement; larger is better

7

June 8, 2011

NonStop Server System Characteristics Component

NS16000

NB50000c

NB54000c

Processor

1.6GHz/6M Intel Itanium2

1.66GHz/24M Intel 9100

1.73GHz/30M Intel 9300

No. of Cores

Single

Dual

Quad

Memory

16GB/CPU

48GB/CPU

48GB/CPU

Interconnect

ServerNet III (500MB/s)

ServerNet III (500MB/s)

ServerNet III (500MB/s)

I/O subsystem

IOAME cabinets

CLIM, IOAME

CLIM, IOAME

Disk subsystem

Fiber Channel

Storage CLIM/SAS Fiber Channel (FCSA)

Storage CLIM/SAS Fiber Channel (FCSA)

Networking

G4SA

IP CLIM G4SA

IP CLIM G4SA

Performance

1.0

2.0x

3.7x

8

June 8, 2011

NB54000c Performance Characteristics •

Throughput – 1.85x – 3.7x

throughput of NB50000c NonStop Server

throughput capability of a NS16000 Server



Equivalent latency for i/o operations compared to NB50000c



CPU cost per operation – –

9

are higher per operation on the NB54000c but there is double (from dual-core to quad core) CPU processing power to allow more work to be done this is the cost of SMP

June 8, 2011

SYSTEM LEVEL BENCHMARKS Embedded SQL Order-Entry

System Level Benchmarks •

Order-Entry is an internal implementation of the TPC-C specification. It is an I/O intensive benchmark used extensively to characterize NonStop system performance – Embedded – Java



SQL version (COE)

version (JOE)

Metrics generally reported in our presentations – Average

CPU cost per transaction

– Transactions

11

per second

June 8, 2011

NB54000c Order-Entry Throughput/CPU SQL/MP 800



700

1.85x

Transactions/CPU

600 500 400 300 200 100 0

OE Trans/CPU

NB50000c (J06.05)

NB54000c (J06.11)

350

659

RVU Release 12

June 8, 2011

NB54000c Order-Entry CPU Cost details SQL/MP

5

1.5

4 3

1.0

2

0.5

1 0

13

2.0

0.0

Total CPU Cost

Interrupt

SQL Server

DB DP2s

Driver

Audit DP2s

TMF

Data Comm

NB50000c (J06.05)

4.51

1.25

0.97

1.91

0.08

0.09

0.10

0.07

NB54000c (J06.11)

4.85

1.10

1.10

2.19

0.13

0.22

0.11

0.07

Speedup

0.93

1.14

0.88

0.87

0.62

0.41

0.91

1.00

June 8, 2011

Speedup

CPU Cost per transaction (milliseconds)

6

NB54000c Order-Entry Throughput/CPU SQL/MX – MX Tables 800 •

700

2.14x

Transactions/CPU

600 500 400 300 200 100 0

OE Trans/CPU

NB50000c (J06.05)

NB54000c (J06.11)

285

611 RVU Release

14

June 8, 2011

NB54000c Order-Entry CPU Cost details SQL/MX – MX Tables

5.00

1.5

4.00 3.00

1.0

2.00

0.5

1.00 0.00

15

2.0

Total CPU Cost

Interrupt

SQL Server

DB DP2s

Driver

Audit DP2s

TMF

Data Comm

NB50000c (J06.05)

5.58

0.67

1.48

3.07

0.09

0.06

0.10

0.07

NB54000c (J06.11)

5.22

0.90

1.10

2.23

0.14

0.18

0.13

0.07

Speedup

1.07

0.74

1.35

1.38

0.64

0.33

0.77

1.00

June 8, 2011

0.0

Speedup

CPU Cost per transaction (milliseconds)

6.00

NB54000c system performance metrics CPU Busy Time





Reported as AVERAGE across 4-cores in MEASURE CPU entity (rated reports)



Individual IPU CPU Busy percentage is reported in MEASURE CPU entity



Total CPU time (un-rated MEASURE CPU reports) for a CPU is sum of times of each IPU

Process Time





Per Process time is SAME as single-core (e.g NS-Series. S-Series) systems



Sum of Process times across all processes in a NB54000c CPU can be greater than 100% but will be less than 400% (due to quad-core)



Sum of Process time across all processes in a CPU (unrated) should be EQUAL to total CPU time (unrated Measure CPU reports)

CPU q-time





Reported as SUM of CPU Q-Times per IPU in CPU entity



Individual IPU Qtimes are reported via IPU-Qtime counter in CPU entity

Some metric observations



16



CPU-qtime per CPU • Can be 4x values on single-core systems or 2x values in dual-core systems– this is ok. Since there are four cores with individual ready queues



Dispatch rates per CPU • Can be 4x values on single-core systems or 2x values of dual-core systems – again, this is ok on account of two cores June 8, 2011

I/O SUBSYSTEM BENCHMARKS Disk Subsystem Message Subsystem TCP/IP v4 Subsystem

NB54000c I/O subsystem •

G6 CLIM based – Storage •

Adapter - Storage G6 CLIM

Serial Attached Storage (SAS)

– Networking •



5*1Gb Ethernet ports or 3*1Gb copper & 2*1Gb Fiber Ethernet ports

IOAME based –

18

Adapter – IP G6 CLIM

for migration purposes

June 8, 2011

Sequential OPERATIONS Throughput 120 100

Throughput (MB/s)

80

60 40 20 0

56KB Sequential Read

56KB Sequential Write

NB50000c (SAS)

78.6

12.4

NB54000c (SAS)

105.0

12.9

1.34

1.04

Speedup

Performance Test 19

June 8, 2011

Random OPERATIONS Throughput

Throughput (I/Os per sec.)

300 250 200 150 100 50 0

4KB Random Read

4KB Random Write

NB50000c (SAS)

255

257

NB54000c (SAS)

296

269

Speedup

1.16

1.05

Performance Test

20

June 8, 2011

Disk I/O Operations CPU Costs 150

CPU Cost per I/O (microseconds)

125

100

75

50

25

0

4KB Random Read

4KB Random Write

56KB Sequential Read

56KB Sequential Write

NB50000c (SAS)

84

133

95

83

NB54000c (SAS)

65

108

67

97

1.29

1.24

1.42

0.86

Speedup

Performance Test

21

June 8, 2011

I/O SUBSYSTEM BENCHMARKS Disk Subsystem Message Subsystem TCP/IP v4 Subsystem

Message System – intra-cpu CPU cost/msg CPU Cost (microseconds)

30.0

20.0

10.0

0.0 0/0

128/ 256

128/ 4K

0/16K

16K/0

56K/0

NB50000c (J06.10)

12.0

14.0

15.0

16.0

16.0

25.0

NB54000c (J06.11)

11.0

13.0

13.0

15.0

15.0

21.0

Speedup

1.09

1.08

1.15

1.07

1.07

1.19

Request/Reply message sizes (bytes)

23

June 8, 2011

Message System – inter-cpu CPU cost/msg 120

CPU Cost (microseconds)

90

60

30

0 0/0

128/ 256

128/ 4K

0/16K

16K/0

56K/0

56k/56k

NB50000c (J06.04)

18

22

24

32

34

51

89

NB54000c (J06.11)

18

20

22

26

36

71

67

1.00

1.10

1.09

1.23

0.94

0.72

1.33

Speedup

Request/Reply message sizes (bytes)

24

June 8, 2011

Message System – inter-cpu throughput 200 175

Throughput (MB/s)

150 125 100 75

50 25 0

128/ 256

128/ 4K

0/16K

16K/0

56K/0

56k/56k

NB50000c (J06.10)

7.48

50.64

120.19

95.31

131.72

150.01

NB54000c (J06.11)

7.36

53.44

115.29

88.31

119.79

143.25

Speedup

0.98

1.06

0.96

0.93

0.91

0.95

Message Sizes (Bytes)

25

June 8, 2011

Message System – inter-cpu latency/msg Latency per message (microseconds)

900 800 700 600 500 400 300 200 100 0

128/ 256

128/ 4K

0/16K

16K/0

56K/0

56k/56k

NB50000c (J06.10)

51

83

136

430

435

765

NB54000c (J06.11)

52

79

142

422

479

801

0.98

1.06

0.96

0.93

0.91

0.95

Speedup

Message Sizes (Bytes)

26

June 8, 2011

I/O SUBSYSTEM BENCHMARKS Disk Subsystem Message Subsystem TCP/IP v4 Subsystem

CPU Cost per message (microseconds)

300 250 200 150 100 50 0

64

128

256

512

1024

2048

4096

8192

16384

NB50000c (J06.10)

28

27

27

30

31

32

64

112

197

NB54000c(J06.11)

28

27

27

30

30

32

64

106

195

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.06

1.01

speedup

Message Size (Bytes)

28

June 8, 2011

2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0

Speedup

TCP/IP v4 – G5 CLIM – OLTP CPU Cost

11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0

64

128

256

512

1024

2048

4096

8192

16384

NB50000c (J06.10)

8766

8693

8048

6372

5231

4192

3177

2171

1348

NB54000c(J06.11)

9187

9008

8333

6721

5462

4272

3255

2308

1452

speedup

1.05

1.04

1.04

1.05

1.04

1.02

1.02

1.06

1.08

Message Size (Bytes)

29

June 8, 2011

2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0

Speedup

Messages/sec

TCP/IP v4 – G5 CLIM – OLTP throughput

TCP/IP v4 – G5 CLIM – Streaming CPU Cost 250 200 150 100 50 0

1024

2048

4096

8192

16384

32768

51000

NB50000c (J06.10)

6

11

24

42

89

164

207

NB54000c(J06.11)

7

15

27

47

97

191

278

0.80

0.73

0.90

0.89

0.91

0.86

0.74

speedup

Message Size (Bytes)

30

2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0

Speedup

CPU Cost (microseconds)

300

June 8, 2011

Throughput (MB/s)

120 100 80 60 40 20 0

1024

2048

4096

8192

16384

32768

51000

NB50000c (J06.10)

84.7

88.9

79.4

83.1

82.4

83.5

94.5

NB54000c(J06.11)

82.4

87.5

80.0

84.7

84.7

85.2

88.3

Speedup

0.97

0.98

1.01

1.02

1.03

1.02

0.94

Message Size (Bytes)

31

June 8, 2011

2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0

Speedup

TCP/IP v4 – G5 CLIM – Streaming throughput

PROCESSOR BENCHMARKS Integer Benchmarks Java Benchmarks

NB54000c vs. NB50000c: INTEGER and JAVA Benchmarks 1.2 1.0 0.8 0.6

Speedup

0.4

0.2 0.0 Integer Benchmark

Java Benchmark

Integer benchmark – based on SPECint2000 Java benchmark – based on SPECJBB2000 33

June 8, 2011

G6 CLIMS Storage CLIMs IP CLIMs

Sequential OPERATIONS Throughput, NB50000c Throughput (MB/s)

240 200

160 120 80 40 0

56k Seq Read

56k Seq Write

G5 CLIM - SAS 72G@15k

150.3

25.8

G6 CLIM - SAS 146G@15k

202.0

28.1

1.34

1.09

Speedup G6 vs G5

Performance Test

35

June 8, 2011

RANDOM OPERATIONS Throughput, NB50000c 700

Throughput (I/O per sec)

600 500 400 300 200

100 0

Random Read 4k

Random Write 4k

G5 CLIM - SAS 72G@15k

588.3

480.4

G6 CLIM - SAS 146G@15k

653.0

584.0

Speedup G6 vs G5

1.14

1.21

Performance Test

36

June 8, 2011

Write Cache Enabled (WCE) – Random Operations 900

Throughput (I/O per sec)

800 700 600 500 400 300 200 100 0

Random Read 4k

Random Write 4k

G5 CLIM - SAS 72G@15k - WCE

571.2

623.0

G6 CLIM - SAS 146G@15k - WCE

668.0

814.0

Speedup G6 vs G5

1.17

1.31

Performance Test

37

June 8, 2011

Write Cache Enabled (WCE) – Sequential Operations Throughput (MB/s)

240 200 160 120

80 40 0

56k Seq Read

56k Seq Write

G5 CLIM - SAS 72G@15k - WCE

144.0

144.7

G6 CLIM - SAS 146G@15k - WCE

202.0

183.0

1.40

1.26

Speedup G6 vs G5

Performance Test

38

June 8, 2011

Random I/O Throughtput – 24 Volumes 16000 14000

Throughput (I/Os per sec.)

12000

10000 8000 6000 4000 2000

0

Random Read 4k

Random Write 4k

G5 SAS (default)

12567

11644

G6 SAS (default)

14900

13299

1.19

1.14

Speedup G6 vs. G5

Performance Test

• • 39

Throughput increases linearly to 35 volumes for unstructured files and to 50 volumes for structured files Safe to configure 50 volumes between two CLIMs from a performance perspective. June 8, 2011

SEQUENTIAL I/O THROUGHPUT – 24 Volumes 1000 900 800

Throughput (MB per sec.)

700 600

500 400 300 200 100 0

Sequential Read 56k

Sequential Write 56k

G5 SAS (default)

738.0

641.0

G6 SAS (default)

780.0

697.6

1.06

1.09

Speedup G6 vs. G5

Performance Test

• 40

Aggregate sequential throughput almost the same due to ServerNet b/w. June 8, 2011

NSVLE: SEQUENTIAL I/O – 24 Volumes Throughput ( I/Os per sec)

RR4K

RW4K

SR56K

SW56K

Aggregate (Structured Files)

+ 16%

NA

NA

NA

Aggregate (UnStructured Files)

+ 60%

+ 70%

+ 60%

+ 67%

• Performance improvement due to faster processors/more cores on the G6 CLIM. • CLIM processor utilization is the limiting factor for aggregate NSVLE performance

41

June 8, 2011

NSVLE SUBSYSTEM – PER CLIM One CLIM with Unstructured Files Random Read 4K 30000

20000 15000

10000 5000

Number of Volumes G5 No Encryption 1-CLIM

G5 XTS-AES(256) 1-CLIM

G6 No Encryption 1-CLIM

G6 XTS-AES(256) 1-CLIM

50

48

46

44

42

40

38

36

34

32

30

28

26

24

22

20

18

16

14

12

10

8

6

4

2

0

0

I/Os per second

25000

G6 CLIMS Storage CLIMs IP CLIMs

Single SOCKET: OLTP PROFILE (Request/REPLY) – IPv4, NB50000c 12000

1.6 1.4

Throughput (messages/sec)

10000

1.2 8000

1.0

6000

0.8 0.6

4000

0.4 2000 0

44

0.2 64

128

256

512

1024

2048

4096

8192

16384

G5 CLIM

9156

8505

7912

6538

5361

4372

3392

2414

1506

G6 CLIM

9533

8981

8271

6685

5256

4080

3165

2213

1368

Speedup

1.04

1.06

1.05

1.02

0.98

0.93

0.93

0.92

0.91

June 8, 2011

0.0

SINGLE SOCKET: STREAMING (BULK DATA TRANSFER) – IPv4 100

1.6

90

1.4

Throughput (MBytes/Sec)

80

1.2

70 60

1.0

50

0.8

40

0.6

30

0.4

20

0.2

10 0

45

512

1024

2048

4096

8192

16384

32768

51000

G5 CLIM

75.22

89.3

93.4

84.8

88.8

90.6

90.8

93.9

G6 CLIM

59.91

66.2

77.9

81.9

83.5

82.6

82.8

87.9

Speedup

0.80

0.74

0.83

0.97

0.94

0.91

0.91

0.94

June 8, 2011

0.0

Throughput (messages/sec)

MULTIPLE SOCKETS: OLTP (REQUEST/REPLY) – IPv4 80000

1.6

70000

1.4

60000

1.2

50000

1.0

40000

0.8

30000

0.6

20000

0.4

10000

0.2

0

46

64

128

256

512

1024

2048

4096

8192

16384

G5 CLIM

62183

60335

58387

55469

51759

28290

18564

10925

5818

G6 CLIM

69563

68940

67927

63412

60758

28222

20436

13375

6811

Speedup

1.12

1.14

1.16

1.14

1.17

1.00

1.10

1.22

1.17

June 8, 2011

0.0

Throughput (MBytes/Sec)

MULTIPLE SOCKET: STREAMING (BULK DATA TRANSFER) – IPv4 140

1.6

120

1.4

1.0 80 0.8 60 0.6 40

0.4

20 0

47

1.2

100

0.2 512

1024

2048

4096

8192

16384

32768

51000

G5 CLIM

82.71

114.23

114.22

116.59

116.51

116.46

116.71

116.88

G6 CLIM

82.27

114.03

114.21

116.60

116.54

116.40

116.53

117.00

Speedup

0.99

1.00

1.00

1.00

1.00

1.00

1.00

1.00

June 8, 2011

0.0

G6 IP CLIM AGGREGATE PERFORMANCE Throughput

STREAMING (BULK) MB/sec

OLTP TRANSFERS (Request/Reply) Messages/Sec

G5 CLIM

348

62,183

G6 CLIM

392

69,563

• G6 CLIM affords approx. 1.12x aggregate throughput of G5 CLIM • Performance measured across all 5 Gigabit Ethernet Ports

48

June 8, 2011

NONSTOP NS2200 SERVER PERFORMANCE Embedded SQL Order-Entry

NS2200 Order-Entry Throughput/CPU SQL/MP

Order-Entry Transactions/sec/CPU

700 600 0.48x 500 400

200

2.24x

100 -

Servers

50

1.15x

300

NB54000c (J06.11)

NS2200 (J06.12.21)

NS2000(J06.06)

NS1200 (H06.12)

659

314.28

272.0

140.00

Sept 21, 2011

NS2200 vs. NS2000: Order-Entry CPU speedup – SQL/MP 10

2.0

9

6 5

1.0

4 3

0.5

2 1 0

51

1.5

7

Total CPU Cost

NS2000(J06.06) NS2200 (J06.12.21) Speedup NS2200 vs NS2000

0.0 Interrupt

SQL Server

DB DP2s

5.73

1.13

1.34

2.74

5.11

1.09

1.18

1.12

1.04

1.14

Sept 21, 2011

Driver

Audit DP2s

TMF

Data Comm

0.11

0.12

0.13

0.09

2.39

0.14

0.09

0.12

0.10

1.15

0.79

1.33

1.08

0.90

Speedup

CPU Cost per transaction (milliseconds)

8

NONSTOP SERVER RELEASE PERFORMANCE SUMMARY J-Series H-Series

NB50000c/J-Series Release Performance Benchmarks Order Entry SQL/MP Order Entry SQL/MX(MP) Order Entry SQL/MX(Native) MS: MS: MS: MS:

Intra-CPU Large Messages Inter-CPU Large Messages Intra-CPU Small Messages Inter-CPU Small Messages

Disk: Disk: Disk: Disk:

Seq. Read Seq. Write Random Read Random Write

TCP/IP Bulk Transfer TCP/IP OLTP 53

June 8, 2011

J06.11/J06.10 1.01 1.16 1.13 0.99 1.05 1.00 1.00 1.06 1.06 1.19 1.20 1.00 1.00

NS16000/H-Series Release Performance Benchmarks Order Entry SQL/MP Order Entry SQL/MX(MP) Order Entry SQL/MX(MX) MS: MS: MS: MS:

Intra-CPU Large Messages Inter-CPU Large Messages Intra-CPU Small Messages Inter-CPU Small Messages

Disk: Disk: Disk: Disk:

Seq. Read Seq. Write Random Read Random Write

TCP/IP Bulk Transfer TCP/IP OLTP 54

June 8, 2011

H06.22/H06.21 1.00 1.02 1.01 1.00 0.96 0.96 0.93 0.99 1.02 0.98 1.03 1.00 1.00

NS16200/H-Series Release Performance Benchmarks Order Entry SQL/MP Order Entry SQL/MX(MP) Order Entry SQL/MX(MX) MS: MS: MS: MS:

Intra-CPU Large Messages Inter-CPU Large Messages Intra-CPU Small Messages Inter-CPU Small Messages

Disk: Disk: Disk: Disk:

55

June 8, 2011

Seq. Read Seq. Write Random Read Random Write

H06.22/H06.21 1.00 0.96 0.96 1.03 0.95 1.00 0.95 0.99 0.97 0.99 1.01

THANK YOU