Accelerating Ceph for Database Workloads with an all PCIe SSD Cluster

Accelerating Ceph for Database Workloads with an all PCIe SSD Cluster Reddy Chagam – Principal Engineer & Chief SDS Architect Tushar Gohad – Senior St...
Author: Ross Leonard
5 downloads 1 Views 2MB Size
Accelerating Ceph for Database Workloads with an all PCIe SSD Cluster Reddy Chagam – Principal Engineer & Chief SDS Architect Tushar Gohad – Senior Staff Engineer Intel Corporation April 19, 2016 Acknowledgements: Orlando Moreno, Dan Ferber (Intel)

Legal Disclaimer Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at http://intel.com.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Configurations: Ceph v0.94.3 Hammer, v10.1.2 Jewel Release, CentOS 7.2, 3.10-327 Kernel, CBT used for testing and data acquisition. OSD System Config: Intel Xeon E5-2699 v4 2x@ 2.20 GHz, 44 cores w/ HT, Cache 46080KB, 128GB DDR4, Each system with 4x P3700 800GB NVMe SSDs, partitioned into 4 OSD’s each, 16 OSD’s total per node. FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 36 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4. Ceph public and cluster networks 2x 10GbE each. FIO 2.2.8 with LibRBD engine. Sysbench 0.5 for MySQL testing. Tests run by Intel DCG Storage Group in Intel lab. Ceph configuration and CBT YAML file provided in backup slides. For more information go to http://www.intel.com/performance. Intel, Intel Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. © 2016 Intel Corporation.

*Other names and brands may be claimed as the property of others.

Agenda • • • • • •

Transition to NVMe flash NVMe architecture with Ceph Database & Ceph – leading flash use case The “All NVMe” high-density Ceph Cluster MySQL workload performance results Summary and next steps

*Other names and brands may be claimed as the property of others.

Storage Evolution Yesterday

Today

Near Term Revolutionary Storage Class Memory

Memory & Storage

3D XPoint™ Technology based Apache Pass (AEP) for DDR4

Latency: ~100X Size of Data: ~1,000X World’s Fastest NVMe SSD

Storage

NAND based Intel PCIe SSDs for NVMe 3D NAND based Intel PCIe Ramping in 2016

3D XPoint™ Technology based Optane™ SSD for NVMe

Next Gen NVM enables world’s fastest NVMe SSD and revolutionary storage class memory

NVMe SSD accelerates performance for latency sensitive workloads on Ceph

*Other names and brands may be claimed as the property of others.

3D XPoint ™ Memory Media

Data Center Form Factors for U.2 2.5in (SFF-8639)

M.2

7mm

80, and 110mm lengths, Smallest footprint of PCIe, use for boot or for max storage density

*Other names and brands may be claimed as the property of others.

Add-in-card

15mm

2.5in makes up the majority of SSDs sold today because of ease of deployment, hotplug, serviceability, and small form factor

Add-in-card (AIC) has maximum system compatibility with existing servers and most reliable compliance program. Higher power envelope, and options for height and length

Intel Platforms

Tick-Tock Development Model Thurley Platform

Romley Platform

Intel® Microarchitecture Codename Nehalem

Intel® Microarchitecture Codename Sandy Bridge

Grantley Platform (Today) Intel® Microarchitecture Codename Haswell

Nehalem

Westmere

Sandy Bridge

Ivy Bridge

Haswell

Broadwell

45nm

32nm

32nm

22nm

22nm

14nm

New Microarchitecture

New Process Technology

New Microarchitecture

New Process Technology

New Microarchitecture

New Process Technology

Tylersburg PCH

Tock

Tick

Patsburg PCH

Tock

Tick

Wellsburg PCH

Tock

Tick

Xeon E5 v4 socket compatible with v3 series - improves Ceph performance

*Other names and brands may be claimed as the property of others.

Higher

Ceph Workloads

Storage Performance

Databases

(IOPS, Throughput)

Cloud DVR

Remote Disks

Lower

Test & Dev

Lower

BigData

CDN

VDI

Boot Volumes

HPC

Block

NVM Focus

Object

Storage Capacity (PB)

*Other names and brands may be claimed as the property of others.

Mobile Content Depot

Enterprise Dropbox App Storage Backup, Archive

Higher

Ceph - NVM Usages Today’s Focus Virtual Machine Guest Application VM

Baremetal User Application Kernel RBD Kernel RADOS RADOS Protocol

RADOS Protocol

NVM

LIBRBD RADOS

Kernel

Kernel RBD RADOS

Client caching w/ write through

RADOS Protocol

RADOS Node

OSD data

metadata

RADOS Node

RocksDB BlueRocksEnv BlueFS

Filestore Caching

Caching

10-25 GbE

File System NVM

NVM

RADOS Protocol

RADOS Protocol

OSD Journal

Container Application

HypervisorQemu/Virtio

Caching

NVM

Container

NVM

Caching

NVM

NVM

Production - FileStore *Other names and brands may be claimed as the property of others.

NVM

NVM

Tech Preview – BlueStore Today’s Focus

Journaling Read cache OSD data

Ceph and Percona Server MySQL Integration Virtual Machine

Virtual Machine

Linux Container

Guest VM Application MySQL

Guest VM Application MySQL

Application

Hypervisor Qemu/Virtio

Hypervisor Qemu/Virtio

RBD RADOS

MySQL Host

RBD RADOS

Kernel RBD RADOS

IP Fabric OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD MON SSD

SSD

SSD

SSD

MON SSD

SSD

Ceph Storage Cluster

*Other names and brands may be claimed as the property of others.

Deployment Considerations • Bootable Ceph volumes (OS & MySQL data) • MySQL RBD volumes (all in one, separate)

SSD

SSD

Configurations • Good: NVMe SSD for Journal/Cache, HDDs as OSD data drive • Better: NVMe SSD as Journal, High capacity SATA (or) 3D-NAND NVMe SSD for data drive • Best: All NVMe SSD

An “All-NVMe” high-density Ceph Cluster Configuration 5-Node all-NVMe Ceph Cluster

10x Client Systems + 1x Ceph MON

Dual-Xeon E5 [email protected], 44C HT, 128GB DDR4

Dual-socket Xeon E5 [email protected]

Centos 7.2, 3.10-327, Ceph v10.1.2, bluestore async

36 Cores HT, 128GB DDR4

Ceph OSD16

NVMe4

Supermicro *Other names and brands may be claimed as the1028U-TN10RT+ property of others.

Test-set 2

NVMe2

Cluster NW 2x 10GbE

NVMe3

Ceph OSD4

Ceph OSD3

Ceph OSD2

Ceph OSD1

NVMe1

Test-set 1

FIO

Docker1 (krbd)

Docker2 (krbd)

MySQL DB Server

MySQL DB Server

Docker3

Docker4

Sysbench Client

Sysbench Client

Public NW 2x 10GbE

DB containers - 16 vCPUs, 32GB mem, 200GB RBD volume, 100GB MySQL dataset, InnoDB buf cache 25GB (25%)

Client containers – 16 vCPUs, 32GB RAM FIO 2.8, Sysbench 0.5

Multi-partitioning flash devices • High performance NVMe devices are capable of high parallelism at low latency • DC P3700 800GB Raw Performance: 460K read IOPS & 90K Write IOPS at QD=128

• High Resiliency of “Data Center” Class NVMe devices

• Reduces lock contention within a single OSD process • Lower latency at all queue-depths, biggest impact to random reads

• Introduces the concept of multiple OSD’s on the same physical device • Conceptually similar crushmap data placement rules as managing disks in an enclosure

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.

*Other names and brands may be claimed as the property of others.

NVMe SSD

Ceph OSD4

Ceph OSD3

• By using multiple OSD partitions, Ceph performance scales linearly

Ceph OSD2

• Power loss protection, full data path protection, device level telemetry

Ceph OSD1

• At least 10 Drive writes per day

Partitioning multiple OSD’s per NVMe Latency vs IOPS - 4K Random Read - Multiple OSD's per Device comparison 5 nodes, 20/40/80 OSDs, Intel DC P3700 Xeon E5 2699v3 Dual Socket / 128GB Ram / 10GbE Ceph0.94.3 w/ JEMalloc, 1 OSD/NVMe

12

2 OSD/NVMe

Single Node CPU Utilization Comparison - 4K Random Reads@QD32 4/8/16 OSDs, Intel DC P3700, Xeon E5 2699v3 Dual Socket / 128GB Ram / 10GbE Ceph0.94.3 w/ JEMalloc 90

4 OSD/NVMe

80 70

Double OSD

8

Quad OSD

Single OSD

6 4

% CPU Utilization

Avg Latency (ms)

10

60 50 40 30 20

2

10 0

0 IOPS

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1 OSD/NVMe

Multiple OSD’s per NVMe result in higher performance, lower latency, and better CPU utilization

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark *Otherparameters. names and brands may be claimed as the property of others.

2 OSD/NVMe

4 OSD/NVMe

4K Random Read/Write Performance and Latency (Baseline FIO Test) IODepth Scaling - Latency vs IOPS - Read, Write, and 70/30 4K Random Mix 5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE Ceph 10.1.2 w/ BlueStore w/ async msgr. 6 RBD FIO Clients 12 11

Average Latency (ms)

~220k 100% 4k Random Write 10 IOPS @~5 ms avg

~560k 70/30% (OLTP) Random IOPS @~3 ms avg

9 8

~1.6 M 100% 4k Random Read IOPS @~2.2 ms avg

7 6 5 4

~1.4 M 100% 4k Random

3

Read IOPS @~1 ms avg

2 1 0

0

200000

400000

600000

800000

1000000

1200000

IOPS 100% Rand Read

100% Rand Write

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark *Other names and brands may be claimed as the property of others. parameters.

70% Rand Read

1400000

1600000

1800000

Sysbench MySQL OLTP Performance (100% SELECT) Sysbench Thread Scaling - Latency vs QPS – 100% read (Point SELECTs) 5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE Ceph 10.1.2 w/ BlueStore w/ async msgr. 20 Docker-rbd Sysbench Clients (16vCPUs, 32GB) 35

~1.3 million QPS (Aggregate, 20 clients) 8 Sysbench threads

Avg Latency (ms)

30 25 20

1 million QPS (Aggregate, 20 clients) @~11 ms avg

15

~55000 QPS per client w/ 2 Sysbench threads

10 5 0 0

200000

400000

600000

800000

Aggregate Queries Per Second (QPS) 100% Random Read

InnoDB buf pool = 25%, SQL dataset = 100GB *Other names and brands may be claimed as the property of others.

1000000

1200000

1400000

Sysbench MySQL OLTP Performance (100% UPDATE, 70/30% SELECT/UPDATE) Sysbench Thread Scaling - Latency vs QPS – 100% Write (Index UPDATEs), 70/30% OLTP 5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE Ceph 10.1.2 w/ BlueStore w/ async msgr. 20 Docker-rbd Sysbench Clients (16vCPU, 32GB) 500 450 ~100k

~5500 QPS w/ 1 Sysbench client (2-4 threads)

400

Avg Latency (ms)

Write QPS@~200 ms avg (Aggregate, 20 clients)

350 300

~400k 70/30% OLTP QPS@~50 ms avg ~25000 QPS w/ 1 Sysbench client (4-8 threads)

250 200 150 100 50

0 0

100000

200000

300000

400000

Aggregate Queries Per Second (QPS) 100% Random Write

InnoDB buf pool = 25%, SQL dataset = 100GB *Other names and brands may be claimed as the property of others.

70/30% Read/Write

500000

600000

Summary & Conclusions • NVMe Flash storage for low latency workloads • Ceph a compelling case for database workloads • With Ceph, 1.4 million random IOPS is achievable in 5U with ~1ms latency today. Ceph performance is only getting better! • Using Xeon E5 v4 standard high-volume servers and Intel NVMe SSDs, you can now deploy a high performance Ceph cluster for database workloads • Next steps: • Evaluation on large scale cluster • Ceph community collaboration in improving write latency

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark *Other names and brands may be claimed as the property of others. parameters.

Thank you- Any Questions? Refer to backup slides for additional configuration and details

Backup

Intel Ceph Contributions CRUSH Placement Algorithm improvements (straw2 bucket type) New Key/Value Store Backend (rocksdb)

2014

Bluestore Backend Optimizations for NVM

PMStore (NVM-optimized backend based on libpmem)

Bluestore SPDK Optimizations

Cache-tiering with SSDs (Write support)

2015

Giant*

2016 Hammer

Cache-tiering with SSDs (Read support)

Virtual Storage Manager (VSM) Open Sourced

Erasure Coding support with ISA-L

RADOS I/O Hinting (35% better EC Write erformance)

Infernalis Industry-first Ceph Cluster to break

1 Million 4k Random IOPs CeTune Open Sourced

Jewel

Client-side Block Cache (librbd) RGW, Bluestore

Compression, Encryption (w/ ISA-L, QAT backend) 19

*Other names and brands may be claimed as the property of others.

Configuration Detail – ceph.conf [global] enable experimental unrecoverable data corrupting features = bluestore rocksdb osd objectstore = bluestore ms_type = async rbd readahead disable after bytes = 0 rbd readahead max bytes = 4194304 bluestore default buffered read = true

auth client required = none auth cluster required = none auth service required = none filestore xattr use omap = true cluster network = 192.168.142.0/24, 192.168.143.0/24 private network = 192.168.144.0/24, 192.168.145.0/24 log file = /var/log/ceph/$name.log log to syslog = false mon compact on trim = false osd pg bits = 8 osd pgp bits = 8 mon pg warn max object skew = 100000 mon pg warn min per osd = 0 mon pg warn max per osd = 32768

debug_lockdep = 0/0 debug_context = 0/0 debug_crush = 0/0 debug_buffer = 0/0 debug_timer = 0/0 debug_filer = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_ms = 0/0 debug_monc = 0/0 debug_tp = 0/0 debug_auth = 0/0 debug_finisher = 0/0 debug_heartbeatmap = 0/0 debug_perfcounter = 0/0 debug_asok = 0/0 debug_throttle = 0/0 debug_mon = 0/0 debug_paxos = 0/0 debug_rgw = 0/0 perf = true mutex_perf_counter = true throttler_perf_counter = false rbd cache = false 20

*Other names and brands may be claimed as the property of others.

Configuration Detail – ceph.conf (continued) [mon] mon data =/home/bmpa/tmp_cbt/ceph/mon.$id mon_max_pool_pg_num=166496 mon_osd_max_split_count = 10000 mon_pg_warn_max_per_osd = 10000 [mon.a] host = ft02 mon addr = 192.168.142.202:6789

[osd] osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog osd_mkfs_options_xfs = -f -i size=2048 osd_op_threads = 32 filestore_queue_max_ops=5000 filestore_queue_committing_max_ops=5000 journal_max_write_entries=1000 journal_queue_max_ops=3000 objecter_inflight_ops=102400 filestore_wbthrottle_enable=false filestore_queue_max_bytes=1048576000 filestore_queue_committing_max_bytes=1048576000 journal_max_write_bytes=1048576000 journal_queue_max_bytes=1048576000 ms_dispatch_throttle_bytes=1048576000 objecter_infilght_op_bytes=1048576000 osd_mkfs_type = xfs filestore_max_sync_interval=10 osd_client_message_size_cap = 0 osd_client_message_cap = 0 osd_enable_op_tracker = false filestore_fd_cache_size = 64 filestore_fd_cache_shards = 32 filestore_op_threads = 6 21

*Other names and brands may be claimed as the property of others.

Configuration Detail - CBT YAML File cluster: user: "bmpa" head: "ft01" clients: ["ft01", "ft02", "ft03", "ft04", "ft05", "ft06"] osds: ["hswNode01", "hswNode02", "hswNode03", "hswNode04", "hswNode05"] mons: ft02: a: "192.168.142.202:6789" osds_per_node: 16 fs: xfs mkfs_opts: '-f -i size=2048 -n size=64k' mount_opts: '-o inode64,noatime,logbsize=256k' conf_file: '/home/bmpa/cbt/ceph.conf' use_existing: False newstore_block: True rebuild_every_test: False clusterid: "ceph" iterations: 1 tmp_dir: "/home/bmpa/tmp_cbt" pool_profiles: 2rep: pg_size: 8192 pgp_size: 8192 replication: 2 *Other names and brands may be claimed as the property of others.

benchmarks: librbdfio: time: 300 ramp: 300 vol_size: 10 mode: ['randrw'] rwmixread: [0,70,100] op_size: [4096] procs_per_volume: [1] volumes_per_client: [10] use_existing_volumes: False iodepth: [4,8,16,32,64,128] osd_ra: [4096] norandommap: True cmd_path: '/usr/local/bin/fio' pool_profile: '2rep' log_avg_msec: 250 `

22

Storage Node Diagram

Two CPU Sockets: Socket 0 and Socket 1  Socket 0 • 2 NVMes • Intel X540-AT2 (10Gbps) • 64GB: 8x 8GB 2133 DIMMs  Socket 1 • 2 NVMes • 64GB: 8x 8GB 2133 DIMMs *Other names and brands may be claimed as the property of others.

Explore additional optimizations using cgroups, IRQ affinity 23

High Performance Ceph Node Hardware Building Blocks • Generally available server designs built for high density and high performance • High density 1U standard high volume server • Dual socket 3rd Generation Xeon E5 (2699v3) • 10 Front-removable 2.5” Formfactor Drive slots, 8639 connector • Multiple 10Gb network ports, additional slots for 40Gb networking

• Intel DC P3700 NVMe drives are available in 2.5” drive form-factor • Allowing easier service in a datacenter environment

*Other names and brands may be claimed as the property of others.

MySQL configuration file (my.cnf) [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice =0 [ mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /data basedir = /usr tmpdir = /tmp lc-messages-dir = /usr/share/mysql skip-external-locking bind-address = 0.0.0.0 max_allowed_packet = 16M thread_stack = 192K thread_cache_size =8 query_cache_limit = 1M query_cache_size = 16M log_error = /var/log/mysql/error.log expire_logs_days = 10 max_binlog_size = 100M

*Other names and brands may be claimed as the property of others.

performance_schema=off innodb_buffer_pool_size = 25G innodb_flush_method = O_DIRECT innodb_log_file_size=4G thread_cache_size=16 innodb_file_per_table innodb_checksums = 0 innodb_flush_log_at_trx_commit = 0 innodb_write_io_threads = 8 innodb_page_cleaners= 16 innodb_read_io_threads = 8 max_connections = 50000 [mysqldump] quick quote-names max_allowed_packet

= 16M

[mysql] !includedir /etc/mysql/conf.d/

Sysbench commands prepare sysbench --test=/root/benchmarks/sysbench/sysbench/tests/db/parallel_prepare.lua --mysql-user=sbtest --mysqlpassword=sbtest --oltp-tables-count=32 --num-threads=128 --oltp-table-size=14000000 --mysql-table-engine=innodb -mysql-port=$1 --mysql-host=172.17.0.1 run READ sysbench --mysql-host=${host} --mysql-port=${mysql_port} \--mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest -mysql-engine=innodb --oltp-tables-count=32 --oltp-table-size=14000000 -test=/root/benchmarks/sysbench/sysbench/tests/db/oltp.lua --oltp-read-only=on --oltp-simple-ranges=0 --oltp-sumranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-point-selects=10 --rand-type=uniform -num-threads=${threads} --report-interval=60 --warmup-time=400 --max-time=300 --max-requests=0 --percentile=99 run WRITE sysbench --mysql-host=${host} --mysql-port=${mysql_port} --mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest -mysql-engine=innodb --oltp-tables-count=32 --oltp-table-size=14000000 -test=/root/benchmarks/sysbench/sysbench/tests/db/oltp.lua --oltp-read-only=off --oltp-simple-ranges=0 --oltp-sumranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=100 --oltp-point-selects=0 --rand-type=uniform --num-threads=${threads} --report-interval=60 --warmup-time=400 --max-time=300 --max-requests=0 --percentile=99 run *Other names and brands may be claimed as the property of others.

Docker Commands Database containers docker run -ti --privileged --volume /sys:/sys --volume /dev:/dev -d -p 2201:22 -p 13306:3306 -cpuset-cpus="1-16,36-43" -m 48G --oom-kill-disable --name database1 ubuntu:14.04.3_20160414db /bin/bash

Client containers docker run -ti -p 3301:22 -d --name client1 ubuntu:14.04.3_20160414-sysbench /bin/bash

*Other names and brands may be claimed as the property of others.

RBD Commands ceph osd pool create database 8192 8192 rbd create --size 204800 vol1 --pool database --image-feature layering rbd snap create database/vol1@master

rbd snap ls database/vol1 rbd snap protect database/vol1@master rbd clone database/vol1@master database/vol2

rbd feature disable database/vol2 exclusive-lock object-map fast-diff deep-flatten rbd flatten database/vol2

*Other names and brands may be claimed as the property of others.

An "All-NVMe” high-density Ceph Cluster Configuration SuperMicro FatTwin (1x dual-socket XeonE5 v3)

FIO/Sysbench FIO RBD Client FIO RBD Client FIO RBD Client

CBT / Zabbix / Monitoring

Ceph MON

SuperMicro FatTwin (4x dual-socket XeonE5 v3)

SuperMicro FatTwin (4x dual-socket XeonE5 v3)

Intel Xeon E5 v4 22 Core CPUs Intel P3700 NVMe PCI-e Flash

Intel PCSD (4x dual-socket Xeon E5 v3)

FIO/Sysbench FIO RBD Client FIO RBD Client

FIO/Sysbench FIO RBD Client FIO RBD Client FIO RBD Client

Ceph network (192.168.142.0/24) – 2x10Gbps Ceph cluster network (192.168.144.0/24) – 2x10Gbps

Ceph OSD16

NVMe1



Ceph OSD4

SuperMicro 1028U

Ceph OSD3

NVMe4

Ceph OSD2

SuperMicro 1028U

NVMe2

NVMe3

Ceph OSD1

NVMe4



Ceph OSD16

NVMe1

Ceph OSD4

Ceph OSD3

Ceph OSD2

NVMe3

Ceph OSD1

NVMe2



Ceph OSD16

NVMe1

Ceph OSD4

SuperMicro 1028U

Ceph OSD3

NVMe4

Ceph OSD2

NVMe3

Ceph OSD1

NVMe2



Ceph OSD16

SuperMicro 1028U

NVMe1

Ceph OSD4

NVMe4

Ceph OSD3

NVMe2

Ceph OSD2

NVMe3

Ceph OSD1

Ceph OSD16

Ceph OSD4

Ceph OSD3

Ceph OSD2

Ceph OSD1

NVMe1



NVMe3

NVMe2

NVMe4

SuperMicro 1028U

Ceph Storage Cluster

• 5-Node all-NVMe Ceph Cluster based on Intel Xeon E5-2699v4, 44 core HT, 128GB DDR4 • Storage: Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node • Networking: 2x10GbE public, 2x10GbE cluster, partitioned, replication factor 2 • Ceph 10.1.2 Jewel Release, CentOS 7.2, 3.10.0-327.13.1.el7 Kernel

• 10x FIO/Sysbench Clients: Intel Xeon E5-2699 v3 @ 2.30 GHz, 36 cores w/ HT, 128GB DDR4 • Docker with kernel RBD volumes – 2 database and 2 client containers per node • Database containers – 16 vCPUs, 32GB RAM, 250GB RBD volume Client containers – 16 vCPUs, *Other•names and brands may be claimed as the property of32GB others. RAM

Easily serviceable NVMe Drives