TURKCELL s EXADATA Journey Part1 & Part 2

TURKCELL IT MANAGER TURKCELL’s EXADATA Journey Part1 & Part 2 Oct 2011 Metin YILMAZ TURKCELL DBA Ferhat ŞENGÖNÜL http://ferhatsengonul.wordpress.c...
Author: Nancy Barrett
8 downloads 0 Views 3MB Size
TURKCELL IT MANAGER

TURKCELL’s EXADATA Journey Part1 & Part 2

Oct 2011

Metin YILMAZ

TURKCELL DBA Ferhat ŞENGÖNÜL http://ferhatsengonul.wordpress.com http://twitter.com/ferhatsengonul

Who am I? • • • • •

11 years in IT in finance sector. Oracle ACE Worked with (nearly) all db’s from hierarchical to relational Found peace in Exadata One of the founders of Turkish Oracle User Group (TROUG)

• http://ferhatsengonul.wordpress.com • http://twitter.com/ferhatsengonul

Headlines • Turkcell in numbers • BI domain in numbers • First project • Migration to V2 (8 nodes) ( Total uncompressed size 250 TB)

• Second project • Migration to 2 x2-2’s (16 nodes) • Consolidation of 4 databases ( Total uncompressed size 600 TB) • Geographical (continental) change of data center

• Third project • 2 clusters in SATA v2

Turkcell Group Overview Communications & technology leader in the region

61 M subscr. in 9 countries Leading operator in 5 countries

Belarus

Germany

Listed on NYSE and ISE

Ukraine Kazakhstan

Moldovia

Market Cap: $10 billion

Georgia Azerbaijan

2010 Revenue: $6 billion

Türkiye TRNC

Most Admired Company Most Valuable Brand in Turkey

About Turkcell   

 



Leading communications and technology company in Turkey with 34.1 million subscribers (as of 30 Jun 2011) Third biggest mobile operator in Europe in terms of subscriber base Capital expenditures of TRY 17 billion (as of 31 December 2010) Covers almost 100% of Turkey with 25,500 base stations Among the leaders in the world in terms of 3G speed and network quality Global operator with international operations and worldwide roaming including international, 3G, GPRS and Camel



operations in 8 different countries, except Turkey, with 27 million subscribers



international roaming with 661 operators in 208 countries

Turkcell’s BI Environment

Exadata Source DBs

ETL AB Initio Oracle ODI

Amount Of Data • • • • •

Reporting MSTR Other DWH DBs

3 Billion CDR per Day 600 -1000 GB raw data extracted from 20+ source databases 5 TB Data on file system processed 2-3 TB loaded into databases, all into Exadata Approximately 600 TB Customer data stored in multiple Databases •

600 TB (60 TB compressed) on Exadata

Turkcell’s BI Environment – Application Footprint ETL Environment • • • •

AB Initio – runs on 8 Node Linux Cluster Data loaded daily between 19:00 - 08:00 (SLA) Sql*Loader used to load tables with 16 parallel threads Works to implement external tables continue

Reporting Environment • • • • •

MSTR (Microstrategy) is used mostly. 4 nodes before migration. 3 nodes after migration. 300+ Users 3000 distinct reports with 50k run per month Runs between 08:00 – 20:00 (SLA)

First Project ( completed in July 2010) • Turkcell’s largest 100 TB (~250 TB uncompressed) DB was migrated to DBM v2, now only 25 TB with the help of HCC on Full SAS Rack • Over 50K reports run every month on this DB, Performance improvement up-to 400x, average 10x • 1 RACK instead of 11 RACKS

Business Drivers - Why Exadata ?

EMC DMX-4 70 TB

HITACHI USP-V 50 TB

Server Model Cpu Type Number Of Cpu Threads Total Main Memory Total Storage Capacity Storage Connection Technology Storage Maximum IO Throughput Capacity Server + Storage units Total Power Server + Storage units Total Form Factor Approximate Data Backup Duration Number of Backup Tape Cartridges Per Backup

Sun M9000 Sparc 7 176 Threads

ORACLE Exadata V2

OLD SYSTEM Sun M9000 Sun Sparc 7 2.52 GHz 176 512 GB 120 TB Fiber Channel (32 x 4 Gbit/s) 5 GB/s 57 KVA

NEW SYSTEM Oracle Exadata V2 Xeon® E5540 Processors 2.53 GHz 128 576 GB 30 TB Infiniband ( 8 x 40 Gbit/s) 21 GB/s 10 KVA

11 Rack

1 Rack

44 Hours 159

14 Hours 57

Business Drivers - Why Exadata ? Simplified Architecture • •

Full Rack replaces Sun M9000 & ten(10) Storage Cabinets Single Vendor Strategy • It took Turkcell a few years to perfect current environment – Exadata up & running in 5 days • We need to address/deal with one party if a problem occurs

Effortless Scalability: one to Multi-Rack • • •

Data volume explodes – data size doubles every year (45TB to 100TB) Old storage environment was maxed out – no chance to scale out Old env 600+ (EMC DMX4 & Hitatchi USP-V) disks with max I/O throughput of 5 GB/sec. vs 168 SAS disks with a max I/O throughput of 21 GB/sec.

Business Drivers - Why Exadata ? 6000

Exadata Migration

Avg Time

Weekly Report Count & Run-time

5000 4000

4750

3284

2944

3000 2486

4827

2227

25 min 27 min 7 min 3 min 3 min 1403 25 min 1161 1023 665 489 648 541 361 192 57 454 376 346 314 271 260 2523519 199342214 3 2 1 3 419 193 179 107 34282588 177 1394049332987 1125245351775 883328 9 5 2 4 0 6 4 2 1 1 1

2000 1000

0-5 dk 5-10 dk 10-30 dk 30-45 dk 45-60 dk 1-1.5 saat 1.5-2 saat 2-2.5 saat 2.5-3 saat 3-3.5 saat 3.5- 4 saat 4 saat… 0-5 dk 5-10 dk 10-30 dk 30-45 dk 45-60 dk 1-1.5 saat 1.5-2 saat 2-2.5 saat 2.5-3 saat 3-3.5 saat 3.5- 4 saat 4 saat… 0-5 dk 5-10 dk 10-30 dk 30-45 dk 45-60 dk 1-1.5 saat 1.5-2 saat 2-2.5 saat 2.5-3 saat 3-3.5 saat 3.5- 4 saat 4 saat… 0-5 dk 5-10 dk 10-30 dk 30-45 dk 45-60 dk 1-1.5 saat 1.5-2 saat 2-2.5 saat 2.5-3 saat 3-3.5 saat 3.5- 4 saat 4 saat… 0-5 dk 5-10 dk 10-30 dk 30-45 dk 45-60 dk 1-1.5 saat 1.5-2 saat 2-2.5 saat 2.5-3 saat 3-3.5 saat 0-5 dk 5-10 dk 10-30 dk 30-45 dk 45-60 dk 1-1.5 saat 1.5-2 saat 2-2.5 saat 2.5-3 saat 3.5- 4 saat

0

Temmuz (05-11)

Temmuz (12-18)

Temmuz (19-25)

Temmuz (26-01)

Agustos(02-08)

Agustos(09-15)

Performance Needs • • • •

Over 50K reports run every month on this DB Average report run time is reduced from 27 minutes to 3 minutes !!! Reports completed less than 5 mins rose from %45 to %90 Reports running more than 4 hours down from 87 to 1

The Project Overview Planned as simple migration • No Application version or interface changes • Migrate to new Exadata V2 machine • Upgrade to new Oracle Version (from 10.2.0.4 to 11.2.0.1) • Move from Single Instance to RAC 4 Months (Apr-July) testing of all the components • Crash Tests (working closely with support) • Network Transfer Speed Tests (10G vs IB) • Backup/Restore Tests • Performance Tests (RAT, Report run times, data load times)

Project Challenges • Will we fit into 30 TB ? (~100 TB 10g compressed) • How to move that much data in 2 days ? • 100TB 10g compressed, how much of it can be moved before/after • How much data needs to be moved during mig window • What kind of Network infrastructure is needed to support such xfr rate

• 8-node RAC ? (earlier attempts to use RAC did not go through) • Training needs for RAC and exadata storage architecture

• Rollback plan: Parallel run (load data to both DB) Environment

Migration Facts • Insert/append over DB Links • Platform and version change forced us to use insert over db-link • None of the other methods like TTS, ASM Rebalance was applicable

• Used in-house pl/sql utility to perform migration • Metadata (Tablespaces were re-organized) • Insert/Append with 128 parallel sessions partition-wise

• 40TB data was transferred during the migration window • • • •

Transfer rate 300mB/sec, around 1 TB per hour approximately Completed in 36 hours. High CPU load on comp nodes because of HCC (expected) SQL*Net compressing the data 2-3x (this was a nice finding)

• 20 TB readonly data was transferred before, ~30TB after • Network Infrastructure • 10Gb-to-IB (Voltaire 4036E) gateway used to connect old system

Migration Facts

• After the test migration we ended-up staying on Exadata • End users were so happy, they did not let us go back • 3 Weeks before the go-live date we were live

• Parallel runs continued for few weeks till we completely feel comfortable on Exadata • Stability of the system under real load was proved for various load patterns • Backup/Restore tests were completed

Compression in Action Old System 10gR2 Compression • ~2-3 times ~250TB raw data to 100TB

Exadata V2 with EHCC • Raw Data 250TB to 25TB (Data)+ 5TB (Temp) = 30TB • EHCC - Compress ratio ~7-10x • Archive compression is efficient but high CPU consumption SORT

COMPRESS

SIZE (GB)

RATIO

NOSORT

NOCOMP

137,59

1

NOSORT

Q_HIGH

21,21

6,48

SORT_A

Q_HIGH

12,18

11,29

SORT_B

Q_HIGH

15,37

8,95

SORT_A_B

Q_HIGH

11,64

11,80

http://ferhatsengonul.wordpress.com/2010/08/09/getting-the-most-from-hybrid-columnar-compression/

Performance Gains Report Name

Old System

Exadata

by…X

CRC Control Report

0:15:48.73

0:05:06.07 X2

prepaid bireysel toplam harcanan kontor tl

8:02:10.59

1:51:33.20 X4.3

eom_equipment_utilization report

0:38:17.77

0:00:23.34 X163

Eom equipment arpu report

0:09:46.25

0:02:08.00 X4.5

Son 10 gun icinde yapilan gonderimler

0:17:57.95

0:00:37.61 X45.7

Rapor 01. Satış Kanalı Kırılımında Blackberry

0:03:22.24

0:00:00.66 X487

İnternet Paketleri Postpaid paketiçi ve paket

0:05:41.34

0:01:00.34 X4.3

Connectcard Fatura tutarları

0:31:32.38

0:00:46.51 X66.3

Connectcard Aktivasyon

0:25:21.00

0:00:54.88 X44.9

• Over 50K reports run every month • Performance improvement is up-to 400x for some reports and on average it is 10x

User Feedbacks • We heard before that infrastructure changes would give us performance gains but this time we were surprised with it, it was well over our expectations. Now we can take faster actions in this competitive environment. Director of Marketing Department • XDDS is fantastic in a single word, none of the reports take more than 10 minutes, It was taking 3-4 hours before now it completes in 3 minutes. It sounds like un-real but it is real. Power end-user from Finance Department • It was a never ending race to match the business' performance and capacity needs. With the Database Machine V2, we have outperformed our user's expectations and we are prepared for the future growth. Veteran System Admin • “You started to scare me MSTR” updated her status on facebook. End-user from Marketing Department

I had a great holiday

Second Project

• Monthly 1 TB increase in size. • Need a second RACK. • Management was satisfied and bought 2 RACKS in stead of one. • Migration of Data Center from Europe to Asia • Consolidation on Exadata.

Operational Sources

SMARTCUBE - MicroStrategy

EXADATA x2-2

ODS DDS

Extract

Feed

.. ..

S

BIS INFRASTRUCTURE ROADMAP

RDS

CDRDM

Moving the data!

DDS 35 TB + 18 Ay

SINGLE DWH ENVIRONMENT WITHOUT SINGLE DATABASE RDS 5 TBOF DATA DUPLICATION

ODS 5 TB

DEV DOMAIN 5 TB TEST DOMAIN 5 TB

ZDDS 5TB

CDRDM 15TB

HIGH AVAILABILTY SOLUTION FOR DWH 50TB OTHER DBS 25TB

SAS 60TB HOT SAS 20TB COLD

100TB COLD

Project Status (Feb 2011) • RDS was migrated and became XRDS • Size : 15 TB decreased to 3 TB • A single tables compression rate 70x (Full of numbers)

• Performance increase 3x even the ETL Server and the DB are on different continents  • And runs in only one database node. (server pool which has only one node.)

Project Status (March 2011) • CDRDM was migrated and became XCDRDM • Size : 45 TB decreased to 15 TB • Our aim is to use this space for ETL server migration.

Project Status (April 2011)

• Existing XDDS has migrated in April. • ETL servers and reporting servers have migrated simultaneously.

Migration method • From Sun Solaris to Exadata • Insert/append over dblink method. • We still love our inhouse code.

• From Exadata v2 to Exadata x2 • Incr0 backup in Europe / restore in Asia • Incr1 backup in Europe / restore in Asia • R/O the source , last incr1 backup and final restore at the target. • Upgrade and open in Asia.

GAINS and PROJECTION DBNAME BEFORE

AFTER

END of 2011

END of 2012

XDDS

35TB

35TB

40TB

50TB

XCDRDM

60TB

13TB

15TB

20TB

XRDS

15TB

3TB

5TB

7TB

ZDDS

15TB

3TB

5TB

7TB

NODS

6TB

2TB

5TB

10TB

ARA TOPLAM

96TB

21TB

30TB

44TB

TOTAL

131TB

56TB

70TB

94TB

Storage Admins were our best friends! Now, we are their best friend. • •

120 TB (net space) of disk is given back with the first project. 100 TB (net space) of disk is given back with the second project.

GAINS on reporting Avg Time

6,42 min

7,1 min

3,28 min

• Even though we’re using only 8 nodes on X2-2 cluster, we had performance increase.

A little talk about server pools. • Quality of Service • 1 sec limit , unaware of parallel DML

• Server Pools • Still can be used for dividing the nodes between servers. • We do not want to run 2 different instances on the same node. • But want to increase or decrease the number of nodes between systems. • Still want to have the chance to get all 16 nodes for 1 database.

3rd project: Having a test environment

• Everything is ok on production side. • What about testing? • V2 SAS disks have been replaces with SATA.

3rd project: Having 2 clusters in one box • Installation is easy with one command. • Cluster 1 : SATA test (Happy developers) • 2 compute nodes • 3 storage cells ( 25 TB disk / 3 GBps)

• Cluster 2 : SATA prod ( Space for everthing) • 6 compute nodes • 11 storage cells ( 100 TB disk / 11 GBps)

Turkcell - Consolidation Journey 2009 DDS

RDS

ODS

2011

hazar M9000 120 TB

Exadata V2 1 Rack 30 TB

Exadata X2 2 Rack 35 TB

everest ¼ E25K 20 TB

everest M5000 20 TB

Exadata X2 2 Rack 5 TB

cunda HP rx8640 5 TB

verona 3-node x86

Exadata X2 2 Rack 5 TB (Q3)

kanuni E25K 60 TB

CDRDM

ZDDS

2010

Amanos E25K 60 TB

2-node x86

Exadata V2 13 TB

Exadata V2 5 TB

Turkcell - Consolidation Journey 2009 DDS

RDS

ODS

CDRDM

ZDDS

2010

2011

More OOW Sessions Title: Maximize Your ROI with Oracle Database Cloud Time : Monday, 03:30 PM, Moscone South – 308 Time : Thursday, 12:00 – 13:00 Marriott Marquis Golden Gate C3 Title: Oracle Exadata Hybrid Columnar Compression: Next-Generation Compression Time : Tuesday, 11:45 AM, Moscone South – 304

More OOW Sessions Ersin Unkar Gürcan Orhan & Sanem Seren Sever Emre Oka & Murat Yılmaz Aslı Filiz Deniz Pazarcioglu & Deniz Seçilir

15701 - Event-Driven Patterns and Best Practices

Monday, 11:00 AM, Marriott Marquis - Golden Gate A

19441 - Case Study: Turkcell Maximizes Integration Results with Monday, 12:30 PM, Oracle Data Integration InterContinental - Sutter07122 - Data Warehouse Performance Tuesday, 11:45 AM, and Accuracy at a Telco with Oracle InterContinental - Telegraph GoldenGate and Oracle Exadata Hill12241 - Customer Panel: Application Wednesday, 01:15 PM, Grid Moscone South - 306

07815 - Migration to Oracle Fusion Wednesday, 05:00 PM, Middleware 11g: Key Point of Success Moscone West - 2018 07813 - Location-Based Data, Onur Taner & Ali Marketing, and Sales Services with Thursday, 01:30 PM, Yuksel Spatial Technologies Moscone South - 302 07182 - Effectively Using Oracle Active Data Guard for Multiple Purposes: Tuesday, 03:30 PM, Moscone Yunus Emre Baransel Turkcell Ca... South - 307

To be continued…  Questions

Ferhat ŞENGÖNÜL http://ferhatsengonul.wordpress.com http://twitter.com/ferhatsengonul

www.turkcell.com.tr