Oracle NoSQL Database Overview 1

David Segleau Director Product Management

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Agenda • NoSQL Overview • Oracle NoSQL Database – Architecture – Technical Overview – Benchmark Results – Use Cases

3

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

NoSQL – A Brief History • Early 2000s, Web 2.0 companies started looking for “RDBMS alternatives” • 2003: memcached (cached k-v store to reduce load on RDBMS) • 2004: Google published MapReduce distributed processing paper • 2006: Google published BigTable distributed database paper • 2007: Amazon published Dynamo paper • 2008+: Several open source projects are launched to productize NoSQL solutions • 2009+: Local meetings to discuss and share RDBMS alternatives • 2010+: Enterprises start to investigate NoSQL solutions

4

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

RDBMS vs NoSQL • RDBMS – High value, high density, complex data – Complex data relationships – Schema-centric – Designed to scale up & out – Lots of general purpose features/functionality

High overhead ($ per operation) 5

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

• NoSQL architectures – Low value, low density, simple data – Very simple relationships – Schema-free, unstructured or semi-structured data – Distributed storage and processing – Stripped down, special purpose data store

Lower overhead ($ per operation)

What is NoSQL? • Not-only-SQL (2009) • Broad class of non-relational DBMS systems that typically – – – – –

Provide horizontal/distributed scalability Avoid joins Have relaxed consistency guarantees Don’t require a structured schema Are application/developer-centric

• No standards – Rapid evolving set of solutions (100+ on nosql-database.org) – Highly variable feature set – UnQL launched in July 2011, still a thought experiment

• Majority are open source 6

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

What problems does NoSQL try to address? • Cost – TBs to PBs of low/unknown value, simple/unstructured data – Lower $ per operation (hardware and RDBMS license fees)

• Scalability – Scale out, don’t scale up • Flexible schema – Diverse, changing data sets • Performance – High rate of data capture – High volume of simple queries – Eliminate ORM overhead

• Availability – Low cost highly available, distributed data store – Move CAP more towards AP rather than CA 7

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Agenda • NoSQL Overview • Oracle NoSQL Database – Architecture – Technical Overview – Benchmark Results – Use Cases

8

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

The NoSQL Challenge Where to Start • New, rapidly emerging database technology • Simple data storage, typically non-SQL or Not-only-SQL • Distributed (Cloud) storage • Large amounts of data (Terabyte – Petabyte range) • Solution categories – Storage for “Web Service” applications – ETL Processing (MR & Hadoop)

Our focus is here … and we integrate here

• Common data models – Key-Value – Document, Columnar, Graph 9

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Our focus is here

Oracle NoSQL Database Target Use Cases High-throughput event processing Customer profile management Click-through data processing Sensor & statistics data capture Social networks Personalization Mobile application backend infrastructure Authentication & Content management Archiving 10

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

SIMPLE QUERIES DYNAMIC SCHEMA HIGH VOLUME DATA INTERACTION

Customer-Driven Requirements • Terabytes to petabytes of unstructured or semi-structured data • No single point of failure • Cost effective, distributed storage. Scalable on commodity hardware • Fast, predictable response time to simple queries • Fast, reliable transactions • Simple administration, enterprise support • Commercial-grade NoSQL solution – Real 24x7 support – Real database expertise – Large vendor & dedicated resources building & testing the software

11

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL Database A Distributed, Scalable Key-Value Database Application

Application

NoSQL Database Driver

NoSQL Database Driver

Simple Data Model Small, distributed footprint Highly scalable, available Transparent load balancing Integrates with Oracle Stack

12

Storage Nodes

Storage Nodes

Datacenter A

Datacenter B

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Architecture Summary Scalable, Highly Available, Optimized • Scalability – Dynamic data partitioning and distribution – Optimized data access via intelligent driver • High availability

– – – –

One or more replicas Resilient to partition master failures No single point of failure Disaster recovery through location of replicas

• Transparent load balancing

– Reads from master or replicas – Driver is network topology & latency aware 13

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Simple Data Model Key-value pairs • Simple data model – key-value pair (major+minor-key paradigm) • Simple operations – read/insert/update/delete, RMW support • Scope of transaction – records within a major key, single API call • Unordered scan of all data (non-transactional) Major key:

userid

Strings Minor key: Byte Array 14

Value:

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

subscriptions expiration date

address phone #

email id

Simple Data Model ACID Transactions • ACID transactions by default • Transaction Scope – Single API call – All records must have the same major key – Support for multiple operations within a transaction • Can be relaxed for increased performance on a per-

operation basis 15

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Simple Data Model ACID Transactions – Configurability • Configurable Durability Policy

• Configurable Consistency Policy

16

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Scalability and Availability • Replicated Application Servers • Driver is linked into each Application • Storage Nodes kept current via

replication (Berkeley DB Java Edition HA) • Storage Nodes across Data Centers • Automatic SN failure handling – Graceful degradation – Automatic recovery

No Single Point of Failure 17

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

High Availability • Automatic log-based replication • Storage Node Failure – Node failures automatically detected, system continues to function – Rejoining nodes automatically synchronize with the master – Isolated nodes can still service reads

• Master Failover – Automatic election of new master, distributed 2-phase election algorithm (PAXOS) – Master election based on highest LSN (log sequence number)

• Multi-node or Shard (replication group) failure – System continues to function using remaining replication groups

• System automatically maintains group membership and status 18

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Transparent Load Balancing Operation + Key[M,m] + Value + Transaction Policy

Application NoSQL DB Driver

Hash Major Key to determine Partition ID Partition Map maps Partition ID to a shard

• Operation result • Partition Map Changes

State Table maps a shard to Storage Node(s) Load Balancer selects best eligible Storage Node Contact Storage Node directly

19

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

• Storage Node stats

Simple Administration • Web-based console and CLI commands • Manages and Monitors – Configuration changes – Load: Number of operations, data size – Performance: Latency, throughput. Min, max, average, trailing, … – Events: Failover, recovery, load distribution – Alerts: Failure, poor performance, … 20

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL Database Differentiation Integrates seamlessly with Oracle Stack (ODI, CEP, OLH)

Commercial Grade Software and Support • General Purpose

Scalability and Availability • Intelligent Oracle NoSQL DB Driver • Evenly distributes Data • Sends operation to fastest node • Bounded network hops for all operations

• Reliable – Based on proven Berkeley DB JE HA • Easy to Install & Configure

21

• Automatic replication and failover

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Simple Administration

Simple Data Model

• Simple Major + Minor key and Value data structure

• Web-based Console and CLI commands • Manages and Monitors:

• ACID transactions • Configurable consistency and durability

• • • • •

Topology Load Performance Events Alerts

Benchmarking • 1.6 billion records • 94K insert/sec • 25K read/update/sec • Low latency • Linear scalability

22

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL Database Use Cases Success Stories •

provides PaaS for deploying applications over the cloud. – Oracle NoSQL Database exposed as a service through their cloud infrastructure.



23

, Oracle Platinum Partner, built an online gaming application for their customer using Oracle NoSQL Database.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential

Oracle NoSQL DB Use Cases Cloud e-mailing Service • Problem: Manage e-mail accounts for 10s of millions of customers and hundreds of Terabytes of data. • Requirements: – Fast, Scalable, flexible data management solution – Highly Available, Easy to manage & monitor

• Solution: NoSQL DB

24

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential

Oracle NoSQL DB Use Cases Cloud Architecture Services • Problem: Cloud-based infrastructure requires support services like Authentication, Authorization, Event Tracking • Requirements – Real time performance and high throughput – Simple data structures

• Solution: NoSQL DB

25

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential

Oracle NoSQL DB Use Cases Customer data aggregation, trend analysis • Problem: Need to preserve OCEP event history. Aggregated customer experience data can be used to identify trends, offer promotions, provide better insight and customer service. • Requirements: – Rich, flexible customer profile – Aggregate and store discrete OCEP event data over time

• Solution: OCEP + NoSQL DB

26

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Confidential

Oracle NoSQL Database Easy to use, easy to manage Scalable, Available, Predictable Latency A NoSQL Database from a vendor you trust

27

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL DB Resources Support • Support via OTN forums and Oracle Support process • OTN Forum: – Forum Home » Big Data » NoSQL Database – forums.oracle.com/forums/forum.jspa?forumID=1388

• Oracle.com: – www.oracle.com/us/products/database/nosql/overview/index.html

• OTN (including documentation and download): – www.oracle.com/technetwork/products/nosqldb/overview/index.html 28

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL DB Resources Documentation • On OTN and in download – docs.oracle.com/cd/NOSQL/html/index.html

• • • •

29

Getting Started Guides Programmatic API Installation & Release Notes FAQ

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Big Data DB Resources External • Big Data on O.com: http://www.oracle.com/us/technologies/big-data/index.html • Big Data on OTN: http://www.oracle.com/technetwork/topics/bigdata/learnmore/index.h tml – Start here: “Big Data Essentials” webinar series

30

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Questions

Q&A

31

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

APPENDIX

32

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

What is Big Data? GEODATA

BLOG

SMART METER

VOLUME

33

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

VELOCITY

VARIETY

VALUE

Why is Big Data important? US HEALTH CARE

US RETAIL

MANUFACTURING

GLOBAL PERSONAL LOCATION DATA

EUROPE PUBLIC SECTOR ADMIN

Increase industry value per year by

Increase net margin by

Decrease dev., assembly costs by

Increase service provider revenue by

Increase industry value per year by

$300 B

60+%

–50%

$100 B €250 B

“In a big data world, a competitor that fails to sufficiently develop its capabilities will be left behind.”

Source: * McKinsey Global Institute: Big Data – The next frontier for innovation, competition and productivity (May 2011)

34

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Big Data Lifecycle DECIDE

ANALYZE

35

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

ACQUIRE

ORGANIZE

Make Better Decisions Using Big Data

Oracle Big Data Software Platform Big Data Appliance

Exalytics

Exadata

Open Source R Oracle NoSQL Database Applications

ACQUIRE

36

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Advanced Analytics Oracle Big Data Connectors

Data Warehouse

InfiniBand Oracle Data Integrator

ORGANIZE

Oracle Database

In-Database Database Analytics

Hadoop

InfiniBand

ANALYZE

Analytic Applications Alerts, Dashboards, MDAnalysis, Reports, Query Web Services BI Abstraction

DECIDE

Oracle Engineered Systems for Big Data Big Data Appliance

ACQUIRE 37

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Exadata

ORGANIZE

ANALYZE

Exalytics

DECIDE

Big Data Use Cases Today’s Challenge

New Data

What’s Possible

Healthcare Expensive office visits

Remote patient monitoring

Preventive care, reduced hospitalization

Manufacturing In-person support

Product sensors

Automated diagnosis, support

Location-Based Services Based on home zip code

Real time location data

Geo-advertising, traffic, local search

Utilities Complex Distribution Grid

Detailed consumption statistics

Increased availability, reduced cost, tiered metering plans

Retail One size fits all marketing

Social media

Sentiment analysis segmentation

38

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Big Data Characteristics

39

Batch-Oriented

Real-Time

Process data to use

Deliver a service

Bulk storage

Fast access to specific record

Write once, read all

Read, write, delete update

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Big Data Storage Choices Hadoop Distributed File System (HDFS)

Oracle NoSQL Database

File System

Database

Parallel scanning

Indexed storage

No inherent structure

Simple data structure

High volume writes

High volume random reads and writes

Batch Oriented

Real-Time

40

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Early Adopter Dilemma

• Time to Build? • Expertise? 41

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

• Cost and Difficulty Maintaining? • Product Support?

Oracle Big Data Appliance Hardware • 18 Sun X4270 M2 Servers per BDA – 864 GB memory – 216 cores – 648 TB storage

• 40 Gb/s InfiniBand Fabric – Inter-rack Connectivity – Inter-node Connectivity

• 10 Gb/s Ethernet Connectivity – Data center connectivity

42

Copyright © 2012, Oracle and/or its affiliates. All rightsFull Rack Configuration Only reserved.

Oracle NoSQL DB Licensing Community VS Enterprise Edition • Two versions – Oracle NoSQL Database Community Edition. Open Source. AGPL license. – Oracle NoSQL Database Enterprise Edition. Closed Source. Standard Oracle License.

• Community Edition has all of the basic functionality and APIs. Gets you started. Competes with other OS NoSQL solutions. • Enterprise Edition for large, production, multi-data center, Oracle integration centric customers and/or non-GPL compliant customers. 43

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Benchmarking Configurations • YCSB-based benchmark (Yahoo Cloud Services Benchmark) – Key ~13 bytes, Data ~= 1.1K

• Configurations of 3 (1x3) – 192 (64 x 3) storage nodes – Replication factor of 3 (master + 2 replicas) – 100m to 2.1b records, 100m-400m records per storage node – Intel Systems: 2.93ghz Intel Westmere (wds024c) model x5670, dual socket with 6 cores/socket, 24GB of memory, single 300GB local disk and RedHat 2.6.18-164.11.1.el5.crt1 – Cisco Systems: UCS C200 M2 & UCS C210 M2 systems (Intel 5600s), dual socket with 6 cores/socket, 18GB of memory, 4,8 or 16 disks for total of 8-16TB. 44

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Benchmarking Configurations Systems configured to minimize I/O overhead • Btree fits in memory one I/O per record read • Writes are buffered + log structured storage system fast write throughput • GC and File System tuning to optimize throughput

45

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL DB API CRUD Operations [...] indicates optional args

put(Key, Value, [Durability, timeout]) putIfAbsent(K, V, [Durability, timeout]) get(Key, [Consistency, timeout]) putIfPresent(K, V, [Durability, timeout]) putIfVersion(K, V, Version, [Durability, timeout]) delete(Key, [Durability, timeout]) deleteIfVersion(Key, Version, [Durability, timeout])

46

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL DB API Iteration Operations iterator(Direction, int batchSize, [Key parentKey, KeyRange subRange, Depth, [Consistency, timeout]])



Iterator

keysIterator(Direction, int batchSize,[Key parentKey, KeyRange subRange, Depth, [Consistency, timeout]])



47

Iterator

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL DB API Sub-key “Multi” Operations Execute (List, [Durability, timeout]) → List multiGet(K, KeyRange, Depth, [Consistency, timeout]) → SortedMap multiGetKeys(K, KeyRange, [Consistency, timeout]) → SortedSet multiDelete (K, KeyRange, Depth, [Durability, timeout]) → int

48

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL DB API Hadoop Integration • KVInputFormat class - Hadoop InputFormat class for reading data from Oracle NoSQL DB • Static Methods: – – – – – – – – 49

setKVHelperHosts (String [] kvHelperHosts) setKVStoreName (String kvStoreName) setParentKey (Key parentKey) setBatchsize (int batchSize) setConsistency (Consistency consistency) setDepth (Depth depth) setDirection(Direction direction) setSubRange(KeyRange subRange)

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

50

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.