Oracle NoSQL Database Overview 1
David Segleau Director Product Management
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
2
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda • NoSQL Overview • Oracle NoSQL Database – Architecture – Technical Overview – Benchmark Results – Use Cases
3
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
NoSQL – A Brief History • Early 2000s, Web 2.0 companies started looking for “RDBMS alternatives” • 2003: memcached (cached k-v store to reduce load on RDBMS) • 2004: Google published MapReduce distributed processing paper • 2006: Google published BigTable distributed database paper • 2007: Amazon published Dynamo paper • 2008+: Several open source projects are launched to productize NoSQL solutions • 2009+: Local meetings to discuss and share RDBMS alternatives • 2010+: Enterprises start to investigate NoSQL solutions
4
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
RDBMS vs NoSQL • RDBMS – High value, high density, complex data – Complex data relationships – Schema-centric – Designed to scale up & out – Lots of general purpose features/functionality
High overhead ($ per operation) 5
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• NoSQL architectures – Low value, low density, simple data – Very simple relationships – Schema-free, unstructured or semi-structured data – Distributed storage and processing – Stripped down, special purpose data store
Lower overhead ($ per operation)
What is NoSQL? • Not-only-SQL (2009) • Broad class of non-relational DBMS systems that typically – – – – –
Provide horizontal/distributed scalability Avoid joins Have relaxed consistency guarantees Don’t require a structured schema Are application/developer-centric
• No standards – Rapid evolving set of solutions (100+ on nosql-database.org) – Highly variable feature set – UnQL launched in July 2011, still a thought experiment
• Majority are open source 6
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
What problems does NoSQL try to address? • Cost – TBs to PBs of low/unknown value, simple/unstructured data – Lower $ per operation (hardware and RDBMS license fees)
• Scalability – Scale out, don’t scale up • Flexible schema – Diverse, changing data sets • Performance – High rate of data capture – High volume of simple queries – Eliminate ORM overhead
• Availability – Low cost highly available, distributed data store – Move CAP more towards AP rather than CA 7
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda • NoSQL Overview • Oracle NoSQL Database – Architecture – Technical Overview – Benchmark Results – Use Cases
8
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
The NoSQL Challenge Where to Start • New, rapidly emerging database technology • Simple data storage, typically non-SQL or Not-only-SQL • Distributed (Cloud) storage • Large amounts of data (Terabyte – Petabyte range) • Solution categories – Storage for “Web Service” applications – ETL Processing (MR & Hadoop)
Our focus is here … and we integrate here
• Common data models – Key-Value – Document, Columnar, Graph 9
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Our focus is here
Oracle NoSQL Database Target Use Cases High-throughput event processing Customer profile management Click-through data processing Sensor & statistics data capture Social networks Personalization Mobile application backend infrastructure Authentication & Content management Archiving 10
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
SIMPLE QUERIES DYNAMIC SCHEMA HIGH VOLUME DATA INTERACTION
Customer-Driven Requirements • Terabytes to petabytes of unstructured or semi-structured data • No single point of failure • Cost effective, distributed storage. Scalable on commodity hardware • Fast, predictable response time to simple queries • Fast, reliable transactions • Simple administration, enterprise support • Commercial-grade NoSQL solution – Real 24x7 support – Real database expertise – Large vendor & dedicated resources building & testing the software
11
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL Database A Distributed, Scalable Key-Value Database Application
Application
NoSQL Database Driver
NoSQL Database Driver
Simple Data Model Small, distributed footprint Highly scalable, available Transparent load balancing Integrates with Oracle Stack
12
Storage Nodes
Storage Nodes
Datacenter A
Datacenter B
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Architecture Summary Scalable, Highly Available, Optimized • Scalability – Dynamic data partitioning and distribution – Optimized data access via intelligent driver • High availability
– – – –
One or more replicas Resilient to partition master failures No single point of failure Disaster recovery through location of replicas
• Transparent load balancing
– Reads from master or replicas – Driver is network topology & latency aware 13
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Simple Data Model Key-value pairs • Simple data model – key-value pair (major+minor-key paradigm) • Simple operations – read/insert/update/delete, RMW support • Scope of transaction – records within a major key, single API call • Unordered scan of all data (non-transactional) Major key:
userid
Strings Minor key: Byte Array 14
Value:
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
subscriptions expiration date
address phone #
email id
Simple Data Model ACID Transactions • ACID transactions by default • Transaction Scope – Single API call – All records must have the same major key – Support for multiple operations within a transaction • Can be relaxed for increased performance on a per-
operation basis 15
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Simple Data Model ACID Transactions – Configurability • Configurable Durability Policy
• Configurable Consistency Policy
16
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Scalability and Availability • Replicated Application Servers • Driver is linked into each Application • Storage Nodes kept current via
replication (Berkeley DB Java Edition HA) • Storage Nodes across Data Centers • Automatic SN failure handling – Graceful degradation – Automatic recovery
No Single Point of Failure 17
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
High Availability • Automatic log-based replication • Storage Node Failure – Node failures automatically detected, system continues to function – Rejoining nodes automatically synchronize with the master – Isolated nodes can still service reads
• Master Failover – Automatic election of new master, distributed 2-phase election algorithm (PAXOS) – Master election based on highest LSN (log sequence number)
• Multi-node or Shard (replication group) failure – System continues to function using remaining replication groups
• System automatically maintains group membership and status 18
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Transparent Load Balancing Operation + Key[M,m] + Value + Transaction Policy
Application NoSQL DB Driver
Hash Major Key to determine Partition ID Partition Map maps Partition ID to a shard
• Operation result • Partition Map Changes
State Table maps a shard to Storage Node(s) Load Balancer selects best eligible Storage Node Contact Storage Node directly
19
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Storage Node stats
Simple Administration • Web-based console and CLI commands • Manages and Monitors – Configuration changes – Load: Number of operations, data size – Performance: Latency, throughput. Min, max, average, trailing, … – Events: Failover, recovery, load distribution – Alerts: Failure, poor performance, … 20
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL Database Differentiation Integrates seamlessly with Oracle Stack (ODI, CEP, OLH)
Commercial Grade Software and Support • General Purpose
Scalability and Availability • Intelligent Oracle NoSQL DB Driver • Evenly distributes Data • Sends operation to fastest node • Bounded network hops for all operations
• Reliable – Based on proven Berkeley DB JE HA • Easy to Install & Configure
21
• Automatic replication and failover
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Simple Administration
Simple Data Model
• Simple Major + Minor key and Value data structure
• Web-based Console and CLI commands • Manages and Monitors:
• ACID transactions • Configurable consistency and durability
• • • • •
Topology Load Performance Events Alerts
Benchmarking • 1.6 billion records • 94K insert/sec • 25K read/update/sec • Low latency • Linear scalability
22
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL Database Use Cases Success Stories •
provides PaaS for deploying applications over the cloud. – Oracle NoSQL Database exposed as a service through their cloud infrastructure.
•
23
, Oracle Platinum Partner, built an online gaming application for their customer using Oracle NoSQL Database.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Confidential
Oracle NoSQL DB Use Cases Cloud e-mailing Service • Problem: Manage e-mail accounts for 10s of millions of customers and hundreds of Terabytes of data. • Requirements: – Fast, Scalable, flexible data management solution – Highly Available, Easy to manage & monitor
• Solution: NoSQL DB
24
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Confidential
Oracle NoSQL DB Use Cases Cloud Architecture Services • Problem: Cloud-based infrastructure requires support services like Authentication, Authorization, Event Tracking • Requirements – Real time performance and high throughput – Simple data structures
• Solution: NoSQL DB
25
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Confidential
Oracle NoSQL DB Use Cases Customer data aggregation, trend analysis • Problem: Need to preserve OCEP event history. Aggregated customer experience data can be used to identify trends, offer promotions, provide better insight and customer service. • Requirements: – Rich, flexible customer profile – Aggregate and store discrete OCEP event data over time
• Solution: OCEP + NoSQL DB
26
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Confidential
Oracle NoSQL Database Easy to use, easy to manage Scalable, Available, Predictable Latency A NoSQL Database from a vendor you trust
27
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL DB Resources Support • Support via OTN forums and Oracle Support process • OTN Forum: – Forum Home » Big Data » NoSQL Database – forums.oracle.com/forums/forum.jspa?forumID=1388
• Oracle.com: – www.oracle.com/us/products/database/nosql/overview/index.html
• OTN (including documentation and download): – www.oracle.com/technetwork/products/nosqldb/overview/index.html 28
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL DB Resources Documentation • On OTN and in download – docs.oracle.com/cd/NOSQL/html/index.html
• • • •
29
Getting Started Guides Programmatic API Installation & Release Notes FAQ
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Big Data DB Resources External • Big Data on O.com: http://www.oracle.com/us/technologies/big-data/index.html • Big Data on OTN: http://www.oracle.com/technetwork/topics/bigdata/learnmore/index.h tml – Start here: “Big Data Essentials” webinar series
30
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Questions
Q&A
31
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
APPENDIX
32
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
What is Big Data? GEODATA
BLOG
SMART METER
VOLUME
33
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
VELOCITY
VARIETY
VALUE
Why is Big Data important? US HEALTH CARE
US RETAIL
MANUFACTURING
GLOBAL PERSONAL LOCATION DATA
EUROPE PUBLIC SECTOR ADMIN
Increase industry value per year by
Increase net margin by
Decrease dev., assembly costs by
Increase service provider revenue by
Increase industry value per year by
$300 B
60+%
–50%
$100 B €250 B
“In a big data world, a competitor that fails to sufficiently develop its capabilities will be left behind.”
Source: * McKinsey Global Institute: Big Data – The next frontier for innovation, competition and productivity (May 2011)
34
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Big Data Lifecycle DECIDE
ANALYZE
35
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
ACQUIRE
ORGANIZE
Make Better Decisions Using Big Data
Oracle Big Data Software Platform Big Data Appliance
Exalytics
Exadata
Open Source R Oracle NoSQL Database Applications
ACQUIRE
36
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Advanced Analytics Oracle Big Data Connectors
Data Warehouse
InfiniBand Oracle Data Integrator
ORGANIZE
Oracle Database
In-Database Database Analytics
Hadoop
InfiniBand
ANALYZE
Analytic Applications Alerts, Dashboards, MDAnalysis, Reports, Query Web Services BI Abstraction
DECIDE
Oracle Engineered Systems for Big Data Big Data Appliance
ACQUIRE 37
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Exadata
ORGANIZE
ANALYZE
Exalytics
DECIDE
Big Data Use Cases Today’s Challenge
New Data
What’s Possible
Healthcare Expensive office visits
Remote patient monitoring
Preventive care, reduced hospitalization
Manufacturing In-person support
Product sensors
Automated diagnosis, support
Location-Based Services Based on home zip code
Real time location data
Geo-advertising, traffic, local search
Utilities Complex Distribution Grid
Detailed consumption statistics
Increased availability, reduced cost, tiered metering plans
Retail One size fits all marketing
Social media
Sentiment analysis segmentation
38
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Big Data Characteristics
39
Batch-Oriented
Real-Time
Process data to use
Deliver a service
Bulk storage
Fast access to specific record
Write once, read all
Read, write, delete update
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Big Data Storage Choices Hadoop Distributed File System (HDFS)
Oracle NoSQL Database
File System
Database
Parallel scanning
Indexed storage
No inherent structure
Simple data structure
High volume writes
High volume random reads and writes
Batch Oriented
Real-Time
40
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Early Adopter Dilemma
• Time to Build? • Expertise? 41
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Cost and Difficulty Maintaining? • Product Support?
Oracle Big Data Appliance Hardware • 18 Sun X4270 M2 Servers per BDA – 864 GB memory – 216 cores – 648 TB storage
• 40 Gb/s InfiniBand Fabric – Inter-rack Connectivity – Inter-node Connectivity
• 10 Gb/s Ethernet Connectivity – Data center connectivity
42
Copyright © 2012, Oracle and/or its affiliates. All rightsFull Rack Configuration Only reserved.
Oracle NoSQL DB Licensing Community VS Enterprise Edition • Two versions – Oracle NoSQL Database Community Edition. Open Source. AGPL license. – Oracle NoSQL Database Enterprise Edition. Closed Source. Standard Oracle License.
• Community Edition has all of the basic functionality and APIs. Gets you started. Competes with other OS NoSQL solutions. • Enterprise Edition for large, production, multi-data center, Oracle integration centric customers and/or non-GPL compliant customers. 43
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Benchmarking Configurations • YCSB-based benchmark (Yahoo Cloud Services Benchmark) – Key ~13 bytes, Data ~= 1.1K
• Configurations of 3 (1x3) – 192 (64 x 3) storage nodes – Replication factor of 3 (master + 2 replicas) – 100m to 2.1b records, 100m-400m records per storage node – Intel Systems: 2.93ghz Intel Westmere (wds024c) model x5670, dual socket with 6 cores/socket, 24GB of memory, single 300GB local disk and RedHat 2.6.18-164.11.1.el5.crt1 – Cisco Systems: UCS C200 M2 & UCS C210 M2 systems (Intel 5600s), dual socket with 6 cores/socket, 18GB of memory, 4,8 or 16 disks for total of 8-16TB. 44
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Benchmarking Configurations Systems configured to minimize I/O overhead • Btree fits in memory one I/O per record read • Writes are buffered + log structured storage system fast write throughput • GC and File System tuning to optimize throughput
45
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL DB API CRUD Operations [...] indicates optional args
put(Key, Value, [Durability, timeout]) putIfAbsent(K, V, [Durability, timeout]) get(Key, [Consistency, timeout]) putIfPresent(K, V, [Durability, timeout]) putIfVersion(K, V, Version, [Durability, timeout]) delete(Key, [Durability, timeout]) deleteIfVersion(Key, Version, [Durability, timeout])
46
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL DB API Iteration Operations iterator(Direction, int batchSize, [Key parentKey, KeyRange subRange, Depth, [Consistency, timeout]])
→
Iterator
keysIterator(Direction, int batchSize,[Key parentKey, KeyRange subRange, Depth, [Consistency, timeout]])
→
47
Iterator
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL DB API Sub-key “Multi” Operations Execute (List, [Durability, timeout]) → List multiGet(K, KeyRange, Depth, [Consistency, timeout]) → SortedMap multiGetKeys(K, KeyRange, [Consistency, timeout]) → SortedSet multiDelete (K, KeyRange, Depth, [Durability, timeout]) → int
48
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL DB API Hadoop Integration • KVInputFormat class - Hadoop InputFormat class for reading data from Oracle NoSQL DB • Static Methods: – – – – – – – – 49
setKVHelperHosts (String [] kvHelperHosts) setKVStoreName (String kvStoreName) setParentKey (Key parentKey) setBatchsize (int batchSize) setConsistency (Consistency consistency) setDepth (Depth depth) setDirection(Direction direction) setSubRange(KeyRange subRange)
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
50
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.