Oracle and NoSQL: Can They Co-Exist? Southeastern Oracle Users Conference, Nov. 2011 Paul Stallard
[email protected] http://dcri.org
Disclaimer
Introduction to NoSQL concepts and solutions and use cases
Not an expert; area of personal interest
I'm biased, having spent most of my professional career working with RDBMS
NoSQL - What does it mean?
Non SQL or Not only SQL data access methods
Distributed Designed to be run on anywhere from a few to thousands of nodes
Schema-less No predefined schema Records can have a variable number of fields which can differ from record to record
NoSQL - What does it mean?
Elasticity Storage and processing capacity can be added dynamically on the fly When nodes are added the database begins giving them work to do
Shared Nothing Architecture Nodes utilize local resources (i.e. storage, memory) instead of a common shared pool like a SAN Potential cost reductions from the use of commodity hardware
NoSQL - What does it mean?
Sharding Records are partitioned into shards across nodes, with the shards typically being replicated Sharding can be handled automatically by the database or managed by the application
Asynchronous Replication Writes are typically performed locally allowing them to be done more quickly Side effect: data not necessarily replicated immediately
ACID versus BASE
Relational Model Atomicity: Data modifications are all or nothing; if part of a transaction fails the whole transaction must fail Consistency: transactions that are performed must leave the database in a consistent state, maintaining integrity rules such as constraints Isolation: No transaction should be able to interfere with another transaction Durability: Once a transaction has been committed it cannot be lost
ACID versus BASE
NoSQL Model
Basically Available: most data is available most of the time Soft state: data may be in a “relaxed” state of consistency. Ex. An inventory count may be retrieved that is not exactly up to date, but close enough for the given application environment Eventually consistent: Nodes are not required to have identical copies of data, it just needs to get to every node in “some reasonable” time period
Good, Fast, Cheap…pick two
CAP Theorem Consistency: At any point in time all nodes see the same exact data Availability: The database will remain operable if any one node fails Partition Tolerance: Nodes will remain functional even when communication with other groups of nodes is lost
Good, Fast, Cheap…pick two
Types of NoSQL Databases
Key/Value Provide a persistent mapping of keys to values Limited in functionality since they provide a limited way to efficiently access values Most of the more common NoSQL data stores are key/value based with additional mechanisms for accessing secondary values Examples: Berkley DB (Oracle) SimpleDB (Amazon) Redis (supported by VMware) Riak (Basho Technologies)
Types of NoSQL Databases
“Big Table” (aka Tabular or Record-Oriented) Similar to RDBMS in that they contain tables and rows with column values Differ from relational tables: Each row can have a different set of columns Can have more columns than a typical relational table – thousands or even millions Support compound values or keys Versioning of rows, typically by system generated timestamp Data stored and managed in shards
Types of NoSQL Databases
Examples: BigTable (Google) Azure Tables (Microsoft) HBase (Apache Hadoop) SimpleDB (Amazon)
Types of NoSQL Databases
Document-Oriented Storage and access optimized around documents as opposed to rows JSON is the common document storage format Core of these data stores can fall under the other categories, APIs centered around document structure Some provide a SQL like query language Examples: CouchDB (Apache, Couchbase) MongoDB (10gen)
Types of NoSQL Databases
Graph Modeling associative data access Many-to-many relationships Examples: Neo4j
Craig’s List
100 million documents (~ 60 days) in production MySQL
Moving archive of posts (2 billion documents) from MySQL to NoSQL store
Motivations: de-couple production and archive (when prod schema changes, archive has to be changed as well) hardware failure tolerance better support data growth (reduce current RAID reconfigurations)
Conclusions
Understand the application domain and data requirements Relational → design around your data, NoSQL → design around your application Many NoSQL solutions well suited for write once, read many scenarios
Career Prospects in NoSQL
Career Prospects in NoSQL
Career Prospects in NoSQL