Red Stack Tech Ltd James Anthony Technology Director

NoSQL Part 2

1

… a view from the top

Contents

Introduction……………………………………………………………..………………….…Page 3 Oracle NoSQL Database.……………………………………..………….…………….…Page 6 Consistency and Durability Guarantees ………………………………………….……Page 11 Versions and Consistency……………………………………………….…….………..Page 12 Oracle NoSQL Differentiators - ACID………………………………..……………..Page 13 What’s new in version 3?.……………………………………………………………....Page 14 Contact Red Stack Tech……………………………………………………….…………..Page 15

2

Introduction

In the first part of this series we discussed the basic types of NoSQL Database, what I don’t think I made clear at the time is that even where a database might fall into the category (take for example Cassandra and HBase in the column family database) they aren’t architected the same and may have different capabilities. This shouldn’t come as a major surprise as anyone in the RDBMS world knows Oracle and SQL Server are two vastly different beasts that have the paradigm of being Relational Databases in common. In this article I’ll discuss first CAP and ACID and how these were perceived as weaknesses of the RDBMS and led to the development of NoSQL databases, then we’ll dive in further to explore the KeyValue type and in particular Oracle’s implementation, the Oracle NoSQL Database. I’m pretty sure that most people reading this article have at least some familiarity with both CAP and ACID, but let’s just do a quick recap. CAP theorem is based on Consistency, Availability and Persistence, and deals specifically with distributed databases. Up until this point I’d not really talked too much about distribution of databases, so at this juncture it’s worth bringing this up. NoSQL databases are typically (although not always) used in a distributed manner, with database servers that are physically separated coupled together to form a single logical entity. Giving an example from the relational world, you could think of Oracle mastermaster replication as just one example of a distributed database system, with multiple geo-graphically separated databases acting as a single entity. The need for distribution arose because many of the NoSQL databases in use today come from large internet scale organisations, and therefore the ability to distribute databases across multiple data centres, in multiple countries was a primary goal to ensure a) high availability, b) global data distribution for both locality of service and data protection and c) suitable load balancing of work to ensure no single location/DC/Server represents a pinch point. In CAP Theorem we discuss a system having these three guarantees; Consistency: When a system is clustered/distributed the ability of all nodes within the distributed system to see the same data at the same time Availability: The ability of the system to service requests for data Partition (Tolerance): The ability of the distributed system to deal with loss of some part of the overall solution such as a message or node What you’ll notice about these is they are functions of the distributed database in general. A full debate on the proof of CAP theorem is beyond this article, in fact this is a debate that still rages for many people and 3

looks set to continue to do so. Briefly put, the view was/is that only 2 of these 3 tenants can be maintained at any time. Let me give you an example in the traditional Oracle world to better illustrate. Take an example of a 2-server configuration with plain old Oracle replication configured. In this environment we have our first decision to make, namely that over consistency. If we want both nodes to always see the same data then we need to configure synchronous replication (2 Phase Commit - 2PC) such that any transaction that gets committed on either node is immediately and synchronously replicated to the other node. If we don’t, and chose asynchronous replication instead, we have relaxed one of our guarantees immediately and we risk an “inconsistent’ view of the data depending on which node is queried. 1)

2)

3)

Figure 1

Transaction A comes in from user to the left hand side database – Because no 2PC is in place the transaction is queued for later delivery to the second database Shortly after the transaction commits on the left hand side a user connects to the right hand side database and queries the information. They are given the “before” image Asynchronous replication now occurs and the right hand side is updated, BUT our user has seen an inconsistent view of the data

So now we have 2PC configured, and we adhere to our consistency guarantee, but we’ve broken one of our other guarantees… that of Partition tolerance. Why? Because in order to preserve consistency across the nodes 2PC requires that the data is committed on both sides. Therefore loss of one side has an impact on the ability of the other side to be able to process, stopping it from doing so! If we take the alternative model and choose asynchronous replication, we can continue to operate if one side is down (and therefore achieve partition tolerance) but we lose our consistency guarantee in order to uphold this. Therefore the general principle is that in CAP theorem we can provide a “2 of 3” model but must relax one of the guarantees. Not being completely relevant at this point, but worth introducing is a phrase you’ll hear a lot of working in and around NoSQL solutions… “Eventual consistency”. Many NoSQL databases relax the consistency guarantee, instead offering to make the system consistent over a period of time, such as handling a failure of a node (or separation of multiple nodes due to network outage) by synchronising the data upon restoration and therefore making the data eventually consistent. For those interested I thoroughly recommend reading the Amazon Dynamo paper and hope that like me you’ll be suitably impressed by the elegance of semantic and syntactic reconciliation. http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf Whilst CAP refers to the system in general, ACID concerns itself with transactions within the system. 4

Atomicity: Each transaction is all or nothing. If you are updating 50 rows, then all 50 rows update or the entire transaction is rolled back. Each transaction is “atomic”, that is to say indivisible. Consistency: Unlike the C in CAP consistency here refers to moving from one valid state to another, such that the transaction committed does not violate integrity constraints or other defined rules. Isolation: This ensures that multiple concurrent executions of transactions will result in the same end state that would occur should those transactions be processed in a serial manner Durability: Once committed a transaction will be permanent, across failures such as power or system crash. The Oracle redo log write ahead logging model is an excellent implementation of this. Now that we’ve done a quick recap of the CAP and ACID properties we will discuss how these were perceived (and notice my use of that word as particularly relevant in my opinion) as weaknesses in the relational database that led to the development of the NoSQL movement that is so strong today. Going back to the example I gave regarding CAP and 2PC, it is easy to see how a 2PC model is hard to envisage as a production solution for a global database with high volumes of traffic. Not only would loss of one database (or isolation due to network or other factors) stop the entire system processing, but the impact of network latency on each and every transaction would inevitably become a pinch point. Much is made in many of the NoSQL papers about the issues with 2PC and the need to move to a different paradigm, but in fact I see this as less of an issue. Oracle has always had asynchronous replication with conflict resolution rules, and more recently Streams and GoldenGate provide even more elegant solutions, all of which can be considered to provide eventual consistency with the ability to provide partition tolerance. In many cases the perceived weaknesses were in comparison to the release of MySQL that was available at the time. None the less, and leaving the pseudo-religious arguments I would no doubt start by carrying on along the rebuttal line, let’s keep discussing these drawbacks and the mechanisms deployed by NoSQL to resolve this.

1) 2)

3)

1)

Figure 2 5

A “put” operation changes data (insert, update or delete) Replication occurs from the node receiving the put to the upper of the replicas. For some reason the lower replica is unavailable (outage or network partition). In this case due to network partition meaning the lower replica is still running Read (get) requests occur at both locations, notice how the lower replica will serve an older version of the data

At some later time the network partition is resolved and the NoSQL solution will resolve the consistency by replicating the change to the remaining replica. The system has become eventually consistent

INSERT SOME EVENTUAL CONSISTENCY DIAGRAMS A second technical issue that NoSQL databases focus on is the need for a less rigid structure for data models than is enforced by relational models. This has seen the rise of document databases such as MongoDB and CouchBase. The standard table structure with a relatively rigid column format was seen as restrictive to rapid development models. Many of these databases adhere to a document data model, which stores all information in the form of documents (where all the information about an entity, such as a person, is held within a single document -- an entirely de-normalised approach) have to sacrifice ACID properties and do away with transactions across multiple documents entirely! Others needed to cope with a variable number of columns in each row. The classic example being the number of links within a web page, each column represents a different page but it is unclear how many links a given page might contain -therefore a flexible number of columns is required. Interestingly many people may now be aware that Oracle are extending the database in 12.1.0.2 to support JSON document models, but with all the benefits of the Oracle RDBMS in terms of ACID support, back and recovery, management etc. Other NoSQL databases (perhaps most notably the BigTable based databases) needed both highly distributed and fault tolerant solutions, as well as dealing with massive data volumes and a data model that needed to incorporate a multi-dimensional element (time). Whatever your personal view on these restrictions, whether technology both software and hardware has improved to get around some of these, it is unquestionable that the NoSQL database is here to stay. In the next part of this article we will discuss the Oracle NoSQL implementation.

Oracle NoSQL Database

Oracle NoSQL Database is a KeyValue (KV) store, much in the same vain as Amazon Dynamo and Voldemort but with some significant advantages we will discuss later on. In the previous article in this series we discussed what KV storage looks like, so hopefully you can remember that far back! The salient points are: the value can be anything we like, a simple alphanumeric value (a name perhaps), a serialisation of a session state (a really good use case), or an object such as a JSON document (which I will use in many of my examples). Data is queried using a “get” command (passing in the key) and written to the database using a “put" statement (passing in the key and value obviously), there is no query language in the vein of SQL.

6

Architecture Firstly, a big thanks up front to the Oracle NoSQL product management team, I’m going to be using their illustrations throughout this to save me the need to recreate them. One of the things you’ll notice when you look at the docs is a lot more diagrams than the RDBMS has in its documentation these days, and I think that’s a great way to illustrate what is a new topic to most people. Within the Oracle NoSQL database the “database” is referred to as the KVStore (KeyValue Store), with the KVStore consisting of multiple components, but at high level the store is broken down into multiple storage nodes. A storage node is a server (or a VM) with local disks, so unlike RAC there is no need for a clustered file system, SAN or NAS -- you just provision local disks allowing deployment on truly commodity based hardware. Within each storage node are replication nodes, we’re going to discuss replication shortly, but all you need to know at this point is that the number of replication nodes within a storage node is determined by its capacity. This gives Oracle NoSQL the ability to run across a bunch of nodes that have different capacities (based on CPU and IO specs), meaning you can start small and grow the cluster out with newer hardware without worrying about the new kit being constrained by the metrics of the older servers. Going down one more level, each replication node hosts one or more partitions (almost always more than one partition). So, let’s get back to some NoSQL concepts and explain partitions and sharing in this context. Partitioning We’ve already discussed how NoSQL solutions are typically distributed, therefore the question becomes how does the system decide how to distribute data. This is called partitioning. When a value is stored a hashing algorithm is applied to the key and a partition ID is derived based on this. A few relevant points are: • A single partition will contain multiple keys • A single replication node can contain multiple partitions This last point is especially relevant, and you will want multiple partitions per node. Why? Well let’s say we start with a 4-node cluster and we define 4 partitions. As key/value pairs are inserted the hashing algorithm will equally balance keys between the different nodes, all good so far. Then we decide we want to scale out and add more nodes, so we bring it two more servers, but how can we spread our 4 partitions across these now 6 nodes? The answer is we can’t as the number of partitions is fixed. Compare this to a situation

7

where we defined 24 partitions, then in the initial 4-node cluster each node would have 6 partitions (4 nodes * 6 = 24). Then as we expand the cluster with two additional nodes, the partitions just move around and we end up with 4 partitions on each node (6 nodes * 4 = 24). These different storage nodes are referred to as shards (as we have separated the data physically with each node having access only to this shard of data). Replication The observant amongst you are probably already thinking “if the data isn’t shared how is it protected”? This is where replication in NoSQL solutions comes in (here we will discuss the Oracle NoSQL implementation, but it’s similar for most of the NoSQL solutions out there). Within the Oracle NoSQL Database you configure a replication factor for each KVStore, which controls the number of other storage nodes onto which the key/value pair will be copied (this is why we have replication nodes inside storage nodes). Take a look at Error! Reference source not found. From this you can see how data is copied to two other nodes from the master (more on this in a moment) based on a replication factor of 3.

Figure 3 Like some other NoSQL implementations the Oracle solution has a concept of a single write master for data. For each shard a master node is elected to which all writes are performed, the master node then copies the data to the replica nodes (later on we will discuss how this can be controlled). Whilst write traffic is performed against this single node (and remember this is a single node per shard, having multiple shards means we balance write activity for different keys across multiple nodes), reads can be performed against any replica in the shard, which allows us to horizontally scale read workloads. Given this is Oracle you probably expect it anyway, but just to state it explicitly, a failed master node will automatically be detected and one of the replica nodes will then become the new master, all transparently happening in the background.

8

By balancing multiple shards and multiple partitions we can ensure we have sufficient capacity for write activity and future expansion. A key feature of the Oracle NoSQL Database is the ability to horizontally scale read workloads using this method, and scaling is indeed linear in this fashion. Major/Minor Keys One of the key features of the Oracle NoSQL database is support for multi-part keys, with the ability to specify both major and minor components to the key. Let’s build an example to illustrate, based on one of our real world deployments. Imagine we are processing an incoming feed of information relating to sporting events. We will use soccer (football to us Brits!) as the game in question. The feed sends us real time information on the events happening within the game, such as a goal, free kick, throw in etc. and we want to process some type of action within our application based on this. Firstly the feed provides us with an ID for the tournament or competition in question (World Cup, Premier League etc.), allowing us to identify which tournament any incoming entry is for. Each incoming entry also has a unique identified for the given match/fixture (we can have multiple matches happening at once and clearly will have a large number of matches over time). We can see that we’ve got two parts to our key already: Part 1: The competition ID Part 2: The match ID Within the Oracle NoSQL database this is easy to process, and I’m going to use some pseudo java code to build the example; // We create an array of String (VARCHAR2 style values) for the key ArrayList majorComponents = new ArrayList(); // Define the major path components for the key majorComponents.add(competitionId); majorComponents.add(matchId); // Create the key Key myKey = Key.createKey(majorComponents); // Do some work here to define the value we store … // Now store the key value with a simple put request NoSQLDataAccess.getDB().put(myKey, ));

However within each match I now have multiple events coming in, each again with its own unique event ID. This is where I can use the minor key component, adding this as the minor key element:

9

ArrayList majorComponents = new ArrayList(); // This time we also have a minor components element ArrayList minorComponents = new ArrayList(); // Define the major and minor path components for the key majorComponents.add(competitionId); majorComponents.add(matchId); minorComponents.add(eventId); // Create the key Key myKey = Key.createKey(majorComponents, minorComponents); // Do some work here to define the value we store … // Now store the key value with a simple put request NoSQLDataAccess.getDB().put(myKey, );

So what’s the advantage of using the minor key like this? Well, let’s take the situation where I later down the line want to get ALL the events for a given match…. what I can do now is execute an operation to “get” the events using just the major key… majorComponents.add(competitionId); majorComponents.add(matchId); // Create the retrieval key Key myKey = Key.createKey(majorComponents); // Now retrieve the records. SortedMap myRecords = NoSQLDataAccess.getDB().multiGet(myKey);

Or perhaps I want to get all of the match IDs for a given tournament (for example as part of the process to show a historical fixture list), in this example I can do a get operation using only the first part of the major key majorComponents.add(competitionId); // We no longer need the following line for the 2nd part of the key // majorComponents.add(matchId); // Create the retrieval key Key myKey = Key.createKey(majorComponents); // Now retrieve the records. SortedMap myRecords = NoSQLDataAccess.getDB().multiGet(myKey);

I can also use the full major and minor keys, again using an example; in this case to check if we’ve already received this event for the given match in the given competition (in order to perform duplicate checking) ArrayList majorComponents = new ArrayList(); ArrayList minorComponents = new ArrayList(); // Define the major and minor path components for the key majorComponents.add("Game"); majorComponents.add(competitionId); majorComponents.add(matchId); // Add the Event ID as the minor Key minorComponents.add(eventId);

10

// Create the key Key myKey = Key.createKey(majorComponents, minorComponents); // Now retrieve the record. We use a single get as opposed to a // multi-get here as we only expect one value ValueVersion vv = NoSQLDataAccess.getDB().get(myKey);

A quick but very important note: The V3 release of the Oracle NoSQL Database provides a table mapping feature and much of this key design goes away! We’ll discuss just how powerful this is in a future article.

Consistency and Durability Guarantees

Going back to the discussion on replication one of the features of many NoSQL databases is that unlike a traditional relational database it is possible to choose the level of durability (write persistence) and consistency (read consistency) of the data at both system level, and then override this per operation level. Let’s explore how that works. Firstly remember back to when we discussed replication and replication factors, and in our example we had a replication factor of 3. This means once the data had been written to the master node, it will then be written to two additional replica nodes. Clearly doing these extra write operations has an overhead, especially if the nodes are separated by any distance due to network latency. In certain cases we may not want our transaction to wait for these additional write operations to complete, so we can tune our durability policy and change this to a different value. If we choose a durability value of 1, then once the data is written to the master node, the operation will return control to the calling program, with the replication happening in the background. In the Oracle NoSQL database we have 3 acknowledgement-based durability models   

Master Node only All Nodes (in the replica set) – basically enforcing synchronous replication A majority of nodes (in the replica set)

Additionally the durability policy in Oracle NoSQL also allows you to control the level of write persistence for the write operations to the master and replica nodes. You can do this by choosing whether the data is written to a) the local in-memory buffer, b) the OS buffer cache, or c) whether we wait for it to be written all the way to disk. We will show some of these examples in a later article, but suffice for now to say they offer a great deal of flexibility in trading off performance and durability. The ability to control these at the transaction level allows for certain operations to be performed in a “fail safe” mode, whilst others can sacrifice durability in favour of performance. 11

Versions & Consistency

Another concept of NoSQL databases that is slightly different to that of the traditional relational database is that of Versions. When we insert data into the KV store it is implicitly given a version in the system, let’s illustrate with an example, where we insert a KV pair with a Key A, value of B, and it is then given a version of 1, we can represent this as;

Now, a process comes along an updates the KV pair (which really means an update to the Value), we aren’t concerned with what the data has changed to, just that its version has changed;

So far so good, but how does this relate to the real world usage? Let’s take an example where we have relaxed our durability guarantees and we’re using asynchronous replication. It’s conceivable that we have different versions of the value for the key on different nodes;

Now when we read back the value for key A, depending on which node we get the data from we get different values (remember at the start of the article we discussed eventual consistency… well here is the downside!). So how do we manage this?

12

Well this is where versions come in! When we retrieve the data we can choose to check the version number, since “get” operations return both the data and the record version. We can then compare this version number to what we held for the insert operation and only proceed if we are working from the “current” version. Again, without repeating myself, I seriously recommend you go and read that Amazon Dynamo paper, as see how semantic and syntactic reconciliation work. Oracle NoSQL also allows for other consistency guarantees based on;

Absolute: Read from the master node. Unlike other NoSQL solutions with no pre-defined master this is an option. We know writes for a key will go through a master for the shard, so servicing requests from this will always return that latest, most consistent value. None: Read from any node, without concern as to the most recent state of the data Time Based: This is based on the lag (as defined by time) of any replica from the master (for example if the replica is no more than 1 second out)

Oracle NoSQL Differentiators - ACID

Oracle NoSQL database has quite a few differentiators in my eyes, these include; Enterprise Grade Support: Always a tricky subject whether to pay for something! However, one of the problems for many organisations in supportability, after all Google is a search engine not a support tool. Being able to fall back on Oracle support means organisations looking to deploy in the brave new NoSQL world might have to rely on Oracle, but they can rely on Oracle. Proven storage engine: One of the attractions to us when first deploying NoSQL was that we were already POC’ing based on the Berkeley DB as we know the track record of that. Oracle NoSQL uses Berkeley DB as its persistence engine. ACID support: For me this is the big one. Whilst I get why people wanted to work around ACID – often stating they don’t need transactions (remember most NoSQL databases sacrifice transaction support). However in my experience you might not need them now but you will at some point.

13

Integration: Not unique amongst NoSQL databases is Hadoop integration, but additionally the Oracle NoSQL database can integrate directly with the Oracle RDBMS (using external tables) and Coherence. Being able to cross the chasm from NoSQL to SQL world is just a great feature. Allowing me to query across my data sources, and stop me from just getting another silo. Transparent Load Balancing: Again perhaps not unique but certainly not prevalent in the NoSQL is the fact that the NoSQL driver provided by Oracle performs all my load balancing for me (something we’ll discuss in a future article) Free: Yep! You read that right. Oracle offer the NoSQL database in two flavours. The paid for version (which is actually not that bad by Oracle terms), but also the community edition (CE) which is totally free!

What’s new in version 3?

Oracle recently announced the availability of Oracle NoSQL Database V3. Whilst we’ve had some time to look at this we’ve not deployed this into any of our existing implementations, but we plan to soon! The reason? Some of the new features provide significant benefit including; 4) Increased Security: OS-independent, cluster-wide password-based user authentication and Oracle Wallet integration enables greater protection from unauthorized access to sensitive data. Additionally, session-level Secure Sockets Layer (SSL) encryption and network port restrictions deliver greater protection from network intrusion. 5) Usability and Ease of Development: Support for tabular data models simplifies application design and enables seamless integration with familiar SQL-based applications. Secondary indexing delivers dramatically improved performance for queries. 6) Data Center Performance Enhancements: Automatic failover to metro-area secondary data centers enables greater business continuity for applications. Secondary server zones can also be used to offload read-only workloads, like analytics, report generation, and data exchange for improved workload management.

14

Contact Red Stack Tech for more information…

UK Headquarters: 3rd Floor Farr House 27 – 30 Railway Street Chelmsford Essex England CM1 1QS Main: 0844 811 3600

Direct: 01245 200 510

Australia Headquarters: Suite 3 Level 19 141 Queen Street, Brisbane, QLD 4000 Main: +61 (0) 7 3210 0132 Email: [email protected] Web: www.redstk.com Follow Red Stack Tech on Twitter: @redstacktech Media Enquiries: Elizabeth Spencer [email protected] 01245 200 532 Red Stack Tech Ltd 3rd Floor Farr House 27 – 30 Railway Street Chelmsford Essex England CM1 1QS

15