# Introduction to NoSQL
-----------------------------------------------------------------

In previous sessions we have looked at use cases for relational database management systems (RDBMS), which predominantly make use of SQL. Today's session provides an overview of NoSQL databases. **NoSQL** can be understood to mean "no SQL" or, alternatively, "not only SQL." NoSQL databases are non-relational, which in the simplest terms means they are not made up of tables.

Topics we will cover include:

* Differences between SQL and NoSQL databases
* Types of NoSQL databases and their use cases
* Document database basics with MongoDB
* Graph database basics with Neo4j

## Why NoSQL?

The highly structured table schema and relationships in RDBMS make them excellent for applications which require maximum consistency and integrity such as banking. However, integrity comes with overhead, with scalability being a particular problem of trying to use traditional RDBMS systems for certain types of web-scale applications such as social media and some aspects of e-commerce. 

Transactions occurring within traditional RDBMS systems are ideally compliant with **ACID** properties:

* Atomicity - transactions succeed completely or fail completely. That is, if a transaction consists of multiple operations, all operations must succeed for the transaction to succeed.
* Consistency - all data are valid, satisfying all rules and constraints. Every transaction leaves the database in a valid state.
* Isolation - transactions do not interfere with or affect each other.
* Durability - completed transactions are persistent.

NoSQL systems may be ACID compliant, but are more generally designed according to an alternative set of **BASE** properties. 

* Basic Availability - the database usually works as expected.
* Soft state - every transaction does not have to leave the database in a consistent state.
* Eventual consistency - consistency can occur at some point following a transaction.

As noted, there are times when ACID compliance is essential

* Online banking
* Stock exchanges
* Airline passenger data

And other time when we are less concerned with consistency

* Retweets and likes
* Blog edits

Other important features of NoSQL databases include:

* Schemaless - Much of the benefit of NoSQL systems comes from their flexibility with regard to data schema. Entities within a single NoSQL datastore may have different attributes and properties. New fields or attributes can be often added at any time. Also, fields can be nested.
* (Sometimes) simplified queries - NoSQL databases do not implement foreign keys. Queries are executed without the need for complex JOIN statements.

MongoDB provides an [example comparison of an RDBMS ERM and a NoSQL document](https://image.slidesharecdn.com/11-7rdbmsmigrationbestpractices-131107171711-phpapp02/95/webinar-relational-databases-to-mongodb-migration-considerations-and-best-practices-10-1024.jpg?cb=1383844707).

## Types of NoSQL Databases

There are a variety of database models and systems which implement NoSQL principles. [Wikipedia](https://en.wikipedia.org/wiki/NoSQL) provides a useful breakdown, but common types of NoSQL applications include *Column*, *Key-value*, *Document*, and *Graph* databases.

### Column store

Column databases store data in a column-oriented fashion, as opposed to the row-orientation of traditional RDBMS. Advantages include

* Computational speed - statistics on column data can be accessed more quickly.
* Compression - repeating values within columns can be combined.
* Faster data access

### Key-value store

Key-value databases are hash tables which consist of key-value pairs. The values are schemaless, and so may include a variety of data types and structures. Values cannot be queried against, but querying against keys is fast.

### Document store

Document databases are optimized for the storage and retrieval of semi-structured documents. Because they are schemaless, different documents may have entirely different structures, fields, etc. An illustrative if not exact example is a tweet. All tweets will have a creator and a text field, along with a timestamp and other such metadata. But some tweets have hashtags, some have user mentions, some are geocoded, etc. All tweets fall under a single document type and tweets do have structural definitions, but they do not all have the same structure.

Document stores offer

* Fast and simple querying via a key-value structure
* Structured text encodings (JSON, BSON, YAML, XML)

### Graph store

Graph databases represent relationships as graphs of connected nodes. Graph databases provide a means to identify relationships without requiring resource-heavy indexing operations or complex queries. A common use case for graph databases is a recommendation engine. Another is to find the shortest path between nodes in a network, or other characteristics of a network - reciprocity, centrality, etc.

Wikipedia offers a vivid example of a [social network graph](https://en.wikipedia.org/wiki/Social_network_analysis#/media/File:Kencf0618FacebookNetwork.jpg).

Image credit: Kencf0618 [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0) or CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Commons

## Resources

* Safari Books Online: [https://www.safaribooksonline.com/](https://www.safaribooksonline.com/)
* The Neo4j Bookshelf: [https://neo4j.com/books/](https://neo4j.com/books/)
