# What is MongoDB?

#### Some considerations for NoSQL databases

The title of this document is "What is MongoDB?" - a question that requires us to address the characteristics of MongoDB. To answer this question, we will begin by talking about SQL & NoSQL databases in general, how they came about, their characteristics and benefits.

## SQL

SQL databases have been around for a while now, about 40 years. They have provided tremendous usefulness to business and industry, and have been tested significantly in information heavy industries e.g., government entities.

They were designed with a client-server application architecture wherein multiple clients (these could be desktop workstations) would connect to a central server over a network. This was and still is the main practice for enterprise resource planning (ERP) apps. ERP data tends to be highly structured and static permitting suitable use with a relational database.

The common applications for relational databases were focused on centralizing data and ease of access. The master-slave design architecture was the most prevalent model and satisfied the most common business cases. However, the master-slave architecture has a significant limitation - scaling. It permits applications to scale _up_ but not to _out_.

To scale up (vertically or make the database bigger) a master-slave architecture database is fairly simple if you have the budget - you would need to purchase bigger hardware.

However, scaling out is a different ball game altogether. This characteristics of cloud applications.

### 5 Characteristics of Cloud Applications

> 1. **Context**: Applications need to provide contextually relevant information. This ensures a better user experience. Examples include nearest locations, a previous order at a restaurant, insight into customer buying trends, movie recommendations etc.
> 2. **Always-On**: In the past, companies used to schedule down time for system upgrades. As they do this, it means the customer has no access to the service and could result in the lose of customers, a damaged brand image or ruined reputation. For cloud applications, downtime is not an option. The data platform must be continuously available and updated which means rolling updates in the data management layer with no planned downtime.
> 3. **Real Time**: Users expect a certain level of performance from the applications they use, and if they operate with a delay instead of real-time speeds, this has the same effect as an application that is unavailable. Patience is lost. Frustrations are raised. Customers begin to consider other options.
> 4. **Distributed**: Modern applications need to be distributed across multiple servers across different locations to allow users to access them from any geographical location with fast response times.
> 5. **Scalable**: In the current age, application use can expand almost instantly. If an application goes viral, it is crucial that the business be able to cope with the surge in use. An inability to cope with an instant rise in usage can be detrimental if not dangerous for the application.

**So, what's the problem with scaling out an SQL database?**

Scaling is probably the most urgent attribute that all cloud applications need to have as it carries perilous consequences if not addressed. To scale out, you need to shard - dividing a large database into smaller chunks across multple hardware servers as opposed to a single large server. This becomes terribly messy for a master-slave architecture. Iteratively sharding various slaves results in more master-slave relations which quickly become complex, making it harder to manage growth and recovery in the event of a catastrophe. Complexity also raises the likelihood of system failure.

## NoSQL

**What's the deal with NoSQL?**

A NoSQL database is non-relational, largely distributed database system that enables rapid, ad-hoc organization and analysis of extremely high-volume, disparate data types.

NoSQL databases have become the default alternatives to SQL databases because they have scalability, availability and fault tolerance abilities built-in. They are capable of auto-sharding where they automatically spread and balance the data and query load across any number of servers. They also use data replication to ensure that when one server goes down, another can quickly and easily take it's place with zero application disruption implying zero downtime.


## NoSQL Data Models

NoSQL have four main types of data models that are designed to work well for specific types of tasks.

> 1. **Key-Value**: Designed for storing data in a schema-less way, a key-value database indexes all data with a key and value hence the name.
> 2. **Graph**: Based on graph theory, these databases are designed for data whose relations are best represented as graph and have interconnected elements with an undetemrined number of relations.
> 3. **Column**: These databases store data in columns rather as rows (the way relational databases store data), which allows the database to more precisely access the data it needs to answer a query, improving performance and availability.
> 4. **Document**: These databases are designed for storing, retrieving and managing document-oriented information also known as semi-structured data.


| Data Model | Performance |   Scalability   | Flexibility | Complexity |  Functionality  |       Example      |
| :--------: | :---------: | :-------------: | :---------: | :--------: | :-------------: | :----------------: |
| Key-value  |     High    |        High     |     High    |    None    | Variable (None) |     Riak, Redis   |
|  Column    |     High    |        High     |   Moderate  |    Low     |      Minimal    |  Cassandra, HBase |
| Document   |     High    | Variable (High) |     High    |    Low     |  Variable (Low) |  MongoDB, CouchDB |
|   Graph    |   Variable  |     Variable    |     High    |    High    |   Graph Theory  |    Neo4j, Giraph  |

## So, what is MongoDB?

MongoDB is a document based distributed portable database which gives the user the freedom to run it on any platform or cloud infrastructure eliminating the fear of vendor lock-in. It provides high performance with fast query execution that is easy and intuitive. It is highly flexible, allowing the user to modify the database architecture to their specifications. It is also possible to run operational and analytical tasks on the same database cluster but in isolation eliminating task overload.