# FIT5148 - Big data management and processing

# Activity: MongoDB Fundamentals #

MongoDB is a leading NoSQL database, and stores Binary JSON documents (i.e. BSON) as data. MongoDB can mainpulate data flexibily allowing the documents to have different schemas. Thus, depending on needs of an application, the scheme can be changed.

MongoDB is one of the most widely adopted big data databases. Unlike relational databases (RDBMS), MongoDB does not use pre-defined schema. 

**In this activity, we will learn the following fundamentals in MongoDB:**
- MongoDB data structure
- How to use MongoDB shell and manage MongoDB documents. 
- How to create databases, as well as insert, update, and delete documents in MongoDB. 
- How to use different operators for retrieving documents in MongoDB with indexing.

Let's get started!

## 1. MongoDB Data Structure ##
MongoDB uses data structured according to the JSON (JavaScript Object Notation) standard. JSON is an open, human/machine readable standard for representing and exchanging data. As data types, it provides many different types such as numbers, string, boolean, and more complicated structure (e.g. arrays). 

#### Binary JSON ###
In the actual operation of MongoDB, it uses JSON documents in binary-encoded format (i.e. BSON). BSON extends the JSON model to provide additional data types such as a Date data type and BinData type (i.e. the base64 representation of a binary string).

Let's take a look at two examples of JSON documents below:

**First Example**
```
{
    "sid": 123,
    "name": {
    "first": "Marie",
    "last": "Currie"
    }
},
...
```
Let's say the above example shows a simple student JSON document that has fields: `sid`, `name`, `first`, and `last`. Note that this document has an embeded document structure - the `name` field consists of two sub fields (`first` and `last`). These sub fields and their values can also be called "embeded sub documents". MongoDB documents make it possible to embed document structures in a field or array within a document. This actually shows that MongoDB has a flexible schema.

To specify or access a field of an embedded document, we can use **dot notation (.)**. We will learn more about this later on.

**Second Example**
```
{
   "unit_code": "FIT9132",
    "unit_name": "Database",
    "synopsis":"This unit will introduce the concept of data management in an organisation through relational database technology.",
    "semester": 1,
    "year": [2017, 2018]
}, 
...
```
Imagine this example shows a simple unit document that has various fields: `unit_code`, `unit_name`, `synopsis`, `semester`, and `year`. Note that the field `year` (i.e. JSON array) can have multiple values, that is, an array of values.


#### Identifier (_id) ####

In MongoDB, documents are made up of field-value pairs. In particular, there is a special field "_id" that should be unique across all documents in the same collection in MongoDB. It holds the document identifier. If not explicitly specified by you, it will be automatically generated by MongoDB for each document. 

## 2. Running MongoDB ##

MongoDB can run on most platforms (i.e. Windows, Linux, Mac OSX). This section will help you understand the running configuration of MongoDB.

### 1. Open a terminal, run command 'mongod' to start the MongoDB database service (`mongod` is already running in VM).

#### mongod (primary daemon process)
`mongod` is the primary daemon process or the core database service for the MongoDB system. It handles data requests, manages data access, and performs background management operations by communicating with MongoDB shell, `mongo`. Please note that `mongod` is already running in the VM provided to you as a startup service and the data files are stored in /var/lib/mongodb and  log files in /var/log/mongodb by default. 

### 2. Open another terminal, run command 'mongo' to start the mongo shell in the new terminal.
#### mongo (MongoDB shell)
Once `mongod` is running, we can use:
```
mongo
```
which is called MongoDB shell (simply mongo). This shell provides an interface for MongoDB where we can manipulate the data (load, read, search, etc). 

Open another terminal or a command prompt and type and run `mongo`. If no parameters are specified it connects to the default database named `test`. If you want to specify a database, you can enter: 
```
mongo db_name
``` 
If the database does not exist, it will be created automatically. Make sure that you need to run `mongod` first before you use `mongo`.

If you want to stop the service `mongod` or `mongo`, simply enter: "ctrl+c" on the terminal or command prompt.

## 3. Working with mongo ##
Now, let's learn more about how to use `mongo` to manipulate documents stored in MongoDB. In this practice, we will learn how to use basic CRUD operations (Create, Read, Update, and Delete) in MongoDB using `mongo`. Also, you will learnd how various queries are run in MongoDB.

#### mongod, mongo
To use `mongo`, make sure that `mongod` is running.

#### db, use, show collections

In the mongo shell, run the following commands:
- `db`: to show the current database you're working on (the default is `test`)
- `show dbs`: to show all available databases
- `show collections`: to show collections within the current database. A collection in MongoDB is equivalent to a table in RDBMS, and indicates a group of documents.


#### help()
At any point, help can be accesses using the `help()` command. For example, if you need help on any of the methods of db or collection, use: `db.help()`, `db.CollectionName.help()`. 

### Create
This section provides brief instructions on how to create a new database, a new collection, a new document.

#### Create Database
Let's first create a new database. It's very simple. Run the command on the `mongo` shell.
```
use fit5148_db
```
The command `use` is used to switch a database to `fit5148_db`. If the database `fit5148_db` does not exist, MongoDB will create and switch to it.

<font color='blue'>
**Practice**: Check whether you're currently using fit5148_db.
</font><br>


<font color='blue'>
**Practice**: Show all databases available. Can you see fit5148_db? 
**Note** that we use fit5148_db, but MongoDB doesn’t create it until data or collection is inserted into the database. This shows a MongoDB’s dynamic approach to data facilitating and dynamic namespace allocation.
</font>

#### Create Collection
Let's create a new collection in the database `fit5148_db`. Run the command:
```
db.createCollection("FIT")
```
If the result is successful, you must see the following message: `{ "ok" : 1 }`

<font color='blue'>
**Practice**: Check whether you've created the collection FIT.
</font><br>

<font color='blue'>
**Practice**: Can you again now show all databases available. Can you see fit5148_db?
</font>



### Insert
Let's learn how to insert documents into a collection. You will creat two unit documents: `FIT9131` and `FIT9132`, and insert them into the `FIT` collection that we've created above. 

#### Insert one document 

Run the command below:
```
FIT9131={
  "unit_code": "FIT9131",
  "unit_name": "Programming",
  "synopsis":"This unit aims to provide students with the basic concepts involved in the development of well structured software using a programming language.",
  "semester": [1,2],
  "year": [2016, 2017, 2018],
  "mark": 80
}
```
Then, we must see the following (or similar):
```
{
	"unit_code" : "FIT9131",
	"unit_name" : "Programming",
	"synopsis" : "This unit aims to provide students with the basic concepts involved in the development of well structured software using a programming language.",
	"semester" : [
		1,
		2
	],
	"year" : [
		2016,
		2017,
		2018
	],
	"mark" : 80
}
```

Let's create another document. Run the command below:
```
FIT9132={
  "unit_code": "FIT9132",
  "unit_name": "Database",
  "synopsis":"This unit will introduce the concept of data management in an organisation through relational database technology.",
  "semester": 1,
  "year": [2017, 2018],
  "mark": 100
}

```

You must see the following (or similar):
```
{
	"unit_code" : "FIT9132",
	"unit_name" : "Database",
	"synopsis" : "This unit will introduce the concept of data management in an organisation through relational database technology.",
	"semester" : 1,
	"year" : [
		2017,
		2018
	],
	"mark" : 100
}
```

Let's add the above two documents one-by-one to the collection, `FIT`:
```
db.FIT.insert(FIT9131)
db.FIT.insert(FIT9132)
```

When we insert each document successfully, we will see the following message:
```
WriteResult({ "nInserted" : 1 })
```

**Note:** If the collection (i.e. `FIT`) does not exist, the above operation (i.e. insert) will **not only insert the two documents** to the `FIT` collection **but also it will also create the collection**. 

Use the command `db.FIT.find()` to display the documents in the collection. When you use `find()`, you will see all documents in the `FIT` collection. Note that there's a special  `_id` field that is generated automatically for all of the documents. The values of this `_id` field must be unique across all documents. You didn't explicitly insert an `_id` field but you can use `find()` to display the documents with this field.

We will learn more about usage of `find()` later on in this activity.

#### Insert multiple documents
We will now look into how to insert multiple documents into a collection. 

Let's insert multiple documents using the `insertMany()` method. This is the syntax of using `insertMany()`:
```
db.collection.insertMany(
   [ <document 1> , <document 2>, ... ],
   {
      writeConcern: <document>,
      ordered: <boolean>
   }
)
```
Here, `document 1, document 2, ...` are the documents to be inserted into the collection. `writeConcern` is optinal expressing the write concern. Omit to use the default write concern. `ordered` is boolean and optional specifying whether the mongod instance should perform an ordered or unordered insert. Defaults to true.


<font color='blue'>
**Practice**: Create two documents which have the same or similar fields with 'FIT9131' or 'FIT9132', and insert them into the 'FIT' collection using the 'insertMany()' method.
</font><br>

#### Import collection
Now let's learn how to use the `import` tool to import data into a MongoDB database. Let's first create a complex form of our FIT data in json format. Open a text editor on your private vm, edit the following, and save it as "FIT_COMPLEX.json" in Documents directory:
```
{
    "sid": 123,
    "name": {
      "first": "Marie",
      "last": "Currie"
    },
    "course": "MIT",
    "result": [
      {
        "unit_code": "FIT9132",
        "unit_name": "Database",
        "synopsis":"This unit will introduce the concept of data management in an organisation through relational database technology.",
        "semester": 1,
        "year": [2017, 2018],
        "mark": 100
      },
      {
        "unit_code": "FIT9131",
        "unit_name": "Programming",
        "synopsis":"This unit aims to provide students with the basic concepts involved in the development of well structured software using a programming language.",
        "semester": [1,2],
        "year": [2016, 2017, 2018],
        "mark": 80
      }
    ]
  }
  {
    "sid": 124,
    "name": {
      "first": "Albert",
      "last": "Einstein"
    },
    "course": "MBIS",
    "result": [
      {
        "unit_code": "FIT9132",
        "unit_name": "Database",
        "synopsis":"This unit will introduce the concept of data management in an organisation through relational database technology. Theoretical foundation of relational model, analysis and design, implementation of relational database using SQL will be covered.",
        "semester": 2,
        "year": [2017],
        "mark": 100
      }
    ]
  }
``` 

As can be seen, there are two documents in the data. Now, we will import this data into the `FIT_COMPLEX` collection in the `fit5148_db` database. You should be aware the path of the FIT_COMPLEX.json. In this case, the path of the file is ~/Documents/FIT_COMPLEX.json.
Type the following command in the terminal where you run the mongod(NOT the terminal you run mongoshell)

```
myPrompt$ mongoimport --host localhost --db fit5148_db --collection FIT_COMPLEX --type json --file ~/Documents/FIT_COMPLEXT.json 
```
If successfully import,it will shows "imported 2 documents", if shows error, please check if the path of the file is correct and check if the database name is valid.

<font color='blue'>
**Practice**: Validate whether the collection is created and the data is imported.
</font><br>

### Read
Now, we will learn how to read data from a given collection. We will continually use the collection `FIT`.

#### find()
First, let's use `find()` to retrieve data from the `FIT` collection. What's the result? You will see all documents stored in this collection, each is specified within “{” and “}” curly braces. 

#### count()
If we want to retrieve the number of the documents retrieved, then we can use the method `count`. For example, if we want to know the total number of the documents in the `FIT` colletion, then execute:
```
db.FIT.find().count()
```

#### Selectors and Projectors
What if we want to retrieve some documents with a condition. For this purpose, we can use **'selectors'** and **'projectors'**:
- A selector can specify a condition to filter out the results.
- A projector can specify to selectively display the data fields.

#### Selector
First, let's learn how to use the `selectors`. For example, the following example retrieves all documents whose `unit_name` is `Programming`:
```
db.FIT.find({"unit_name":"Programming"})
```

Referring to above, we learn that we can specify both `field` and `value` as a condition to retrieve documents that we want retrieve.

<font color='blue'>
**Practice**: Display the number of the documents satisfying the above condition.
</font><br>

Now, let's learn how we can specify a complex condition based on our requirements. If we want to retrieve documents whose `unit_code` is `FIT_1` and `unit_name` is `Database`, then we can perform:
```
db.FIT.find({"unit_code":"FIT_1", $or:[{"unit_name":"Database"}]})
```

Next, if we want to find all documents whose `unit_name` is `Database` and `year` is either `2017` or `2018`, then execute:
```
db.FIT.find({"unit_name":"Database", $or:[{"year":2017, "year":2018}]})
```

#### Projector
Let’s find out how to use a projector when querying. We can mention specific fields that need to be displayed. For example, if we want display documents whose `unit_name` is `Database`, along with only `unit_code` and `unit_name` fields, then use this example:
```
db.FIT.find({"unit_name":"Database"}, {"unit_code":1,"unit_name":1}) // 1: show, 0: not show
```

#### Sort
Do we want to display documents by a particular order. In this case, we can use the `sort()` method. This method can be specified as follows: 1 for ascending and -1 for descending. In the above example, if you want to sort the documents by ascending order of `unit_code`, execute:
```
db.FIT.find({"unit_name":"Database"}, {"unit_code":1,"unit_name":1}).sort({"unit_code":1})
```

<font color='blue'>
**Practice**: Display the same information by descending order of 'unit_code'
</font><br>

#### Limit
We can also limit the documents to be retrieved using `limit()`. For example, using the previous query, if you want to limit the
result set and return only 1 document, then execute:
```
db.FIT.find({"unit_name":"Database"}, {"unit_code":1,"unit_name":1}).sort({"unit_code":1}).limit(1)
```

This method can maximize performance and prevent MongoDB from returning more results than required for processing.

<font color='blue'>
**Practice**: Retrieve documents specifying a different numeric value for 'limit()'
</font><br>

#### findOne
There is a special form of `find()`: `findOne()`. It can take the same parameters as `find()`, but returns a **single document**. 

<font color='blue'>
**Practice**: Using the same query, use `findOne()` to see how it works. If you use 'sort()' or 'limit()', then you will get an error.
</font><br>

#### writingQuery

1. Find ...
2. 
3.
4.
5.
6.



### Update
Now, we will learn how to use the `update()` method to update documents of a collection. By default, this method updates a single document. We can also use the `multi` option to update all documents that match the criteria.

But note that we cannot update the `_id` field as this should be unique.

Now let's update one document whose `unit_code` is `FIT9131`. We will update its `unit_name` (`Programming` -> `Advanced Programming`) and `mark` (80 -> 100) fields using the `$set` operator.

```
db.FIT.update(
    {"unit_code" : "FIT9131"},
    {
      $set: {"unit_name": "Programming", "mark":80}
    }
)
```

We can also update an embedded field. For this practice, we will create another collection, `users`. 

<font color='blue'>
**Practice**: Create a new collection, "users"
</font><br>

Then, create a user, named `user1` based on the scheme below:
```
user1 = {
    "sid": "your_id",
    "name": {
    "first": "your first name",
    "last": "your last name"
    }
}
```

<font color='blue'>
**Practice**: Insert the above user1 into the users collection. Check whether it has been correctly inserted.
</font><br>

Let's now update the first name of user1. Use the following code:
```
db.users.update(
  {"sid" : "your_id"},
  {$set: { "name.first": "your new name"}}
)
```

The update operation returns a `WriteResult` object that contains the status of the operation.

As mentioned above, we can update multiple documents using `update()`. On this occassion, we can use `{ multi: true}`. For this demonstration, let's create another user that has the same last name, named `user2`.

<font color='blue'>
**Practice**: Create a new user, named "user2" that has the same last name of "user1". Insert "user2" into the "users" collection.
</font><br>

```
db.users.update(
    {"name.last":"your last name"}, 
    {$set: { "name.first": "new first name"}}, 
    {multi:true}
)
```


### Delete
We now focus on deleting a document in a collection using the `remove()` method. If you specify a condition, only the documents meeting the condition will be deleted.

As an example, we will delete delete the documents where `unit_code = ‘FIT_1’` :
```
db.FIT.remove({"unit_code":"FIT_1"})
```

<font color='blue'>
**Practice**: Check the result using "find()". 
</font><br>

The following command will delete all documents in the `users` collection:
```
db.users.remove({})
```

<font color='blue'>
**Practice**: Check the result using "find()". 
</font><br>

Finally, if you want to drop the collection `users`, use this command:
```
db.users.drop()
```
<font color='blue'>
**Practice**: Check the result using "show collections". 
</font><br>

#### Using Cursor
There is an interesting method, `next()`, when using `find()`. Note that when using `find()`, mongoDB returns the documents as a cursor object. So we can iterate the results over the returned cursor. 

For example, if we want to retrieve all documents whose `unit_name` is `Database`. Then, using `next()` and a while loop, we can print the returned documents in the json format:

```
var results = db.FIT.find({"unit_name":"Database"})
while(results.hasNext()) print(results.next())
```

Note that `printjson()` renders the output in the json format. The variable `results` can be manipulated as an array. So if we want to display the document at array index 1, use:
```
var results = db.FIT.find({"unit_name":"Database"})
printjson(results[1])
```

#### Explain
The `explain()` method can be used to see what steps have been taken to execute a query. It takes an optional parameter called verbose, which determines the format of the output : `allPlansExecution`, `executionStats`, and `queryPlanner` (default). Run the following:
```
db.FIT.find({"unit_name":"Database"}).explain("allPlansExecution")
```
As you can see, `explain()` returns information regarding `queryPlanner`, `executionStats`, and `serverInfo`. 

###  Indexes

Let's now look into how we can use indexing on documents in a collection to provide efficient **read** operations. By default, whenever a collection is created and documents are added to it, an index is created on their `_id` field. Here, you will learn how to create indexes. 

Let’s insert 100,000 documents in a new collection called `indexTestCollection`.

```
for(i=1;i<=100000;i++){db.indexTestCollection.insert({
        "sid": "ID_"+i,
        "name": {
            "first": "First Name" + i,
            "last": "Last Name" + i
            },
        "age":Math.floor(Math.random()*120)
        }
    )}

```

<font color='blue'>
**Practice**: Check the result using "count()". 
</font><br>

Run "explain()" to check what steps MongoDB performed to return the result to find a user whose sid is "ID_1000":

```
db.indexTestCollection.find({"sid":"ID_1000"}).explain("allPlansExecution")
```

In the output, can you find the value of the field: "totalDocsExamined"?. What about the value of the field: "executionTimeMillis"? In this output, the database scanned the entire collection which is inefficient. Can you improve its performance?

Let’s create an index on the `sid` field using `createIndex()`:
```
db.indexTestCollection.createIndex({"sid":1})
```

<font color='blue'>
**Practice**: Now run the same query for reading. Check the two fields: "totalDocsExamined" and "executionTimeMillis". What can you see now? Yes, there is no collection scan which is excellent. Now we see the value of using indexes.
</font><br>


However, you have created an index on a field and some documents have the same value of the field. In this case, creating index on a field cannot ensure uniqueness. Thus, we need to put a constraint that uniqueness must be set to true when creating the index. 

For example, if we want to create an index on the `first` field, and enable uniqueness to be true, then do the following:
```
db.indexTestCollection.createIndex({"first": 1},{"unique":true})
```

Now if you try to insert a duplicated first name in the collection, MongoDB returns an error:

<font color='blue'>
**Practice**: Try to insert the same users with a same first name into the "users" collection.
</font><br>


Also, we can rebuild all the indexes of a collection: 
```
db.indexTestCollection.reIndex()
```

#### Compound Index
We can also create indexes on multiple fields. For example, we can create indexes on both `sid` and `first` fields:

```
db.indexTestCollection.createIndex({"sid":1, "first": 1})
```

**Congratulations on finishing this activity!**

Next week, we will learn advanced querying using conditional operators and regular expressions in the selector part. Each of these operators and regular expressions provides you with more control over the queries.