# MongoDB demo

## Intro

__NOTE:__ for this notebook you should start your server with `MongoDB environment`.

![Apache Airflow in a box](images/mongodb_env.png)

[MongoDB](https://www.mongodb.com/) is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas.

## MongoDB client

Python library [PyMongo](https://pymongo.readthedocs.io) is used to get access to MongoDB demo database. The first step when working with PyMongo is to create a MongoClient to the running mongod instance:

In [None]:
from pymongo import MongoClient

client = MongoClient() # default setting
client

A single instance of MongoDB can support multiple independent databases. Connect to `test-database` database:

In [None]:
db = client['test-database']
db

A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Here is an example how to get a collection:

In [None]:
collection = db['test_collection']
collection

## Working with documents

### Insert a document

Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents:

In [None]:
import datetime

post = {
    'author': 'Mike',
    'text': 'My first blog post!',
    'tags': ['mongodb', 'python', 'pymongo'],
    'date': datetime.datetime.now()
}
post

To insert a document into a collection `posts` we can use the `insert_one()` method:

In [None]:
# new collection 'posts'
posts = db['posts']
# insert a document
post_id = posts.insert_one(post).inserted_id
post_id

After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:

In [None]:
db.list_collection_names()

Now we see an inserted document (with use `find_one()` method):

In [None]:
posts.find_one()

## Insert many documents

We can also perform bulk insert operations, by passing a list as the first argument to `insert_many()`:

In [None]:
new_posts = [
    {
        'author': 'Mike',
        'text': 'Another post!',
        'tags': ['bulk', 'insert'],
        'date': datetime.datetime.now()
    },
    {
        'author': 'Eliot',
        'title': 'MongoDB is fun',
        'text': 'and pretty easy too!',
        'date': datetime.datetime.utcnow()
    }
]
result = posts.insert_many(new_posts)
result.inserted_ids

Display all records in database:

In [None]:
for post in posts.find():
    print(post)

Or we can use `find(...)` to find records by condition: 

In [None]:
for post in posts.find({'author': 'Mike'}):
    print(post)

If we just want to know how many documents match a query we can perform a `count_documents()` operation instead of a full query:

In [None]:
posts.count_documents({})