# Introduction to MongoDB via Python

Before we start let's see what Wikipedia has to say about MongoDB:

> "MongoDB (from "hu**mongo**us") is an open source document-oriented database system developed and supported by 10gen. It is part of the NoSQL family of database systems. Instead of storing data in tables as is done in a "classical" relational database, MongoDB stores structured data as JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.

>10gen began Development of MongoDB in October 2007. The database is used by MTV Networks, Craigslist, Foursquare and UIDAI Aadhaar. MongoDB is the most popular NoSQL database management system."

![](https://image.slidesharecdn.com/thinkingindocuments1-150909190439-lva1-app6891/95/webinar-back-to-basics-thinking-in-documents-6-638.jpg)
Done - let's start!

``` bash
mkdir -p ~/mongodb
cd ~/mongodb
curl -O https://fastdl.mongodb.org/osx/mongodb-osx-x86_64-3.4.7.tgz
tar -zxvf mongodb-osx-x86_64-3.4.7.tgz
cp -R -n mongodb-osx-x86_64-3.4.7/ ~/mongodb
```

``` bash
vi ~/.bashrc
#paste this phrase into bashrc file: export PATH=~/mongodb/bin:$PATH
# press ESc on keyboard, type colon, then type wq
source ~/.bashrc
which mongod
mkdir -p ~/mongodb/data/db
sudo chmod 755 ~/mongodb/data/
rm mongodb-osx-x86_64-3.4.7.tgz
mongod --dbpath ~/mongodb/data/db/
```


if you should see something like:
`
MongoDB shell version: 2.2.2
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
	http://docs.mongodb.org/
Questions? Try the support group
	http://groups.google.com/group/mongodb-user
> 
`
then you are good to go! Type **exit** to quit and go to the previous terminal windows and hit Ctrl+C to kill the server  - we need a bit of customization first.

## Pymongo

[PyMongo](https://api.mongodb.com/python/current) is a Python distribution containing tools for working with MongoDB.
* It is the recommended way to work with MongoDB from Python. You can install mongodb using:  ```conda install```.

### Connecting to MongoDB

We will use MongoClient to connect to MongoDB.

In [29]:
import pymongo

# Connection to Mongo DB
try:
    client = pymongo.MongoClient()
    print "Hooray, we have connected to MongoDB successfully!"
except pymongo.errors.ConnectionFailure, e:
    print "Could not connect to MongoDB: %s" % e 
client

Hooray, we have connected to MongoDB successfully!


MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

What does ```tz_aware``` mean?

* ```tz_aware (optional): if True, datetime instances returned as values in a document by this MongoClient will be timezone aware (otherwise they will be naive)```

## Connected
We are now connected to our server.

![](https://www.codeproject.com/KB/database/1037052/image001.png)

### Databases

Mongodb creates databases and collections automatically for you if they don't exist already. A single instance of MongoDB can support multiple independent databases. 

In [30]:
client.drop_database('mydb')
client.drop_database('sandpit-test')
client.drop_database('test-database')


When working with PyMongo you access databases using attribute style access:

In [31]:
db = client.test_database
db

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test_database')

## Listing PyMongo Databases
To view the databases, use this Python statement on the client object:

In [32]:
db = client['test_database']
db

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test_database')

If your database name is such that using attribute style access won’t work (like db-name), you can use dictionary style access instead:

If you need to know what databases are available:

In [33]:
client.database_names()

[u'admin', u'local', u'test_database']

We have created 2 new databases. 

### Why didn't my 2 databases show up with the above command? 
Well, databases with no collections or with empty collections will **not** show up with ```database_names()```. Same goes when we try to list empty collections in a database.

We'll test it again once we have populate some collections.

### Collections

A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database.

In MongoDB, documents stored in a collection must have a unique ```_id``` field that acts as a primary key.

Getting a collection in PyMongo works the same as getting a database:

In [34]:
collection = db.test_collection
collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test_database'), u'test_collection')

or (using dictionary style access):

In [35]:
collection = db['test-collection']
collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test_database'), u'test-collection')

### Documents

MongoDB stores structured data as JSON-like documents, using dynamic schemas (called BSON, Binary JSON), rather than predefined schemas. An element of data is called a document, and documents are stored in collections. One collection may have any number of documents.

Compared to relational databases, we could say collections are like tables, and documents are like records. But there is one big difference: every record in a table has the same fields (with, usually, differing values) in the same order, while each document in a collection can have completely different fields from the other documents. 

All you really need to know when you're using Python, however, is that documents are Python dictionaries that can have strings as keys and can contain various primitive types (int, float,unicode, datetime) as well as other documents (Python dicts) and arrays (Python lists).

In [36]:
import datetime
post = {"author": "Mike", 
        "text": "My first blog post!",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()}

In [37]:
post

{'author': 'Mike',
 'date': datetime.datetime(2017, 11, 14, 6, 25, 12, 47501),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

In [38]:
posts = db.posts
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('5a0a8c484237c71146125a1d')

The operation returns an ```InsertOneResult``` object, which includes an attribute ```inserted_id``` that contains the ```_id``` of the inserted document. 

In [39]:
db.collection_names(include_system_collections=False)

[u'test_database', u'posts']

The ```ObjectId``` of your inserted document will differ from the one shown.

Congratulations you have created your first document!!

If the document passed to the ```insert_one()``` method does not contain the ```_id``` field, MongoClient automatically adds the field to the document and sets the field’s value to a generated ```ObjectId```.

In [40]:
client.database_names()

[u'admin', u'local', u'test_database']

In [41]:
db.collection_names()

[u'test_database', u'posts']

In [42]:
client.database_names()

[u'admin', u'local', u'test_database']

In [43]:
# Remove database 

client.drop_database("test-database")
client.database_names()

[u'admin', u'local', u'test_database']

In [44]:
# Create a database, then a collection and a dummy document
client.dummy.dummy_collection.insert_one({"greeting":"Hello World"})
client.dummy.collection_names()


[u'dummy_collection']

In [45]:
client.dummy.collection_names()

[u'dummy_collection']

In [46]:
# Delete a collection
client.dummy.drop_collection("dummy_collection")
client.dummy.collection_names()

[]

In [47]:
# Delete dummy database
client.drop_database("dummy")
client.database_names()

[u'admin', u'local', u'test_database']

In [48]:
# Remove documents.
#  Please note that there is no multi=True option for remove. MongoDB will remove any documents that match the query
posts.delete_one({"_id":5678})

<pymongo.results.DeleteResult at 0x10443f730>

If we don't specify what documents to remove MongoDB will remove them all. This just removes the documents. The collection and its indexes still exist.

## Useful Commands

If you need to get some statistics about your collections.

In [49]:
db.command({'dbstats': 1})

{u'avgObjSize': 184.0,
 u'collections': 2,
 u'dataSize': 1472.0,
 u'db': u'test_database',
 u'indexSize': 53248.0,
 u'indexes': 2,
 u'numExtents': 0,
 u'objects': 8,
 u'ok': 1.0,
 u'storageSize': 53248.0,
 u'views': 0}

To get collection statistics use the collstats command:

In [50]:
db.command({'collstats': 'posts'})

{u'avgObjSize': 138,
 u'capped': False,
 u'count': 6,
 u'indexDetails': {u'_id_': {u'LSM': {u'bloom filter false positives': 0,
    u'bloom filter hits': 0,
    u'bloom filter misses': 0,
    u'bloom filter pages evicted from cache': 0,
    u'bloom filter pages read into cache': 0,
    u'bloom filters in the LSM tree': 0,
    u'chunks in the LSM tree': 0,
    u'highest merge generation in the LSM tree': 0,
    u'queries that could have benefited from a Bloom filter that did not exist': 0,
    u'sleep for LSM checkpoint throttle': 0,
    u'sleep for LSM merge throttle': 0,
    u'total size of bloom filters': 0},
   u'block-manager': {u'allocations requiring file extension': 0,
    u'blocks allocated': 4,
    u'blocks freed': 1,
    u'checkpoint size': 4096,
    u'file allocation unit size': 4096,
    u'file bytes available for reuse': 16384,
    u'file magic number': 120897,
    u'file major version number': 1,
    u'file size in bytes': 36864,
    u'minor version number': 0},
   u'btre

This notebook was adapted from: 
* https://github.com/Altons/pymongo-tutorial
* http://api.mongodb.com/python/current/tutorial.html 