# Section 08: Other Database Structures

## Document Stores

A **_Document Store_** is a database that stores records as unique documents in the database. These documents can be arbitrarily long, and can even contain other documents inside of them! The chat log example we saw above is a prime use case for a document store. In a document store, we could store each message and its accompanying metadata as a document, and then embed each of those documents in order in a chat document. In this way, we can easily access the data as needed. 

In these Document Stores, each document contains key-value pairs, with the actual data being stored in as the value. This makes Document Stores incredibly flexible, because each document can be unique. There is no constraint saying each document must have the same keys! This makes it great for working with data where we don't know what shape it will take (as we saw above, with chat logs that can be arbitrarily long or short), or perhaps when we don't know what data will be stored at all. This would be a problem in a relational database, because we would need to know what column the data belongs in before we could store it. With a Document Store, we can just create a key on the fly for the data that matters to us!

Note that while this flexibility makes it easy for us to store data on the fly, this also makes it harder for us to query data and get exactly what we need. Since each different document can potentially have its own **_Schema_**, this means that we have to know what we are looking for. This also means that we have to be diligent in our naming conventions, because `chatLog` is different than `ChatLog`. This means that if we run a query across all documents to get all data with the key `chatLog`, we'll completely miss any data where they key is written as `ChatLog`!

## Installing MongoDB

In this lesson, we'll learn about the popular NoSQL database **_MongoDB_**, including how to install it on our machine, connect to a mongo database, add how to use it to **_C_**reate / **_R_**ead / **_U_**pdate / **_D_**elete (**CRUD**) data!


This part is easy -- in order to install mongoDB, we'll use our favorite package manager, `conda`! This part works the same, regardless of what operating system you're running. 

To install MongoDB on your machine, open a terminal or `conda` command prompt and just type: 

`conda install mongodb`

Next, we have to create a directory to store our Mongo data files: 

`sudo mkdir -p /data/db`

Give the directory the correct permission: 

``sudo chown -R `id -un` /data/db``


Now we're ready to run our server! In that same command prompt, just type `mongod`. You'll see the server start up instantly.  Note that you must leave this terminal process running in order to make use of the mongoDB instance, so you'll need to leave this one alone, and open a new terminal or command prompt window when you need it.

## Working with Mongodb through Python with `pymongo`

Connecting to mongodb through a python library is going to feel very similar to the boilerplate code we had to use to connect to a SQLite database. To connect to our mongo server through python, we have to:

1. Import the `pymongo` library. 

2. Create a client that is connected to our running mongodb server by using the `pymongo` library's `MongoClient` object and passing it the URL for the server (which the mongo server told us as output when we started it up at the very beginning).

3. Get the database that we'll be working with from the `myclient` object -- this can include creating a new database by passing in it's name as a key, just as if we were trying to get it from a Python dictionary.  

We'll do this now in the cells below as an example. 

In [None]:
import pymongo

import requests
from bs4 import BeautifulSoup

In [None]:
myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/") # connecting to Mongo

In [None]:
print(myclient.list_database_names()) # these are databases that exist by default

## Getting Data from the Internet (Section 10 Preview)

I'm gonna grab data from http://quotes.toscrape.com/

In [None]:
def get_some_quotes(url):
    # Make a get request to retrieve the page
    html_page = requests.get(url) 
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
        
    list_quotes = []
    for i in soup.find_all(class_="quote"):
        quotes = {}
        quote = (i.find(class_="text").text)
        quotes['quote'] = quote
        list_quotes.append(quotes)
        author = i.find(class_ = "author").text
        quotes['author'] = author
    return list_quotes

In [None]:
quotes_for_mongo = get_some_quotes('http://quotes.toscrape.com/' )
quotes_for_mongo

## Creating the Database and Inserting Data into Mongo

In [None]:
mydb = myclient['quote_database'] 
mycollection = mydb['quote_collection']

In [None]:
print(myclient.list_database_names())

In [None]:
delete_collection = mycollection.delete_many({}) # this deletes the collection! use to reset db

Note that we can get a full list of the names of every database we have by running our clients object's `.list_database_names()` method. However, if we run this method in the cell below, we'll see that the database does not yet exist. 

This is because mongoDB doesn't actually create the new database until we have stored some data in it. The act of not doing something until absolutely necessary because another operation needs it is a programming concept called **_lazy execution_**. Since our `example_database` database doesn't contain any data yet, mongo hasn't created it yet, so it doesn't show up in the output of our `.list_database_names()` call. 

Just as a SQL database has tables, a mongo database has **_Collections_** of documents. We can get a collection or create a new one by passing its name to the database object we created, just like when we passed the database name to the client object. 


In [None]:
results = mycollection.insert_many(quotes_for_mongo)

In [None]:
print(myclient.list_database_names())

In [None]:
results.inserted_ids

## Querying Data from Mongo

In [None]:
mycollection.find({'author':'Albert Einstein'})

In [None]:
query = mycollection.find({'author':'Albert Einstein'})

for x in query:
    print(x)

## Updating Data to Mongo

In [None]:
steve_tags = ['change', 'deep-thoughts', 'thinking', 'world']

In [None]:
update_steve = {'author': 'Steve Martin'} # which subset of data to update
steve_quote_tags = {'$set': {'quote_tags': steve_tags}} # what new data to add

mycollection.update_one(update_steve, steve_quote_tags)

In [None]:
query2 = mycollection.find({'author': 'Steve Martin'})
for item in query2:
    print(item)

## Deleting Data from Mongo

In [None]:
# delete all Albert Einstein quotes

deletion_1 = mycollection.delete_one({'author': 'Albert Einstein'})
print(deletion_1.deleted_count)

In [None]:
query = mycollection.find({'author':'Albert Einstein'})

for x in query:
    print(x)