The following code accompanies a blog demonstration of how to navigate around a MongoDB database. You can find the blog [here](https://zach-a-greenberg.medium.com/79de6b2842a1)

In [1]:
import pymongo
from host import *

In [2]:
client = pymongo.MongoClient(hostname)

In [3]:
#this is how you see the database names. 'admin' and 'local' are standard with mongodb.
client.list_database_names()

['movies', 'admin', 'local']

In [4]:
#creating a variable to access the movies db.
db = client['movies']

In [5]:
#viewing all of the collections in the movies db.
db.list_collection_names()

['animatedFilms']

This is the total number of entries in the db in the animatedFilms collection.

In [6]:
db.animatedFilms.count_documents({})

200

This is a sample of what 2 of the entries in the db look like

In [7]:
list(db.animatedFilms.find({}).limit(2))

[{'_id': ObjectId('617efa278a5dd277669ea133'),
  'rank': '1',
  'title': 'Spirited Away',
  'year': '2001',
  'rating': 'PG',
  'runtime': 125,
  'genre': ['Animation', 'Adventure', 'Family'],
  'score': 8.6,
  'directors': ['Hayao Miyazaki'],
  'cast': ['Daveigh Chase', 'Suzanne Pleshette', 'Miyu Irino', 'Rumi Hiiragi'],
  'gross': 10060000},
 {'_id': ObjectId('617efa278a5dd277669ea134'),
  'rank': '2',
  'title': 'The Lion King',
  'year': '1994',
  'rating': 'G',
  'runtime': 88,
  'genre': ['Animation', 'Adventure', 'Drama'],
  'score': 8.5,
  'directors': ['Roger Allers', 'Rob Minkoff'],
  'cast': ['Matthew Broderick',
   'Jeremy Irons',
   'James Earl',
   'Whoopi Goldberg'],
  'gross': 422780000}]

Using the '$in' query operator will allow us to filter what we are searching for. A list of other query operators can be found [here](https://docs.mongodb.com/manual/reference/operator/query/)

In [8]:
directed_by = {'directors': {'$in':['Hayao Miyazaki']}}

In [9]:
db.animatedFilms.count_documents(directed_by)

28

In [10]:
#finding the names of the films directed by Hayao Myazaki
cursor = db.animatedFilms.find(directed_by)

for doc in cursor:
    print(doc['title'])

Spirited Away
Princess Mononoke
Howl's Moving Castle
My Neighbor Totoro
Nausicaä of the Valley of the Wind
Castle in the Sky
Kiki's Delivery Service
Spirited Away
Princess Mononoke
Howl's Moving Castle
My Neighbor Totoro
Nausicaä of the Valley of the Wind
Castle in the Sky
Kiki's Delivery Service
Spirited Away
Princess Mononoke
Howl's Moving Castle
My Neighbor Totoro
Nausicaä of the Valley of the Wind
Castle in the Sky
Kiki's Delivery Service
Spirited Away
Princess Mononoke
Howl's Moving Castle
My Neighbor Totoro
Nausicaä of the Valley of the Wind
Castle in the Sky
Kiki's Delivery Service


In [11]:
high_gross = {'gross': {'$gte': 20000000}}

In [12]:
#finding the number of films that grossed more than 20 million
db.animatedFilms.count_documents(high_gross)

96

In [13]:
#getting a list of the years represented in the dataset and the total number of distinct years
count_years = len(db.animatedFilms.distinct('year'))
years = sorted(list(db.animatedFilms.distinct('year')))
count_years, years[:5]

(27, ['1984', '1986', '1988', '1989', '1991'])

In [14]:
#setting up to get the films that do not have a gross reported
missing_gross = db.animatedFilms.find({'gross': {'$type': 'double'}}, projection={'title':1, '_id':0})

In [15]:
for missing in missing_gross:
    print(missing)

{'title': 'Grave of the Fireflies'}
{'title': 'A Silent Voice: The Movie'}
{'title': 'Klaus'}
{'title': 'Soul'}
{'title': 'Wolf Children'}
{'title': 'Mary and Max'}
{'title': 'Neon Genesis Evangelion: The End of Evangelion'}
{'title': 'Wolfwalkers'}
{'title': 'Castle in the Sky'}
{'title': 'Whisper of the Heart'}
{'title': 'Ninja Scroll'}
{'title': "Kiki's Delivery Service"}
{'title': 'Grave of the Fireflies'}
{'title': 'A Silent Voice: The Movie'}
{'title': 'Klaus'}
{'title': 'Soul'}
{'title': 'Wolf Children'}
{'title': 'Mary and Max'}
{'title': 'Neon Genesis Evangelion: The End of Evangelion'}
{'title': 'Wolfwalkers'}
{'title': 'Castle in the Sky'}
{'title': 'Whisper of the Heart'}
{'title': 'Ninja Scroll'}
{'title': "Kiki's Delivery Service"}
{'title': 'Grave of the Fireflies'}
{'title': 'A Silent Voice: The Movie'}
{'title': 'Klaus'}
{'title': 'Soul'}
{'title': 'Wolf Children'}
{'title': 'Mary and Max'}
{'title': 'Neon Genesis Evangelion: The End of Evangelion'}
{'title': 'Wolfwalk

This is how we perform aggregates. Remember that they are always grouped by something.

In [16]:
list(db.animatedFilms.aggregate([{'$group': {'_id':None, 'score': {'$avg':"$score"} } }]))

[{'_id': None, 'score': 8.106}]

In [17]:
list(db.animatedFilms.aggregate([{'$group': {'_id':None, 'runtime': {'$avg':"$runtime"} } }]))

[{'_id': None, 'runtime': 101.66}]