# MongoDB

[MongoDB](http://www.mongodb.org/) is a widely used document-oriented NoSQL database. It's especially good for operating on large amounts of data and data that needs to be used by many users simultaneously, and so is used by large websites such as Craigslist, eBay, Foursquare, and LinkedIn.

The purpose of this tutorial is to show you the basics of working with MongoDB. We'll cover how to insert documents into a MongoDB database and how to get lists of documents back from the database that match a particular set of criteria. This is barely scratching the surface of MongoDB! By the end of this session you'll have enough literacy in how MongoDB works to explore its more advanced features on your own.

## Why use a database?

So what exactly is a database, and why might we need one? For our purpose, we can define a "database" as a piece of software whose main purpose is to make it possible for us to store data somewhere, then later retrieve it, usually in a way that pays attention to the structure of the data itself. There are several reasons we might want to put data into a database:

* *Persistence*: we download data, process it into a form that we like, draw conclusions from it...and then it disappears forever, once we close the notebook. To get that data again, we have to download and process it again, from scratch. This is fine for small amounts of data, but with larger amounts it can be very time consuming. Having a database allows us to store our data in a way that persists from one notebook session to the next.

* *Sharing*: another problem with downloading and processing data on demand is that it's difficult for us to share the result of our data processing with other data scientists. The data exists in our Jupyter notebook and nowhere else--there's no easy way to let someone else access it. A database like MongoDB, on the other hand, can be used by many people simultaneously. It's also easy to create a _dump_ of a MongoDB database and send it to a colleague, who can then reconstruct the data on their own server very easily.

* *Performance*: many databases, like MongoDB, boast features (like indexing, aggregation, and map-reduce) that can make accessing and processing data very fast, faster than we can do in Python.

## Architecture

MongoDB is client-server software, which means that the software itself runs on a server somewhere, and various clients on other computers can access it. The clients each talk to the server over the network, with a particular protocol unique to MongoDB. Most databases including PostgreSQL and MySQL work like this, but there are some exceptions, like SQLite, which work with files stored locally on your machine.

We're going to write our client software in a Jupyter Notebook, using a library called `pymongo`. The `pymongo` library gives us an easy way to write Python code that opens a network connection to the server, sends it commands using the MongoDB protocol, and interprets the results that come back.

The `pymongo` library can be installed on a SherlockML instances using `conda` with the following command:

    $ conda install -y pymongo
    
It can also be installed elsewhere with `pip` with the following command:

    $ pip install pymongo

You should generally favour `conda` over `pip` for installing database drivers, as they often have dependencies on shared libraries written in C. In this workshop, the database server (MongoDB itself) and the client software (the Python code running in your Jupyter notebook) both live on the same machine, your SherlockML server. When you see the word `localhost` below, that's what it means: connect to a database running on the same server as I'm on. Other than that, everything we'll learn here applies to connecting to MongoDB on remote servers.

## How MongoDB is structured

MongoDB is a document-oriented database, as we learnt about during the workshop presentation. MongoDB "documents" are like Python dictionaries: a list of key-value pairs that describe some particular thing. Documents are stored in a structure called a "collection," which is like a list of dictionaries in Python. Most of the work we do in MongoDB will be adding documents to a collection, and asking that collection to return documents that match particular queries.

Collections themselves are grouped into "databases," and each MongoDB server can support multiple databases.

## Connecting to MongoDB and inserting a document

First we'll import the `MongoClient` class from the `pymongo` module, and instantiate a `MongoClient` object. We'll pass the string `localhost` as the first argument, which tells `pymongo` to connect to the MongoDB server running on this machine.

In [1]:
from __future__ import print_function
from pymongo import MongoClient

client = MongoClient('localhost')

print(type(client))

<class 'pymongo.mongo_client.MongoClient'>


In [2]:
# Delete the `fellowship` database when starting the notebook again from scratch.
client.drop_database('fellowship')

One thing you can do with a `MongoClient` object is call its `.database_names()` method, which returns a list of all the databases on the server.

In [3]:
print(client.database_names())

['local', 'admin']


You should see only one database right now---`local`. The `local` database is for MongoDB's internal use, so we won't mess with it. Instead, we'll use the `MongoClient` object to get another object that represents a new database, like so:

In [6]:
db = client['fellowship']

Note: We haven't done anything at this point to explicitly create the `workshop` database! MongoDB automatically creates databases when you first use them.

This `Database` object supports several interesting methods, among them `.collection_names()`, which shows all of the collections in this database:

In [8]:
print(db.collection_names())

[]


It's an empty list right now (except maybe for a `system.indexes` collection, which is for internal MongoDB use and which you can ignore if it exists), because we haven't made any collections yet! Using the `Database` object as a dictionary, we can get an object representing a collection:

In [9]:
coll = db['kittens']
print(coll)

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'fellowship'), 'kittens')


Now we're in business. Let's insert our first document into the collection, using the collection's `.insert_one()` method. In between the parentheses of the `.insert_one()` method, we need to supply an expression that evaluates to a Python dictionary. PyMongo will convert this dictionary into a MongoDB document, and then add that document to the collection. Calling the `.insert_one()` method evaluates to a MongoDB `InsertOneResult` object, from which we can retrieve a randomly generated number that uniquely identifies the record that we just added from the `inserted_id` attribute:

In [10]:
result = coll.insert_one({'name': 'Fluffy', 'favourite_colour': 'chartreuse', 'lbs': 9.5})
print(result.inserted_id)

59ba8427cab99e56f8e7dc7f


Let's insert a few more records!

In [11]:
coll.insert_one({'name': 'Monsieur Whiskeurs', 'favourite_colour': 'cerulean', 'lbs': 10.8})
coll.insert_one({'name': 'Grandpa Pants', 'favourite_colour': 'mauve', 'lbs': 14.1})
coll.insert_one({'name': 'Susan B. Meownthony', 'favourite_colour': 'cerulean', 'lbs': 9.0});

## Finding a document

Of course, inserting documents on its own is not very useful. We'd like to be able to retrieve them later. To do so, we can use the `.find_one()` method of a collection object. Between the parentheses of the `.find_one()` call, we give a Python dictionary that tells MongoDB which documents to return. The `.find_one()` evaluates to the document that has an exact match for whichever key-value pairs are specified in the dictionary. To demonstrate:

In [12]:
coll.find_one({'name': 'Monsieur Whiskeurs'})

{'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
 'favourite_colour': 'cerulean',
 'lbs': 10.8,
 'name': 'Monsieur Whiskeurs'}

If more than one document had the value `Monsieur Whiskeurs` for the key `name`, MongoDB would have returned only the first matching document. If no documents match, this happens:

In [13]:
val = coll.find_one({'name': 'Big Shoes'})
print(val)

None


... the method evaluates to `None`.

You may have noticed the key `_id` in the document above. We didn't specify that key when we created the document, so where did it come from? It turns out that unless we specify the `_id` key manually, MongoDB will add it automatically and give it a randomly generated and unique `ObjectId` object as a value.

Let's do that `.find_one()` call again and see what else we can do with it.

In [14]:
doc = coll.find_one({'name': 'Monsieur Whiskeurs'})
print(type(doc))
print(doc['favourite_colour'])

<class 'dict'>
cerulean


As you can see, the value returned from `.find_one()` is just a Python dictionary. We can use it in any of the ways we usually use Python dictionaries--by getting a value for one of its keys, for example.

Use the `.find_one()` method to print out the `favourite_colour` value for our kitten named `Grandpa Pants`.

In [33]:
doc = coll.find_one({'name':'Grandpa Pants'})
print(doc['lbs'],doc['favourite_colour'])

14.1 mauve


In [49]:
doc['lbs']

14.1

## Finding more than one document

The collection object has a method `.find()` that allows you to access every document in the collection. It doesn't return a list, but a weird thing called a `Cursor`. To get data from a cursor, you either have to use it in a `for` loop like this:

In [22]:
for doc in coll.find():
    print(doc)

{'_id': ObjectId('59ba8427cab99e56f8e7dc7f'), 'name': 'Fluffy', 'favourite_colour': 'chartreuse', 'lbs': 9.5}
{'_id': ObjectId('59ba843fcab99e56f8e7dc80'), 'name': 'Monsieur Whiskeurs', 'favourite_colour': 'cerulean', 'lbs': 10.8}
{'_id': ObjectId('59ba843fcab99e56f8e7dc81'), 'name': 'Grandpa Pants', 'favourite_colour': 'mauve', 'lbs': 14.1}
{'_id': ObjectId('59ba843fcab99e56f8e7dc82'), 'name': 'Susan B. Meownthony', 'favourite_colour': 'cerulean', 'lbs': 9.0}


... or explicitly convert it to a list, with the `list()` function:

In [23]:
documents = list(coll.find())
documents

[{'_id': ObjectId('59ba8427cab99e56f8e7dc7f'),
  'favourite_colour': 'chartreuse',
  'lbs': 9.5,
  'name': 'Fluffy'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc81'),
  'favourite_colour': 'mauve',
  'lbs': 14.1,
  'name': 'Grandpa Pants'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

We can also pass a dictionary to `.find()` to tell MongoDB to only return a subset of documents, namely, only those documents that match the key-value pairs in the dictionary we put in the parentheses. For example, to fetch only those kittens whose `favourite_colour` is `cerulean`:

In [24]:
cerulean_lovers = list(coll.find({'favourite_colour': 'cerulean'}))
cerulean_lovers

[{'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

Write a list comprehension that evaluates to a list of the names of kittens whose weight is over 10 lbs.


In [29]:
list(kitten['name'] for kitten in coll.find() if kitten['lbs']>10.)

['Monsieur Whiskeurs', 'Grandpa Pants']

In [69]:
heavy_cats = list(coll.find({'lbs':{'$gt': 10.}},{'name':1,'_id':0}))
heavy_cats

[{'name': 'Monsieur Whiskeurs'}, {'name': 'Grandpa Pants'}]

## Simple aggregations

You can ask MongoDB how many documents are in a collection with the collection's `.count()` method:

In [30]:
coll.count()

4

It's also easy to get a list of distinct values there are for a particular field, using the `distinct` method:

In [31]:
coll.distinct('favourite_colour')

['chartreuse', 'cerulean', 'mauve']

## Removing documents

You can remove documents from a collection with the `.delete_many()` method, passing in a dictionary that describes which documents you want to remove. For example, to remove any documents where the `name` key has the value `Fluffy`:

In [32]:
coll.delete_many({'name': 'Fluffy'})
list(coll.find())

[{'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc81'),
  'favourite_colour': 'mauve',
  'lbs': 14.1,
  'name': 'Grandpa Pants'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

You can see that `Fluffy` has now gone missing. You can also easily remove *all* documents from a collection, using the `.remove_many()` method with an empty dictionary. **WARNING**: Don't run this cell unless you want to remove everything you've inserted so far!

In [None]:
# WARNING: Will delete *all* documents from the collection.
# coll.delete_many({})
# list(coll.find())

## More sophisticated queries

We can be a bit more specific about which documents we want from the collection using MongoDB *query selectors*. Query selectors take the form of dictionaries that we pass to the `.find()` method. Keys in this dictionary should be the field that you want to match against, and the value for such a key should be *another* dictionary, that has as its key a MongoDB query operator (listed below), and as its value the number to go with the operator. Here's an example, to make it more clear, searching our collection of kittens for documents where the `lbs` field is greater than `10`:

In [34]:
list(coll.find({'lbs': {'$gt': 10}}))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc81'),
  'favourite_colour': 'mauve',
  'lbs': 14.1,
  'name': 'Grandpa Pants'}]

Other operators that are supported ([full list here](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors)):

* `$gt`: greater than
* `$gte`: greater than or equal to
* `$lt`: less than
* `$lte`: less than or equal to
* `$ne`: not equal to

You can combine more than one operator for a particular field, in which case MongoDB will find documents that match *all* criteria:

In [51]:
list(coll.find({'lbs': {'$gt': 9, '$lt': 11}}))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'}]

You can also include conditions for more than one field in the dictionary, in which case MongoDB will find documents that match those criteria for each respective field:

In [52]:
list(coll.find({'favourite_colour': 'cerulean'}))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

In [53]:
list(coll.find({'favourite_colour': 'cerulean', 'name': {'$ne': 'Monsieur Whiskeurs'}}))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

### Regular expression searches

Another valuable search criterion that MongoDB supports is `$regex`, which will return documents that match a regular expression for a particular field. For example, to find all kittens whose name ends with the letter `y`:

In [54]:
list(coll.find({'name': {'$regex': 'y$'}}))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

Write a call to `.find()` that returns all kittens whose favourite colour begins with the letter `c`.

In [58]:
list(coll.find({'favourite_colour':{'$regex':'c'}}))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

## Sorting and limiting

Results from `.find()` aren't returned in a particular order. You may find it helpful for this reason to sort the results. You can specify a sort order for results from the `.find()` method by appending a call to the `.sort()` method. It looks like this:

In [59]:
list(coll.find().sort('lbs'))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc81'),
  'favourite_colour': 'mauve',
  'lbs': 14.1,
  'name': 'Grandpa Pants'}]

The parameter you pass to `.sort()` specifies which field the documents should be sorted by. Specifying descending order is easy:

In [60]:
list(coll.find().sort('lbs', -1))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc81'),
  'favourite_colour': 'mauve',
  'lbs': 14.1,
  'name': 'Grandpa Pants'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

(The `-1` means 'in reverse order'.) The `.sort()` method works even if you've specified query selectors in the call to `.find()`:

In [61]:
list(coll.find({'lbs': {'$gt': 9.0}}).sort('name'))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc81'),
  'favourite_colour': 'mauve',
  'lbs': 14.1,
  'name': 'Grandpa Pants'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'}]

You can also limit the number of results returned from `.find()` using the `.limit()` method, which, like `.sort()`, gets called on the result of `.find()`. To return only two kittens:

In [62]:
list(coll.find().limit(2))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc80'),
  'favourite_colour': 'cerulean',
  'lbs': 10.8,
  'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc81'),
  'favourite_colour': 'mauve',
  'lbs': 14.1,
  'name': 'Grandpa Pants'}]

Search for all kittens weighing less than 10 pounds, limit to one result:

In [63]:
list(coll.find({'lbs':{'$lt':10}}).limit(1))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc82'),
  'favourite_colour': 'cerulean',
  'lbs': 9.0,
  'name': 'Susan B. Meownthony'}]

You can put a `.limit()` after a `.sort()` to get only the first few results from a sorted list of documents. So, for example, to get only the heaviest cat:

In [64]:
list(coll.find().sort('lbs', -1).limit(1))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc81'),
  'favourite_colour': 'mauve',
  'lbs': 14.1,
  'name': 'Grandpa Pants'}]

## Only get certain fields

If we want our result to only include certain key-value pairs from the document, we can provide a second argument to the `find` method. This argument should be a dictionary whose keys are the fields we want included, and whose values are all `1`. For example, to find all kittens whose favourite colour is `cerulean`, but only return their names, we could do this:

In [65]:
list(coll.find({'favourite_colour': 'cerulean'}, {'name': 1}))

[{'_id': ObjectId('59ba843fcab99e56f8e7dc80'), 'name': 'Monsieur Whiskeurs'},
 {'_id': ObjectId('59ba843fcab99e56f8e7dc82'), 'name': 'Susan B. Meownthony'}]

The `_id` field is always included by default. If we want to get rid of it, we can include the `_id` key in our list of fields, giving it a `0` (instead of a `1`):

In [66]:
list(coll.find({'favourite_colour': 'cerulean'}, {'name': 1, '_id': 0}))

[{'name': 'Monsieur Whiskeurs'}, {'name': 'Susan B. Meownthony'}]

## Let's get real

I want to take you through a real-world example of consuming data from a source, putting it into MongoDB, then querying MongoDB to find interesting stuff in that data. Specifically, we're going to fetch a big CSV of historic data about congress members from the "Bulk Data" section of [govtrack.us](https://www.govtrack.us/developers/data). [Here's the file](https://www.govtrack.us/data/congress-legislators/legislators-historic.csv), which contains a row for every member of Congress in the history of the United States (who isn't currently a sitting member).

Let's play with the `csv` library's `DictReader` class to see what the data looks like.

In [70]:
import csv
rows = csv.DictReader(open('legislators-historic.csv'))
all_rows = list(rows)
all_rows[0]

OrderedDict([('last_name', 'Bassett'),
             ('first_name', 'Richard'),
             ('birthday', '1745-04-02'),
             ('gender', 'M'),
             ('type', 'sen'),
             ('state', 'DE'),
             ('district', ''),
             ('party', 'Anti-Administration'),
             ('url', ''),
             ('address', ''),
             ('phone', ''),
             ('contact_form', ''),
             ('rss_url', ''),
             ('twitter', ''),
             ('facebook', ''),
             ('facebook_id', ''),
             ('youtube', ''),
             ('youtube_id', ''),
             ('bioguide_id', 'B000226'),
             ('thomas_id', ''),
             ('opensecrets_id', ''),
             ('lis_id', ''),
             ('cspan_id', ''),
             ('govtrack_id', '401222'),
             ('votesmart_id', ''),
             ('ballotpedia_id', ''),
             ('washington_post_id', ''),
             ('icpsr_id', '507'),
             ('wikipedia_id', '')])

In [71]:
# Drop the `legislators` collection when starting the notebook again from scratch.
db.drop_collection('legislators')

{'errmsg': 'ns not found', 'ok': 0.0}

What we seem to have here is a dictionary that describes a member of congress. This happens to be one [Richard Bassett](http://en.wikipedia.org/wiki/Richard_Bassett_(politician)), born in 1745. So how about putting all those rows into MongoDB? Here's how it would go. It's pretty simple--we'll create a separate collection in our database for these legislators, called `legislators`.

In [72]:
coll = db['legislators']

Now, loop through the rows of the table and just insert each dictionary from `DictReader` straight into MongoDB:

In [73]:
for row in all_rows:
    coll.insert_one(row)

At this point, the number of documents in the database should match the number of rows in the CSV file. Let's make sure.

In [74]:
len(all_rows) == coll.count()

True

And how many exactly is that?

In [75]:
coll.count()

11807

Eleven thousand legislators. Not exactly "big data", but hopefully you can still see the benefit of having this data in one place without having to re-download and parse the data each time we want to use it.

### Meet the press

Let's do some queries on our data now. Make a list of all legislators who are women:

In [79]:
women = list(coll.find({'gender':'F'}))
len(women)

73

How about a list of legislators who are women, whose party is not `Democrat`?

In [80]:
women_nonD = list(coll.find({'gender':'F','party':{'$ne':'Democrat'}}))
len(women_nonD)

31

Make a list of these women, including only their names, states, and birthdays:

In [81]:
women_nonD_nsb = list(coll.find({'gender':'F','party':{'$ne':'Democrat'}},
                                {'first_name':1,'last_name':1,'state':1,'_id':0}))
len(women_nonD_nsb)

31

List the youngest five Republican legislators, as determined by their birthday:

In [89]:
list(coll.find({'party':'Republican'},
              {'first_name':1,'last_name':1,'_id':0,'birthday':1}).sort('birthday',1).limit(5))

[{'birthday': '', 'first_name': 'Alexander', 'last_name': 'Martin'},
 {'birthday': '', 'first_name': 'Timothy', 'last_name': 'Bloodworth'},
 {'birthday': '', 'first_name': 'Christopher', 'last_name': 'Greenup'},
 {'birthday': '', 'first_name': 'John', 'last_name': 'Sherburne'},
 {'birthday': '', 'first_name': 'Thomas', 'last_name': 'Sprigg'}]

Make a list of all distinct parties? (Witness the variety of American democracy)

In [90]:
list(coll.distinct('party'))

['Anti-Administration',
 '',
 'Pro-Administration',
 'Republican',
 'Federalist',
 'Democratic Republican',
 'Unknown',
 'Adams',
 'Jackson',
 'Jackson Republican',
 'Crawford Republican',
 'Whig',
 'Anti-Jacksonian',
 'Adams Democrat',
 'Nullifier',
 'Anti Masonic',
 'Anti Jacksonian',
 'Jacksonian',
 'Democrat',
 'Anti Jackson',
 'Union Democrat',
 'Conservative',
 'Ind. Democrat',
 'Law and Order',
 'American',
 'Liberty',
 'Free Soil',
 'Independent',
 'Ind. Republican-Democrat',
 'Ind. Whig',
 'Unionist',
 'States Rights',
 'Anti-Lecompton Democrat',
 'Constitutional Unionist',
 'Independent Democrat',
 'Unconditional Unionist',
 'Conservative Republican',
 'Ind. Republican',
 'Liberal Republican',
 'National Greenbacker',
 'Readjuster Democrat',
 'Readjuster',
 'Union',
 'Union Labor',
 'Populist',
 'Silver Republican',
 'Free Silver',
 'Democratic and Union Labor',
 'Progressive Republican',
 'Progressive',
 'Prohibitionist',
 'Socialist',
 'Farmer-Labor',
 'Nonpartisan',
 'Coal

Read the documentation for [MongoDB's `$nin` operator](http://docs.mongodb.org/manual/reference/operator/query/nin/), and write a MongoDB query that returns a list of the names, states, and parties of all legislators whose party is neither Republican nor Democrat.

In [92]:
member = list(coll.find({'party':{'$nin':['Republican','Democrat']}},
               {'first_name':1,'last_name':1,'state':1,'party':1,'_id':0}))
len(member)

2096

## Where to go next

Great work--you've learned the basics. Where to go next?

* The [PyMongo tutorial](http://api.mongodb.org/python/current/tutorial.html) covers a lot of the same material that we've covered here, but it's always nice to have a different perspective on this.
* If you want to be able to modify your data after you've imported it, you may want to learn how to [update documents](http://docs.mongodb.org/manual/tutorial/modify-documents/) once they're already in the database.
* Once you're working with a sufficiently large amount of data, you'll be able to speed up your queries significantly using [indexes](http://docs.mongodb.org/manual/core/indexes-introduction/).
* We didn't cover some of MongoDB's most powerful features in this tutorial, including its [aggregation](http://docs.mongodb.org/manual/core/aggregation-introduction/) and [map-reduce](http://docs.mongodb.org/manual/core/map-reduce/) features. Both are very handy, but difficult to teach in a Python class, because they often require writing JavaScript. [Here's a good overview](http://blog.safaribooksonline.com/2013/06/21/aggregation-in-mongodb/) of MongoDB's aggregation framework.