### i0u19a - Data Processing - KU Leuven

# Python Mongo exercises

###### _Thomas Moerman, Jan Aerts_

![license](https://licensebuttons.net/l/by/3.0/88x31.png)

Hello and welcome to the tutorial on data processing with **Mongo DB**!
Make sure to have the mongo server running as well, e.g with `docker run -d -p 27017:27017 -p 28017:28017 jandot/mongo-i0u19a`

We'll be using Jupyter notebook again (you're looking at it) as a tool to walk you through a few examples. At the VDA-LAB, we like notebooks as a teaching tool because they allow you to experiment with code and data as you work your way through the document.

A few guidelines on the notebook itself:
* A notebook consists of *cells*, which are snippets of either text (markdown) or code (Python in this case).
* Cells can be executed by clicking the `[>]` "play" button, or by hitting shift-enter on the keyboard.
* You can navigate between cells either by clicking or by using the arrow buttons.

### Documentation

Check this [PyMongo Tutorial blog bost](http://connor-johnson.com/2014/08/17/getting-started-with-mongodb-and-python/).

PyMongo API documentation:
* http://api.mongodb.org/python/current/index.html#overview
* http://api.mongodb.org/python/current/tutorial.html

# Mongo client setup

We need a MongoClient to connect to a remote Mongo database. We connect to a mongo server node prepared with databases for this exercise session.

In [None]:
from pymongo import MongoClient

# connect to the mongo server running on your local machine
docker_machine_ip = '192.168.99.100' # you might have a different IP, see docker-image IP
client = MongoClient(docker_machine_ip, 27017)

In [None]:
client.database_names()

Let's connect to the i0u19a database.

In [None]:
db = client.i0u19a

Let's check which collections are present in the database.

In [470]:
db.collection_names()

['numberBeersPerBrewery', 'number_beers_per_brewery', 'beers']

It contains our familiar 'beers' collection, let's check what's in it

In [471]:
db.beers.find_one()

{'_id': ObjectId('57208d9d73ecfed3a86d9f55'),
 'alcoholpercentage': 6.0,
 'beer': '3 SchtÃ©ng',
 'brewery': "Brasserie Grain d'Orge",
 'type': ['hoge gisting']}

Great! Let's continue with some exercises.

# Exercises

We will do the exercises defined in: http://vda-lab.github.io/2016/04/mongodb-exercises.

## 1. Warm-up exercises

### 1.a How many beers are there in the database?

In [472]:
nr_beers = db.beers.count() # complete this

nr_beers

1691

In [473]:
assert nr_beers == 1691, "incorrect nr beers: %s" % nr_beers

### 1.b. Return the first 5 beers.

Working with a result set is slightly different than in the Mongo shell. When executing operations like `find()`, pymongo returns a `cursor`.

In [474]:
db.beers.find()

<pymongo.cursor.Cursor at 0x7f106573fa20>

A cursor is an interface to a collection that supports Python's "slice" operator, to select a range of results we are interested in. Slicing is a common operation in Python, for example:

In [475]:
int_list = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

int_list[0:3]

(1, 2, 3)

Now use the slice operator to select the first 5 beers

In [476]:
first_5_beers_cursor = db.beers.find()[0:5] # complete this

# read the items from the cursor into a list
first_5_beers = list(first_5_beers_cursor)

first_5_beers

[{'_id': ObjectId('57208d9d73ecfed3a86d9f55'),
  'alcoholpercentage': 6.0,
  'beer': '3 SchtÃ©ng',
  'brewery': "Brasserie Grain d'Orge",
  'type': ['hoge gisting']},
 {'_id': ObjectId('57208d9d73ecfed3a86d9f56'),
  'alcoholpercentage': 5.6,
  'beer': '400',
  'brewery': "'t Hofbrouwerijke voor Brouwerij Montaigu",
  'type': ['blond']},
 {'_id': ObjectId('57208d9d73ecfed3a86d9f57'),
  'alcoholpercentage': 6.5,
  'beer': 'IV Saison',
  'brewery': 'Brasserie de Jandrain-Jandrenouille',
  'type': ['saison']},
 {'_id': ObjectId('57208d9d73ecfed3a86d9f58'),
  'alcoholpercentage': 7.5,
  'beer': 'V Cense',
  'brewery': 'Brasserie de Jandrain-Jandrenouille',
  'type': ['hoge gisting', 'special belge']},
 {'_id': ObjectId('57208d9d73ecfed3a86d9f59'),
  'alcoholpercentage': 6.0,
  'beer': 'VI Wheat',
  'brewery': 'Brasserie de Jandrain-Jandrenouille',
  'type': ['hoge gisting', 'tarwebier']}]

In [477]:
assert 5 == len(first_5_beers), "incorrect number of beers: %s" % len(first_5_beers)

### 1.c How many beers in the database are of type 'blond troebel'?

Provide the `find(...)` method with a filter criterion.

In [478]:
nr_blond_troebel = db.beers.find({'type': 'blond troebel'}).count() # complete this

nr_blond_troebel

1

In [479]:
assert 1 == nr_blond_troebel, "Incorrect result for nr of 'blond troebel' beers: %s" % nr_blond_troebel 

### 1.d Of these 'blond troebel' beers, only return the name of the beer.

Provide the `find(...)` with a filter criterion and a projection.

In [480]:
troebel_blond_cursor = db.beers.find({'type': 'blond troebel'}, {'_id': 0, 'beer': 1}) # complete this

troebel_blond_names = list(troebel_blond_cursor)

troebel_blond_names

[{'beer': "L'Autruche Biere des Gilles"}]

NOTE: make sure that the item IDs are not included in the result!

In [481]:
troebel_blond_tuple_lengths = list(len(b) for b in troebel_blond_names)

assert (troebel_blond_tuple_lengths[0] == 1 and len(set(troebel_blond_tuple_lengths)) == 1), "incorrect projection"

### 1.e How many beers have a percentage alcohol of more than 8 degrees?

Provide the find(...) method with the correct filter predicate.

In [482]:
nr_strong_beers = db.beers.find({'alcoholpercentage': {'$gt': 8}}).count() # complete this

nr_strong_beers

399

In [483]:
assert 399 == nr_strong_beers, "incorrect nr of strong beers: %s" % nr_strong_beers

### 1.f How many beers have low alcohol ('alcoholarm')?

Provide the find(...) method with the correct filter predicate.

In [484]:
nr_lemonades = db.beers.find({'type': 'alcoholarm'}).count()

nr_lemonades

6

In [485]:
assert 6 == nr_lemonades, "incorrect nr of low alcohol beers: %s" % nr_lemonades

## 2. Aggregation pipeline exercises

Alrightie, we have learned some basic Mongo chops, let's now move on to one of Mongo DB's swiss army knife operators for data wrangling: the `aggregate` pipeline.

Check out the documentation here:
https://docs.mongodb.org/manual/reference/operator/aggregation-pipeline/.

We pass `aggregate` a list of pipeline commands, the data is sequentially operated upon by each command.

For example, let's select those beers that have a percentage of more than 8 degrees, get the average of these per brewery, and finally take a sample:

In [486]:
# notice the $ signs !

avg_strong_per_brewery_cursor = db.beers.aggregate([
  {'$match': {'alcoholpercentage': {'$gt': 8}}},
  {'$group': {'_id': '$brewery', 'avg': {'$avg': '$alcoholpercentage'}}},
  {'$sample': {'size': 5}}
])

list(avg_strong_per_brewery_cursor)

[{'_id': 'De Proefbrouwerij', 'avg': 9.0},
 {'_id': 'Brasserie de Cazeau', 'avg': 8.2},
 {'_id': 'De Proefbrouwerij voor Brouwerij The Musketeers',
  'avg': 8.733333333333333},
 {'_id': 'Brouwerij Caulier', 'avg': 10.0},
 {'_id': 'Brouwerij Kortrijk-dUtsel', 'avg': 8.5}]

We provided `aggregate` with a pipeline like this:

```
          +--------+     +--------+     +---------+
INPUT --> | $match | --> | $group | --> | $sample | --> RESULT
          +--------+     +--------+     +---------+
```

You can specify pipelines of arbitrary length and complexity, using different commands like (but not limited to):

* `$project`: reshape each document
* `$match`: filter the stream
* `$limit`: return only the first n documents
* `$unwind`: deconstruct a list in each document into separate documents
* `$group`: group documents by a given identifier
* `$sort`: reorder the document stream by a specified sort key
* `$sample`: take a random sample
* `$out`: write the results to a new collection. If used, this should be the last step of the pipeline.

Good, now proceed and wrangle your data `aggregate`-style!

### 2.a What is the average alcoholpercentage per brewery?

In [487]:
avg_pct_per_brewery_cursor = db.beers.aggregate([
        {'$group': {'_id': '$brewery',  
                    'avg': {'$avg': '$alcoholpercentage'}}} # complete this
    ])

avg_pct_per_brewery = list(avg_pct_per_brewery_cursor)

avg_pct_per_brewery

[{'_id': 'De Proefbrouwerij voor de gemeente Zwalm', 'avg': 8.0},
 {'_id': 'De Proefbrouwerij voor Zonderik Beer Company', 'avg': 8.25},
 {'_id': 'Brouwerij Lupus', 'avg': 8.299999999999999},
 {'_id': 'De Proefbrouwerij voor Oude Brouwerij van Zonnegem vzw', 'avg': 7.1},
 {'_id': 'Sint-Sixtusabdij van Westvleteren', 'avg': 7.55},
 {'_id': 'Brouwerij Westmalle', 'avg': 8.25},
 {'_id': "Brouwerij De Graal voor t'Drankorgel", 'avg': 8.3},
 {'_id': 'De Proefbrouwerij in opdracht van De Kale Ridders', 'avg': 7.125},
 {'_id': 'De Proefbrouwerij voor Brouwerij The Musketeers',
  'avg': 7.6800000000000015},
 {'_id': 'Brouwerij Anders! voor BOMBrewery', 'avg': 6.4},
 {'_id': 'Brouwerij en stokerij Wilderen', 'avg': 8.2},
 {'_id': 'De Proefbrouwerij voor Buitenlust', 'avg': 5.5},
 {'_id': 'Brouwerij Den ToetÃ«lÃ¨r', 'avg': 7.3999999999999995},
 {'_id': 'Brouwerij Ter Dolen', 'avg': 6.566666666666666},
 {'_id': 'Brasserie du Bocq voor Brouwerij Corsendonk', 'avg': 6.0},
 {'_id': 'Brasserie des Fa

In [488]:
# Verify by checking AB Inbev's average percentage.

AB_Inbev_avg = next(b.get('avg') for b in avg_pct_per_brewery if b.get('_id') == "AB Inbev")

assert 5.2 == AB_Inbev_avg, "incorrect average alcohol percentage for AB Inbev: %s" % AB_Inbev_avg

### 2.b Which breweries have an average alcohol percentage higher than 10 degrees?

Return these in descending order of alcoholpercentage.

In [489]:
avg_gt_10_per_brewery_cursor = db.beers.aggregate([
        {'$group': {'_id': '$brewery', 
                    'avg': {'$avg': '$alcoholpercentage'}}}, # complete this
        {'$match': {'avg': {'$gt': 10}}},
        {'$sort': {'avg': -1}}
    ])

avg_gt_10_per_brewery = list(avg_gt_10_per_brewery_cursor)

avg_gt_10_per_brewery

[{'_id': 'Staminee De Garre (Brouwerij Van Steenberge)', 'avg': 11.5},
 {'_id': 'Brouwerij Sint-Jozef voor brouwerij Kerkom', 'avg': 10.65},
 {'_id': 'Brouwerij Dubuisson', 'avg': 10.5},
 {'_id': 'De Struise Brouwers bij Brouwerij Deca', 'avg': 10.204545454545455}]

In [490]:
assert 4 == len(avg_gt_10_per_brewery), "incorrect nr of breweries with average alcohol percentage > 10: %s" % len(avg_gt_10_per_brewery)

assert 'Staminee De Garre (Brouwerij Van Steenberge)' == avg_gt_10_per_brewery[0]['_id'], "incorrect top brewery"

### 2.c What is the average alcoholpercentage per type of beer? 

Sort by alcoholpercentage (descending).

**HINT**: beers can have more than one type, check the list of pipeline commands again to find a command that helps dealing with this.

In [491]:
avg_per_type_cursor = db.beers.aggregate([
        {'$unwind': '$type'},
        {'$group': {'_id': '$type', 
                    'avg': {'$avg': '$alcoholpercentage'}}},
        {'$sort': {'avg': -1}}
    ])

avg_per_type = list(avg_per_type_cursor)

avg_per_type

[{'_id': 'Eisbockmethode', 'avg': 26.0},
 {'_id': 'Russian Imperial Stout', 'avg': 15.4},
 {'_id': 'Belgian Royal Stout', 'avg': 13.0},
 {'_id': 'amberkleurig speciaalbier', 'avg': 11.5},
 {'_id': 'robijnrood speciaalbier', 'avg': 11.0},
 {'_id': 'quadrupel', 'avg': 10.385714285714286},
 {'_id': 'blond kerstbier', 'avg': 10.0},
 {'_id': 'Belgo/american ale', 'avg': 10.0},
 {'_id': 'donkere tripel', 'avg': 10.0},
 {'_id': 'single hop', 'avg': 10.0},
 {'_id': 'donker seizoensbier', 'avg': 10.0},
 {'_id': 'Belgian strong pale ale', 'avg': 10.0},
 {'_id': 'donker speciaalbier', 'avg': 9.666666666666666},
 {'_id': 'oak aged', 'avg': 9.583333333333334},
 {'_id': 'blonde India Pale Ale', 'avg': 9.5},
 {'_id': 'abdijbier ale', 'avg': 9.5},
 {'_id': 'gerstewijn', 'avg': 9.5},
 {'_id': 'imperial stout', 'avg': 9.5},
 {'_id': 'tripel India Pale Ale', 'avg': 9.0},
 {'_id': 'Belgian Strong Ale', 'avg': 9.0},
 {'_id': 'fruit- en bloemenbier', 'avg': 9.0},
 {'_id': 'abdijbier bruin', 'avg': 9.0},
 {'

In [492]:
assert 'Eisbockmethode' == avg_per_type[0]['_id'], "incorrect top alcohol percentage beer type: '%s'" % avg_per_type[0]['_id']

### 2.d What is the range (max - min) of alcoholpercentage for beers per brewery that brews more than 1 beer?

Sort by range (descending).

This is a sophisticated aggregation that will involve some serious Mongo DB [kung-fu](https://www.youtube.com/watch?v=SncapPrTusA)!

**HINT**: in an early stage, you will need to collect all percentages per brewery. Find a '$group' [accumulator](https://docs.mongodb.org/manual/reference/operator/aggregation/group/) that allows you to do that.

In [493]:
range_per_brewery_cursor = db.beers.aggregate([ # complete this
        {'$group': {'_id': '$brewery', 'percentages': {'$push': '$alcoholpercentage'}}},
        {'$project': {'_id': 1, 
                      'percentages': 1, 
                      'count': {'$size': '$percentages'},
                      'min': {'$min': '$percentages'},
                      'max': {'$max': '$percentages'}
                     }},
        {'$match': {'count': {'$gt': 1}}},
        {'$project': {'range': {'$subtract': ['$max', '$min']}}},
        {'$sort': {'range': -1}}
    ])

range_per_brewery = list(range_per_brewery_cursor)

range_per_brewery

[{'_id': 'De Struise Brouwers bij Brouwerij Deca', 'range': 21.0},
 {'_id': 'Group John Martin', 'range': 12.0},
 {'_id': 'Brouwerij Van Steenberge', 'range': 11.5},
 {'_id': 'Brouwerij Alvinne', 'range': 11.2},
 {'_id': 'Brouwerij Van Honsebrouck', 'range': 11.0},
 {'_id': "Brasserie d'Ecaussinnes", 'range': 10.0},
 {'_id': 'De Proefbrouwerij', 'range': 9.5},
 {'_id': 'Brouwerij De Regenboog', 'range': 9.5},
 {'_id': 'Brouwerij Strubbe', 'range': 8.899999999999999},
 {'_id': 'Brasserie Du Bocq', 'range': 8.7},
 {'_id': 'Brouwerij Haacht', 'range': 8.1},
 {'_id': 'Brouwerij De Arend', 'range': 8.0},
 {'_id': 'Brouwerij Roman', 'range': 8.0},
 {'_id': 'Brouwerij Sint-Jozef', 'range': 8.0},
 {'_id': 'Brouwerij Bavik', 'range': 7.5},
 {'_id': 'Brouwerij Het Anker', 'range': 7.5},
 {'_id': 'Brouwerij Bosteels', 'range': 7.5},
 {'_id': 'Brouwerij Palm', 'range': 7.25},
 {'_id': 'De Proefbrouwerij in opdracht van De Kastelse Biervereniging',
  'range': 7.0},
 {'_id': 'Brouwerij Smisje', 'ran

In [494]:
assert 21.0 == range_per_brewery[0]['range'], "incorrect first range"

## 3. MapReduce in Mongo DB

Read the section on M/R in the [blog post](http://vda-lab.github.io/2016/04/mongodb-exercises).

With PyMongo, things can get a little awkward because we pass JavaScript functions to Mongo DB, using Python. The JavaScript functions are expressed as Strings and wrapped in a `Code` object, like this:

In [495]:
from bson.code import Code

In [496]:
map_fn = Code("""
function() {
  emit(this.brewery, 1);
};
""")

reduce_fn = Code("""
function(brewery, values) {
  return Array.sum(values)
};
""")

db.beers.map_reduce(
    map_fn,
    reduce_fn,
    'numberBeersPerBrewery')

Collection(Database(MongoClient(host=['vdalab-docker-mongo-35bb8a5c-1.1d9f54c9.cont.dockerapp.io:27017'], document_class=dict, tz_aware=False, connect=True), 'tmoerman'), 'numberBeersPerBrewery')

the output of the M/R operation is written to a new collection. Let's check the collections on our database again:

In [497]:
db.collection_names()

['numberBeersPerBrewery', 'number_beers_per_brewery', 'beers']

In [498]:
db.numberBeersPerBrewery.find_one()

{'_id': "'t Hofbrouwerijke", 'value': 15.0}

Okay, that seems to work.

Now it's up to you to complete the final exercises, good luck!

### 3.a Top 10 productive breweries
Using the `numberBeersPerBrewery` collection that you just generated, get the top-10 of the breweries. How can we sort from high to low?

Use an aggregation pipeline!

In [499]:
top_10_productive_cursor = db.numberBeersPerBrewery.aggregate([ #complete this
        {'$sort': {'value': -1}},
        {'$limit': 10}
    ])

top_10_productive = list(top_10_productive_cursor)

top_10_productive

[{'_id': 'Brouwerij Huyghe', 'value': 43.0},
 {'_id': 'Brouwerij Van Honsebrouck', 'value': 36.0},
 {'_id': 'Brouwerij Van Steenberge', 'value': 32.0},
 {'_id': 'Brouwerij De Regenboog', 'value': 31.0},
 {'_id': 'Brouwerij Alvinne', 'value': 30.0},
 {'_id': 'Brouwerij Haacht', 'value': 27.0},
 {'_id': 'Brouwerij Bavik', 'value': 24.0},
 {'_id': 'De Struise Brouwers bij Brouwerij Deca', 'value': 22.0},
 {'_id': 'Brouwerij Van Eecke', 'value': 21.0},
 {'_id': 'Brouwerij Strubbe', 'value': 21.0}]

In [500]:
top_10_result_size = len(top_10_productive)

assert 10 == top_10_result_size, "incorrect result size: %s" % top_10_result_size

assert 43 == top_10_productive[0]['value']

### 3.b String matching

Find all entries in the collection `numberBeersPerBrewery`, that contain the word 'Inbev' in the brewery field. You will probably get 3 results. However, there should be 9. Why? How can you solve that?

In [501]:
import re
inbev_matcher = re.compile("inbev", re.IGNORECASE) # complete this

inbev_like_count = db.numberBeersPerBrewery.find(
    {"_id": inbev_matcher} # complete this
).count()

In [502]:
assert 9 == inbev_like_count, "incorrect nr of 'inbev' like breweries found: %s" % inbev_like_count

### 3.c Map/Reduce aggregation: max
Using a single mapreduce on the beers collection, calculate the maximum alcohol percentage per type of beer.

In [503]:
def max_pct_MR(): # complete this
    
    map_fn = Code("""    
    function() {
        var beer = this;
        beer.type.forEach(function(t) {
            emit(t, beer.alcoholpercentage)
        });
    };
    """)
    
    red_fn = Code("""    
    function(key, values) {
        return Math.max.apply(null,values);
    };    
    """)
    
    # inline returns the result instead of making a collection
    return db.beers.inline_map_reduce(map_fn, red_fn)

max_per_type = max_pct_MR()

max_per_type

[{'_id': '0', 'value': 0.0},
 {'_id': '0 RosÃ©e', 'value': 0.0},
 {'_id': '00', 'value': 0.0},
 {'_id': '100% lambiek', 'value': 5.0},
 {'_id': 'Alcoholarm', 'value': 1.0},
 {'_id': 'Amber', 'value': 8.2},
 {'_id': 'Belgian IPA', 'value': 11.0},
 {'_id': 'Belgian Imperial Stout', 'value': 8.7},
 {'_id': 'Belgian Royal Stout', 'value': 13.0},
 {'_id': 'Belgian Strong Ale', 'value': 10.0},
 {'_id': 'Belgian ale', 'value': 9.0},
 {'_id': 'Belgian dark ale', 'value': 5.0},
 {'_id': 'Belgian strong ale', 'value': 7.7},
 {'_id': 'Belgian strong pale ale', 'value': 10.0},
 {'_id': 'Belgische ale', 'value': 5.1},
 {'_id': 'Belgische blonde ale', 'value': 7.0},
 {'_id': 'Belgische pale ale', 'value': 5.5},
 {'_id': 'Belgo/american ale', 'value': 10.0},
 {'_id': 'Blonde', 'value': 0.0},
 {'_id': 'Brown ale', 'value': 3.8},
 {'_id': 'Dortmunder', 'value': 6.3},
 {'_id': 'Double IPA', 'value': 9.0},
 {'_id': 'Eisbockmethode', 'value': 26.0},
 {'_id': 'Erkend Belgisch Abdijbier', 'value': 9.5},
 {'

In [504]:
max_pct_IPA = next(b['value'] for b in max_per_type if b.get('_id') == 'IPA')

assert 10 == max_pct_IPA, "incorrect max alcohol percentage for IPA: %s" % max_pct_IPA

### 3.d Map/Reduce aggregation: average
Using a single mapReduce on the beers collection, calculate the average alcohol percentage per type of beer. Remember that in order to calculate an average, you will first need a sum and a count. 

Hint: watch out, reduce will not run if there is only one element for a given key (see this [stackoverflow discussion](http://stackoverflow.com/questions/11021733/mongodb-mapreduce-emit-one-key-one-value-doesnt-call-reduce).

In [505]:
def avg_pct_MR():
    
    map_fn = Code("""    
    function() {
        var beer = this;        
        beer.type.forEach(function(t) {
            emit(t, {count: 1, total: beer.alcoholpercentage});            
        });
    };    
    """)
    
    red_fn = Code("""
    function(type, percentages) {
        acc = {count: 0, total: 0};
        
        percentages.forEach(function(p) {
            acc.count += p.count
            acc.total += p.total
        });
        
        return acc;
    };        
    """)
    
    fin_fn = Code("""   
    function(type, acc) {        
        return acc.total / acc.count;
    };
    """)
    
    return db.beers.inline_map_reduce(map_fn, red_fn, finalize = fin_fn) 

avg_per_type = avg_pct_MR()

avg_per_type

[{'_id': '0', 'value': 0.0},
 {'_id': '0 RosÃ©e', 'value': 0.0},
 {'_id': '00', 'value': 0.0},
 {'_id': '100% lambiek', 'value': 5.0},
 {'_id': 'Alcoholarm', 'value': 1.0},
 {'_id': 'Amber', 'value': 8.2},
 {'_id': 'Belgian IPA', 'value': 8.75},
 {'_id': 'Belgian Imperial Stout', 'value': 8.7},
 {'_id': 'Belgian Royal Stout', 'value': 13.0},
 {'_id': 'Belgian Strong Ale', 'value': 9.0},
 {'_id': 'Belgian ale', 'value': 7.75},
 {'_id': 'Belgian dark ale', 'value': 5.0},
 {'_id': 'Belgian strong ale', 'value': 7.7},
 {'_id': 'Belgian strong pale ale', 'value': 10.0},
 {'_id': 'Belgische ale', 'value': 5.1},
 {'_id': 'Belgische blonde ale', 'value': 6.5},
 {'_id': 'Belgische pale ale', 'value': 5.25},
 {'_id': 'Belgo/american ale', 'value': 10.0},
 {'_id': 'Blonde', 'value': 0.0},
 {'_id': 'Brown ale', 'value': 3.8},
 {'_id': 'Dortmunder', 'value': 5.8},
 {'_id': 'Double IPA', 'value': 9.0},
 {'_id': 'Eisbockmethode', 'value': 26.0},
 {'_id': 'Erkend Belgisch Abdijbier', 'value': 8.0},
 {

In [506]:
avg_pct_lager = next(b['value'] for b in avg_per_type if b.get('_id') == 'lager')

assert 5.45 == avg_pct_lager, "incorrect average alcohol percentage for lager: %s" % avg_pct_lager

If you've made it to here and completed all exercises correctly, pat yourself on the back, open up a cold Duvel and enjoy some well-earned rest!

[You've done it again](https://www.youtube.com/watch?v=n3UKJq_lxcM)! 
