|
1 |
| -## 📚 Intro to Aggregation |
| 1 | +## ➡️ 📚 Intro to Aggregation |
2 | 2 | ### 📚 Queries have implicit stages
|
3 | 3 | 
|
4 | 4 | ### 📚 Adding sort and skip stages
|
5 | 5 | 
|
6 | 6 | ### 📚 can i count?
|
7 | 7 | 
|
| 8 | + |
| 9 | +## 🦍 Sequencing stages |
| 10 | +Here is a cursor, followed by four aggregation pipeline stages: |
| 11 | + |
| 12 | + cursor = (db.laureates.find( |
| 13 | + projection={"firstname": 1, "prizes.year": 1, "_id": 0}, |
| 14 | + filter={"gender": "org"}) |
| 15 | + .limit(3).sort("prizes.year", -1)) |
| 16 | + |
| 17 | + project_stage = {"$project": {"firstname": 1, "prizes.year": 1, "_id": 0}} |
| 18 | + match_stage = {"$match": {"gender": "org"}} |
| 19 | + limit_stage = {"$limit": 3} |
| 20 | + sort_stage = {"$sort": {"prizes.year": -1}} |
| 21 | +> What sequence pipeline of the above four stages can produce a cursor db.laureates.aggregate(pipeline) equivalent to cursor above? |
| 22 | +Possible Answers |
| 23 | +- [ ] [project_stage, match_stage, limit_stage, sort_stage] |
| 24 | +- [ ] [project_stage, match_stage, sort_stage, limit_stage] |
| 25 | +- [ ] [match_stage, project_stage, limit_stage, sort_stage] |
| 26 | +- [x] [match_stage, project_stage, sort_stage, limit_stage] |
| 27 | + |
| 28 | +## 🦍 Aggregating a few individuals' country data |
| 29 | +The following query cursor yields birth-country and prize-affiliation-country information for three non-organization laureates: |
| 30 | + |
| 31 | + cursor = (db.laureates.find( |
| 32 | + {"gender": {"$ne": "org"}}, |
| 33 | + ["bornCountry", "prizes.affiliations.country"] |
| 34 | + ).limit(3)) |
| 35 | +- [x] Translate the above cursor cursor to an equivalent aggregation cursor, saving the pipeline stages to pipeline. Recall that the find collection method's "filter" parameter maps to the "$match" aggregation stage, its "projection" parameter maps to the "$project" stage, and the "limit" parameter (or cursor method) maps to the "$limit" stage. |
| 36 | +```py |
| 37 | +# Translate cursor to aggregation pipeline |
| 38 | +pipeline = [ |
| 39 | + {'$match': {'gender': {'$ne': 'org'}}}, |
| 40 | + {'$project': {'bornCountry': 1, 'prizes.affiliations.country': 1}}, |
| 41 | + {'$limit': 3} |
| 42 | +] |
| 43 | + |
| 44 | +for doc in db.laureates.aggregate(pipeline): |
| 45 | + print("{bornCountry}: {prizes}".format(**doc)) |
| 46 | +``` |
| 47 | + |
| 48 | +## 🦍 Passing the aggregation baton to Python |
| 49 | +- [x] Save to pipeline an aggregation pipeline to collect prize documents as detailed above. Use Python's collections.OrderedDict to specify any sorting. |
| 50 | +```py |
| 51 | +from collections import OrderedDict |
| 52 | +from itertools import groupby |
| 53 | +from operator import itemgetter |
| 54 | + |
| 55 | +original_categories = set(db.prizes.distinct("category", {"year": "1901"})) |
| 56 | + |
| 57 | +# Save an pipeline to collect original-category prizes |
| 58 | +pipeline = [ |
| 59 | + {"$match": {"category": {"$in": list(original_categories)}}}, |
| 60 | + {"$project": {"category": 1, "year": 1}}, |
| 61 | + {"$sort": OrderedDict([("year", -1)])} |
| 62 | +] |
| 63 | +cursor = db.prizes.aggregate(pipeline) |
| 64 | +for key, group in groupby(cursor, key=itemgetter("year")): |
| 65 | + missing = original_categories - {doc["category"] for doc in group} |
| 66 | + if missing: |
| 67 | + print("{year}: {missing}".format( |
| 68 | + year=key, missing=", ".join(sorted(missing)))) |
| 69 | +``` |
0 commit comments