Skip to content

Commit 77c0662

Browse files
authored
Update 4. Aggregation Pipelines: Let the Server Do It For You.md
1 parent f2f2503 commit 77c0662

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

Introduction to MongoDB in Python/4. Aggregation Pipelines: Let the Server Do It For You.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,3 +97,31 @@ pipeline = [
9797
print(list(db.laureates.aggregate(pipeline)))
9898
```
9999
[{'_id': None, 'n_prizes_total': 27}]
100+
## 🦍 Gap years, aggregated
101+
- [x] Make the $group stage output a document for each prize year (set "_id" to the field path for year) with the set of categories awarded that year.
102+
- [x] Given your intermediate collection of year-keyed documents, $project a field named "missing" with the (original) categories not awarded that year. Again, mind your field paths!
103+
- [x] Use a $match stage to only pass through documents with at least one missing prize category.
104+
- [x] Finally, add sort documents in descending order.
105+
```py
106+
from collections import OrderedDict
107+
108+
original_categories = sorted(set(db.prizes.distinct("category", {"year": "1901"})))
109+
pipeline = [
110+
{"$match": {"category": {"$in": original_categories}}},
111+
{"$project": {"category": 1, "year": 1}},
112+
113+
# Collect the set of category values for each prize year.
114+
{"$group": {"_id": "$year", "categories": {"$addToSet": "$category"}}},
115+
116+
# Project categories *not* awarded (i.e., that are missing this year).
117+
{"$project": {"missing": {"$setDifference": [original_categories, "$categories"]}}},
118+
119+
# Only include years with at least one missing category
120+
{"$match": {"missing.0": {"$exists": True}}},
121+
122+
# Sort in reverse chronological order. Note that "_id" is a distinct year at this stage.
123+
{"$sort": OrderedDict([("_id", -1)])},
124+
]
125+
for doc in db.prizes.aggregate(pipeline):
126+
print("{year}: {missing}".format(year=doc["_id"],missing=", ".join(sorted(doc["missing"]))))
127+
```

0 commit comments

Comments
 (0)