Skip to content

Commit 4dfafbb

Browse files
authored
Update 4. Aggregation Pipelines: Let the Server Do It For You.md
1 parent 852ab8f commit 4dfafbb

File tree

1 file changed

+55
-1
lines changed

1 file changed

+55
-1
lines changed

Introduction to MongoDB in Python/4. Aggregation Pipelines: Let the Server Do It For You.md

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,4 +136,58 @@ Possible Answers
136136
- [ ] {"$expr": {"$in": ["$bornCountry", db.laureates.distinct("bornCountry")]}}
137137
- [ ] {"$expr": {"$eq": [{"$type": "$bornCountry"}, "string"]}}
138138
- [ ] {"bornCountry": {"$type": "string"}}
139-
- [ ] All of the above
139+
- [x] All of the above
140+
## 🦍 Here and elsewhere
141+
- [x] Use $unwind stages to ensure a single prize affiliation country per pipeline document.
142+
- [x] Filter out prize-affiliation-country values that are "empty" (null, not present, etc.) -- ensure values are "$in" the list of known values.
143+
- [x] Produce a count of documents for each value of "affilCountrySameAsBorn" (a field we've projected for you using the $indexOfBytes operator) by adding 1 to the running sum.
144+
```py
145+
key_ac = "prizes.affiliations.country"
146+
key_bc = "bornCountry"
147+
pipeline = [
148+
{"$project": {key_bc: 1, key_ac: 1}},
149+
150+
# Ensure a single prize affiliation country per pipeline document
151+
{"$unwind": "$prizes"},
152+
{"$unwind": "$prizes.affiliations"},
153+
154+
# Ensure values in the list of distinct values (so not empty)
155+
{"$match": {key_ac: {"$in": db.laureates.distinct(key_ac)}}},
156+
{"$project": {"affilCountrySameAsBorn": {
157+
"$gte": [{"$indexOfBytes": ["$"+key_ac, "$"+key_bc]}, 0]}}},
158+
159+
# Count by "$affilCountrySameAsBorn" value (True or False)
160+
{"$group": {"_id": "$affilCountrySameAsBorn",
161+
"count": {"$sum": 1 }}},
162+
]
163+
for doc in db.laureates.aggregate(pipeline): print(doc)
164+
```
165+
## 🦍 Countries of birth by prize category
166+
- [x] $unwind the laureates array field to output one pipeline document for each array element.
167+
- [x] After pulling in laureate bios with a $lookup stage, unwind the new laureate_bios array field (each laureate has only a single biography document).
168+
- [x] Collect the set of bornCountries associated with each prize category.
169+
- [x] Project out the size of each category's set of bornCountries.
170+
```py
171+
pipeline = [
172+
# Unwind the laureates array
173+
{"$unwind": "$laureates"},
174+
{"$lookup": {
175+
"from": "laureates", "foreignField": "id",
176+
"localField": "laureates.id", "as": "laureate_bios"}},
177+
178+
# Unwind the new laureate_bios array
179+
{"$unwind": "$laureate_bios"},
180+
{"$project": {"category": 1,
181+
"bornCountry": "$laureate_bios.bornCountry"}},
182+
183+
# Collect bornCountry values associated with each prize category
184+
{"$group": {"_id": "$category",
185+
"bornCountries": {"$addToSet": "$bornCountry"}}},
186+
187+
# Project out the size of each category's (set of) bornCountries
188+
{"$project": {"category": 1,
189+
"nBornCountries": {"$size": "$bornCountries"}}},
190+
{"$sort": {"nBornCountries": -1}},
191+
]
192+
for doc in db.prizes.aggregate(pipeline): print(doc)
193+
```

0 commit comments

Comments
 (0)