### Problem statement

The data is related to a coding platform that hosts coding challenges. They have a unique business model, where they crowdsource problems from various creators(authors). These authors create the problem and release it on the client's platform. The users then select the challenges they want to solve. 

This dataset contains information about each coding problem. It contains information about the problem, about the author who created it and a list of users who have attempted the problem.

Below are the fields that can be founf within each document in the collection -

- `challenge_id` - Unique id of the challenge problem

- `programming_language` - Programming language for the challenge

- `total_submissions` - Total submissions by all users

- `publish_date` - Publishing date for the challenge

- `author` - Embedded document about the author of the challenge.
> - `id` - Author id
> - `gender` - Author gender
> - `org_id` - Organisation if for author

- `users` - List of users who have attempted the challenge

----

### Connecting to MongoDB


----

In [1]:
# Importing the required libraries
import pymongo
import pprint as pp


pp.sorted = lambda x, key=None: x

In [2]:
client = pymongo.MongoClient("mongodb+srv://vikasmandal380:Vikas995511@vikas1234.3rgy0.mongodb.net/?retryWrites=true&w=majority&appName=Vikas1234")

---
### Importing data

----

In [3]:
# # Restore database
# !mongorestore /home/avadmin/Desktop/Mongo/Content/Querying/Assignment/Data/querying_assignment

In [3]:
#db = client['querying_assignment']
db = client["querying"]
collection = db["quyeringproject"]

In [7]:
import os 
import bson
os.chdir(r"C:\Users\vicky\Downloads\Assignment-210715-134432\Assignment\Data\querying_assignment\querying_assignment")

In [8]:
os.listdir()

['challenge.bson', 'challenge.metadata.json']

In [9]:
with open('challenge.bson', "rb") as bson_file:
    for doc in bson.decode_file_iter(bson_file):
        collection.update_one({'_id': doc['_id']}, {'$set': doc}, upsert=True)


db.list_collection_names()

pp.pprint(
    db.challenge.find_one()
)

---
### Assignment Questions

----

### Q1. 

Find the number of documents in the collection

In [10]:
# Enter your code here
q1_result = collection.count_documents({})
print("Q1:", q1_result)

Q1: 5606


### Q2. 

Find the number of unique `programming_language` and `challenge_id`

In [11]:
# Enter your code here
q2_result = collection.aggregate([
    {"$group": {"_id": None, "unique_languages": {"$addToSet": "$programming_language"}, "unique_challenges": {"$addToSet": "$challenge_id"}}},
    {"$project": {"unique_language_count": {"$size": "$unique_languages"}, "unique_challenge_count": {"$size": "$unique_challenges"}}}
])
print("Q2:", list(q2_result))

Q2: [{'_id': None, 'unique_language_count': 3, 'unique_challenge_count': 5606}]


### Q3. 

How many documents are there where the challenge was created between `2009-01-01` and `2010-01-01`? 

In [13]:
from datetime import datetime

In [14]:
# Enter your code here
q3_result = collection.count_documents({
    "publish_date": {"$gte": datetime(2009, 1, 1), "$lt": datetime(2010, 1, 1)}
})
print("Q3:", q3_result)

Q3: 888


### Q4. 

How many challenges have been written by author `AI563576` in either `programming_language` `1` or `3` ?


In [15]:
# Enter your code here
q4_result = collection.count_documents({
    "author.id": "AI563576",
    "programming_language": {"$in": [1, 3]}
})
print("Q4:", q4_result)

Q4: 41


### Q5. 

How many documents are there where the challenges have been created by a female author and where the author belong to either the 'AOI100013' organisation or the 'AOI100018' organisation?

In [16]:
# Enter your code here
q5_result = collection.count_documents({
    "author.gender": "female",
    "author.org_id": {"$in": ["AOI100013", "AOI100018"]}
})
print("Q5:", q5_result)

Q5: 0


### Q6.

Find the top 5 challenge ids where either the challenge has been attempted by exactly 100 `users` or where the `total_submissions` is between 100 and 200, both inclusive?

***Hint - Think of using the `$size` operator.***

In [17]:
# Enter your code here
q6_result = collection.find({
    "$or": [
        {"users": {"$size": 100}},
        {"total_submissions": {"$gte": 100, "$lte": 200}}
    ]
}, {"challenge_id": 1}).limit(5)
print("Q6:", list(q6_result))

Q6: [{'_id': ObjectId('60dab9f75945974466d8d651'), 'challenge_id': 'CI23482'}, {'_id': ObjectId('60dab9f75945974466d8d65d'), 'challenge_id': 'CI23494'}, {'_id': ObjectId('60dab9f75945974466d8d660'), 'challenge_id': 'CI23497'}, {'_id': ObjectId('60dab9f75945974466d8d663'), 'challenge_id': 'CI23500'}, {'_id': ObjectId('60dab9f75945974466d8d673'), 'challenge_id': 'CI23516'}]


### Q7. 

How documents are there where either the `publish_date > 2010-01-01` and `total_submissions > 100`, or the `publish_date < 2000-01-01` and `total_submissions > 1000` ?

In [18]:
# Enter your code here
q7_result = collection.count_documents({
    "$or": [
        {"publish_date": {"$gt": datetime(2010, 1, 1)}, "total_submissions": {"$gt": 100}},
        {"publish_date": {"$lt": datetime(2000, 1, 1)}, "total_submissions": {"$gt": 1000}}
    ]
})
print("Q7:", q7_result)

Q7: 45
