# **Theory Questions**

1. **Key Differences Between SQL and NoSQL Databases?**
  > SQL: Data stored in table-based structure, uses schema and good for complex queries.

  > NoSQL: Data stored in document/graph/key-value based, schema-less, good for fast and scalable apps.

2. **What Makes MongoDB Good for Modern Applications**
  > Since mongoDB is flexible , easy to scale and handles large amounts of data, its is good for large and rapid application growth.

3. **Concept of Collections in MongoDB?**
  > A collection is like a table in SQL. It holds multiple documents in MongoDB.

4. **How MongoDB Ensures High Availability Using Replication?**
  > MongoDB uses replication sets that maintain the same data set with one primary and multiple secondary nodes. If the primary fails, a secondary takes over.

5. **Main Benefits of MongoDB Atlas?**
  > MongoDB Atlas provides an easy way to host and manage your data in the cloud.

6. **Role of Indexes in MongoDB?**
  > Indexes speed up data searches. Without indexes, MongoDB scans every document which makemit slow to search data.



7. **Stages of the MongoDB Aggregation Pipeline?**
  > An aggregation pipeline consists of one or more stages that process documents: Each stage performs an operation on the input documents. Example- `$match`, `$group`, `$sort`, `$set`, `$unset`, `$limit`

8. **What is Sharding in MongoDB? How is it Different from Replication?**
  > Sharding: Splits data across servers for load balancing.

  > Replication: Copies data for backup and high availability.

9. **What is PyMongo, and Why is it Used?**
  > PyMongo is a python library to connect with MongoDB from Python apps.



10. **ACID Properties in MongoDB Transactions?**
  > Ensures Atomicity, Consistency, Isolation, Durability — like SQL, to make sure data is correct and safe during multi-document operations.


11. **Purpose of MongoDB’s explain() Function?**
  > It shows how a query is executed and return information about the write information. Helps optimize performance by showing use of indexes, scan types, etc.


12. **How MongoDB Handles Schema Validation?**
  > By defining rules for documents like field types, required fields to control data structure.

13. **Difference Between Primary and Secondary Node in Replica Set?**
  > Primary: Accepts writes and reads.

  > Secondary: Copies data from primary and can serve read-only queries.

14. **Security Mechanisms MongoDB Provides?**
  > It provides Authentication, authorization, encryption and auditing.

15. **Concept of Embedded Documents and When to Use Them?**
  > This allows the storage of related data within a single parent document, creating a hierarchical data structure.

16. **Purpose of MongoDB’s $lookup in Aggregation?**
  > It joins documents from another collection, like SQL joins.

17. **Common Use Cases for MongoDB?**
  > Real-time analytics, content management, IoT, mobile apps, product catalogs, user profiles.

18. **Advantages of MongoDB for Horizontal Scaling?**
  > MongoDB achieves horizontal scaling using sharding. It distribute data across multiple machines to handle more traffic and data without slowing down.

19. **How MongoDB Transactions Differ from SQL Transactions?**
  > Similar in function, but MongoDB started supporting multi-document transactions more recently. Usually faster for single-document operations.

20. **Difference Between Capped Collections and Regular Collections?**
  > Capped: Capped collections have a predefined, fixed size in bytes, or a maximum number of documents, established at creation.

  > Regular: Regular collections grow dynamically as documents are added, with no predefined size limit.

21. **Purpose of $match in Aggregation Pipeline?**
  > It used to filters documents, based in the specific condition. It is similar to the find() query but operates within the aggregation.

22. **How to Secure Access to a MongoDB Database?**
  > Use user roles, enable authentication, encrypt data, IP restrictions, and network security rules.

23. **What is MongoDB’s WiredTiger Storage Engine and Why Important?**
  > It is the default engine for mongoDB and offfers compression of files, for better performance, and concurrency.

# **Practical Questions**

In [None]:
! pip install pymongo

Collecting pymongo
  Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.14.0


In [None]:
'''
1. Write a Python script to load the Superstore dataset from a CSV file into
MongoDB
'''
import pymongo as mng
import csv

client = mng.MongoClient("mongodb://localhost:27017/")
db = client["superstore"]
collection = db["store"]

data_file= r"/content/superstore.csv"

try:
  with open(data_file, encoding='cp1252') as file:
      reader = csv.DictReader(file)
      for row in reader:
          collection.insert_one(row)
except Exception as e:
  print(e)

else:
  print("Data imported Successfully")

localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 68937fa1e24ba6f7d7c34ce2, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
CSV data imported successfully using csv module.


In [None]:
'''
2. Retrieve and print all documents from the Orders collection
'''
orders = collection.find({})
for order in orders:
  print(order)


In [None]:
'''
3. Count and display the total number of documents in the Orders collection
'''
total_documents = collection.count_documents({})
print(f"Total documents are:{total_documents}")

In [None]:
'''
4. Write a query to fetch all orders from the "West" region
'''
west_region = collection.find({"Region": "West"})
for order in west_region:
  print(order)

In [None]:
'''
5. Write a query to find orders where Sales is greater than 500
'''
sales = collection.find({"$expr": {
        "$gt": [{"$toDouble": "$Sales"}, 500]
    }})

for order in sales:
  print(order)


In [None]:
'''
6. Fetch the top 3 orders with the highest Profit
'''
top_orders = collection.find().sort("Profit", -1).limit(3)
for order in top_orders:
  print(order)


In [None]:
'''
7. Update all orders with Ship Mode as "First Class" to "Premium Class.
'''
collection.update_many({"Ship Mode": "First Class"}, {"$set": {"Ship Mode": "Premium Class"}})

In [None]:
'''
8. Delete all orders where Sales is less than 50
'''
collection.delete_many({
    "$expr": {
        "$lt": [{"$toDouble": "$Sales"}, 50]
    }
})

In [None]:
'''
9. Use aggregation to group orders by Region and calculate total sales per region
''''
region_sales = collection.aggregate([
    {
        "$group": {
            "_id": "$Region",
            "Total Sales": {
                "$sum": { "$toDouble": "$Sales" }
            }
        }
    }
])
for region in region_sales:
  print(region)

In [None]:
'''
10. Fetch all distinct values for Ship Mode from the collection
'''
ship_modes = collection.distinct("Ship Mode")
print(ship_modes)

In [None]:
'''
11. Count the number of orders for each category.
'''
category_counts = collection.aggregate([
    {"$group": {"_id": "$Category", "Count": {"$sum": 1}}}
])
for category in category_counts:
  print(category)