# **`MONGO DB`**

`Q1. What is MongoDB? Explain non-relational databases in short. In which scenarios it is preferred to use MongoDB over SQL databases?`

`NoSQL databases` (aka "not only SQL" synonymous with `non-relational databases`) are non-tabular databases and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads. One such example is MongoDB

`MongoDB` is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. It is a database based on a non-relational document model. Thus, as a so-called NoSQL database (NoSQL = Not-only-SQL), it differs fundamentally from conventional relational databases such as Oracle, MySQL or the Microsoft SQL Server. The name MongoDB is derived from the English word “humongous”, which roughly means “gigantic”. MongoDB was released in 2009 by founder and developer Eliot Horowitz, who stepped down as Chief Technology Officer and from the Board of Directors of MongoDB Inc. in 2020, but is still active as a technical consultant.MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License which is deemed non-free by several distributions

Key features are:
- MongoDB stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time
- The document model maps to the objects in your application code, making data easy to work with
- Ad hoc queries, indexing, and real time aggregation provide powerful ways to access and analyze your data
- MongoDB is a distributed database at its core, so high availability, horizontal scaling, and geographic distribution are built in and easy to use
- MongoDB is free to use. Versions released prior to October 16, 2018 are published under the AGPL. All versions released after October 16, 2018, including patch fixes for prior versions, are published under the Server Side Public License (SSPL) v1.

Scenarios where mongodb is preferred over SQL databases:
- `Integrating large amounts of diverse data` : If you are bringing together tens or hundreds of data sources, the flexibility and power of the document model can create a single unified view in ways that other databases cannot. MongoDB has succeeded in bringing such projects to life when approaches using other databases have failed.

- `Describing complex data structures that evolve` : Document databases allow embedding of documents to describe nested structures and easily tolerate variations in data in generations of documents. Specialized data formats like geospatial are efficiently supported. This results in a resilient repository that doesn’t break or need to be redesigned every time something changes.

- `Delivering data in high-performance applications` : MongoDB’s scale-out architecture can support huge numbers of transactions on humongous databases. Unlike other databases that either cannot support such scale or can only do so with massive amounts of engineering and additional components, MongoDB has a clear path to scalability because of the way it was designed. MongoDB is scalable out of the box.

- `Supporting hybrid and multi-cloud applications` : MongoDB can be deployed and run on a desktop, a massive cluster of computers in a data center, or in a public cloud, either as installed software or through MongoDB Atlas, a database-as-a-service product. If you have applications that need to run wherever they make sense, MongoDB supports any configuration now and in the future.

- `Supporting agile development and collaboration` : Document databases put developers in charge of the data. Data becomes like code that is friendly to developers. This is far different from making developers use a strange system that requires a specialist. Document databases also allow the evolution of the structure of the data as needs are better understood. Collaboration and governance can allow one team to control one part of a document and another team to control another part.


`Q2. State and Explain the features of MongoDB.`


`Top 5 features of MongoDB`:

1. `Ad-hoc queries for optimized, real-time analytics` : When designing the schema of a database, it is impossible to know in advance all the queries that will be performed by end users. An ad hoc query is a short-lived command whose value depends on a variable. Each time an ad hoc query is executed, the result may be different, depending on the variables in question. Optimizing the way in which ad-hoc queries are handled can make a significant difference at scale, when thousands to millions of variables may need to be considered. This is why MongoDB, a document-oriented, flexible schema database, stands apart as the cloud database platform of choice for enterprise applications that require real-time analytics. With ad-hoc query support that allows developers to update ad-hoc queries in real time, the improvement in performance can be game-changing. MongoDB supports field queries, range queries, and regular expression searches. Queries can return specific fields and also account for user-defined functions. This is made possible because MongoDB indexes BSON documents and uses the MongoDB Query Language (MQL).

2. `Indexing appropriately for better query executions` : The number one issue that many technical support teams fail to address with their users is indexing. Done right, indexes are intended to improve search speed and performance. A failure to properly define appropriate indices can and usually will lead to a myriad of accessibility issues, such as problems with query execution and load balancing. Without the right indices, a database is forced to scan documents one by one to identify the ones that match the query statement. But if an appropriate index exists for each query, user requests can be optimally executed by the server. MongoDB offers a broad range of indices and features with language-specific sort orders that support complex access patterns to datasets. Notably, MongoDB indices can be created on demand to accommodate real-time, ever-changing query patterns and application requirements. They can also be declared on any field within any of your documents, including those nested within arrays.

3. `Replication for better data availability and stability` : When your data only resides in a single database, it is exposed to multiple potential points of failure, such as a server crash, service interruptions, or even good old hardware failure. Any of these events would make accessing your data nearly impossible. Replication allows you to sidestep these vulnerabilities by deploying multiple servers for disaster recovery and backup. Horizontal scaling across multiple servers that house the same data (or shards of that same data) means greatly increased data availability and stability. Naturally, replication also helps with load balancing. When multiple users access the same data, the load can be distributed evenly across servers. In MongoDB, replica sets are employed for this purpose. A primary server or node accepts all write operations and applies those same operations across secondary servers, replicating the data. If the primary server should ever experience a critical failure, any one of the secondary servers can be elected to become the new primary node. And if the former primary node comes back online, it does so as a secondary server for the new primary node.

4. `Sharding` : When dealing with particularly large datasets, sharding—the process of splitting larger datasets across multiple distributed collections, or “shards”—helps the database distribute and better execute what might otherwise be problematic and cumbersome queries. Without sharding, scaling a growing web application with millions of daily users is nearly impossible.Like replication via replication sets, sharding in MongoDB allows for much greater horizontal scalability. Horizontal scaling means that each shard in every cluster houses a portion of the dataset in question, essentially functioning as a separate database. The collection of distributed server shards forms a single, comprehensive database much better suited to handling the needs of a popular, growing application with zero downtime.All operations in a sharding environment are handled through a lightweight process called mongos. Mongos can direct queries to the correct shard based on the shard key. Naturally, proper sharding also contributes significantly to better load balancing.

5. `Load balancing` : At the end of the day, optimal load balancing remains one of the holy grails of large-scale database management for growing enterprise applications. Properly distributing millions of client requests to hundreds or thousands of servers can lead to a noticeable (and much appreciated) difference in performance.Fortunately, via horizontal scaling features like replication and sharding, MongoDB supports large-scale load balancing. The platform can handle multiple concurrent read and write requests for the same data with best-in-class concurrency control and locking protocols that ensure data consistency. There’s no need to add an external load balancer—MongoDB ensures that each and every user has a consistent view and quality experience with the data they need to access

Q3. Write a code to connect MongoDB to Python. Also, create a database and a collection in MongoDB.

In [106]:
!pip install pymongo



In [107]:
import pymongo
client = pymongo.MongoClient("mongodb+srv://tejas05in:<password>@cluster0.cnbjrys.mongodb.net/?retryWrites=true&w=majority")
db = client.test

In [108]:
db = client["PWSkills"] # creating a database 'pwskills'

In [109]:
coll_create = db["mongo_nosql"] # creating a collection 'mongo_nosql'

In [110]:
data1 = {
"name":"tj",
    "age":38,
    "speciality": "MD Forensic Medicine",
    "Designation": "Professor"
} # test data

In [111]:
coll_create.insert_one(data1)  # test data inserted and database and collection created

<pymongo.results.InsertOneResult at 0x7f7ff435bd60>

In [112]:
for i in coll_create.find(): # retrieving the collection
    print(i)

{'_id': ObjectId('63efc6f1271c845a8b6096a5'), 'name': 'tj', 'age': 38, 'speciality': 'MD Forensic Medicine', 'Designation': 'Professor'}


Q4. Using the database and the collection created in question number 3, write a code to insert one record,
and insert many records. Use the find() and find_one() methods to print the inserted record.

In [113]:
# test data with seingle record
data2 = {
    "name" : "murali",
    "age" : 32,
    "speciality" : "MD Forensic Medicine",
    "Designation": "Assistant Professor"
} 

In [114]:
# test data wih many records
data3 = [  
    {"name":"Shyam", "email":"shyamjaiswal@gmail.com"},  
    {"name":"Bob", "email":"bob32@gmail.com"},  
    {"name":"Jai", "email":"jai87@gmail.com"},
    {"name":"bai", "email":"bai23@gmail.com"},
    {"name":"Pai", "email":"pai34@gmail.com"},
    {"name":"Rob", "email":"rob28@gmail.com"},
    {"name":"Jon", "email":"jon65@gmail.com"}
] 

In [115]:
# inserting one record
X = coll_create.insert_one(data2) 

In [116]:
X.acknowledged

True

In [117]:
# inserting many records
X = coll_create.insert_many(data3)

In [118]:
X.acknowledged

True

In [119]:
for i in coll_create.find(): # retrieving the collection with find()
    print(i)

{'_id': ObjectId('63efc6f1271c845a8b6096a5'), 'name': 'tj', 'age': 38, 'speciality': 'MD Forensic Medicine', 'Designation': 'Professor'}
{'_id': ObjectId('63efc6f1271c845a8b6096a6'), 'name': 'murali', 'age': 32, 'speciality': 'MD Forensic Medicine', 'Designation': 'Assistant Professor'}
{'_id': ObjectId('63efc6f1271c845a8b6096a7'), 'name': 'Shyam', 'email': 'shyamjaiswal@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096a8'), 'name': 'Bob', 'email': 'bob32@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096a9'), 'name': 'Jai', 'email': 'jai87@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096aa'), 'name': 'bai', 'email': 'bai23@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ab'), 'name': 'Pai', 'email': 'pai34@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ac'), 'name': 'Rob', 'email': 'rob28@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ad'), 'name': 'Jon', 'email': 'jon65@gmail.com'}


In [120]:
print(coll_create.find_one())# retrieving the collection with find_one() default is first collection 
print(coll_create.find_one({"name":"Rob"})) # can search for specific records with arguments just like find()

{'_id': ObjectId('63efc6f1271c845a8b6096a5'), 'name': 'tj', 'age': 38, 'speciality': 'MD Forensic Medicine', 'Designation': 'Professor'}
{'_id': ObjectId('63efc6f1271c845a8b6096ac'), 'name': 'Rob', 'email': 'rob28@gmail.com'}


Q5. Explain how you can use the find() method to query the MongoDB database. Write a simple code to
demonstrate this.

In mongoDB, the `find()` method is used to fetch a particular data from the table. In other words, it is used to select data in a table. It is also used to return all events to the selected data. The find() method consists of two parameters by which we can find a particular record.

`Syntax`:

db.collection_name.find(query, projection)

1. query: This is an optional parameter that defines the selection criteria. In simple words, it defines a query as what you want to find in a collection.
2. projection: This is an optional parameter that defines what to return if the query criteria are met successfully. In simple words, it is a type of decision-making that decides on criteria.

In [121]:
for i in coll_create.find({'speciality': 'MD Forensic Medicine'}): # retrieving the collection with a query 'speciality': 'MD Forensic Medicine'
    print(i)

{'_id': ObjectId('63efc6f1271c845a8b6096a5'), 'name': 'tj', 'age': 38, 'speciality': 'MD Forensic Medicine', 'Designation': 'Professor'}
{'_id': ObjectId('63efc6f1271c845a8b6096a6'), 'name': 'murali', 'age': 32, 'speciality': 'MD Forensic Medicine', 'Designation': 'Assistant Professor'}


Q6. Explain the sort() method. Give an example to demonstrate sorting in MongoDB.

*`The sort() Method`*


To sort documents in MongoDB, you need to use sort() method. The method accepts a document containing a list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.

In [122]:
#  test data for demo of sroting
data4 = [
    {
           "name" : "Mick",
           "Course" : "btech",
           "batch_year" : 2018,
           "language" : ["c++", "java", "python"],
},
{
           "name" : "Zoya",
           "Course" : "BCA",
           "batch_year" : 2020,
           "language" : ["C#", "JavaScript"],
},
{
           "name" : "Jonny",
           "Course" : "MCA",
           "batch_year" : 2019,
           "language" : ["C#", "java", "PHP"],
},
{
           "name" : "Oliver",
           "Course" : "BA",
           "batch_year" : 2017,
           "language" : ["c", "PHP"],
},
{
           "name" : "Mia",
           "Course" : "btech",
           "batch_year" : 2020,
           "language" : ["HTML", "CSS", "PHP"],
}
] 

In [123]:
X = coll_create.insert_many(data4) # inserting test data

In [124]:
X.acknowledged

True

In [125]:
for i in coll_create.find({'batch_year': {"$gte":2016}}).sort("batch_year",1): # sorting the data by the batch year
    print(i)

{'_id': ObjectId('63efc6f1271c845a8b6096b1'), 'name': 'Oliver', 'Course': 'BA', 'batch_year': 2017, 'language': ['c', 'PHP']}
{'_id': ObjectId('63efc6f1271c845a8b6096ae'), 'name': 'Mick', 'Course': 'btech', 'batch_year': 2018, 'language': ['c++', 'java', 'python']}
{'_id': ObjectId('63efc6f1271c845a8b6096b0'), 'name': 'Jonny', 'Course': 'MCA', 'batch_year': 2019, 'language': ['C#', 'java', 'PHP']}
{'_id': ObjectId('63efc6f1271c845a8b6096af'), 'name': 'Zoya', 'Course': 'BCA', 'batch_year': 2020, 'language': ['C#', 'JavaScript']}
{'_id': ObjectId('63efc6f1271c845a8b6096b2'), 'name': 'Mia', 'Course': 'btech', 'batch_year': 2020, 'language': ['HTML', 'CSS', 'PHP']}


Q7. Explain why delete_one(), delete_many(), and drop() is used.

`delete_one()`:

In MongoDB, a single document can be deleted by the method delete_one(). The first parameter of the method would be a query object which defines the document to be deleted. If there are multiple documents matching the filter query, only the first appeared document would be deleted. 

Note: Deleting a document is the same as deleting a record in the case of SQLIn MongoDB, a single document can be deleted by the method delete_one(). The first parameter of the method would be a query object which defines the document to be deleted. If there are multiple documents matching the filter query, only the first appeared document would be deleted. 

Note: Deleting a document is the same as deleting a record in the case of SQL

In [126]:
# before delete_one()
for i in coll_create.find({'speciality': 'MD Forensic Medicine'}): 
    print(i)

{'_id': ObjectId('63efc6f1271c845a8b6096a5'), 'name': 'tj', 'age': 38, 'speciality': 'MD Forensic Medicine', 'Designation': 'Professor'}
{'_id': ObjectId('63efc6f1271c845a8b6096a6'), 'name': 'murali', 'age': 32, 'speciality': 'MD Forensic Medicine', 'Designation': 'Assistant Professor'}


In [127]:
# deleting with query {'speciality': 'MD Forensic Medicine'}
X = coll_create.delete_one({'speciality': 'MD Forensic Medicine'})

In [128]:
X.deleted_count

1

In [129]:
# after delete_one()
for i in coll_create.find({'speciality': 'MD Forensic Medicine'}):
    print(i)

{'_id': ObjectId('63efc6f1271c845a8b6096a6'), 'name': 'murali', 'age': 32, 'speciality': 'MD Forensic Medicine', 'Designation': 'Assistant Professor'}


`Delete_many()` :

Delete_many() is used when one needs to delete more than one document. A query object containing which document to be deleted is created and is passed as the first parameter to the delete_many().

In [130]:
# before delete_many()
for i in coll_create.find():
    print(i)

{'_id': ObjectId('63efc6f1271c845a8b6096a6'), 'name': 'murali', 'age': 32, 'speciality': 'MD Forensic Medicine', 'Designation': 'Assistant Professor'}
{'_id': ObjectId('63efc6f1271c845a8b6096a7'), 'name': 'Shyam', 'email': 'shyamjaiswal@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096a8'), 'name': 'Bob', 'email': 'bob32@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096a9'), 'name': 'Jai', 'email': 'jai87@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096aa'), 'name': 'bai', 'email': 'bai23@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ab'), 'name': 'Pai', 'email': 'pai34@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ac'), 'name': 'Rob', 'email': 'rob28@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ad'), 'name': 'Jon', 'email': 'jon65@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ae'), 'name': 'Mick', 'Course': 'btech', 'batch_year': 2018, 'language': ['c++', 'java', 'python']}
{'_id': ObjectId('63efc6f1271c845a8b6096af'), 'name': 'Zoya', 'Course': 'BCA'

#### Using regex expression within mongo query to delete all files where names start with M

In [131]:
coll_create.delete_many({"name":{"$regex":"^M"}})

<pymongo.results.DeleteResult at 0x7f7ff4462920>

In [132]:
# After delete_many()
for i in coll_create.find():
    print(i)

{'_id': ObjectId('63efc6f1271c845a8b6096a6'), 'name': 'murali', 'age': 32, 'speciality': 'MD Forensic Medicine', 'Designation': 'Assistant Professor'}
{'_id': ObjectId('63efc6f1271c845a8b6096a7'), 'name': 'Shyam', 'email': 'shyamjaiswal@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096a8'), 'name': 'Bob', 'email': 'bob32@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096a9'), 'name': 'Jai', 'email': 'jai87@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096aa'), 'name': 'bai', 'email': 'bai23@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ab'), 'name': 'Pai', 'email': 'pai34@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ac'), 'name': 'Rob', 'email': 'rob28@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096ad'), 'name': 'Jon', 'email': 'jon65@gmail.com'}
{'_id': ObjectId('63efc6f1271c845a8b6096af'), 'name': 'Zoya', 'Course': 'BCA', 'batch_year': 2020, 'language': ['C#', 'JavaScript']}
{'_id': ObjectId('63efc6f1271c845a8b6096b0'), 'name': 'Jonny', 'Course': 'MCA', 'bat

`Drop()`:

MongoDB's db.collection.drop() is used to drop a collection from the database.

Basic syntax of drop() command is as follows −

db.COLLECTION_NAME.drop()

Returns:	
true when successfully drops a collection.
false when collection to drop does not exist.


In [135]:
# dropping the collection mongodb_nosql
coll_create.drop()

In [137]:
# drops the entire database
# for this to work the database access under security in mongodb atlas should be set to 'atlasAdmin@admin'
client.drop_database("PWSkills")