# Workshop Objectives

Welcome to this instructional workshop focused on NoSQL database manipulation using Python. The primary objective of this workshop is to impart practical skills in database operations such as querying and inserting data. A secondary goal is to offer participants a tangible experience with a document-oriented database, simulating interactions commonly found in more advanced databases like MongoDB.

### Introduction to TinyDB

In this workshop we will be using [TinyDB](https://tinydb.readthedocs.io/en/latest/) - a minimalistic, document-oriented database.  It was designed for ease of use and quick deployment, making it an ideal choice for Python developers interested in rapid development and prototyping. Its pure Python implementation negates the need for external dependencies, making it easy to setup and use right away.

However, it is crucial to understand the limitations of TinyDB. As an embedded database, it lacks the performance optimization, scalability, and feature-rich nature of server-based solutions such as [MongoDB](https://www.mongodb.com/). While it is efficient for small, standalone applications (like this workshop), TinyDB is not designed for high-throughput, data-intensive, or multi-user environments.

### Rationale for Using TinyDB

TinyDB's embedded nature obviates the need for a separate, cumbersome installation process, thereby making it highly accessible and minimizing the potential for technical complications. This is especially useful in educational settings where participants may not have administrative privileges on their computing systems. Furthermore, despite its limitations, TinyDB serves as an excellent introduction to document databases, providing an opportunity to grasp essential concepts and operations in a simplified environment.

# Workshop Setup: Initialization and Database Connection

Before diving into the specifics of database operations, let's establish the basic environment required for this workshop. This involves two key steps:

1. **Importing the Necessary Objects**: The TinyDB library provides various objects that facilitate database interactions. For this workshop, we'll specifically need **`TinyDB`** for database operations and **`Query`** for building search queries.

2. **Creating a Database Handle**: A handle (sometimes referred to as a database connection) is essentially a variable through which we can interact with the database. We'll call various methods on this handle to execute operations like insert, delete, and query.

### Workshop Database

For this workshop, we have already provided you with a pre-populated database file named `movies.json` containing JSON documents for 32,000+ movies dating back to the early 1900s. Your first task is simply to connect to this existing database. No need for complicated setups or installations!

### Initialization Code

The next cell contains the following code. Let's dissect what these statements do.

```python
from tinydb import TinyDB, Query  # Imports the required objects
db = TinyDB('movies.json')        # Creates a handle (connection) to the existing database file 'movies.json'
```

### What's Happening Here?

- `from tinydb import TinyDB, Query`: This line imports the `TinyDB` and `Query` objects from the TinyDB library. These objects contain methods that allow us to perform operations on our database.

- `db = TinyDB('movies.json')`: Here, we're creating a new database connection named `db` and linking it to our existing, pre-populated database file, `movies.json`. This connection is our primary object to interact with the database.

By executing this cell, you'll have successfully set up your TinyDB environment and connected to the provided database. You're now ready to venture into the world of document databases.

In [11]:
!pip install tinydb


Collecting tinydb
  Downloading tinydb-4.8.0-py3-none-any.whl.metadata (6.2 kB)
Downloading tinydb-4.8.0-py3-none-any.whl (24 kB)
Installing collected packages: tinydb
Successfully installed tinydb-4.8.0


In [1]:
from tinydb import TinyDB, Query
db = TinyDB('movies.json')

# Simple Queries 



TinyDB allows for simple queries to find records that match certain conditions. The **`Query()`** function in TinyDB is like a helper that allows you to create these searches for your database. When you call **`Query()`**, it gives you an object that you can use to look for particular pieces of information. For example, you could set it up to look for all movies released in the year 2000. You can even combine different search criteria together, like finding all "Science Fiction" movies released after 1990. It's your go-to tool for creating searches that are as simple or as detailed as you need.


To find all films released in the year 2000, you would execute the following example:

```
Film = Query()
db.search(Film.year == 2000)
```

### Task 1

In the following cell, write a query using `db.search` to find all films that were released in the year 1995.

In [15]:
Film = Query()
db.search(Film.year == 1995)

[{'title': '12 Monkeys',
  'year': 1995,
  'cast': ['Bruce Willis',
   'Madeleine Stowe',
   'Brad Pitt',
   'Christopher Plummer'],
  'genres': ['Science Fiction', 'Short'],
  'href': '12_Monkeys',
  'extract': "12 Monkeys is a 1995 American science fiction film directed by Terry Gilliam, inspired by Chris Marker's 1962 short film La Jetée, starring Bruce Willis, Madeleine Stowe, and Brad Pitt, with Christopher Plummer and David Morse in supporting roles. Set in a post-apocalyptic future devastated by an unknown disease, a convict (Willis) is sent back in time to investigate its origin.",
  'thumbnail': 'https://upload.wikimedia.org/wikipedia/en/c/cf/Twelve_monkeysmp.jpg',
  'thumbnail_width': 260,
  'thumbnail_height': 387},
 {'title': '3 Ninjas Knuckle Up',
  'year': 1995,
  'cast': ['Victor Wong',
   'Charles Napier',
   'Michael Treanor',
   'Max Elliott Slade'],
  'genres': ['Martial Arts', 'Comedy'],
  'href': '3_Ninjas_Knuckle_Up',
  'extract': '3 Ninjas Knuckle Up is a 1993 Am

# Logical Operations in Queries

You can combine multiple query conditions using logical operators like **`&`** (AND) and **`|`** (OR). For example, to find all films of the "Action" genre that were released after 1990:

```
Film = Query()
db.search((Film.year > 1990) & (Film.genres.any('Action')))
```

The example above also shows how you can query fields that contain lists using the **`.any()`** method. This will return all records where the specified field contains at least one element that satisfies the condition. When the field is an array, this allows you to check if a particular value exists within that array.


### Task 2

In the following cell, use logical operators to find all films in the dataset that belong to the genre "Science Fiction" and were released in 1990.

In [15]:
Film = Query()
db.search((Film.year == 1990) & (Film.genres.any('Science Fiction')))

[{'title': 'Back to the Future Part III',
  'year': 1990,
  'cast': ['Michael J. Fox',
   'Christopher Lloyd',
   'Mary Steenburgen',
   'Thomas F. Wilson'],
  'genres': ['Science Fiction', 'Western'],
  'href': 'Back_to_the_Future_Part_III',
  'extract': 'Back to the Future Part III is a 1990 American science fiction Western film and the final installment of the Back to the Future trilogy. The film was directed by Robert Zemeckis, and stars Michael J. Fox, Christopher Lloyd, Mary Steenburgen, Thomas F. Wilson, and Lea Thompson. The film continues immediately following Back to the Future Part II (1989); while stranded in 1955 during his time travel adventures, Marty McFly (Fox) discovers that his friend Dr. Emmett "Doc" Brown (Lloyd), trapped in 1885, was killed by Buford "Mad Dog" Tannen (Wilson), Biff\'s great-grandfather. Marty travels to 1885 to rescue Doc and return once again to 1985, but matters are complicated when Doc falls in love with Clara Clayton (Steenburgen).',
  'thumbn

# Working with Lists in Queries

As we showed above, TinyDB provides the **`.any()`** method for querying fields that contain lists. For instance, to find all films that have "Willem Dafoe" OR "Frances McDormand" in their cast, you can do:

```
db.search(Film.cast.any(['Willem Dafoe', 'Frances McDormand']))
```

But TinyDB also provide an **`all()`** method for querying fields that contain lists. You can use the **`all()`** method to filter for lists that contain EVERY item you specicify.  For instance, to find all films that have both "Mark Hamill" and "Harrison Ford" in their cast, you can use the following:

```
db.search(Film.cast.all(['Mark Hamill', 'Harrison Ford']))
```


### Task 3

In the following cell, write a query to find all films that have "Comedy", "Thriller" and "Spy" in their `genres` list.

In [19]:
Film = Query()
db.search(Film.genres.all(["Comedy", "Thriller", "Spy"]))

[{'title': 'Espionage',
  'year': 1937,
  'cast': ['Edmund Lowe', 'Madge Evans', 'Paul Lukas'],
  'genres': ['Drama',
   'Adventure',
   'Comedy',
   'Noir',
   'Romance',
   'Spy',
   'Thriller'],
  'href': 'Espionage_(1937_film)',
  'extract': 'Espionage is a 1937 American Proto-Noir, spy-film, adventure, drama, romance, comedy thriller film directed by Kurt Neumann and written by Leonard Lee, Ainsworth Morgan and Manuel Seff, based on the 1935 West End play Espionage by Walter C. Hackett. The film stars Edmund Lowe, Madge Evans, Paul Lukas, Ketti Gallian, Richard "Skeets" Gallagher, and Frank Reicher. The film was released February 26, 1937, by Metro-Goldwyn-Mayer.',
  'thumbnail': 'https://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/Poster_-_Espionage_%281937%29_01.jpg/320px-Poster_-_Espionage_%281937%29_01.jpg',
  'thumbnail_width': 320,
  'thumbnail_height': 482},
 {'title': 'All Through the Night',
  'year': 1941,
  'cast': ['Humphrey Bogart', 'Kaaren Verne', 'Conrad Veidt

# Counts 

TinyDB allows you to count the number of records that match a condition using the db.count() method. This can be combined with logical operators to count records matching multiple conditions:

```
db.count((Film.year >= 1980) & (Film.year <= 1985))
```

### Task 4

In the following cell, use **`db.count()`** to create a query that will return a count of all the films that have the genres "Political" and "Comedy" and were released on or after 2000. This should be done with only a single call to db.count().

In [25]:
# Political and Comedy
Film = Query()
db.count((Film.year >= 2000) & Film.genres.all(['Political', 'Comedy']))

10

In [63]:
#Political or Comedy:

Film = Query()
db.count((Film.year >= 2000) | Film.genres.all(['Political', 'Comedy']))

6110

# Advanced Query: Aggregation with Python

While TinyDB provides basic query and count functionality, it does not allow for more complex aggregations. Instead, you must use Python to perform these opeartions manually. For example, you can find the year with the most movie releases in the dataset using a Python loop and a dictionary to count occurrences:

```
from collections import defaultdict
year_count = defaultdict(int)

for film in db.all():
    y = film.get('year', 'Unknown')
    year_count[y] += 1

year_most_releases = max(year_count, key=year_count.get)
print(year_most_releases)
```

### Task 5

In the following cell, write Python code to find all of the dataset genres and movie counts for each. 

In [51]:


from collections import defaultdict
genre_count = defaultdict(int)

for film in db.all():
    g = film.get('genres', 'Unknown')
    for genre in g:
        genre_count[genre] += 1

for genre, count in genre_count.items():
    print('{} count: {}'.format(genre, count))



Silent count: 7131
Short count: 1023
Documentary count: 527
Comedy count: 10498
Fantasy count: 777
Western count: 4436
Crime count: 2612
Drama count: 14062
Adventure count: 1761
Romance count: 2857
Action count: 2241
Animated count: 897
Historical count: 567
Biography count: 752
Horror count: 1758
War count: 1694
Mystery count: 1163
Thriller count: 1998
Sports count: 459
Musical count: 1785
Teen count: 205
Sport count: 39
Family count: 435
Independent count: 319
Spy count: 214
Satire count: 119
Erotic count: 240
Science Fiction count: 1169
Dance count: 34
Disaster count: 85
Noir count: 1120
Supernatural count: 313
Political count: 110
Suspense count: 112
Superhero count: 221
Live Action count: 35
Legal count: 56
Performance count: 42
Slasher count: 231
Martial Arts count: 105
Found Footage count: 25


# Adding Records

Adding new records to your TinyDB database is straightforward. You'll create a Python dictionary containing all the information for the new movie, and then you'll use the **`insert()`** method to add that dictionary to your database. The keys of the dictionary should match the field names in your existing records.

Let's say you have a database containing information about cars, and you want to add a new car to this collection. Each record in your "Car" database might contain fields like make, model, year, color, and mileage.

Here's a generic example to insert a new car into such a databse:

```python
# Create a dictionary with a new car's information
new_car = {
    'make': 'Mazda',
    'model': 'Miata',
    'year': 2019,
    'color': 'Silver',
    'mileage': 31028
}

# Add the new car to the database
db.insert(new_car)
```

### Task 6

Now that you've seen how to do it, create a Python dictionary for the new movie "Oppenheimer" and insert it into your TinyDB database. Use the information provided below for the movie's details. 


- Title: Oppenheimer,
- Year: 2023,
- Cast: Cillian Murphy, Emily Blunt, Matt Damon, Robert Downey Jr., Rami Malek, Florence Pugh
- Genre: Biography
- HREF: "Oppenheimer_(film)"
- Extract: Oppenheimer is an upcoming biographical film written and directed by Christopher Nolan. It is based on American Prometheus, a biography written by Kai Bird and Martin J. Sherwin. The film stars Cillian Murphy as J. Robert Oppenheimer, an American theoretical physicist credited with being the "father of the atomic bomb" for his role in the Manhattan Project—the World War II undertaking that developed the first nuclear weapons, with a supporting ensemble cast that includes Emily Blunt, Robert Downey Jr., Matt Damon, Rami Malek, Florence Pugh, Benny Safdie, Michael Angarano, Josh Hartnett and Kenneth Branagh. It is a co-production between Universal Pictures, Syncopy Inc. and Atlas Entertainment, with Nolan producing the film alongside Emma Thomas and Charles Roven.,
- Thumbnail: https://upload.wikimedia.org/wikipedia/en/4/4a/Oppenheimer_%28film%29.jpg,
- Thumbnail width: 251
- Thumbnail height: 397


In [59]:
new_Movie = {
    'title': 'Oppenheimer',
    'year': 2023,
    'cast': ['Cillian Murphy', 'Emily Blunt', 'Matt Damon', 'Robert Downey Jr.', 'Rami Malek', 'Florence Pugh'],
    'genres': 'Biography',
    'href': "Oppenheimer_(film)",
    'extract': 'Oppenheimer is an upcoming biographical film written and directed by Christopher Nolan. It is based on American Prometheus, a biography written by Kai Bird and Martin J. Sherwin. The film stars Cillian Murphy as J. Robert Oppenheimer, an American theoretical physicist credited with being the "father of the atomic bomb" for his role in the Manhattan Project—the World War II undertaking that developed the first nuclear weapons, with a supporting ensemble cast that includes Emily Blunt, Robert Downey Jr., Matt Damon, Rami Malek, Florence Pugh, Benny Safdie, Michael Angarano, Josh Hartnett and Kenneth Branagh. It is a co-production between Universal Pictures, Syncopy Inc. and Atlas Entertainment, with Nolan producing the film alongside Emma Thomas and Charles Roven.',
    'thumbnail': 'https://upload.wikimedia.org/wikipedia/en/4/4a/Oppenheimer_%28film%29.jpg',
    'thumbnail_width': 251,
    'thumbnail_height': 397,
}
db.insert(new_Movie)

36274

### Task 7

Now just to test that your insert worked, use the cell below and execute **`db.search`** to retrieve the "Oppenheimer" record you just added.

In [61]:
Film = Query()
db.search(Film.title == 'Oppenheimer' )

[{'title': 'Oppenheimer',
  'year': 2023,
  'cast': ['Cillian Murphy',
   'Emily Blunt',
   'Matt Damon',
   'Robert Downey Jr.',
   'Rami Malek',
   'Florence Pugh'],
  'genres': 'Biography',
  'href': 'Oppenheimer_(film)',
  'extract': 'Oppenheimer is an upcoming biographical film written and directed by Christopher Nolan. It is based on American Prometheus, a biography written by Kai Bird and Martin J. Sherwin. The film stars Cillian Murphy as J. Robert Oppenheimer, an American theoretical physicist credited with being the "father of the atomic bomb" for his role in the Manhattan Project—the World War II undertaking that developed the first nuclear weapons, with a supporting ensemble cast that includes Emily Blunt, Robert Downey Jr., Matt Damon, Rami Malek, Florence Pugh, Benny Safdie, Michael Angarano, Josh Hartnett and Kenneth Branagh. It is a co-production between Universal Pictures, Syncopy Inc. and Atlas Entertainment, with Nolan producing the film alongside Emma Thomas and Cha