# Lab Instructions

Find a dataset that interests you. I'd recommend starting on [Kaggle](https://www.kaggle.com/). Read through all of the material about the dataset and download a .CSV file.

1. Write a short summary of the data.  Where did it come from?  How was it collected?  What are the features in the data?  Why is this dataset interesting to you?  

2. Identify 5 interesting questions about your data that you can answer using Pandas methods.  

3. Answer those questions!  You may use any method you want (including LLMs) to help you write your code; however, you should use Pandas to find the answers.  LLMs will not always write code in this way without specific instruction.  

4. Write the answer to your question in a text box underneath the code you used to calculate the answer.



Q1. I chose a movie dataset that contains information about a large collection of films, including their titles, release years, genres, ratings, and other performance metrics. The dataset originally came from Kaggle, where it was compiled from public movie databases and user‑generated rating platforms. It includes features such as movie titles, release dates, IMDb ratings, vote counts, genres, and other descriptive attributes. I chose this dataset because I enjoy analyzing entertainment data, and movies provide a lot of interesting patterns to explore. The dataset also gives me the chance to practice filtering, grouping, and summarizing information using Pandas.

Here are the five questions I’m answering in my lab:

1. What is the average IMDb rating across all movies in the dataset?

2. Which movie has the highest rating?

3. How many movies were released in each decade?

4. What is the most common movie genre in the dataset?

5. What is the average rating for each genre?

Below are the Pandas methods and code I used to answer each one.

In [6]:
import pandas as pd 
df = pd.read_csv("Movies_dataset.csv")

df.head()
df.columns

Index(['id', 'original_title', 'original_language', 'genre', 'overview',
       'popularity', 'vote_count', 'vote_average', 'release_date'],
      dtype='object')

In [7]:
df['vote_average'].mean()

np.float64(6.714658)

This gives me the average rating across all movies in the dataset. It helps me understand the general quality level of the films based on user votes.

In [8]:
df.loc[df['vote_average'].idxmax(), ['original_title', 'vote_average']]

original_title    The Shawshank Redemption
vote_average                         8.714
Name: 0, dtype: object

This shows the highest-rated movie in the dataset and its score. I used idxmax() to find the row with the maximum value in the vote_average column.

In [9]:
df['release_year'] = pd.to_datetime(df['release_date'], errors='coerce').dt.year
df['decade'] = (df['release_year'] // 10) * 10
df['decade'].value_counts().sort_index()



decade
1900.0       2
1910.0       4
1920.0      26
1930.0      45
1940.0      87
1950.0     156
1960.0     249
1970.0     367
1980.0     712
1990.0    1157
2000.0    2127
2010.0    3508
2020.0    1558
Name: count, dtype: int64

This groups movies by decade and counts how many were released in each one. It helps me see how movie production has changed over time.

In [10]:
df['genre'].value_counts().idxmax()


'Drama'

This tells me which genre appears most frequently in the dataset. It shows what type of movies dominate the collection.

In [11]:
df.groupby('genre')['vote_average'].mean().sort_values(ascending=False)


genre
Thriller, Crime, Comedy                                 8.487
Animation, Drama, War                                   8.446
Fantasy, Animation, Adventure                           8.390
Adventure, Fantasy, Animation                           8.324
Animation, Science Fiction, Family, Adventure, Drama    8.317
                                                        ...  
Comedy, Music, War                                      5.385
Family, Comedy, Action, Adventure, Fantasy              5.383
Comedy, Family, Adventure, Romance                      5.378
Action, Fantasy, Comedy                                 5.351
Crime, Drama, Action, Thriller, Science Fiction         5.346
Name: vote_average, Length: 2162, dtype: float64

This compares genres by their average ratings. It helps me see which genres tend to be rated higher overall.