# 🎬 Actor Performance Analysis of iMDB 2024 Statistics

This notebook demonstrates how to use the `actor_analysis` package to analyze and visualize top 50 actors' performances based on IMDb movie data.  
We compute metrics like average budget, revenue, vote average, and profitability — both generally and actor-wise — using IMDb 2024 data.

The dataset can be found here: https://www.kaggle.com/datasets/anandshaw2001/imdb-movies-and-tv-shows/data

We have derived new statistics from this dataset and attained new columns, their explanations are as goes:
- Actor: Names of actors
- Budget_Mean_General: The mean of all movie budgets in the processed dataset.
- Revenue_Mean_General: The mean of all movies' revenues in the processed dataset.	
- Vote_Average_General: The mean of all votes in the processed dataset.	
- Profit_Mean_General: 	The mean of all movies' profits in the processed dataset via: (General Mean Revenue - General Mean Budget)/Size of the Dataset .
- Budget_Mean_Actor: The mean of all movie budgets depending on actor in the processed dataset.
- Revenue_Mean_Actor: The mean of all movies' revenue depending on actor in the processed dataset.	
- Vote_Average_Actor: The mean of all votes depending on actor in the processed dataset.		
- Profit_Mean_Actor: The mean of all movies' profits depending on actor in the processed dataset	
- isProfitable: States if the actor on row is more profitable than the general.	
- isLiked: States if the actor on row has more of an vote average than the general.

In [None]:
# If not installed yet
!pip3 install -e .

In [None]:
from actor_analysis import (
    load_data,
    preprocess_data,
    extract_cast_list,
    calculate_general_statistics,
    get_top_actors,
    calculate_actor_statistics,
    plot_actor_statistics
)

### Step 1: Load and Clean the IMDb Dataset

In [None]:
df_raw = load_data('IMDb2024.csv')
df = preprocess_data(df_raw)
df.head(df.size)

### Step 2: Extract Actor Names and Analyze Top 50

In [None]:
cast_list = extract_cast_list(df)
top_actors = get_top_actors(cast_list, top_n=50)
general_stats = calculate_general_statistics(df)
actor_stats_df = calculate_actor_statistics(df, top_actors, general_stats)

actor_stats_df.head(actor_stats_df.size)

### Step 3: Visualize Actor Profitability and Popularity

In [None]:
plot_actor_statistics(actor_stats_df, 'Profit_Mean_Actor')

In [None]:
plot_actor_statistics(actor_stats_df, 'Vote_Average_Actor')

### Summary & Insights

#### Profitability
Out of the top 50 actors, 45 actors (90%) are more profitable than the general movie average.
Only 5 actors (10%) are not more profitable than the general baseline.
Some of the most profitable actors include:
- James Austin Johnson (~$763M profit)
- Kyle MacLachlan (~$763M profit)
- Rachel House (~$666M profit)
- Seth Rogen (~$475M profit)
#### Popularity (Based on Vote Average)
40 out of 50 actors (80%) are liked more than the average movie rating.
The general vote average is 5.93, while top actors often hit 6.50 or more.
Actors with the highest average ratings include:
- Timothée Chalamet (7.50)
- Emma Corrin (7.00)
- Pedro Pascal (7.00)
- Lupita Nyong'o (7.00)
#### Balanced Stars (Profitable & Liked)
Actors like Pedro Pascal, Timothée Chalamet, Lupita Nyong'o, Kyle MacLachlan, and James Austin Johnson stand out as both highly profitable and well-liked.