# Q1: Baseline Analysis and Random Forest Regression

In this notebook, we investigate whether **console** and **genre**
are informative predictors of **global video game sales**.
We start with simple baseline averages before training a Random Forest model.


## start my importing the dependencies

In [None]:
import sys
from pathlib import Path

ROOT = Path().resolve().parent
sys.path.append(str(ROOT / "py"))

DATA_PATH = ROOT / "data" / "VideoGames_Sales.xlsx"

## Importing the functions from functions.py

In [None]:
from functions import (
    load_sales_data,
    filter_consoles_and_genres,
    mean_by_console,
    mean_by_genre,
    mean_by_console_genre,
    encode_features,
    train_random_forest,
    evaluate_regression,
    predict_all_combinations,
)

Loading and filtering the dataset

In [None]:
df = load_sales_data(DATA_PATH)
df_filtered = filter_consoles_and_genres(df)

Showing the **mean** results imperically to act as a baseline for the ML approach.

In [None]:
print(mean_by_console(df_filtered))
print(mean_by_genre(df_filtered))
display(mean_by_console_genre(df_filtered).head(12))

## Encoding the data

In [None]:
x_encoded, y = encode_features(df_filtered)
model, x_train, x_test, y_train, y_test = train_random_forest(x_encoded, y)


## Evaluating the model

In [None]:
metrics = evaluate_regression(model, x_test, y_test)
print(metrics)

## Using the model to answer the question

In [None]:
top_combos = predict_all_combinations(model, df_filtered, x_encoded.columns)
top_combos.head(12)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from functions import (
    load_sales_data,
    filter_consoles_and_genres,
    mean_by_console,
    mean_by_genre,
    mean_by_console_genre,
    encode_features,
    train_random_forest,
    evaluate_regression,
    predict_all_combinations,
)

sns.set(style="whitegrid")
