# How much should we prepare for the event sponsorship budget in 2018?

* [Exploring the Data](#Exploring-the-Data)
* [Creating a Visualization](#Creating-a-Visualization)
* [Modeling the Data](#Modeling-the-Data)

## Exploring the Data

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('data/sponsorship-budget.csv')
df.head()

In [None]:
df.columns = df.columns.str.replace(' ', '_')
df.head()

### Handling Missing Data 

See the package `missingno` at https://github.com/ResidentMario/missingno.

In [None]:
!pip install missingno

In [None]:
import missingno as msno

In [None]:
msno.matrix(df)

Fill in the missing data with some random value.

In [None]:
df.number_of_employees = df.number_of_employees.fillna(85)

In [None]:
df.head()

---

## Creating a Visualization

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
_, ax = plt.subplots(1, 2, figsize=(15, 6))
sns.regplot('revenue', 'sponsor_budget', data=df, ax=ax[0])
sns.regplot('number_of_employees', 'sponsor_budget', data=df, ax=ax[1]);

---

## Modeling the Data

We'll try with a very simple approach that uses the data in the previous month to predict the budget in the next month as feature. We then use the linear regression model to predict the budget in 2018. Let's have a look at the data again.

In [None]:
df.head()

### Importing the Model

In [None]:
from sklearn.linear_model import LinearRegression

### Selecting a Set of Features

In [None]:
import numpy as np
from sklearn.model_selection import cross_val_score

In [None]:
def get_rmse(scores):
    mse_scores = -scores
    rmse_scores = np.sqrt(mse_scores)
    return rmse_scores.mean()

In [None]:
y = df[df.year >= 2015]['sponsor_budget']

In [None]:
X = df[df.year <= 2016][['revenue', 'number_of_employees']]
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=2, scoring='neg_mean_squared_error')
get_rmse(scores)

In [None]:
X = df[df.year <= 2016][['revenue']]
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=2, scoring='neg_mean_squared_error')
get_rmse(scores)

In [None]:
X = df[df.year <= 2016][['number_of_employees']]
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=2, scoring='neg_mean_squared_error')
get_rmse(scores)

### Normalizing Data

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler_y = StandardScaler()
y_scaled = scaler_y.fit_transform(y.values.reshape(-1, 1))

In [None]:
X = df[df.year <= 2016][['revenue', 'number_of_employees']]
scaler_x = StandardScaler()
X_scaled = scaler_x.fit_transform(X)

lr = LinearRegression()
scores = cross_val_score(lr, X_scaled, y_scaled, cv=2, scoring='neg_mean_squared_error')
get_rmse(scores)

In [None]:
X = df[df.year <= 2016][['revenue']]
scaler_x = StandardScaler()
X_scaled = scaler_x.fit_transform(X)

lr = LinearRegression()
scores = cross_val_score(lr, X_scaled, y_scaled, cv=2, scoring='neg_mean_squared_error')
get_rmse(scores)

In [None]:
X = df[df.year <= 2016][['number_of_employees']]
scaler_x = StandardScaler()
X_scaled = scaler_x.fit_transform(X)

lr = LinearRegression()
scores = cross_val_score(lr, X_scaled, y_scaled, cv=2, scoring='neg_mean_squared_error')
get_rmse(scores)

### Creating a Final Model

In [None]:
selected_features = ['revenue']
dependent_variable = ['sponsor_budget']

In [None]:
X = df[df.year <= 2016][selected_features]
y = df[df.year >= 2015][dependent_variable]

In [None]:
y

In [None]:
scaler_x = StandardScaler()
scaler_x.fit(X)
X_scaled = scaler_x.transform(X)

scaler_y = StandardScaler()
scaler_y.fit(y)
y_scaled = scaler_y.transform(y)

In [None]:
lr = LinearRegression()
lr.fit(X_scaled, y_scaled)

In [None]:
print(f'Coefficients: {lr.coef_}')
print(f'Independent Term: {lr.intercept_}')

### Making a Prediction

In [None]:
X_test = df[df.year == 2017][selected_features]
X_test.head()

In [None]:
X_test_scaled = scaler_x.transform(X_test)
X_test_scaled

In [None]:
results = lr.predict(X_test_scaled)

In [None]:
print(f"Let's prepare the event sponsorship budget (in 2018) around {scaler_y.inverse_transform(results)} baht!")