## Imports

In [None]:
import pandas as pd

from scipy.stats import uniform

from sklearn import set_config; set_config(display='diagram') # to visualize pipeline

from sklearn.impute import SimpleImputer
from sklearn.model_selection import RandomizedSearchCV, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC # C-Support Vector Classification

from sklearn import set_config; set_config(display='diagram')

# Tuning Pipeline

Consider the following dataset.

In [None]:
df = pd.read_csv('data/nba_2.csv')
df.head()

- Each observation represents a player
- Each column represents a characteristic of a player's performance

The target defines whether the player lasted less than 5 years (`0`) vs. 5 years or more (`1`) as a professional.

In [None]:
X = df.drop(columns="target_5y")
y = df['target_5y']

## Pipeline

Let's start with a simple pipeline.

In [None]:
# Preprocessing pipe
preprocessor = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaling', MinMaxScaler())
])

# Final pipe
pipe = Pipeline([
    ('preprocessing', preprocessor),
    ('model_svm', SVC())
])

pipe

## Fine-Tuning

Our task is to assist in the recruitment process of promising young players.  
The model should **limit false alarms as much as possible** to avoid recruiting players that will flop.

In [None]:
# Visualize the parameters of the pipeline
pipe.get_params()

❓ **>>>** **Fine-tune this pipeline to maximize your objective**

- Use the `scoring` metric appropriate for the task
- Grid Search for the optimal
    - imputing `strategy`
    - `kernel`
    - regularization factor `C`
- Store your random search results in a `search` variable

In [None]:
# Code here!



## Export

Once you have built your optimal pipeline, export it as a pickle file.

In [None]:
import pickle

# Code here!

