# **Lab: Model Serving**



## Exercise 3: Sklearn Pipeline

The steps are:
1.   Setup Repository
2.   Load Dataset
3.   Build Pipeline
4.   Push Changes


### 1. Setup Repository

**[1.1]** Go to the folder you created previously

In [None]:
# Placeholder for student's code (1 command line)

In [None]:
# Solution:
cd ~/Projects/adv_mla_2025/adv_mla_lab_3

**[1.2]** Create a new git branch called `pipelines`

In [None]:
# Placeholder for student's code (command line)

In [None]:
#Solution:
git checkout -b pipelines

**[1.5]** Navigate the folder `notebooks` and create a new jupyter notebook called `2_pipelines.ipynb`

### 2.   Load Dataset

**[2.1]** Launch magic commands to automatically reload modules

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
%load_ext autoreload
%autoreload 2

**[2.2]** Import the pandas and numpy packages

In [None]:
# Placeholder for student's code (Python code)

In [None]:
#Solution
import pandas as pd
import numpy as np

**[2.3]** Load the data in a dataframe called `df`


In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
df = pd.read_csv('../data/raw/archive.zip')

**[2.4]** Create a copy of `df` and save it into a variable called `df_cleaned`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
df_cleaned = df.copy()

### 3. Build Pipeline

**[3.1]** Import `Pipeline` from `sklearn.pipeline`, `StandardScaler`, `OrdinalEncoder` and `OneHotEncoder` from `sklearn.preprocessing` and `SGDClassifier` from `sklearn.linear_model`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder
from sklearn.linear_model import SGDClassifier

**[3.2]** Create a `Pipeline` called `num_transformer` with one step that contains `StandardScaler`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
num_transformer = Pipeline(
    steps=[
        ('scaler', StandardScaler())
    ]
)

**[3.3]** Create a `Pipeline` called `cat_transformer` with one step that contains `OneHotEncoder`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
cat_transformer = Pipeline(
    steps=[
        ('one_hot_encoder', OneHotEncoder(sparse_output=False, drop='first'))
    ]
)

**[3.4]** Create a `Pipeline` called `age_ord_transformer` with one step that contains `OrdinalEncoder`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
age_ord_transformer = Pipeline(
    steps=[
        ('ordinal_encoder', OrdinalEncoder())
    ]
)

**[3.5]** Create a `Pipeline` called `health_ord_transformer` with one step that contains `OrdinalEncoder`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
health_ord_transformer = Pipeline(
    steps=[
        ('ordinal_encoder', OrdinalEncoder(categories=[['Poor','Fair','Good','Very Good','Excellent']]))
    ]
)

**[3.6]** Create a `Pipeline` called `checkup_ord_transformer` with one step that contains `OrdinalEncoder`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
checkup_ord_transformer = Pipeline(
    steps=[
        ('ordinal_encoder', OrdinalEncoder(categories=[['Within the past year','Within the past 2 years','Within the past 5 years','5 or more years ago','Never']]))
    ]
)

**[3.7]** Create a list called `num_cols` that will contain the list of columns that are numeric type

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
num_cols = df_cleaned.select_dtypes(include=['float64']).columns

**[3.8]** Create a list called `cat_cols` that will contain the list of columns that are categorical type

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
cat_cols = ['Arthritis', 'Depression', 'Diabetes', 'Exercise', 'Other_Cancer', 'Sex', 'Skin_Cancer', 'Smoking_History']

**[3.9]** Import `ColumnTransformer` from `sklearn.compose`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
from sklearn.compose import ColumnTransformer

**[3.10]** Create a `ColumnTransformer` called `preprocessor` containing the following steps
- `num_transformer` for `num_cols`
- `cat_transformer` for `cat_cols`
- `age_ord_transformer` for `Age_Category`
- `health_ord_transformer` for `General_Health`
- `checkup_ord_transformer` for `Checkup`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
preprocessor = ColumnTransformer(
    transformers=[
        ('num_cols', num_transformer, num_cols),
        ('cat_cols', cat_transformer, cat_cols),
        ('age_col', age_ord_transformer, ['Age_Category']),
        ('health_col', health_ord_transformer, ['General_Health']),
        ('checkup_col', checkup_ord_transformer, ['Checkup'])
    ]
)

**[3.11]** Create a `Pipeline` called `sgd_pipe` that contains 2 steps `preprocessor` and another that instantiate a SGDClassifier with same parameters as previously

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
sgd_pipe = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('sgd', SGDClassifier(loss='log_loss', penalty='elasticnet', random_state=42))
    ]
)

**[3.12]** Fit `sgd_pipe` with `df_cleaned`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
target = df_cleaned['Heart_Disease'].map({'Yes': 1, 'No': 0})
sgd_pipe.fit(df_cleaned, target)

**[3.13]** Make predictions on df_cleaned

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
sgd_pipe.predict(df_cleaned)

**[3.11]** Transform the first observation of `df_cleaned` into a dataframe, call it `obs` and make prediction on it

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
obs = pd.DataFrame(df_cleaned.iloc[0]).transpose()
sgd_pipe.predict(obs)

**[3.12]** Import `dump` from `joblib` package and save `sgd_pipe` into `models` folder

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution:
from joblib import dump

dump(sgd_pipe,  '../models/sgd_pipeline.joblib')

### 4.   Push changes

**[4.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (1 command line)

In [None]:
# Solution:
git add .

**[4.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (1 command line)

In [None]:
# Solution:
git commit -m "sgd pipeline"

**[4.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (1 command line)

In [None]:
# Solution:
git push --set-upstream origin pipelines

**[4.4]** Go to Github and merge the branch after reviewing the code and fixing any conflict




**[4.5]** Check out to the master branch

In [None]:
# Placeholder for student's code (1 command line)

In [None]:
# Solution:
git checkout master

**[4.6]** Pull the latest updates

In [None]:
# Placeholder for student's code (1 command line)

In [None]:
# Solution:
git pull