# Object Oriented Programming - Practice

### Write a Standard Scaler from Scratch!

In this notebook, we walk through the process of coding SKLearn's `StandardScaler` class from scratch.

In [None]:
# Run this cell unchanged
# Import assignment packages
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Load data
df = pd.read_csv('data/auto-mpg.csv')
# Let's only use the numeric columns, so drop car name
df = df.drop(columns='car name')

# Output preview of data
df.head(2)

**Let's set up a train test split for our dataset**

In [None]:
# Run this cell unchanged
X = df.drop(columns = 'mpg')
y = df['mpg']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2022)

## Your Task
1. Write `fit` and `transform` functions ***outside*** of a class

    - We want to get the code working _before_ we throw it into a class
    
    
2. Create a `StandardScaler` class with `fit` and `transform` methods

    - We will need to add the `self` variable during this step
    
    
Then we'll compare our results with SKLearn's!

### Step 1.1: `fit`

In the cell below, let's define a function called `fit`. 

**This function should receive 1 argument**
1. `X` - A pandas dataframe or numpy array

**This function should execute the following steps:**
1. Convert `X` to a numpy array by passing the input into `np.array`.
3. Loop over the columns of the numpy array.
4. For each column, calculate the mean and standard deviation
5. Store the statistics in the container as a tuple with the following format:
```python
(mean, standard_deviation)
```

**This function should not return anything.**

In [None]:
container = []

def fit(X):
    # Convert X to a numpy array by passing the input into np.array
    
    # Loop over the columns of the numpy array.
    
        # For each column, calculate the mean and standard deviation
        
        # Store the statistics AS A TUPLE in the container
        

### Step 1.2: `transform`

Below we define function called `transform`. 

**This function should receive 1 argument**
1. `X` - Pandas dataframe or numpy array

**This function should execute the following steps:**
1. Convert X to a numpy array by passing the input into np.array
2. Loop over the columns of X
3. Access the mean and standard deviation that were created from the `fit` function and stored in the container variable.
4. Subtract the mean from the column and divide by the standard deviation.
5. Return the transformed version of X

In [None]:
def transform(X):
    # Convert X to a numpy array by passing the input into np.array
    
    # Loop over the columns of X
    
        # Access the mean and standard deviation that were 
        # created from the fit function and stored in the container variable.
        
        # Subtract the mean from the column 
        # and divide by the standard deviation.
       
    # Return the transformed version of X  
    

In [None]:
# Run this cell to try it
container = []
fit(X_train)
X_train_scaled = transform(X_train)
X_train_scaled[:5]

## Step 2: Move our code into a `StandardScaler` class!

Adjust the two functions you wrote above to work as class methods.

Note that you'll want to add the `self` argument to your function, before the `X` argument those functions take in. 

You'll also want to save the container as a `container` attribute of your class.

In [None]:
class StandardScaler:
    
    def fit():
        
    def transform():
        

## Now, compare our results with SKLearn's scaler!

Run the cells below **without changes** to check your work!

In [None]:
from sklearn.preprocessing import StandardScaler as SklearnScaler

In [None]:
# Create an instance of our scaler
our_scaler = StandardScaler()
our_scaler.fit(X_train)

In [None]:
# Create an instance of sklearn's scaler
sk_scaler = SklearnScaler()
sk_scaler.fit(X_train)

In [None]:
# Scaler train with our scaler
our_scaled_train = our_scaler.transform(X_train)
sk_scaled_train = sk_scaler.transform(X_train)

# Scaler test with our scaler
our_scaled_test = our_scaler.transform(X_test)
sk_scaled_test = sk_scaler.transform(X_test)

In [None]:
# Check if our scaled train is the same as sklearn's
np.all(our_scaled_train == sk_scaled_train)

In [None]:
# Check if our scaled test is the same as sklearn's
np.all(our_scaled_test == sk_scaled_test)