<a href="https://colab.research.google.com/github/rhodes-byu/cs180-winter25/blob/main/notebooks/13-encoding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [35]:
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.datasets import fetch_openml
import seaborn as sns
import pandas as pd
import numpy as np

## Transforming Continuous (Numeric) Featuers

#### Standardization
Standardization is the process of scaling features to have a mean of 0 and a standard deviation of 1. The formula for standardization is:

$$ z = \frac{x - \mu}{\sigma} $$

where:
- $z$ is the standardized value  
- $x$ is the original value  
- $\mu$ is the mean of the feature  
- $\sigma$ is the standard deviation of the feature  

#### Normalization
Normalization is the process of scaling features to a range of [0, 1]. The formula for normalization is:

$$ x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}} $$

where:
- $x'$ is the normalized value  
- $x$ is the original value  
- $x_{\min}$ is the minimum value of the feature  
- $x_{\max}$ is the maximum value of the feature  


## Normalization vs. Standardization

### **Use Normalization (Scaling to [0, 1] or [-1, 1]) When:**
- **Bounded Data**: Features have a fixed range (e.g., pixel values [0, 255]).
- **Deep Learning**: Neural networks perform better with small, scaled inputs.
- **Distance-Based Models**: k-NN, K-Means, and clustering methods rely on consistent feature scales.
- **Non-Gaussian Data**: Works even when data isn't normally distributed.
- **Interpretability**: Easier to understand in real-world terms.

### **Use Standardization (Zero Mean, Unit Variance) When:**
- **Gaussian-Like Data**: Ideal for normally distributed features.
- **Linear Models & PCA**: Regression, SVM, and PCA assume standardized inputs.
- **Outlier Robustness**: Less sensitive to extreme values than normalization.
- **Different Units**: Useful when features have varying scales (e.g., income vs. age).
- **Optimization Stability**: Gradient-based models (SGD, Adam) converge better.

### **Key Takeaways:**
- **Normalization**: Best for bounded data, deep learning, and distance-based models.
- **Standardization**: Best for Gaussian-like data, linear models, and handling different units.

### Sklearn Scaling / Normalizing

#### Scaling

In [36]:
X = np.random.normal(loc = 10, scale = 3, size = 1000)

In [37]:
np.mean(X), np.std(X)

(np.float64(9.956321861119374), np.float64(2.934210348380183))

In [38]:
scaler = StandardScaler()

# Note: Sklearn requires at least one column; the reshape ensures a column vector
X_scaled = scaler.fit_transform(X.reshape(-1, 1))

In [39]:
np.mean(X_scaled), np.std(X_scaled)

(np.float64(-1.438849039914203e-16), np.float64(0.9999999999999999))

#### Normalizing

In [40]:
normalizer = MinMaxScaler()

X_normalized = normalizer.fit_transform(X.reshape(-1, 1))

In [41]:
np.min(X_normalized), np.max(X_normalized)

(np.float64(0.0), np.float64(0.9999999999999999))

### Pandas Scaling

In [42]:
df = sns.load_dataset('iris')

In [43]:
# Sklearn StandardScaler converts to array
scaler = StandardScaler()
scaler.fit_transform(df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']])

array([[-9.00681170e-01,  1.01900435e+00, -1.34022653e+00,
        -1.31544430e+00],
       [-1.14301691e+00, -1.31979479e-01, -1.34022653e+00,
        -1.31544430e+00],
       [-1.38535265e+00,  3.28414053e-01, -1.39706395e+00,
        -1.31544430e+00],
       [-1.50652052e+00,  9.82172869e-02, -1.28338910e+00,
        -1.31544430e+00],
       [-1.02184904e+00,  1.24920112e+00, -1.34022653e+00,
        -1.31544430e+00],
       [-5.37177559e-01,  1.93979142e+00, -1.16971425e+00,
        -1.05217993e+00],
       [-1.50652052e+00,  7.88807586e-01, -1.34022653e+00,
        -1.18381211e+00],
       [-1.02184904e+00,  7.88807586e-01, -1.28338910e+00,
        -1.31544430e+00],
       [-1.74885626e+00, -3.62176246e-01, -1.34022653e+00,
        -1.31544430e+00],
       [-1.14301691e+00,  9.82172869e-02, -1.28338910e+00,
        -1.44707648e+00],
       [-5.37177559e-01,  1.47939788e+00, -1.28338910e+00,
        -1.31544430e+00],
       [-1.26418478e+00,  7.88807586e-01, -1.22655167e+00,
      

In [44]:
# Pandas apply to keep as dataframe; filter by float columns
df_standardized = df.apply(lambda x: (x - x.mean()) / x.std() if x.dtype == 'float64' else x)
df_standardized.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,-0.897674,1.015602,-1.335752,-1.311052,setosa
1,-1.1392,-0.131539,-1.335752,-1.311052,setosa
2,-1.380727,0.327318,-1.392399,-1.311052,setosa
3,-1.50149,0.097889,-1.279104,-1.311052,setosa
4,-1.018437,1.24503,-1.335752,-1.311052,setosa


In [45]:
# Pandas normalization
df_normalized = df.apply(lambda x: (x - x.min()) / (x.max() - x.min()) if x.dtype == 'float64' else x)
df_normalized.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,0.222222,0.625,0.067797,0.041667,setosa
1,0.166667,0.416667,0.067797,0.041667,setosa
2,0.111111,0.5,0.050847,0.041667,setosa
3,0.083333,0.458333,0.084746,0.041667,setosa
4,0.194444,0.666667,0.067797,0.041667,setosa


## Processing Categorical Features

### Label Encoding

Typically used to encode the labels or targets when labels are categories.  

`LabelEncoder` from `sklearn.preprocessing` maps from categories (strings) to integer values.


### One-Hot Encoding

One-hot encoding splits up a single categorical feature (e.g., `['cat', 'dog', 'fish']`) into several columns which represent binary values, 1 mapped to the category of the observation, and 0 for the other categories.

For example, the animal column with values `['cat', 'dog', 'fish', 'cat']` Would map to

| cat | dog | fish |
|-----|-----|------|
|  1  |  0  |  0   |
|  0  |  1  |  0   |
|  0  |  0  |  1   |
|  1  |  0  |  0   |




### Sklearn Encoding

#### Label Encoding

In [50]:
y_str = ['zebra', 'dog', 'cat', 'fish', 'dog', 'cat', 'fish']

label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y_str)

In [51]:
print(y_encoded)

[3 1 0 2 1 0 2]


#### One Hot Encoding

In [52]:
one_hot_encoder = OneHotEncoder(sparse_output = False)
y_one_hot = one_hot_encoder.fit_transform(y_encoded.reshape(-1, 1))

In [53]:
print(y_one_hot)

[[0. 0. 0. 1.]
 [0. 1. 0. 0.]
 [1. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]
 [1. 0. 0. 0.]
 [0. 0. 1. 0.]]


### Pandas Encoding

#### Label Encoding

In [62]:
# Load in the titanic dataset
data = fetch_openml(data_id=40945, parser = 'auto')
titanic = data.frame
titanic.drop(['body', 'boat', 'name', 'ticket', 'home.dest', 'cabin'], axis = 1, inplace = True)
titanic.dropna(inplace = True)

In [63]:
titanic.head()

Unnamed: 0,pclass,survived,sex,age,sibsp,parch,fare,embarked
0,1,1,female,29.0,0,0,211.3375,S
1,1,1,male,0.9167,1,2,151.55,S
2,1,0,female,2.0,1,2,151.55,S
3,1,0,male,30.0,1,2,151.55,S
4,1,0,female,25.0,1,2,151.55,S


In [64]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1043 entries, 0 to 1308
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   pclass    1043 non-null   int64   
 1   survived  1043 non-null   category
 2   sex       1043 non-null   category
 3   age       1043 non-null   float64 
 4   sibsp     1043 non-null   int64   
 5   parch     1043 non-null   int64   
 6   fare      1043 non-null   float64 
 7   embarked  1043 non-null   category
dtypes: category(3), float64(2), int64(3)
memory usage: 52.3 KB


In [65]:
titanic_encoded = titanic.apply(lambda x: pd.Categorical(x).codes if x.dtype == 'category' else x)

In [66]:
titanic_encoded.head()

Unnamed: 0,pclass,survived,sex,age,sibsp,parch,fare,embarked
0,1,1,0,29.0,0,0,211.3375,2
1,1,1,1,0.9167,1,2,151.55,2
2,1,0,0,2.0,1,2,151.55,2
3,1,0,1,30.0,1,2,151.55,2
4,1,0,0,25.0,1,2,151.55,2


#### One-Hot Encoding

In [67]:
titanic_one_hot = pd.get_dummies(titanic)

In [68]:
titanic_one_hot.head()

Unnamed: 0,pclass,age,sibsp,parch,fare,survived_0,survived_1,sex_female,sex_male,embarked_C,embarked_Q,embarked_S
0,1,29.0,0,0,211.3375,False,True,True,False,False,False,True
1,1,0.9167,1,2,151.55,False,True,False,True,False,False,True
2,1,2.0,1,2,151.55,True,False,True,False,False,False,True
3,1,30.0,1,2,151.55,True,False,False,True,False,False,True
4,1,25.0,1,2,151.55,True,False,True,False,False,False,True


# **In-Class Activity: Predicting Obesity Levels from Eating Habits and Physical Condition**

In this activity, you will work with a dataset designed to predict obesity levels based on various eating habits and physical conditions. Your goal is to preprocess the data, experiment with different encoding strategies, and compare classification models.

---

## **Review the Dataset**
Before beginning, take some time to familiarize yourself with the dataset and its features. Feature descriptions can be found [here](https://archive.ics.uci.edu/dataset/544/estimation+of+obesity+levels+based+on+eating+habits+and+physical+condition).

Consider the following as you review the dataset:
- What types of features are present? *(Numerical, ordinal, categorical?)*  
- How should these features be encoded for use in machine learning models?

---

## **Data Preprocessing**
- **Encoding:** Decide how to encode categorical and ordinal variables appropriately.
- **Splitting:** Divide the dataset into **80% training** and **20% testing** using:


## **Model Training & Cross-Validation**
- Apply **cross-validation** on the training set to fine-tune hyperparameters and evaluate model performance.
- Compare the results of **$k$-Nearest Neighbors (k-NN) and Logistic Regression** using cross-validation scores.

### **Evaluation:**
1. Compare the models based on accuracy.
2. Consider hyperparameter tuning for both models:
   - For **k-NN**, experiment with different values of k, metrics, and weighting.
   - For **Logistic Regression**, consider trying different penalties. (View the documentation [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).)

## **Wait Before Testing!**
🚨 **Do NOT evaluate your model on the test set until instructed to do so!** 🚨  

- The test set should remain **unseen** throughout training and validation.
- We will use it **only once** to assess the final model’s performance.
- Keep track of your cross-validation results to decide which model to use for final testing.

### **Why is this important?**
Evaluating too early on the test set can lead to **data leakage** and **overfitting**, giving misleading performance estimates. The test set should serve as a final, unbiased evaluation of the model.




In [61]:
# Here is the data:
df = pd.read_csv('https://raw.githubusercontent.com/rhodes-byu/cs180-winter25/refs/heads/main/data/obesity.csv')
df.head()

Unnamed: 0,NObeyesdad,Gender,Age,Height,family_history_with_overweight,FAVC,FCVC,NCP,CAEC,SMOKE,CH2O,SCC,FAF,TUE,CALC,MTRANS
0,Normal_Weight,Female,21.0,1.62,yes,no,2.0,3.0,Sometimes,no,2.0,no,0.0,1.0,no,Public_Transportation
1,Normal_Weight,Female,21.0,1.52,yes,no,3.0,3.0,Sometimes,yes,3.0,yes,3.0,0.0,Sometimes,Public_Transportation
2,Normal_Weight,Male,23.0,1.8,yes,no,2.0,3.0,Sometimes,no,2.0,no,2.0,1.0,Frequently,Public_Transportation
3,Overweight_Level_I,Male,27.0,1.8,no,no,3.0,3.0,Sometimes,no,2.0,no,2.0,0.0,Frequently,Walking
4,Overweight_Level_II,Male,22.0,1.78,no,no,2.0,1.0,Sometimes,no,2.0,no,0.0,0.0,Sometimes,Public_Transportation


In [76]:
# Step 1: Import required libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Step 2: Generate a synthetic dataset for predicting obesity levels
np.random.seed(42)
data_size = 500

# Continuous Features
age = np.random.randint(10, 70, data_size)  # Age range from 10 to 70 years old
height = np.random.uniform(1.4, 2.0, data_size)  # Height in meters
weight = np.random.uniform(40, 120, data_size)  # Weight in kg
caloric_intake = np.random.randint(1200, 4000, data_size)  # Daily caloric intake
exercise_hours = np.random.uniform(0, 3, data_size)  # Hours of exercise per day

# Categorical Features
gender = np.random.choice(["Male", "Female"], data_size)
eating_frequency = np.random.choice(["Low", "Medium", "High"], data_size)
fast_food_intake = np.random.choice(["Never", "Rarely", "Often", "Always"], data_size)

# Target variable: Obesity Level (Classifying obesity risk based on synthetic conditions)
obesity_levels = np.random.choice(["Underweight", "Normal", "Overweight", "Obese"], data_size)

# Create DataFrame
df_obesity = pd.DataFrame({
    "Age": age,
    "Height": height,
    "Weight": weight,
    "Caloric_Intake": caloric_intake,
    "Exercise_Hours": exercise_hours,
    "Gender": gender,
    "Eating_Frequency": eating_frequency,
    "Fast_Food_Intake": fast_food_intake,
    "Obesity_Level": obesity_levels
})

# Step 3: Data Preprocessing
# Encode categorical variables using Label Encoding
label_enc = LabelEncoder()
df_obesity["Gender"] = label_enc.fit_transform(df_obesity["Gender"])  # Male=1, Female=0
df_obesity["Eating_Frequency"] = label_enc.fit_transform(df_obesity["Eating_Frequency"])  # Low=0, Medium=1, High=2
df_obesity["Fast_Food_Intake"] = label_enc.fit_transform(df_obesity["Fast_Food_Intake"])  # Encoding frequency
df_obesity["Obesity_Level"] = label_enc.fit_transform(df_obesity["Obesity_Level"])  # Encode target variable

# Step 4: Normalization & Standardization
scaler_standard = StandardScaler()

# Standardize numerical features
df_obesity[["Age", "Height", "Weight", "Caloric_Intake", "Exercise_Hours"]] = scaler_standard.fit_transform(
    df_obesity[["Age", "Height", "Weight", "Caloric_Intake", "Exercise_Hours"]]
)

# Step 5: Train-Test Split
X = df_obesity.drop(columns=["Obesity_Level"])
y = df_obesity["Obesity_Level"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 6: Model Training & Prediction
# Train a Decision Tree Classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Step 7: Evaluation
accuracy = accuracy_score(y_test, y_pred)

# Display results
print(f"Model Accuracy: {accuracy:.2f}")


Model Accuracy: 0.26


In [73]:
df_scaled = df.copy()  # Make a copy to avoid modifying the original
df_scaled[numeric_features] = scaler.fit_transform(df[numeric_features])

In [74]:
categorical_features = ['Gender', 'family_history_with_overweight', 'FAVC', 'CAEC', 'SMOKE', 'SCC', 'CALC', 'MTRANS']

In [75]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

ValueError: Expected 2D array, got 1D array instead:
array=[ 9.27624491 11.90492412  6.7915709  12.53797913  9.62113921 10.56283148
  9.41890973  7.09092347 10.85069098  9.12111123 10.29147798  5.51393666
 10.56011654  7.67664228  9.92689297 14.20318004 11.2341732   8.88770406
 13.40247481  9.6265758   5.47683399  4.93579489 13.10161124 10.06363878
  8.10794131  8.82963128  9.86353172  4.11920778 10.47187538  7.02520727
 13.25209535  9.35541097 10.35334164 14.92302902  7.71310601  8.87227561
  9.16985012  8.74482439  9.79329404 14.76575412 10.73378936  4.48062529
 14.08080861  8.65308753  8.48470285  7.62224614  7.74508187 11.82808466
  1.75786162 11.97936114 13.60132502 12.51863496  7.68618292  8.45287171
 11.94897999 11.80434102 11.06163433  3.79769325 11.06635138 12.14805834
 11.15137417 14.54624106 11.48471685  3.69745663  4.43640298  8.61378546
 10.89806963  5.71823578 13.43555121 10.22592487 10.07108013 11.37689813
  8.53930999  8.00050056 11.93308706 12.82501886 12.00797184 10.32793844
 11.30046543  9.43206136 12.49514737 15.39701648  8.9838109  14.69715311
  9.05539967 16.05967818  9.24137611  8.55431831  9.40544055  8.65823453
  7.34879589 13.15749623 10.16634849  6.72721699 13.93099041 12.67356194
  4.78397525  8.89060943 12.30354393 13.5232879   6.40109942  9.48680206
  9.7249272   9.38463932 12.05365489  9.17123414 14.078769    8.18517816
  9.62433409  9.19198354 11.52832821 10.68012782 12.73135613  5.21732515
  8.70052941  7.88767566  9.71508795  7.90845194  8.83126687  8.9550682
  9.37397876  9.82326216  8.14059278  9.85173809  7.16870071  7.5614521
  8.9044485  15.10688376 12.26832418 10.76720477 10.5103309   9.56742961
 10.20541885  4.30593924  6.16815821 10.95325012  6.92732023 12.8647166
  9.75018728  8.06141764 13.71056781  8.71915513 11.94848647  8.90275491
  4.55679036 10.98128647 10.90482619 11.12067599  6.7274797  10.66744722
 11.29075699 11.25869207 14.98051312 11.55349444  9.29112749 10.59966505
  3.7277808   7.89716127  7.0796236   8.80246649  8.04833077 10.03320257
 12.82959127 14.18446574  6.87942543  7.03230948  8.10753406 10.5514362
 13.45659155 10.63233423 13.91257359 13.37020008  8.79176364  8.80369626
 12.20610412  9.97077993  9.89961589  8.50569758  5.26326804  6.82257762
  9.73723697  9.70660242  7.66586136  9.41220892 13.97651891 10.21789605
 13.43687626  7.52140526 11.89171081  8.05679257  9.31951872  8.88666945
  9.41024954 11.25859317 13.50072482 10.58297069  4.09437898 10.636654
 12.720959    8.19148728  8.64835285 12.41506849 12.46446878 13.42520402
 12.80218933 10.18453719  2.21526293  7.52639847 12.54572305 12.66193187
  2.581348    5.42379818  9.22834922 11.61039258 13.44463278 13.50754413
 12.56818859  8.36231905  8.51860269 17.30518187  3.19806371 14.74900598
  7.8536536  12.5935196  11.26728654 12.43795572  9.45989335  9.6409508
  8.85106176  6.45198612  5.38425382  9.37270834  7.36786792  5.06427248
 10.63004597  5.92986385 12.7019198   9.82785576  4.74323523 10.00947173
 10.09307644 10.25572202  5.60690618 13.5071211   6.97984393  5.85803863
 10.28414297  9.42012858 15.78914248 10.86028839 11.51336086  7.02016866
  3.4371318  12.08495224 10.57394314  4.41331707 15.52721128  8.9978269
  8.1363341  14.12697233 12.88783162 10.37781348  9.80417496  6.82579334
 11.3969003   7.66735411 14.90949124 11.52986146  8.80174141  7.93353789
  8.21417693  3.39726399  8.98407913  7.13973334  9.14331175 11.13873828
  3.9246446   7.13825149  7.5383165   9.92382513 17.86407584 18.03102763
 12.03495143 11.26117788 11.43969911 11.24051338 13.88023697 12.35872889
  9.03782141 17.56202807  9.20661205 11.77643507 10.35532738 11.96652287
  9.33921415  9.01329291  6.51381506 11.50331859 12.77599203  4.89775075
 13.00955317  6.75153577 12.11148839 12.93266595  8.28321844 13.22580674
  8.98433325  8.92470319 11.78108353  7.78296602 10.9249046  11.74904955
  9.17063756  5.11709983 10.22513703  8.52859053  7.23959775 10.79328343
 12.48046288 12.74455938  2.47154568  6.811719   11.22310536 16.25872194
 10.29038571 12.05677894  9.95285433  9.74296502  6.73341655  9.74445934
  9.46704686 13.84288029 13.06788838  8.8984295  13.27619289 16.50188918
 12.4096404   7.4681581   4.72153046 13.46333341  7.76349236 11.39264668
 11.37926378  7.66457433 13.66577423  4.39284734  6.72876494  8.98652662
  6.3260868   8.90501191 11.34956468 12.27766107 12.07889308  6.11607545
 10.44521579 11.19682766 12.25290208  8.72198607  8.55754527 11.4384272
 11.71566539 12.03228943 10.12756629 12.00305452 10.66368577  9.22374198
 12.59141729  3.63413744  7.35860726 11.22366227  8.54909611  7.7453401
 13.17721329  8.23640391 11.94261137 11.19206505  9.66271417  8.27254937
 11.29369844 10.513024    6.66418906 11.52961955 11.15252974 11.11804376
  8.20050941  8.24320889  5.6831468  15.03366835  9.92547416 13.55714544
  2.32169468  8.72736971 12.9750848  12.03463968 12.71349223 12.21058255
  9.12260425  9.93379332 12.39604942  8.92841163  6.59774689 14.60564771
  8.87472378  9.72574922  9.09201532 10.42380588  8.99316224 10.53065805
 13.58946724 10.11737216 10.10235534 10.83797315  9.12506285  5.55293683
 10.60282046 15.88449443  9.67942116 10.04947283 13.88667527 12.26825798
 11.23735197  8.32458366 13.12965519 13.8652363  11.283587   12.28577727
  6.48383796 14.26709702 12.78901259  5.65516973  8.61109787 16.06633284
 17.20573488 11.88068341 13.58411997 15.63599537  7.07688861 13.31697322
 13.72379348  9.13503974  9.12963156  2.98964875  5.8354872  16.06852893
  8.3488639   8.23907068 13.14957012  6.97022736 11.70944908 10.96759652
  7.19415871 10.43925182 11.6908566  10.11957781  8.1041734   7.33804667
  6.90311409  6.68316086 12.46806765  7.58155009 13.75688624 10.4062499
 12.94495188 11.05028904  8.05256036  9.63841944 16.85444019  9.39195183
  8.00991564  8.09278724 10.21115721  7.94338257  9.85021065  6.96249334
 10.44578329 11.80974404  8.3316572  10.11114827 10.47539223 11.59824016
  9.45969435 10.85562641 10.51969428 15.65336618 15.70616914 10.61971793
  6.32862154  9.7118043  11.53270865 14.99956132  7.16047517  9.12994832
  7.91597697 13.74115422  7.29217193  7.68719822  9.91464502 14.22994634
 15.35255528 12.45423185  8.56884201 13.20480702  6.37934159 12.03335518
 10.71797751 12.95159633 10.90496547 11.64557206  7.57122844  7.52805416
  9.86830676  4.96195399  9.8321002  18.38826163 11.22302743  9.42983169
  4.26808758  6.1595028  10.82557664 11.34859266 12.31294448 10.81782376
 15.00909673  7.57116625  6.84340604 15.29119206  9.60524407 12.05503924
 11.52840482  7.0936058  11.07247352  7.54247189  8.65028958 11.43146103
 10.69455619 13.51405231 14.74723552  3.59717027  9.9604006  11.43763832
 11.72066533  7.59578143  7.64404681 10.76836977 10.43335195  9.08076262
 11.62814202 13.25976395 11.65007883  5.42374937  7.96171706  7.5136504
  9.43582477  7.59762946  4.10960773  8.19078558 12.28765538  7.87962154
  8.72422039  4.5712     10.71446156 11.05303884  8.55693524  5.62035379
  7.79454661  9.20346713  9.49229917 10.55425944 13.38165226 14.70453571
  4.94814748  8.63491601 11.40840433 14.02933968  8.09003273 10.31207218
 10.85922171 10.52036008 13.0791634  12.52831577 13.19580385 12.59589337
  5.46390463  9.15511076 11.1674338  10.84442234 11.91062698  5.30967078
  9.53854568  7.80935321 13.39488126 11.41640549 10.29285076 11.68026215
  4.93427279 10.13717014 11.91346507  9.01008626  6.69274173 15.3046785
 15.00383106  7.27739857  9.22848823  8.44177396 10.67926415 16.96987029
  9.01228106  8.46668842 10.76627629  6.89874031 10.6111387   2.0236623
 10.44858301 12.49740201 14.03253111 13.26580604  9.03507258  8.50673343
 12.29108496  9.80694247  8.76192192 17.10356306 13.17519881  9.20281156
 16.45270504 10.71746481  8.94237201 11.80024649 11.11654785  8.85485795
  6.30939996 13.30950339  5.54825883  8.73514575 12.63371119  9.03207693
 11.98425632 12.55394868  7.17847218 11.51581011 14.27310317  9.54532165
  5.79275377  8.59080579  7.21211075 10.61085877  6.60809174 10.66877473
 12.64869189  5.91194441 11.38784217 12.09929301 11.44508793  9.16913497
 10.93140893 12.57277712 15.1487468  14.0675874   6.61929976  8.70549066
 10.92229198 12.48570955  7.95976845 12.95923744 13.84703281 16.85190275
  6.35344329  7.91151599 12.48034214  9.85869345  7.24238841 11.01943394
  7.77438954 11.127331   11.28587306  9.47169873 10.84456299  9.37364124
  5.10493135 12.05587605 10.49107603  6.83973191 13.25560048  7.88245199
  4.09372923  7.08853139  9.56642875 11.93854843  8.99925422 14.8470014
  7.07985472 13.77921032 13.20388045  8.30864047  4.82696243  9.00969566
  6.74364539  8.18017838  8.96635207  8.56684634 15.09350725  9.09146511
  9.1500414  10.31253585  5.71623958 12.65623223 11.7268877   4.48831015
 11.25446527 11.6361894   7.49194761 11.99917652  5.3184061   9.04392844
  2.78943622  6.47864164  4.86240377 12.39047793  7.62922973 10.05467568
 11.29033947  8.48403534 11.44846142 13.45527784  9.03207685 10.03495511
 11.49666593  2.89926164  9.0975886   8.96893677 14.44326174 16.04508795
  8.22531663  4.36715359  5.26412535  7.38939373 11.35772416 10.66884104
  5.63600317 10.37844609 10.47685782 13.19878042 16.81923663 11.10325341
 11.88815526  3.52429277 10.08322565  9.36985083  9.16680946  6.20946066
  8.2215878   9.54897069 10.66205355 15.37483938  3.13714022 10.71927136
  6.51096937 12.03192769 10.79703293  9.03349435  6.45605928  8.81783231
 10.04564954  9.50433236 14.19522551  9.98080597  7.72971923  6.0167099
 12.54087751 13.17777861 10.27718536 10.15499963 12.98128711 15.23023061
 12.87928428 10.05801774  7.70874343 10.54485034 11.40675017  6.68594191
  9.36143979 14.21824997  9.68323277  6.2382885  10.17703758  8.23927353
  7.69916965 10.56833299  3.24362542 15.5037467   7.70670016  7.97315342
  7.32950138 12.83064192  6.61604223  8.42247982 12.15223942 10.07327291
  9.9467174   7.66362999 13.59651373 10.11492167 11.57294304  7.74428412
  9.69704434 10.84658446 11.21297925  6.64679834  8.1701189   7.84213766
  7.74895406 13.22445022  8.00380536  7.20894027 11.05859604 10.44820005
  7.76814676 15.84207333 12.23588037 16.80602499 11.56783146  9.40888725
 11.2572911  11.78064792 12.12837799  8.53190814  9.27564621 10.84409415
 11.37180733 10.16582014  9.82839268  4.0830213  10.23291914  8.08030527
  7.20719642  9.29555036 12.05436168  9.81872772  4.65266869 14.06051187
  7.55904815 13.15473162 11.23278295  9.83173288 12.06230484 12.73458187
 14.77357467 14.27352358 14.32525332  8.31548822  5.84464866  7.81629259
  6.45145729 13.13688625  7.59152853  7.39826543  9.73612425 10.62914951
 10.71509722  3.10381163 11.59974352 11.43504469  8.79353057 13.33670356
  8.13964215 10.97169812 10.97862682 12.64408319 10.8902873  15.07002916
  4.83552988  8.32807862 13.23818425  7.14994131 10.74630881  9.89143054
 11.06142847  1.41779234  7.17073468  7.98715781 12.7538092   9.02627066
 13.10321194  9.91091469 15.09609275 13.28592269 10.05420292  5.43963879
 11.37534642 10.36864577  7.79857918  9.70119747 13.19717304  3.65631018
  8.62346989  4.61917193 12.39445297  9.92132585 10.37915974 11.29519513
  1.46553483  9.47323117 12.71765825  8.3667906   7.30878409 14.07052964
  9.59569198 10.68764255  7.78799726  9.33313486 15.11838449  0.56429717
  7.32813162 12.20838903 11.71324221  5.93917751  8.12094317 11.23802502
  7.79880476 11.47654702 12.82543705 11.6585006  10.45909225 10.52581974
  0.42045634  8.8339253  12.62792873  9.20415647  8.98419558 12.11523114
  9.0399515  14.19268585 11.81914121 11.79763639  3.86038232  7.58833308
 11.64934982 10.24528634  9.82797304  9.90183235  7.95654265  6.45832142
  9.98490688  6.12839793  9.42057551 12.01826501  8.11768403  9.34390197
 11.06493548 14.90833495 15.66237601  7.58978564 13.02972715  4.20052851
 11.6385288   8.91174727 10.27836226 10.506433    9.8815361  11.98229288
 12.7846983   9.81963127  9.71859486  8.36817139 10.95463627  7.56202642
  9.79087319 10.89707685 12.73519504 12.42682038  7.71314676 14.75020534
  4.92858536  7.57253096  4.73693773  8.72456793  8.56811765  6.14104792
  5.69161816  9.01514587  8.69057432 13.46731369  8.82973517  8.4075337
  8.991442    8.31694113  7.75786746  8.95746798 14.67424246 11.8938108
 12.4233234  10.83766958 13.46727378 14.49546439 10.88609956  4.64669141
 10.47795001 10.79074653 11.25073066  8.36899881].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [None]:
minmax_scaler = MinMaxScaler()
X_normalized = minmax_scaler.fit_transform(X)

In [None]:
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)

In [None]:
one_hot_encoder = OneHotEncoder(sparse = False)
X_encoded = one_hot_encoder.fit_transform(X_categorical)