### Scikit Learn

MinMaxScaler is a preprocessing tool in the scikit-learn library used for feature scaling. Its primary purpose is to scale the features of a dataset to a specific range, typically between 0 and 1. This process is known as min-max normalization or min-max scaling.  It works by transforming each feature (column) in the dataset so that the minimum value of the feature becomes 0, and the maximum value becomes 1. The transformation is applied using the following formula:

$$\text{X}_{\text{scaled}} = \frac{\text{X} - \text{X}_{\text{min}}}{\text{X}_{\text{max}} - \text{X}_{\text{min}}}$$
Where:
- $\text{X}$ is the original feature value.
- $\text{X}_{\text{min}}$ is the minimum value of the feature.
- $\text{X}_{\text{max}}$ is the maximum value of the feature.

Example

Suppose you have a dataset with a feature `Age` with values ranging from 20 to 60. 
Applying `MinMaxScaler` to this feature will transform the values as follows:
- The minimum value (20) will be scaled to 0.
- The maximum value (60) will be scaled to 1.
- Any value in between will be scaled proportionally.

For example, the value 40 will be scaled to 0.5.

Why Use `MinMaxScaler`?
1. Normalization: It ensures that all features are on a similar scale, which can improve the performance of many machine learning algorithms, especially those that rely on distance calculations (e.g., k-nearest neighbors, support vector machines)
2. Avoiding Dominance: Features with larger ranges can dominate the model, leading to biased results. Scaling helps mitigate this issue.
3. Consistency: It makes the data more consistent and easier to interpret.

In [7]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Sample data
data = np.array([[20, 50000],
                 [30, 60000],
                 [40, 70000],
                 [50, 80000],
                 [60, 90000]])



# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Fit the scaler to the data and transform it
scaled_data = scaler.fit_transform(data)


print("Original Data:\n", data)
print("Scaled Data:\n", scaled_data)


Original Data:
 [[   20 50000]
 [   30 60000]
 [   40 70000]
 [   50 80000]
 [   60 90000]]
Scaled Data:
 [[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [0.75 0.75]
 [1.   1.  ]]


In [15]:
import seaborn as sns

df = pd.DataFrame(data, columns=['X', 'Y'])
scaled_df = pd.DataFrame(scaled_data, columns=['X_scaled', 'Y_scaled'])

# Combine original and scaled data for plotting
combined_df = pd.concat([df, scaled_df], axis=1)
# TODO use seaborn to draw the above data
# sns.scatterplot(data=combined_df, color='blue')