**Handling imbalanced classes with Upsampling**

https://chrisalbon.com/machine_learning/preprocessing_structured_data/handling_imbalanced_classes_with_upsampling/

**Upsampling - A strategy to handle imbalanced classes by repeatedly sampling with replacement from the minority class to make it equal to the size of the majority class.**

In upsampling, for every observation in the majority class, we randomly select an observation from the minority class with replacement. The end result is the same number of observations from the minority and majority classes.



**Preliminaries**

In [1]:
#Load libraries

import numpy as np
from sklearn.datasets import load_iris

**Load IRIS Dataset**

In [7]:
#Load Iris data
iris = load_iris()

#Create feature matrix
X = iris.data

#Create target vector
y = iris.target

**Make Iris dataset Imbalanced**

In [8]:
#Take out first 40 observations

X = X[40:,:]
y = y[40:]


In [9]:
#Create binary target vector indicating if class 0

y = np.where((y==0),0,1)

#look at the imbalanced target vector
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

**Upsampling minority class to match majority class**

In [11]:
#Indices of each class' observations
i_class0 = np.where(y==0)[0]
i_class1 = np.where(y==1)[0]

#Number of observations of each class
n_class0 = len(i_class0)
n_class1 = len(i_class1)

#For every observation of class 1, randomly sample from class 0 with replacement
i_class0_upsampled  = np.random.choice(i_class0, size = n_class1, replace = True)

#Join together class 0's upsampled target vector with class 1's target vector
np.concatenate((y[i_class0_upsampled],y[i_class1]))

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])