# Random Sampling

In this activity you will use the provided dataset of a bank's telemarketing campaign. You will compare the effectiveness of random resampling methods using a random forest. You will measure the random forest's recall of the minority class for both a random forest fitted to the resampled data and the original.

## Instructions:

1. Read the CSV file into a Pandas DataFrame.

2. Separate the features `X` from the target `y`.

3. Encode the categorical variables from the features data using [`get_dummies`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html).

4. Separate the data into training and testing subsets.

5. Scale the data using [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)

**RandomForestClassifier**

6. Create and fit a `RandomForestClassifier` to the **scaled** training data.

7.  Make predictions using the scaled testing data.

**Random Undersampler**

8. Import `RandomUnderSampler` from `imblearn`.

9. Fit the random undersampler to the scaled training data.

10. Check the `value_counts` for the resampled target.

11. Create and fit a `RandomForestClassifier` to the **undersampled** training data.

12. Make predictions using the scaled testing data.

13. Generate and compare classification reports for each model.

**Random Oversampler**

14. Import `RandomOverSampler` from `imblearn`.

15. Fit the random over sampler to the scaled training data.

16. Check the `value_counts` for the resampled target.

17. Create and fit a `RandomForestClassifier` to the **oversampled** training data.

18. Make predictions using the scaled testing data.

19. Generate and compare classification reports for each model.

## Prepare the Data

In [None]:
# Import modules
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler

### 1. Read the CSV file into a Pandas DataFrame

In [None]:
# Read the CSV file into a Pandas DataFrame
df = # YOUR CODE HERE

# Review the DataFrame
# YOUR CODE HERE

### 2. Separate the features, `X`, from the target, `y`, data.

In [None]:
# Split the features and target data
y = # YOUR CODE HERE
X = # YOUR CODE HERE


### 3. Encode categorical variables with `get_dummies`

In [None]:
# Encode the features dataset's categorical variables using get_dummies
X = # YOUR CODE HERE

# Review the features DataFrame
# YOUR CODE HERE


### 4. Split the data into training and testing sets

In [None]:
# Split data
X_train, X_test, y_train, y_test = # YOUR CODE HERE


In [None]:
# Review the distinct values from y
# YOUR CODE HERE


### 5. Scale the data using `StandardScaler`

In [None]:
# Instantiate a StandardScaler instance
scaler = # YOUR CODE HERE

# Fit the training data to the standard scaler
X_scaler = # YOUR CODE HERE

# Transform the training data using the scaler
X_train_scaled = # YOUR CODE HERE

# Transform the testing data using the scaler
X_test_scaled = # YOUR CODE HERE

---

## RandomForestClassifier

### 6. Create and fit a `RandomForestClassifier` to the **scaled** training data.

In [None]:
# Import the RandomForestClassifier from sklearn
from sklearn.ensemble import RandomForestClassifier

# Instantiate a RandomForestClassifier instance
model = # YOUR CODE HERE

# Fit the training data to the model
# YOUR CODE HERE


### 7. Make predictions using the scaled testing data.

In [None]:
# Predict labels for original scaled testing features
y_pred = # YOUR CODE HERE

---

## Random Undersampler

### 8. Import `RandomUnderSampler` from `imblearn`.

In [None]:
# Import RandomUnderSampler from imblearn
# YOUR CODE HERE

# Instantiate a RandomUnderSampler instance
rus = # YOUR CODE HERE


### 9. Fit the random undersampler to the scaled training data.

In [None]:
# Fit the training data to the random undersampler model
X_undersampled, y_undersampled = # YOUR CODE HERE


### 10. Check the `value_counts` for the undersampled target.

In [None]:
# Count distinct values for the resampled target data
# YOUR CODE HERE


### 11. Create and fit a `RandomForestClassifier` to the **undersampled** training data.

In [None]:
# Instantiate a new RandomForestClassier model 
model_resampled = # YOUR CODE HERE

# Fit the undersampled data the new model
# YOUR CODE HERE


### 12. Make predictions using the scaled testing data.

In [None]:
# Predict labels for resampled testing features
y_pred_undersampled = # YOUR CODE HERE

### 13. Generate and compare classification reports for each model.
  * Print a classification report for the model fitted to the original data
  * Print a classification report for the model fitted to the undersampled data

In [None]:
# Print classification reports
print(f"Classifiction Report - Original Data")
print(# YOUR CODE HERE)
print("---------")
print(f"Classifiction Report - Undersampled Data")
print(# YOUR CODE HERE))
    

---

## Random Oversampler

### 14. Import `RandomOverSampler` from `imblearn`.

In [None]:
# Import RandomOverSampler from imblearn
# YOUR CODE HERE

# Instantiate a RandomOversampler instance
ros = # YOUR CODE HERE


### 15. Fit the random over sampler to the scaled training data.

In [None]:
# Fit the model to the training data
X_oversampled, y_oversampled = # YOUR CODE HERE


### 16. Check the `value_counts` for the resampled target.

In [None]:
# Count distinct values
# YOUR CODE HERE


### 17. Create and fit a `RandomForestClassifier` to the **oversampled** training data.

In [None]:
# Instantiate a new RandomForestClassier model
model_oversampled = # YOUR CODE HERE

# Fit the oversampled data the new model
# YOUR CODE HERE


### 18. Make predictions using the scaled testing data.

In [None]:
# Predict labels for oversampled testing features
y_pred_oversampled = # YOUR CODE HERE


### 19. Generate and compare classification reports for each model.
  * Print a classification report for the model fitted to the original data
  * Print a classification report for the model fitted to the undersampled data
  * Print a classification report for the model fitted to the oversampled data

In [None]:
# Print classification reports
print(f"Classifiction Report - Original Data")
print(# YOUR CODE HERE)
print("---------")
print(f"Classifiction Report - Undersampled Data")
print(# YOUR CODE HERE)
print("---------")
print(f"Classifiction Report - Oversampled Data")
print(# YOUR CODE HERE)