# Notebook A: Ensemble Models

Use this notebook to go through the process of using ensemble models to predict heart disease. Load the [Cleveland heart disease dataset](https://archive.ics.uci.edu/dataset/45/heart+disease), preprocess the data, perform a train-test split, and use a grid search to find the best parameters for a bagging classifier and an adaboost classifier.


### Setup imports

### Load data

In [None]:
assert not cleveland_df.empty, "DataFrame is empty"
assert cleveland_df.shape == (303, 14), "DataFrame has incorrect number of columns"

### Proprocess data
Remove the rows that have question marks in them

In [None]:
assert cleveland_df.shape == (297, 14), "DataFrame has incorrect number of columns"

### Train-Test Split
Use 80% of the data for training data, and set the random state to 42.

In [None]:
# Ensure that the split was successful
assert X_train.shape[0] > 0 and X_test.shape[0] > 0, "Something went wrong in train-test split."

### Bagging Model Training
First, define a bagging classifier that uses KNeighbors classifier as the base estimator, and has random_state=42.

Then, use a grid search, with five fold cross validation, to find the best parameters for the bagging classifier.

Use accuracy for scoring the parameter combinations, and use this parameter grid:

```python
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_samples': [0.5, 0.75, 1.0], 
    'max_features': [0.5, 0.75, 1.0] 
}
```



In [None]:
assert bagging_grid_search.best_params_ == {'max_features': 0.5, 'max_samples': 0.75, 'n_estimators': 50}, "Incorrect best parameters"
assert bagging_grid_search.best_score_ > 0.560 and bagging_grid_search.best_score_ < 0.561, "Incorrect best score"

### AdaBoost Model Training
First, define a adaboost classifier that uses the decision tree classifier as the base estimator (the default behavior), and has random_state=42.

Then, use a grid search, with five fold cross validation, to find the best parameters for the adaboost classifier. 

Use this parameter grid:

```python
param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 1.0]
}
```

There may be warnings. You can ignore these.


In [None]:
assert ada_grid_search.best_params_ == {'learning_rate': 0.01, 'n_estimators': 100}, "Incorrect best parameters"
assert ada_grid_search.best_score_ > 0.579 and ada_grid_search.best_score_ < 0.580, "Incorrect best score"