#__Boosting__
Next, let us understand the ensemble technique of boosting.

## Step 1: Import Required Libraries and Load the Dataset

- Import pandas, NumPy, sklearn.metrics, sklearn.model_selection, and sklearn.ensemble libraries



In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics import classification_report
from sklearn.model_selection import KFold
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier, AdaBoostClassifier

- Load the breast cancer dataset and create a DataFrame df
- Assign the feature names of the dataset to columns and assign the target column to y
- Using the head() method, we can check the first 5 rows of the dataset. 

Let us load the data.

In [None]:
df = pd.DataFrame(load_breast_cancer()['data'],
columns=load_breast_cancer()['feature_names'])
df['y'] = load_breast_cancer()['target']
df.head(5)

__Observation:__
- This is the head of the dataset.

Let's check the data types and find information about the dataset.

In [None]:
df.info()

__Observation:__
- There are no missing or null values.

## Step 2: Perform K-Fold Cross-Validation and Fit the Model
- Define X and y
- Assign the DataFrame df, except the target column, to X
- Assign the y column to variable y
- Create a K-fold
 - Iterate over the K-fold splits
- Split the data into training and validation sets

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
ada = AdaBoostClassifier(n_estimators=50)

X_train,X_test,y_train,y_test = train_test_split(df.drop(columns="y"),df['y'], test_size=0.2, random_state=42)

In [None]:
ada.fit(X_train,y_train)

In [None]:
from sklearn.metrics import f1_score
f1_score(y_test,ada.predict(X_test))

# Gradient Boosting

In [None]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor

np.random.seed(42)
X = np.random.rand(100, 1) - 0.5
y = 3 * X[:, 0] ** 2 + 0.05 * np.random.randn(100)  # y = 3x² + Gaussian noise

tree_reg1 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg1.fit(X, y)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(X,y,"o")

- Create a GradientBoostingClassifier object with a learning rate of 0.1


In [None]:
gradient_booster = GradientBoostingClassifier(learning_rate=0.1)
gradient_booster.get_params()

__Observation:__
- The above output shows the parameters that can be treated. In this demo, we apply gradient boosting with the default value.

Next, let's fit the model on the training data. 

In [None]:
gradient_booster.fit(X_train,y_train)
print(classification_report(y_val,gradient_booster.predict(X_val)))

__Observations:__

- The accuracy is 0.96.
- Precision and recall do not have much difference between them.

In [None]:
# Adaboost

'''
1. Generally better to avoid overfit
2. Easy to implement
3. Great because of deicision tree
4. Binary Classification
'''

In [None]:
# Gradient Boost

'''
1. Achieve highest accuracy than adaboost
2. offers better control during fine tuning
3. more time in hyper paraters tuning
4. Regression/classification
5. Computationally expensive compared to adaboost
'''

In [None]:
!pip install xgboost

In [None]:
import xgboost

In [None]:
model = xgboost.XGBClassifier()
model.fit(X_train,y_train)

In [None]:
f1_score(y_test,model.predict(X_test))

In [None]:
# MSE

In [None]:
# ovo and ovr technique

In [None]:
from sklearn.ensemble import StackingClassifier, RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression

base_learner = [("rf", RandomForestClassifier()),
                ("log", LogisticRegression()),
                ('dtree',DecisionTreeClassifier())]
meta_learner = xgboost.XGBClassifier() 

In [None]:
stacking_clf =StackingClassifier(estimators=base_learner, final_estimator=meta_learner)
stacking_clf.fit(X_train,y_train)

In [None]:
f1_score(y_test,stacking_clf.predict(X_test))