### Ensemble Methods
Definition and Objective of Analysis
Definition: Ensemble methods combine several decision trees classifiers to produce better predictive performance than a single decision tree classifier. The main principle behind the ensemble model is that a group of weak learners come together to form a strong learner, thus increasing the accuracy of the model.

Objective of Analysis: Minimization of risk and maximization of profit for the bank. To minimize loss from the bank’s perspective, the bank needs a decision rule regarding loan approval. An applicant’s demographic and socio-economic profiles are considered by loan managers before a decision is taken regarding his/her loan application. The dataset contains data on 13 variables and the classification whether an applicant is considered a credit worthy (1) or a non-credit worthy (0) for 1000 loan applicants. A predictive model developed on this data should provide the bank manager guidance for making a decision whether to approve a loan to a prospective applicant based on his/her profile.



<img src= 'bb.png'>

In [3]:
import pandas as pd

In [2]:
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier,GradientBoostingClassifier

In [3]:
# Step1: create data set
x,y = make_moons(n_samples = 10000, noise = 0.5, random_state=0)
x.shape

(10000, 2)

In [4]:
#step2 : split the training test set
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size =0.2,
                                                   random_state = 42)

In [5]:
#step 3 : fit a decision tree model as comparision
dt = DecisionTreeClassifier()
dt.fit(x_train,y_train)
y_pred = dt.predict(x_test)
accuracy_score(y_test, y_pred)


0.754

In [6]:
#step 4: Fit a random forest
rf = RandomForestClassifier(n_estimators = 100, random_state=0)

# n_estimators indicates the number of trees in the forest

rf.fit(x_train, y_train)
y_pred = rf.predict(x_test)
accuracy_score(y_test, y_pred)


0.7965

In [7]:
#step 5 : fit a adaboost model
# compared to decision tree model, accuracy goes up

ab = AdaBoostClassifier(n_estimators = 100)
ab.fit(x_train,y_train)
y_pred = ab.predict(x_test)
accuracy_score(y_test,y_pred)

0.833

In [8]:
#step 6: fit a gradient boosting model,
#compared to decision tree model, accuracy goes up
gb = GradientBoostingClassifier(n_estimators=100)
gb.fit(x_train,y_train)
y_pred = gb.predict(x_test)
accuracy_score(y_test, y_pred)

0.8335

In [9]:
!pip install xgboost



In [11]:
from xgboost import XGBClassifier
xgb = XGBClassifier()
xgb.fit(x_train,y_train)
y_pred = xgb.predict(x_test)
accuracy_score(y_test, y_pred)

0.82