# Chapter 2: Decision Trees in Depth
This chapter covers the following main topics:
- Introducing decision trees with XGBoost
- Exploring decision trees
- Contrasting variance and bias
- Tuning decision tree hyper parameters
- Predicting heart disease: a case study

## Introducing decision trees with XGBoost
XGBoost is an ensemble method: it is composed of different machine learning models that combine to work well together. The individual models in the ensemble are called *base learners*. The most common type of base learner is the decision tree. Decision trees split data by asking questions about the columns. Unfortunately, they are prone to overfitting, meaning they can predict the training set very well but generalize poorly. XGBoost and random forests aggregate the predictions of many trees. This reduces overfitting and improves performance. 

## Exploring decision trees
Decision trees split data into branches, which are followed down to leaves, where predictions are made.

In [1]:
# Let's build a decision tree
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")

In [2]:
df_census = pd.read_csv('census_cleaned.csv')

In [3]:
# Declare predictor and target columns
X = df_census.iloc[:, :-1]
y = df_census.iloc[:, -1]

In [4]:
# Split the dataset into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = \
    train_test_split(X, y, random_state=1)

In [5]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [6]:
# Initialize the decision tree classifier
clf = DecisionTreeClassifier(random_state=2)

# Fit the model
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Compare predictions to the test set
print(accuracy_score(y_pred, y_test))

0.8212750276378823
