# Machine Learning - Decision Trees

> Heuristics for learning decision trees and their theoretical properties. 

- toc: true
- hide: true
- badges: true
- comments: false
- categories: ['Machine Learning','Decision Trees','Random Forests']
- image: images/decision-tree-example.png

# Introduction

## Classification vs. Regression

We start our discussion of decision trees with a definition of *classification* and *classifier*.

> Definition: &nbsp; **Classification** is the process of grouping data into discrete categories (i.e. **class labels**).
<br>

We may contrast this definition with *regression* which is the process of predicting a *continous* (i.e. real or complex-valued) output. 

A common example of a classification problem is the sorting of emails into the binary categories of *'spam'* and *'not spam'*. However, the labels in a classification problem need not be binary â€” they may be any discrete set. Whereas a common example of regression is learning a linear (or a non-linear) function that best fits a given dataset. 

> Note: The line between classification and regression is sometimes blurred. For instance, *logistic regression* is a regression algorithm which outputs a prediction in the continous probability range $[0,1]$. It's commonly used with a *decision rule* which casts its output into discrete classes. Thus, even though it's a regression algorithm, it can easily be converted into a classification algorithm and is often used for classification problems in practice.
<br>

This leads us to the expected definition of a classifier, which is:

> Definition: &nbsp; A **classifier** is any algorithm that performs classification.
<br>

## Decision Trees

*Decision trees* are one type of classifier among many. 

The nodes of a decision tree correspond to the *features* of the dataset and its leaves correspond to the class labels. The paths in a decision tree corrspond to the *conjunction of features* that lead to the class labels at its leaves.

To understand this, let's look at an example of a non-binary decision tree that's nonetheless very easy to understand because of the historical context of the data it's attempting to learn. 

**Example:**
![](my_icons/decision-tree-example.png "Description: Decision tree that predicts the survival chances of the passangers in the Titanic.")

The above decision tree has identified three features that best predict the chances of a given passanger of the Titanic to survive. These three features, in order of their effect on the accuracy of the prediction, are *gender*, *age*, and *sibsp* (which is the number of siblings or spouses).

As we can infer from the tree, were you a passanger on the Titanic, you would've likely survived if you were either female or a male child (below the age of 9.5) with less than 3 siblings (a conjunction of features).

# TEST

In [1]:
# hide
import pandas as pd
import numpy as np
from sklearn import datasets
FEATURE_NAMES = ['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width']

ModuleNotFoundError: No module named 'sklearn'

# Setup

## Simplifying Assumptions

In the rest of this article, for simplicity, we will assume binary input and binary output for decision trees. That is, the training set is  ${S = \{(x^1,y^1), ... ,(x^k, y^k)\}}$ with ${x^i \in \{0,1\}^n}$ and ${y^i \in \{0,1\} \ \ \forall i}$. This means that the decision tree itself is simply a binary function which also receives binary input. 

The task is to learn this function.

## Potential Function