In [2]:
#include <iostream>
#include <algorithm>

# Binary Trees

- The arbitrary number of children in general trees is often unnecessary - many real-life trees are restricted to two branches
    - Expression trees using binary operators
    - An ancestral tree of an individual, parents, grandparents, etc.
    - Phylogenetic trees
    - Lossless encoding algorithms

- There are also issues with general trees
    - There is no natural order between a node and its children

## Definition

A **binary tree** is a restriction where each node has exactly two children
- Each child is either empty or another binary tree
- This restriction allows us to label the children as *left* and *right* subtrees

![](../img/btree1.png)

*Reminder:* Recall that $\lg(n) = \Theta(log_b(n))$ for any $b$

- We will also refer to the two sub-trees as
    - <span style="color:red">The left-hand sub-tree</span>, and
    - <span style="color:blue">The right-hand sub-tree</span>
    
![](../img/btree2.png)

- Sample variations on binary trees with five nodes

![](../img/btree3.png)

- A **full** node is a node where both the left and right sub-trees are non-empty trees

![](../img/btree4.png)
![](../img/btree5.png)

- An **empty node** or a null sub-tree is any location where a new leaf node could be appended

![](../img/btree6.png)

- A **full binary tree** is where each node is
    - A full node, or
    - A leaf node

![](../img/btree7.png)

- These have applications in
    - Expression trees
    - Huffman encoding

## Implementation: Binary Node

- The binary node class is similar to the single node class

In [2]:
template <typename Type>
class BinaryNode {
    Type element;
    BinaryNode *left_tree;
    BinaryNode *right_tree;

public:
    BinaryNode( const Type& , BinaryNode<Type>* = nullptr, BinaryNode<Type>* = nullptr );

    Type value() const;
    BinaryNode* left() const;
    BinaryNode* right() const;
    bool is_leaf() const;
    
    int size() const;
    int height() const;
    
    void clear();
};

- We will usually only construct new leaf nodes

In [3]:
template <typename Type>
BinaryNode<Type>::BinaryNode( const Type &e, BinaryNode<Type>* l, BinaryNode<Type>* r):
    element( e ),
    left_tree( l ),
    right_tree( r )
{
    // Empty constructor
}

- The accessors are similar to that of `SinglyLinkedList`

In [4]:
template <typename Type>
Type BinaryNode<Type>::value() const {
    return element;
}

In [5]:
template <typename Type>
BinaryNode<Type>* BinaryNode<Type>::left() const {
    return left_tree;
}

In [6]:
template <typename Type>
BinaryNode<Type>* BinaryNode<Type>::right() const {
    return right_tree;
}

- Much of the basic functionality is very similar to `SimpleTree`

In [7]:
template <typename Type>
bool BinaryNode<Type>::is_leaf() const {
    return (left() == nullptr) && (right() == nullptr);
}

### `size`

- The recursive `size` function runs in $\Theta(n)$ time and $\Theta(h)$ memory, $h$ is a tree height

In [8]:
template <typename Type>
int BinaryNode<Type>::size() const {    
    if ( left() == nullptr ) {
        return ( right() == nullptr ) ? 1 : 1 + right()->size();
    } else {
        return ( right() == nullptr ) ?
            1 + left()->size() :
            1 + left()->size() + right()->size();
    }
}

- Or, we can maintain integer `size` data member of the `BinaryTree` wrapper class and have $\Theta(1)$ runtime.

In [9]:
{    
    BinaryNode<int> c{4}, d{5};
    BinaryNode<int> a{2}, b{3, &c, &d};
    BinaryNode<int> r{1, &a, &b};
    std::cout << "Root: " << r.value() << std::endl;
    std::cout << "L: " << r.left()->value() << ", R: " << r.right()->value() << std::endl;    
    std::cout << "Tree size: " << r.size() << std::endl;
}

Root: 1
L: 2, R: 3
Tree size: 5


### `height`

- The recursive `height` function also runs in $\Theta(n)$ time and $\Theta(h)$ memory
    - Later we will implement this in $\Theta(h)$ time

In [10]:
template <typename Type>
int BinaryNode<Type>::height() const {
    return 1 + std::max(
        left() == nullptr ? 0 : left()->height(),
        right() == nullptr ? 0 : right()->height()
    );
}

- Or, we can maintain integer `height` data member in the node and have $\Theta(h)$ runtime.

In [11]:
{
    BinaryNode c{4}, d{5};
    BinaryNode a{2}, b{3, &c, &d};
    BinaryNode r{1, &a, &b};
    
    std::cout << "height: " << r.height() << std::endl;
}

height: 2


### `clear`

- Removing all the nodes in a tree is similarly recursive

In [12]:
template <typename Type>
void BinaryNode<Type>::clear() {
    if ( left() != nullptr ) {
        left()->clear();
    }

    if ( right() != nullptr ) {
        right()->clear();
    }

    delete this;
}

## Binary Tree

- Because each node in a tree refers to its children, the binary tree class need only a link to the tree root

In [13]:
template <typename Type>
class BinaryTree {
    BinaryNode<Type>* root_node;

public:
    BinaryTree(BinaryNode<Type>* = nullptr);
    ~BinaryTree();
    // Accessors
    bool empty() const;
    BinaryNode<Type>* root() const;
    int size() const;
    int height() const;
    // Mutators
    void expand( BinaryNode<Type>* );    
    BinaryNode<Type>* insert_left( const Type&, BinaryNode<Type>* );
    BinaryNode<Type>* insert_right( const Type&, BinaryNode<Type>* );
    void erase( BinaryNode<Type>* );
    void clear();
};

## Operations
- First, we want to create a binary tree

In [14]:
template <typename Type>
BinaryTree<Type>::BinaryTree(BinaryNode<Type>* root) : root_node{root} {
    // Empty constructor
}

- We also want to be able to manage the stored values in the binary tree
    - check the state,
    - access,
    - insert into, and   
    - erase from    

In [15]:
template <typename Type>
BinaryNode<Type>* BinaryTree<Type>::root() const {
    return root_node;
}

In [16]:
template <typename Type>
bool BinaryTree<Type>::empty() const {
    return (size() == 0);
}

In [17]:
template <typename Type>
int BinaryTree<Type>::size() const {
    return (root() == nullptr) ? 0 : root()->size();
}

- For adding nodes to the tree, we can expand a leaf node `p` making it into an internal one
    - by creating two new leaf nodes and making them the children of `p`
    - an error condition occurs if `p` is an internal node.

In [18]:
template <typename Type>
void BinaryTree<Type>::expand( BinaryNode<Type>* p) {
    if ( !p->is_leaf() ) {
        throw std::logic_error("Cannot expand internal node");
    }
    p->left_tree = new BinaryNode<Type>();
    p->right_tree = new BinaryNode<Type>();
}

- or using separate methods

In [19]:
template <typename Type>
BinaryNode<Type>* BinaryTree<Type>::insert_left( const Type &obj, BinaryNode<Type> *p ) {
    if (p->left() != nullptr)
        throw std::logic_error("Cannot insert to the left");
    p->left_tree = new BinaryNode<Type>(obj);
    return p->left_tree;
}

In [20]:
template <typename Type>
BinaryNode<Type>* BinaryTree<Type>::insert_right( const Type &obj, BinaryNode<Type> *p ){
    if (p->right() != nullptr)
        throw std::logic_error("Cannot insert to the right");
    p->right_tree = new BinaryNode<Type>(obj);
    return p->right_tree;
}

- Remove the external node `w` together with its parent `v`, replacing `v` with the sibling of `w`
    - an error condition occurs if `w` is an internal node or `w` is the root.
![](../img/erase-internal.png)

In [23]:
template <typename Type>
void BinaryTree<Type>::erase( BinaryNode<Type>* w) {
    auto v = w->parent(); // node must have parent link
    auto z = (w == v->left() ? v->right() : v->left()); // get a sibling of w
    if (v == root()) { // child of root?
        z->parent = nullptr; // then make sibling to be the root
        root_node = z;
    } else {
        auto u = v->parent();
        if (v == u->left())  // replace parent by sibling
            u->left_node = z;
        else
            u->right_node = z;
        z->parent = u;
    }
    delete w; delete v; // delete removed nodes
}

- For removing all the nodes in a tree, call recursive `clear` function on the root

In [25]:
template <typename Type>
void BinaryTree<Type>::clear() {
    root()->clear();
}

## Application: Expression Trees

- Any basic mathematical expression containing binary operators may be represented using a binary tree

- For example, $3(4a + b + c) + d/5 + (6 – e)$

![](../img/binexp.png)

- Observations
    - Internal nodes store operators
    - Leaf nodes store literals or variables
    - No nodes have just one sub tree
    - The order is not relevant for
        - Addition and multiplication (commutative)
    - Order is relevant for
        - Subtraction and division (non-commutative)
    - It is possible to replace non-commutative operators using the unary negation and inversion
        - $(a/b) = a b^{-1}$
        - $(a - b) = a + (-b)$ 

- A post-order depth-first traversal converts such a tree to the reverse-Polish format
  
![](../img/binexp.png)


    3  4  a  ×  b  c  +  +  ×  d  5  ÷  6  e  -  +  +

In [1]:
#include <iostream>
#include "../src/BinaryTree.h"

auto tree = new BinaryTree<char>( new BinaryNode<char>('+') );
auto tmp1 = tree->insert_left('*', tree->root());
tree->insert_left('3', tmp1);
tmp1 = tree->insert_right('+', tmp1);
auto tmp2 = tree->insert_left('*', tmp1);
tree->insert_left('4', tmp2); tree->insert_right('a', tmp2);
tmp2 = tree->insert_right('+', tmp1);
tree->insert_left('b', tmp2); tree->insert_right('c', tmp2);
auto tmp = tree->insert_right('+', tree->root());
tmp1 = tree->insert_left('/', tmp); tree->insert_left('d', tmp1); tree->insert_right('5', tmp1);
tmp2 = tree->insert_right('-', tmp); tree->insert_left('6', tmp2); tree->insert_right('e', tmp2);

In [2]:
std::cout <<  "Tree size: " << tree->size() << std::endl;
std::cout <<  "Tree height: " << tree->height() << std::endl;
std::cout << *tree << std::endl;

Tree size: 17
Tree height: 5
3  4  a  *  b  c  +  +  *  d  5  /  6  e  -  +  + 


In [3]:
tree->clear();
delete tree;

## Run Times

- Recall that with linked lists and arrays, some operations would run in $\Theta(n)$ time

- The run times of operations on binary trees, we will see, depends on the height of the tree

- We will see that:
    - The worst is clearly $\Theta(n)$
    - Under average conditions, the height is $\Theta(\sqrt{n})$
    - The best case is $\Theta(\ln(n))$
    
- If we can achieve and maintain a height $\Theta(\ln(n))$, we will see that many operations can run in $\Theta(\ln(n))$
    - Logarithmic time is not significantly worse than constant time

## Properties of Binary Trees

- Binary trees have several interesting properties dealing with relationships between their heights and number of nodes.

- Let $T$ be a nonempty binary tree, then it has the following properties:

$$h + 1 \leq n \leq 2^{h+1}-1$$
$$1 \leq n_l \leq 2^h$$
$$h \leq n_i \leq 2^{h-1}$$
$$\log(n + 1) - 1 \leq h \leq n - 1$$

- where $n$ is the number of nodes, $n_l$ is the number of external nodes, $n_i$ is the number of internal nodes, and $h$ denote height of $T$. 

- A binary tree is **proper** if each node has either zero or two children.
- If $T$ is proper, then it has the following properties:

$$2h + 1 \leq n \leq 2^{h+1}-1$$
$$h+1 \leq n_l \leq 2^h$$
$$h \leq n_i \leq 2^{h-1}$$
$$\log(n + 1) - 1 \leq h \leq (n - 1)/2$$

- In a nonempty proper binary tree $T$, the number of leaf nodes is one more than the number of internal nodes.
$$ n_i = n_l+1$$

## Perfect Binary Tree

- Standard definition:
    - A perfect binary tree of height $h$ is a binary tree where
    - All leaf nodes have the same depth $h$
    - All other nodes are full
    
- Recursive definition:
    - A binary tree of height $h = 0$ is perfect
    - A binary tree with height $h > 0$ is a perfect if both subtrees are prefect binary trees of height $h-1$

- Perfect binary trees of height h = 0, 1, 2, 3 and 4
![](../img/perfect-bt.png)

## Theorems 

- We will now look at four theorems that describe the properties of perfect binary trees:
    - A perfect tree of height $h$ has $2^{h + 1} - 1$ nodes
    - The height of a tree with $n$ nodes is $\Theta(\ln(n))$
    - There are $2^h$ leaf nodes
    - The average depth of a node is $\Theta(\ln(n))$

- The results of these theorems will allow us to determine the optimal run-time properties of operations on binary trees

## Logarithmic Height

- Theorem
    - A perfect binary tree with $n$ nodes has height $\lg(n + 1) - 1$

- Proof
    - Solving $n = 2^{h + 1} - 1$ for $h$
    
$$n + 1 = 2{h + 1}$$
$$\lg(n + 1) = h + 1$$
$$h = lg(n + 1) - 1$$

- Lemma
	$$\lg(n + 1) - 1 = \Theta(\ln(n))$$

## Leaf Nodes

- Theorem
    - A perfect binary tree with height $h$ has $2^h$ leaf nodes

- Proof (by induction)
    - When $h = 0$, there is $2^0 = 1$ leaf node.
    - Assume that a perfect binary tree of height $h$ has $2^h$ leaf nodes and observe that both sub-trees of a perfect binary tree of height $h + 1$ have $2^h$ leaf nodes.

- Consequence
    - Over half all nodes are leaf nodes
    $$ \frac{2^h}{2^{h+1}-1} > \frac{1}{2}$$

## Node Average Depth

- Consequence
    - 50/50 chance that a randomly selected node is a leaf node

- The average depth of a node in a perfect binary tree is

$$\frac{\sum_{k=0}^h k 2^k }{2^{h+1}-1} = \frac{h2^{h+1} - 2^{h-1} + 2}{2^{h+1}-1} = h+1+\frac{h+1}{2^{h+1}-1} \approx h+1 = \Theta(\ln(n))$$

## Applications

- Perfect binary trees are considered to be the *ideal* case
    - The height and average depth are both $\Theta(\ln(n))$

- We will attempt to find trees which are as close as possible to perfect binary trees

## Complete Binary Tree

- A perfect binary tree has ideal properties but restricted in the number of nodes:  $n = 2^{h + 1} - 1$

- We require binary trees which are
    - Similar to perfect binary trees, but
    - Defined for **all** $n$

- A **complete** binary tree filled at each depth from left to right    
![](../img/complete-bt.png)

- The order is identical to that of a breadth-first traversal

## Recursive Definition

- A binary tree with a single node is a complete binary tree of height $h = 0$, and a complete binary tree of height $h$ is a tree where either:
- The left sub-tree is a **complete tree** of height $h - 1$ and the right sub-tree is a *perfect tree* of height $h - 2$, or
- The left sub-tree is a **perfect tree** with height $h - 1$ and the right sub-tree is *complete tree* with height $h - 1$

![](../img/complete-bt2.png)

## Height

- Theorem    
    - The height of a complete binary tree with $n$ nodes is $h = \lfloor\lg(n)\rfloor$

- Proof
    - Base case
        - When $n = 1$ then $\lfloor\lg(1)\rfloor = 0$ and a tree with one node is a complete tree with height $h = 0$
    - Inductive step
        - Assume that a complete tree with $n$ nodes has height $\lfloor\lg(n)\rfloor$
        - Must show that $\lfloor\lg(n+1)\rfloor$ gives the height of a complete tree with $n + 1$ nodes
        - Two cases:
            - If the tree with $n$ nodes is perfect, and
            - If the tree with $n$ nodes is complete but not perfect

- Case 1 (the tree is perfect)
    - If it is a perfect tree then
        - Adding one more node must increase the height
    - Before the insertion, it had $n = 2^{h + 1} - 1$ nodes
    $$2^h < n < 2^{h+1}-1$$
    $$h = \lg(2^h) < \lg(2^{h+1}-1) <  \lg(2^{h + 1}) = h+1$$
    $$h \leq \lg(2^{h+1}-1) < h+1$$
    $$\lfloor\lg(n)\rfloor = h$$
    - Thus,
    $$\lfloor\lg(n+1)\rfloor = \lfloor\lg(2^{h+1} -1 + 1)\rfloor = \lfloor\lg(2^{h+1})\rfloor = h+1$$

- Case 2 (the tree is complete but not perfect)
    - If it is not a perfect tree of height $h$ then
    $$2^h \leq n < 2^{h+1}-1$$
    $$2^h+1 \leq n < 2^{h+1}$$
    $$h = \lg(2^h) < \lg(2^h+1) \leq \lg(n + 1)<  \lg(2^{h + 1}) = h+1$$
    $$h \leq \lg(n + 1) < h+1$$
    - Consequently, the height is unchanged: $\lfloor\lg(n+1)\rfloor = h$

## Array Storage

- We are able to store a complete tree as an array
    - Traverse the tree in breadth-first order, placing the entries into the array
![](../img/bt-array1.png)

- To insert another node while maintaining the complete-binary-tree structure, insert the node into the next array location
    
![](../img/bt-array2.png)

- To remove a node while keeping the complete-tree structure, remove the last element from the array
![](../img/bt-array3.png)

- Leaving the first entry blank yields a bonus
    - The children of the node with index $k$ are in $2k$ and $2k + 1$
    - The parent of node with index $k$ is in $k/2$
![](../img/bt-array4.png)    

- Leaving the first entry blank yields a bonus    
    - In C++, this simplifies the calculations:
    ```cpp
    parent = k >> 1;
    left_child = k << 1;
    right_child = left_child | 1;
    ```

![](../img/bt-array4.png)    

- Q: why not store any tree as an array using breadth-first traversals?

- A: There is a significant potential for a lot of wasted memory

- Consider this tree with 12 nodes would require an array of size 32
    - Adding a child to node K doubles the required memory 
![](../img/bt-array5.png)  

- In the worst case, an exponential amount of memory is required

![](../img/bt-array6.png)  

- These nodes would be stored in entries 1, 3, 6, 13, 26, 52, 105

Based on material provided by Douglas Wilhelm Harder