# Design Doc for Randomized Selective Inference for CART

Yiling Huang (yilingh@umich.edu)

## The Solver

### The `TreeNode` Class

In [None]:
class TreeNode:
    def __init__(self, feature_index=None, threshold=None, pos=None,
                 left=None, right=None, value=None, prev_branch=None,
                 prev_node=None, membership=None, depth=0, 
                 randomization=None, sd_rand=1.):
        self.feature_index = feature_index  # Index of the feature to split on
        self.threshold = threshold  # Threshold value to split on
        self.pos = pos  # Position (the ascending order) of the split value
        self.left = left  # Left child node
        self.right = right  # Right child node
        self.value = value  # Value for leaf nodes (mean of target values)
        self.prev_branch = prev_branch  # List of (j, s, e) depicting a previous branch
        self.prev_node = prev_node
        self.membership = membership
        self.depth = depth
        self.randomization = randomization
        self.sd_rand = sd_rand

An instantiation of the class `TreeNode` defines a node within a tree.
It takes as parameters the following information:
1. `feature_index`: the index (j) of the feature involved in splitting the node
2. `threshold`: the threshold of covariate value that decides the splitting rule
3. `pos`: the split position (s) of the j-th feature
4. `left`: the left node of this node
5. `right`: the right node of this node
6. `value`: the mean Y of this node 
7. `prev_branch`: previous branch (a list of (j, s, e) pair) that leads to this node
8. `prev_node`: previous node to this node
9. `membership`: a $n$-dim indicator vector indicating whether each observation belongs to this node
10. `depth`: the depth of the node
11. `randomization`: a $(n_{node}-1) \times p$ matrix recording the realizations of randomized noise in each split position
12. `sd_rand`: the sd of randomization

### The `RegressionTree` class