#### Construction of Index Benefit Graph (from Schnaitter's PhD Thesis, 2011)

***Definition***: The `IBG` of a query $q$ is a `DAG` in which each node $Y$ is a subset of $C$, a set of all relevant indexes that could ever be utlilized in the execution of $q$. Node $Y$ also stores the following two quantities: 

* $cost(q,Y)$ which is the query optimizer's estimated cost for executing $q$ under configuration $Y$  
* $used(q,Y)$ which is the subset of indexes from $Y$ that are included in the query plan


Recursive algorithm for constructing the IBG:

```python
construct_IBG(q, Y):
    if Y.built:
        return

    # obtain estimated cost and determine indexes used
    Y.cost = cost(q,Y)
    Y.used = used(q,Y)
    Y.built = True
    
    # create children (one for each index in Y.used)
    for a in Y.used:
        create child node: X = Y - {a}   # child node is set Y with index a removed
        X.built = False
        Y.add_child(X)
        # recursively construct IBG on children
        construct_IBG(q, X)

```




```python
# create root node
Y = C
Y.built = False

# call construct_IBG(q, Y)
construct_IBG(q, Y)
```


It is possible that some nodes may share the same child. Instead of creating a new node for that child for each different parent node, we can use a separate hash table to keep track of children that have already been created and reuse children which have already been created.

Once the IBG has been constucted, we can use it to derive $cost(q, X)$ and $used(q, X)$ for any $X \subseteq C$, even if $X$ is not in the IBG, as follows. We start from the root node in the IBG (which will contain all indexes in $X$ and possibly some additional ones not in X), iteratively traverse down to a child that corresponds to removal of a node not in $X$ until we reach a node $Y$ which only contains nodes that are in $X$. Then $cost(q,X) = cost(q,Y)$ and $used(q, X) = used(q,Y)$.

So the whole point of the IBG is that it gives us a compressed/efficient representation of the power-set of $C$ so that for any subset $X$ in the power-set we can compute  $cost(q, X)$ and $used(q, X)$ using the IBG, without having to maintain those quantities for every possible subset.

(Later on, we will also see how to use the IGB to derive information about index interactions.)



In [2]:
%load_ext autoreload
%autoreload 2

from ssb_qgen_class import *
from pg_utils import *

import time


In [3]:
# create an SSB query generator object
qg = QGEN()

In [18]:
class Node:
    def __init__(self, id, indexes):
        self.id = id
        self.indexes = indexes
        self.children = []
        self.parents = []
        self.built = False
        self.cost = None
        self.used = None


# class for creating and storing the IBG
class IBG:
    def __init__(self, query_string):
        self.q = query_string
        # get all candidate indexes
        self.C = extract_query_indexes(self.q, include_cols=True)
        #print(f"Candidate indexes: {self.C}")
        # map index_id to integer
        self.idx2id = {index.index_id:i for i, index in enumerate(self.C)}
        
        # create a hash table for keeping track of all created nodes
        self.nodes = {}
        # create a root node
        self.root = Node(self.get_configuration_id(self.C), self.C)
        self.nodes[self.root.id] = self.root
        print(f"Created root node with id: {self.root.id}")


    # assign unique string id to a configuration
    def get_configuration_id(self, indexes):
        # get sorted list of integer ids
        ids = sorted([self.idx2id[idx.index_id] for idx in indexes])
        return "_".join([str(i) for i in ids])
    

    def get_cost_used(self, indexes):
        conn = create_connection()
        # create hypothetical indexes
        hypo_indexes = bulk_create_hypothetical_indexes(conn, indexes)
        # map oid to index object
        oid2index = {}
        for i in range(len(hypo_indexes)):
            oid2index[hypo_indexes[i][0]] = indexes[i]
        # get cost and used indexes
        cost, indexes_used = get_query_cost_estimate_hypo_indexes(conn, self.q, show_plan=False)
        # map used index oids to index objects
        used = [oid2index[oid] for oid,scan_type,scan_cos in indexes_used]
        # drop hypothetical indexes
        bulk_drop_hypothetical_indexes(conn)
        close_connection(conn)   
        return cost, used

    # recursive IBG construction algorithm
    def construct_ibg(self, Y):
        if Y.built:
            return 
        
        # obtain query optimizers cost and used indexes
        cost, used = self.get_cost_used(Y.indexes)
        Y.cost = cost
        Y.used = used
        Y.built = True

        # create children
        for a in Y.used:
            # create a new configuration with index a removed from Y
            X_indexes = [index for index in Y.indexes if index != a]
            X_id = self.get_configuration_id(X_indexes)
            # if X is not in the hash table, create a new node and recursively build it
            if X_id not in self.nodes:
                X = Node(X_id, X_indexes)
                X.parents.append(Y)
                self.nodes[X_id] = X
                Y.children.append(X)
                self.construct_ibg(X)
            else:
                X = self.nodes[X_id]
                Y.children.append(X)
                X.parents.append(Y)


In [19]:
query = qg.generate_query(14)
print(query)

template id: 14, query: 
                SELECT lo_linenumber, lo_quantity, lo_orderdate  
                FROM lineorder
                WHERE lo_linenumber >= 5 AND lo_linenumber <= 6
                AND lo_quantity = 16;
            , payload: {'lineorder': ['lo_linenumber', 'lo_quantity', 'lo_orderdate']}, predicates: {'lineorder': ['lo_linenumber', 'lo_quantity']}, order by: {}, group by: {}


In [20]:
ibg = IBG(query)

Candidate indexes: [<pg_utils.Index object at 0x7fd2ca13e010>, <pg_utils.Index object at 0x7fd2ca162590>, <pg_utils.Index object at 0x7fd2ca161b90>, <pg_utils.Index object at 0x7fd2caf3c390>, <pg_utils.Index object at 0x7fd2caf3d3d0>, <pg_utils.Index object at 0x7fd2c9d61290>, <pg_utils.Index object at 0x7fd2c9d60610>, <pg_utils.Index object at 0x7fd2c9d63c90>, <pg_utils.Index object at 0x7fd2ca1218d0>, <pg_utils.Index object at 0x7fd2ca1208d0>, <pg_utils.Index object at 0x7fd2ca121210>, <pg_utils.Index object at 0x7fd2ca120a90>]
Created root node with id: 0_1_2_3_4_5_6_7_8_9_10_11
