# Generalised LR (GLR)
First introduced in 1965 by Knuth [1], LR is a bottom-up parsing technique that works by constructing an automaton, then traveling through it while consuming input one by one and maintaining a stack. A more detailed description about LR can be found [here](https://rahul.gopinath.org/post/2024/07/01/lr-parsing/).

Creating a grammar in LR(1), or even LR(k), can be difficult. Ideally, you want your grammar to be intuitive, easily understood, and readable. This is important because you'll make mistakes or change your mind in the future, and modifying a large grammar while making sure that it is in LR(1) is a painful process. Luckily, a more powerful parser, that can handle all context-free grammar, called Generalised LR (GLR) is available. This post outlines the implementation details of two GLR variants (RNGLR and BRNGLR), both of which are presented in Economopoulos's PhD dissertation [2].
### Preliminary
This post assumes that the reader is already familiar with LR parsing and context-free grammar terminology (non-terminal, derivation, nullable,...).

The parser implementation here uses the fuzzingbook format for input grammar. For example a grammar of the form $$\begin{split} S &\rightarrow A+A\ |\ A-A\\
A &\rightarrow a\ |\ b\end{split}$$
Is presented as

In [1]:
sample_grammar = {
	'<S>': [['<A>', '+', '<A>'],
			['<A>', '-', '<A>']],
	'<A>': [['a'], ['b']]
}

#### Table Generator
This GLR implementation uses LR(1) as the base parse table, here we introduce all the needed components for an LR(1) parse table generator.
##### Import

In [2]:
from copy import copy
import csv

##### Types and helper function

In [3]:
Grammar = dict[str, list[list[str]]]
# State in the LR automaton
State = tuple[int, set["Item"]]

LOGGING = False

def is_nt(k: str):
    '''
        Check if k is a non-terminal
    '''
    return (k[0], k[-1]) == ('<', '>')

##### Item class

In [4]:
class Item:
    def __init__(self, lhs: str, rhs: list[str], dot: int, look_ahead: str):
        '''
            An item is a production of the form A -> a·b
            
            Args:
                lhs: The left-hand side of the production (A)
                rhs: The right-hand side of the production (a b)
                dot: The position of the dot in the production (dot = 0 means A -> ·a b)
                look_ahead: look ahead symbol
        '''
        self.lhs = lhs
        self.rhs = rhs
        self.dot = dot
        self.look_ahead = look_ahead
    
    def __repr__(self):
        return f"{self.lhs} -> {' '.join(self.rhs[:self.dot])} · {' '.join(self.rhs[self.dot:])}, {self.look_ahead}"
        # format: A -> a · b, look_ahead
    
    def __eq__(self, other):
        return self.lhs == other.lhs and self.rhs == other.rhs and self.dot == other.dot and self.look_ahead == other.look_ahead

    def __hash__(self):
        return hash((self.lhs, tuple(self.rhs), self.dot, self.look_ahead))

    def __lt__(self, other):
        '''
            For sorting
        '''
        return (self.lhs, self.rhs, self.dot, self.look_ahead) < \
               (other.lhs, other.rhs, other.dot, other.look_ahead)

##### TableGenerator class

In [5]:
class TableGenerator:
    def __init__(self, grammar: Grammar, start: str):
        '''
            This class is responsible for generating the right-nulled parse table.
        '''

        self.end_of_input = "€"
        self.grammar = grammar
        self.start = self.add_new_start(start)

        self.rule_list = self.reformat_grammar(grammar)
        self.nullable = TableGenerator.get_nullable(grammar)
        self.first = self.init_FIRST(grammar)
        self.follow = self.init_FOLLOW(grammar, self.start)

        self.symbols = self.get_symbols(grammar)

        self.sppf = SPPF(grammar)
    
    def get_symbols(self, grammar: Grammar):
        '''
            Return the list of all terminals, and non-terminals
        '''
        symbols: set[str] = set()
        for nt in grammar:
            for production in grammar[nt]:
                for symbol in production:
                    symbols.add(symbol)
        
        return symbols
    
    def add_new_start(self, start:str):
        '''
            Augment new start symbol <S'>
        '''
        new_start = ''.join([start[:-1], "'", start[-1:]])
        new_production = [[start]]
        self.grammar[new_start] = new_production

        return new_start

    def reformat_grammar(self, grammar: Grammar) -> list[tuple[str, list[str]]]:
        '''
            Transform grammar from a dictionary to a list of rules

            args
                grammar: the given grammar, in dictionary form
            
            return
                A list of rules, a rule is a tuple with lhs and rhs 
        '''

        rule_list = []

        for nt in grammar.keys():
            for production in grammar[nt]:
                rule_list.append((nt, production))
        return rule_list
    
    @staticmethod
    def get_nullable(grammar: Grammar) -> set[str]:
        '''
            Get nullable set, a non-terminal is called nullable if X can derive epsilon 

            return
                The set of nullable non-terminals
        '''
        res: set[str] = set()
        prev_size = -1
        while (prev_size != len(res)):
            prev_size = len(res)
            for nt in grammar.keys():
                for production in grammar[nt]:
                    if (len(production) == 0):
                        res.add(nt)
                        continue
                    
                    if (all((is_nt(symbol) and symbol in res) for symbol in production)):
                        res.add(nt)
        
        return res
    
    def init_FIRST(self, grammar: Grammar) -> dict[str, set[str]]:
        '''
            Calculate the FIRST set for all symbols
            FIRST(X) is the set of terminals that can start a string derived from X

            return
                A dictionary with all the non-terminal as keys and FIRST sets as values
        '''
        first: dict[str, set[str]] = {}
        for nt in grammar:
            first[nt] = set()
            if nt in self.nullable:
                first[nt].add("epsilon")
        
        changed = True
        while (changed):
            changed = False
            for lhs in grammar:
                for rhs in grammar[lhs]:
                    # Epsilon rule, already handled above
                    if (len(rhs) == 0):
                        continue

                    for symbol in rhs:
                        if (is_nt(symbol)):
                            for char in first[symbol]:
                                if char not in first[lhs]:
                                    first[lhs].add(char)
                                    changed = True
                            
                            if symbol not in self.nullable:
                                break
                        # Is terminal
                        else:
                            if symbol not in first[lhs]:
                                first[lhs].add(symbol)
                                changed = True
                            break
    
        return first
    
    def calculate_FIRST(self, symbol_list: list[str]) -> set[str]:
        '''
            Calculate FIRST(X), where X is one or more non-terminals/terminals sequence

            args
                symbol_list: list of symbols
            return
                The FIRST set
        '''

        if (len(symbol_list) == 0):
            return {"epsilon"}

        res = set()
        nullable = True
        for symbol in symbol_list:
            if is_nt(symbol):
                for char in self.first[symbol]:
                    if char != "epsilon":
                        res.add(char)
                
                if symbol not in self.nullable:
                    nullable = False
                    break
            else:
                nullable = False
                res.add(symbol)
                break

        if nullable:
            res.add("epsilon")
        
        return res

    
    def init_FOLLOW(self, grammar: Grammar, start: str) -> dict[str, set[str]]:
        '''
            Calculate the FOLLOW set for all symbols
            FOLLOW(A) is the set of terminals that can appear immediate after A
            Example: S -> A a B c then a is in FOLLOW(A)

            return
                A dictionary with all the non-terminal as keys and FOLLOW sets as values
        '''
        follow: dict[str, set[str]] = {}
        for nt in grammar:
            follow[nt] = set()
        follow[start].add(self.end_of_input)

        changed = True
        while (changed):
            changed = False
            for lhs in grammar:
                for rhs in grammar[lhs]:
                    for idx, symbol in enumerate(rhs):
                        if not is_nt(symbol):
                            continue
                        
                        # lhs -> ...By
                        # Adding FIRST(y) to FOLLOW(B)
                        # print(symbol + " " + str(rhs[idx + 1:]))
                        first_y = self.calculate_FIRST(rhs[idx + 1:])
                        for char in first_y:
                            if char == "epsilon":
                                continue
                            if char not in follow[symbol]:
                                changed = True
                                follow[symbol].add(char)
                        
                        if "epsilon" in first_y:
                            # Adding all symbol from FOLLOW(lhs) to FOLLOW(B)
                            for char in follow[lhs]:
                                if char not in follow[symbol]:
                                    changed = True
                                    follow[symbol].add(char)
                        
        return follow

    def find_closure(self, items: list[Item]) -> set[Item]:
        '''
            Find the closure of a list of items

            args
                items: item list

            return
                The closure set of input items
        '''
        res = set(items)
        prev_len = -1
        while (prev_len != len(res)):
            prev_len = len(res)
            for item in sorted(list(res.copy())):
                # dot at the end
                if (item.dot == len(item.rhs)):
                    continue

                next_sym = item.rhs[item.dot]
                if not is_nt(next_sym):
                    continue
                
                # X -> α·Yβ, a
                # Calculate FIRST(βa)
                first_set = self.calculate_FIRST(item.rhs[item.dot + 1:] + [item.look_ahead])
                
                for production in self.grammar[next_sym]:
                    for look_ahead in first_set:
                        res.add(Item(next_sym, production, 0, look_ahead))
        return res
    
    def transition(self, state: list[Item], next_sym: str) -> set[Item]:
        '''
            Calculate the transition from a state with <next_sym> edge

            arg
                state: current state
                next_sym: transition edge, can be terminal or non-terminal
            return
                Next state, with all items in a set
        '''        
        items = []
        for item in state:
            if item.dot + 1 > len(item.rhs):
                continue
            if (item.rhs[item.dot] == next_sym):
                new_item = copy(item)
                new_item.dot = new_item.dot + 1
                items.append(new_item)
        
        return self.find_closure(items)
    
    def generate_states(self) -> tuple[list[State], dict[tuple[int, str], int]]:
        '''
            This function generates all the states needed for the automata
                - State format: a tuple (state_id, set of Items)

            return
                A tuple consists of
                - A list of states
                - A GOTO map: a dictionary, with (id, symbol) as keys and next_id as values
                    Example: state_1 --A--> state_2, then GOTO[(1, "A")] = 2
        '''
        # Initial state has [<S#> -> ·<S>, $] and its closure
        initial_state = (0, self.find_closure([Item(self.start, 
                                                    self.grammar[self.start][0], 
                                                    0, 
                                                    self.end_of_input)]))
        states: list[State] = []
        states.append(initial_state)
        unprocessed_states = [initial_state]

        goto_map: dict[tuple[int, str], int] = {}
        while (len(unprocessed_states) > 0):
            top_state_id, top_state = unprocessed_states.pop()
            for item in sorted(list(top_state)):
                if (item.dot == len(item.rhs)):
                    continue

                next_sym = item.rhs[item.dot]
                next_state = self.transition(top_state, next_sym)

                # Check if state already exist
                duplicate = False
                dup_idx = 0
                for idx, state in states:
                    if (all([(item in state) for item in next_state]) and 
                        len(next_state) == len(state)):
                        dup_idx = idx
                        duplicate = True
                        break

                if not duplicate:
                    new_state = (len(states), next_state)
                    states.append(new_state)
                    unprocessed_states.append(new_state)
                    goto_map[(top_state_id, next_sym)] = new_state[0]
                else:
                    goto_map[(top_state_id, next_sym)] = dup_idx

        # Print result
        if (LOGGING):
            for state in states:
                print(f"State {state[0]}")
                for ins in state[1]:
                    print(ins)
                print("")
            
            print("---------------")
            print("Transition map")
            for key, value in goto_map.items():
                print(f"GOTO {key} = {value}")
        
        return (states, goto_map)
    
    def generate_parse_table(self):
        '''
            This function generates an LR(1) parse table for the given grammar

            return
                The parse table in the form of a 2-dimensional dictionary.
                Usage: T[\<state_id\>][\<symbol\>], each item is a list of possible action either "pk" or "r(A, p)"
        '''
        states, goto_map = self.generate_states()

        row_entries = [state[0] for state in states]
        column_entries = (list(self.symbols) + [self.end_of_input])
        column_entries.sort()
        
        table: dict[int, dict[str, list[str]]] = {}
        # Init table
        for row in row_entries:
            table[row] = {}
            for col in column_entries:
                table[row][col] = []

        # Add shift and goto
        for state, symbol in goto_map.keys():
            table[state][symbol].append(f"p{goto_map[(state, symbol)]}")

        # Add reduce
        for state_id, state in states:
            for item in sorted(list(state)):
                # Dot at the end
                if (item.dot == len(item.rhs)):
                    if item.lhs == self.start:
                        table[state_id][self.end_of_input].append("acc")
                    else:
                        action = f"r{item.lhs}{item.dot}.{0}"
                        if (item.dot == 0):
                            action = f"r{item.lhs}{item.dot}.{self.sppf.I[item.lhs]}"
                        table[state_id][item.look_ahead].append(action)
                # Right-nulled
                else:
                    right_seq = item.rhs[item.dot:]
                    if (all([sym in self.nullable for sym in right_seq])):
                        label = ''.join(right_seq)
                        action = ""
                        if item.lhs == self.start:
                            action = "acc"
                        else:
                            action = f"r{item.lhs}{item.dot}.{self.sppf.I[label]}"
                        table[state_id][item.look_ahead].append(action)

        # print table
        if (LOGGING):
            print("\nParsing table:\n")
            frmt = "{:>12}" * len(column_entries)
            print(" ", frmt.format(*column_entries), "\n")
            ptr = 0
            for state_id in row_entries:
                # frmt1 = "{:>8}"
                print(f"{{:>3}}".format('I'+str(state_id)), end="")
                for symbol in column_entries:
                    list_opp = []
                    for opp in table[state_id][symbol]:
                        word = ""
                        word += opp
                        list_opp.append(word)
                    print(f"{{:>12}}".format("/".join(list_opp)), end="")
                print()
        return table
    
    def export_to_csv(table, path: str):
        # table = self
        row_entries = list(table.keys())
        symbols = list(table[0].keys())
        header = ["state"] + list(table[0].keys())

        with open(path, 'w') as csv_file:
            writer = csv.writer(csv_file, delimiter=',')
            writer.writerow(header)
            for state_id in row_entries:
                row = [str(state_id)]
                for symbol in symbols:
                    row.append('/'.join(table[state_id][symbol]))
                writer.writerow(row)

##### Helper functions to visualise

In [6]:
class ChoiceNode:
    def __init__(self, parent, total):
        self._p, self._chosen = parent, 0
        self._total, self.next = total, None

    def __str__(self):
        return '(%s/%s %s)' % (str(self._chosen),
                               str(self._total), str(self.next))

    def __repr__(self): return repr((self._chosen, self._total))

    def chosen(self): return self._chosen

    def finished(self):
        return self._chosen >= self._total
    
    def increment(self):
        # as soon as we increment, next becomes invalid
        self.next = None
        self._chosen += 1
        if self.finished():
            if self._p is None: return None
            return self._p.increment()
        return self
    

class EnhancedExtractor:
    def __init__(self, forest):
        self.my_forest = forest
        self.choices = ChoiceNode(None, 1)

    def choose_path(self, arr_len, choices):
        if choices.next is not None:
            if choices.next.finished():
                return None, choices.next
        else:
            choices.next = ChoiceNode(choices, arr_len)
        next_choice = choices.next.chosen()
        return next_choice, choices.next
    
    def extract_a_node(self, forest_node, seen, choices):
        if isinstance(forest_node, SPPFNode):
            if not forest_node.children:
                return (forest_node.label, []), choices
            
            packing_node_children = isinstance(forest_node.children[0], PackingNode)

            # PackingNode child
            if packing_node_children:

                child_ind, new_choices = self.choose_path(len(forest_node.children), choices)
                
                # out of choice
                if child_ind is None:
                    return None, new_choices 
                if str(id(forest_node.children[child_ind])) in seen:
                    return None, new_choices
                
                n, newer_choices = self.extract_a_node(forest_node.children[child_ind], 
                                                       seen | {str(id(forest_node.children[child_ind]))}, 
                                                       new_choices)
            
                return (forest_node.label, n), newer_choices
            
            # SPPFNode child
            list_n = []
            for child in forest_node.children:
                n, newer_choices = self.extract_a_node(
                        child, seen | {str(id(child))}, choices)
            
                if n is None: return None, newer_choices
                list_n.append(n)

            return (forest_node.label, list_n), newer_choices


        elif isinstance(forest_node, PackingNode):
            cur_child_ind, new_choices = self.choose_path(len(forest_node.edges), choices)

            # out of choice
            if cur_child_ind is None:
                return None, new_choices
            if str(id(forest_node.edges[cur_child_ind])) in seen:
                return None, new_choices 

            packing_node_children = isinstance(forest_node.edges[0], PackingNode)

            # PackingNode child
            if packing_node_children:

                child_ind, new_choices = self.choose_path(len(forest_node.edges), choices)
                
                # out of choice
                if child_ind is None:
                    return None, new_choices
                if str(id(forest_node.edges[child_ind])) in seen:
                    return None, new_choices

                
                n, newer_choices = self.extract_a_node(forest_node.edges[child_ind], 
                                                       seen | {str(id(forest_node.edges[child_ind]))}, 
                                                       choices)
            
                return n, newer_choices
            
            # SPPFNode child
            list_n = []
            for child in forest_node.edges:
                n, newer_choices = self.extract_a_node(
                        child, seen | {str(id(child))}, choices)
            
                if n is None: return None, newer_choices
                list_n.append(n)

            return list_n, newer_choices
        
    def extract_a_tree(self):
        choices = self.choices
        while not self.choices.finished():
            parse_tree, choices = self.extract_a_node(
                    self.my_forest,
                    set(), self.choices)
            choices.increment()
            if parse_tree is not None:
                return parse_tree
        return None
    


class O:
    def __init__(self, **keys): self.__dict__.update(keys)

OPTIONS   = O(V='│', H='─', L='└', J = '├')

def format_node(node):
    key = node[0]
    if key and (key[0], key[-1]) ==  ('<', '>'): return key
    return repr(key)

def get_children(node):
    return node[1]

def display_tree(node, format_node=format_node, get_children=get_children,
                 options=OPTIONS):
    print(format_node(node))
    for line in format_tree(node, format_node, get_children, options):
        print(line)

def format_tree(node, format_node, get_children, options, prefix=''):
    children = get_children(node)
    if not children: return
    *children, last_child = children
    for child in children:
        next_prefix = prefix + options.V + '   '
        yield from format_child(child, next_prefix, format_node, get_children,
                                options, prefix, False)
    last_prefix = prefix + '    '
    yield from format_child(last_child, last_prefix, format_node, get_children,
                            options, prefix, True)

def format_child(child, next_prefix, format_node, get_children, options,
                 prefix, last):
    sep = (options.L if last else options.J)
    yield prefix + sep + options.H + ' ' + format_node(child)
    yield from format_tree(child, format_node, get_children, options, next_prefix)


## Extending LR Parser
### Eliminating Nondeterminism
If you are familiar with LR, you probably know about *shift/reduce conflicts* (choices between shifting and reducing) and *reduce/reduce conflicts* (choices between reducing different rules), a normal LR parser cannot handle conflicts, as it does not know which choice to make. What we can do is incorporating a bit of breadth-first search, so the parser can try all options, and that is the main idea behind GLR.

For example, consider the following ambiguous grammar:
$$\begin{split} S &\rightarrow a \ B \ c & \hspace{1cm} (1)\\
S &\rightarrow a\ D \ c &\hspace{1cm} (2) \\
B &\rightarrow b &\hspace{1cm} (3) \\
D & \rightarrow b &\hspace{1cm} (4)\end{split}$$
For this grammar, the LR(1) automaton is as below:
![sss](images/lr1_gram.png)
And the LR(1) parse table is:

| state | a   | b   | c               | $       | S   | B   | D   |
| ----- | --- | --- | --------------- | ------- | --- | --- | --- |
| 0     | p2  |     |                 |         | p1  |     |     |
| 1     |     |     |                 | acc     |     |     |     |
| 2     |     | p4  |                 |         |     | p5  | p3  |
| 3     |     |     | p7              |         |     |     |     |
| 4     |     |     | r(B, 3)/r(D, 4) |         |     |     |     |
| 5     |     |     | p6              |         |     |     |     |
| 6     |     |     |                 | r(S, 1) |     |     |     |
| 7     |     |     |                 | r(S, 2) |     |     |     |

In this table, "$pk$" is shift action, it means "go to state $k$" and $r(X, m)$ is the reduce action meaning "reduce symbol $X$ with rule numbered $m$." The symbol $\$$ is used to denote "end of string." There is a reduce/reduce conflict in state 4. Let's see what happens when we try to parse the string "$abc$".

| Step | Input | State | Stack                     | Next operation    |
| ---- | ----- | ----- | ------------------------- | ----------------- |
| 0    | ""    | 0     | $\$, S_0$                 | $p2$              |
| 1    | "a"   | 2     | $\$, S_0, a, S_2$         | $p4$              |
| 2    | "ab"  | 4     | $\$, S_0, a, S_1, b, S_4$ | $r(B, 3)/r(D, 4)$ |

A usual LR parser now has to choose between two possible reductions ($B \rightarrow b$ and $D \rightarrow b$). With GLR, it can attempt to try all options, but how would it do that? The simplest solution is to duplicate the stack and treat each stack as a separate process. After performing $r(B, 3)$ the stack is $\{\$, S_0, a, S_1, B, S_5\}$; similarly, we obtain $\{\$, S_0, a, S_1, D, S_3\}$ when $r(D, 4)$ is applied. Now we have 2 different stacks to manage, and the parse can continue to process with both stacks. However, this approach is not ideal, the number of stacks can blow up exponentially, we need something more efficient.
### Graph-Structured Stack (GSS)
In the above example, notice that the first four elements are the same in both stacks, therefore we can "share" them in a unified data structure. This is a "Graph-Structured Stack", or GSS, proposed by Tomita in his book [2]. 
![GSS_example](GSS_exam.png)

This image illustrates how the states $S_0$ and $S_1$ are shared between the two stacks. As the name suggests, our stack is now a single graph, and each element in the stack is a node. In the original Tomita's approach, elements $a$, $D$ and $B$ are individual nodes, but here we have simplified by making them the edge labels between states. Each node contains a label, label is a state in the LR automaton (node $v_4$ has label $S_3$, or state 3 in the automaton).

The nodes are divided into $n+1$ *levels,* with $n$ as the length of the input string. We call $U_i$ the set of nodes in level $i$, in the above example $U_0 = \{v_1\}$ and $U_1 = \{v_2, v_3, v_4\}$. GSS construction is done level by level, and a new level is created upon a *shift* action. The GSSNode data structure is as follow:

In [7]:
class GSSNode:
    '''
        Represent a node in the GSS structure, nodes are identified by id
    '''
    def __init__(self, level: int, id: int, label):
        self.level = level
        self.id = id
        self.label = label
        self.children: list[tuple['GSSNode', 'SPPFNode']] = []

    def __repr__(self):
        repr = f"Node(v{self.id}, {self.label})"
        return repr

    def __eq__(self, other):
        return self.id == other.id

    def __hash__(self):
        return hash(self.id)

    def add_child(self, child: 'GSSNode', edge):
        self.children.append((child, edge))

The GSS is a bit unusual. It does not perform the "pop" operation like an ordinary stack. Once a node is created, it is never removed. Instead of popping $m$ nodes out of the stack, we perform a traversal of length $m$ from the original node. For example, instead of popping 2 elements from node $v_4$, we traverse down the graph with length 2 and find that node $v_1$ is our target. We define a method to perform this operation:

In [8]:
class GSSNode(GSSNode):
    def find_paths_with_length(self, m: int) -> set[tuple['GSSNode',...]]:
        '''
            Find a set of nodes with length m from the origin node,
            return tuples of lenght m in a set, tuples contain all the labels and the destination node
        '''
        
        res: set[tuple] = set()
        def dfs(node: GSSNode, path: list[GSSNode]):
            if (len(path) >= m):
                res.add(tuple(path + [node]))
                return

            for child, edge in node.children:
                dfs(child, path + [edge])

        dfs(self, [])
        return res

The `find_paths_with_length()` method doesn't have to account for cycle because GSS is a directed acyclic graph, a cycle cannot exist. Finally we can have our GSS class:

In [9]:
class GSS:
    '''
        A Graph Structured Stack (GSS)
    '''
    def __init__(self):
        '''
            Initialize the graph, in RNGLR, a GSS has n levels, where n is the length of input string

            Each level is a set of GSSNodes, levels are stored in a list
        '''
        self.level: list[set[GSSNode]] = []
        self.count = 0

    def resize(self, n: int):
        '''
            Resize the GSS to include n levels
        '''
        self.level = [set() for i in range(n)]

    def create_node(self, label, level: int):
        '''
            Create a new node with label in a specific level
        '''
        new_node = GSSNode(level, self.count, label)
        self.count += 1
        self.level[level].add(new_node)
        return new_node
    
    def find_node(self, label, level: int) -> GSSNode:
        '''
            Find a node with label and in a specific level

            return
                GSSNode object if found, else None is returned
        '''
        # Can be optimized further
        for node in self.level[level]:
            if (node.label == label):
                return node
        return None
    
    def __repr__(self):
        '''
            Print the GSS structure
        '''
        repr = "GSS:\n"
        for idx, level in enumerate(self.level):
            repr += f"Level {idx}:\n"
            for node in level:
                repr += f"    {node}\n"
                for child, edge in node.children:
                    repr += f"        {child} - {edge}\n"
        return repr

### Shared Packed Parse Forest (SPPF)
For practical usage, we are more interested in a full parser rather than just a recogniser. While a recogniser's output is simply a yes/no answer, a parser has to provide a full derivation path (usually in the form of the parse tree). However, a parse tree is insufficient because we are dealing with all context-free grammars, which includes ambiguous grammars; thus, multiple derivations (or even infinite ones) are possible. Instead of a parse tree, a data structure called *Shared Packed Parse Forest* (SPPF) is used.

Consider the string "abc" in the above example, we have 2 possible derivations, resulting in 2 parse trees: 
![parse tree](images/Parse_tree.png)
In an SPPF, we combine them into a single graph, the final result looks like this
![sppf](images/SPPF.png)

Nodes like "S", "a", "b" and "c" are shared to reduce space. Since $S$ can be derived in two ways (either $S\rightarrow a\ B\ c$ or $S\rightarrow a\ D\ c$), two new black nodes are created to represent different choices. These are called **packing nodes**.


In [10]:
class PackingNode:
    def __init__(self):
        self.edges = []

    def add_edge(self, node):
        self.edges.append(node)

    def __repr__(self):
        return f"PackingNode({self.edges})"

SPPF nodes are identified by `(label, start_position)`. Each node represents one non-terminal in the derivation process, and its children are the product of derivation step. The `start_position` parameter is used to differentiate between the same non-terminals that may occur multiple times during the parsing process.

In [11]:
class SPPFNode:
    def __init__(self, id: int, label: str, start_pos:int = -1):
        '''
            start_pos = -1 means the node is in epsilon-SPPF
        '''
        self.id = id
        self.label = label
        self.start_pos = start_pos
        self.children: list['SPPFNode' | PackingNode] = []
    
    def add_child(self, node):
        self.children.append(node)
    
    def check_sequence_exists(self, nodes: list['SPPFNode']) -> bool:
        '''
            Check if a sequence of nodes already exists in the current node
        '''
        
        # If packing nodes exist
        if any(isinstance(child, PackingNode) for child in self.children):
            for child in self.children:
                if child.edges == nodes:
                    return True
            return False
        
        # No packing nodes case
        return self.children == nodes
    
    # Nodes are identified by (label, start_pos)
    def __hash__(self):
        return hash((self.label, self.start_pos))

    def __eq__(self, other):
        return (self.label == other.label and self.start_pos == other.start_pos)

    def __repr__(self):
        if self.label == "":
            return f"SPPF Node:(blank)"
        return f"SPPF Node:({self.label}, {self.start_pos})"

For our SPPFNode class, we also want an `add_children()` method, its purpose is to maintain the following property for every node in the SPPF:
- Each choice is unique, there is no overlapping choice.
- If there is only one possible choice, no *packing node* is used.
- If there are at least 2 choices, all choices must be wrapped in *packing nodes*.

In [12]:
class SPPFNode(SPPFNode):
    def add_children(self, nodes: list['SPPFNode']):
        '''
            Add a list of nodes to the current node
        '''
        if len(self.children) == 0:
            for node in nodes:
                self.add_child(node)
            return
        
        # If already exists, we skip
        if self.check_sequence_exists(nodes):
            return
        
        # No packing node yet
        if not isinstance(self.children[0], PackingNode):
            z = PackingNode()
            for child in self.children:
                z.add_edge(child)
            self.children = [z]
        
        t = PackingNode()
        for node in nodes:
            t.add_edge(node)
        self.children.append(t)

And finally, we can define the SPPF class

In [13]:
class SPPF:
    def __init__(self, grammar: Grammar):
        self.grammar = grammar

        # Two dictionary node_id -> Node and node_label -> node_id
        self.epsilon_sppf, self.I = self.build_epsilon_sppf()

        self.nodes: list[SPPFNode] = []
        self.counter = 0
    
    def create_node(self, label: str, start_pos: int) -> SPPFNode:
        node = SPPFNode(self.counter, label, start_pos)
        self.counter += 1
        self.nodes.append(node)
        return node

    def __repr__(self):
        repr = "SPPF:\n"
        for node in self.nodes:
            if node.label == "":
                repr += f"    blank\n"
            else:
                repr += f"    {node.label}-{node.start_pos}\n"
            for child in node.children:
                if isinstance(child, PackingNode):
                    repr += f"        PackingNode\n"
                    for edge in child.edges:
                        repr += f"            {edge}\n"
                else: repr += f"        {child}\n"
        return repr

#### Epsilon SPPF
An SPPF tree for a nullable string or symbol is called *epsilon-SPPF* (or $\epsilon$-SPPF). We precompute $\epsilon$-SPPF trees for nullable non-terminals ($A \overset{*}\rightarrow \epsilon$), this step is necessary for our parser later. In addition to non-terminals, we also build an $\epsilon$-SPPF tree for every string $\beta$ such that $\beta\overset{*}\rightarrow \epsilon$ and there exists a rule $A \rightarrow \alpha \beta$ ($\alpha \neq \epsilon$) in the grammar, such string $\beta$ is also called *right-nullable*. Finally, we define $I$ as an index function which accepts a non-terminal/string and return the corresponding $\epsilon$-SPPF root.

Let's look at an example, grammar 5.3 in the dissertation by Economopoulos:
$$\begin{split} S &\rightarrow a \ B \ B \ C\\
B &\rightarrow b \ |\ \epsilon \\
C & \rightarrow \epsilon\end{split}$$

In [14]:
grammar_53 = {
    "<S>": [["a", "<B>", "<B>", "<C>"]],
    "<B>": [["b"], []],
    "<C>": [[]]
}
start_symbol="<S>"

We have $B$ and $C$ as nullable non-terminals, and the strings $BBC$ and $BC$ satisfy the conditions for $\beta$. Therefore we build the $\epsilon$-SPPF for $B$, $C$, $BB$ and $BBC$:
![sppf_epsilon](images/SPPFepsilon.png)
In this tree, vertices are indexed from 1 to 4, hence, our $I$ function can be defined with $I(B) = 1$, $I(C) = 2$, $I(BBC)=3$ and $I(BC)=4$.

In [15]:
class SPPF(SPPF):
	def build_epsilon_sppf(self) -> tuple[dict[int, SPPFNode], dict[str, int]]:
	        '''
	            Build an epsilon-SPPF tree
	
	            return
	                A tuple that contains
	                - All SPPFNodes created, stored in a dict
	                - The I function dictionary
	        '''
	        # key: node_id, value: SPPF Node
	        epsilon_sppf: dict[int, SPPFNode] = {}
	
	        # Create epsilon node
	        eps_node = SPPFNode(0, "epsilon")
	        epsilon_sppf[0] = eps_node
	        counter = 1
	
	        # Find a given node with label
	        node_with_label: dict[str, SPPFNode] = {}
	
	        nullable = TableGenerator.get_nullable(self.grammar)
	
	        # Step 1, add all nullable symbols
	        # Sorted to guarantee determinism
	        for nt in sorted(nullable):
	            node = SPPFNode(counter, nt)
	            epsilon_sppf[counter] = node
	            node_with_label[nt] = node
	            counter += 1
	        
	        for lhs in self.grammar:
	            for rhs in self.grammar[lhs]:
	                # Epsilon rule
	                if len(rhs) == 0:
	                    node_with_label[lhs].add_child(eps_node)
	                # Total nullable
	                elif all(x in nullable for x in rhs):
	                    node = PackingNode()
	                    for nt in rhs:
	                        node.add_edge(node_with_label[nt])
	                    node_with_label[lhs].add_child(node)
	                # Partial nullable
	                else:
	                    for i in range(1, len(rhs)):
	                        partial_rhs = rhs[i:]
	                        if len(partial_rhs) == 0:
	                            continue
	
	                        if all(x in nullable for x in partial_rhs):
	                            label = ''.join(partial_rhs)
	                            if label in node_with_label:
	                                continue
	                            node = SPPFNode(counter, label)
	                            for x in partial_rhs:
	                                node.add_child(node_with_label[x])
	                            node_with_label[label] = node
	                            epsilon_sppf[counter] = node
	                            counter += 1
	
	        # Construct the I indexing map label -> node_id
	        I: dict[str, int] = {}
	        for label, node in node_with_label.items():
	            I[label] = node.id
	        
	        return (epsilon_sppf, I)

Using the code to build the $\epsilon$-tree

In [16]:
sppf = SPPF(grammar_53)
for node_id, node in sppf.epsilon_sppf.items():
    print(f"{node_id}: {node.label}")
    for child in node.children:
        print(f"    {child.label}")

0: epsilon
1: <B>
    epsilon
2: <C>
    epsilon
3: <B><B><C>
    <B>
    <B>
    <C>
4: <B><C>
    <B>
    <C>


## Right-Nulled GLR (RNGLR)
In Tomita's book, he introduced 4 different algorithms. The first one only works for grammar without $\epsilon$-rules. Algorithm 2 and 3 were intended to handle $\epsilon$-rules but failed to deal with hidden left-recursion in grammars. Algorithm 4 (which is the full parser) inherited the same problem from algorithm 2 and 3. RNGLR is an extension to algorithm 1 to include grammars with $\epsilon$-rules. 
#### Right-nulled parse table
Our algorithm uses a slightly modified parse table, which is neither LR(1) nor LALR(1). This table is built upon the usual LR table, but with the addition of new reductions for "*right-nullable*" rules; therefore it is called *right-nulled parse table*. A *right-nullable* rule has the form $A\rightarrow \alpha\beta$, where $\beta$ can derive to $\epsilon$. If an reduction *item* is of the form ($A\rightarrow \alpha \cdot\beta, a$), we write $r(A, m, f)$ into the parse table, with $m=|\alpha|$ and $f=I(\beta)$ if $m\neq0$ and $f=I(A)$ if $m=0$.

Back to grammar 5.3 
$$\begin{split} S &\rightarrow a \ B \ B \ C\\
B &\rightarrow b \ |\ \epsilon \\
C & \rightarrow \epsilon\end{split}$$
![automaton](images/grammar_53_auto.png)

The regular LR(1) parse table for this grammar is 

| State | B   | C   | S   | a   | b             | $          |
| ----- | --- | --- | --- | --- | ------------- | ---------- |
| 0     |     |     | p1  | p2  |               |            |
| 1     |     |     |     |     |               | acc        |
| 2     | p4  |     |     |     | p3/r(B, 0, 1) | r(B, 0, 1) |
| 3     |     |     |     |     | r(B, 1, 0)    | r(B, 1, 0) |
| 4     | p6  |     |     |     | p5            | r(B, 0, 1) |
| 5     |     |     |     |     |               | r(B, 1, 0) |
| 6     |     | p7  |     |     |               | r(C, 0, 2) |
| 7     |     |     |     |     |               | r(S, 4, 0) |

At cell $T(6, \$)$, we can see that the parser is performing the reduction $C \rightarrow \epsilon$, in this case $m = |\alpha| = 0$ and $I(C) = 2$, hence we write $r(C, 0, 2)$. To form a *right-nulled* parse table, we need to add more reductions for right-nullable items. As the strings $BBC$ and $BC$ are nullable, such items in this case are $S\rightarrow a\cdot B\ B\ C$, $S\rightarrow a\ B\cdot B\ C$ and $S\rightarrow a\ B\ B \cdot C$. Three new reductions are added into the *right-nulled table*:

| State | B   | C   | S   | a   | b             | $                     |
| ----- | --- | --- | --- | --- | ------------- | --------------------- |
| 0     |     |     | p1  | p2  |               |                       |
| 1     |     |     |     |     |               | acc                   |
| 2     | p4  |     |     |     | p3/r(B, 0, 1) | r(B, 0, 1)/r(S, 1, 3) |
| 3     |     |     |     |     | r(B, 1, 0)    | r(B, 1, 0)            |
| 4     | p6  |     |     |     | p5            | r(B, 0, 1)/r(S,2,4)   |
| 5     |     |     |     |     |               | r(B, 1, 0)            |
| 6     |     | p7  |     |     |               | r(C, 0, 2)/r(S, 3, 2) |
| 7     |     |     |     |     |               | r(S, 4, 0)            |

We have the generate parse table method:

In [17]:
class TableGenerator(TableGenerator):
    def generate_parse_table(self):
        '''
            This function generates an LR(1) parse table for the given grammar

            return
                The parse table in the form of a 2-dimensional dictionary.
                Usage: T[\<state_id\>][\<symbol\>], each item is a list of possible action either "pk" or "r(A, p)"
        '''
        states, goto_map = self.generate_states()

        row_entries = [state[0] for state in states]
        column_entries = (list(self.symbols) + [self.end_of_input])
        column_entries.sort()
        
        table: dict[int, dict[str, list[str]]] = {}
        # Init table
        for row in row_entries:
            table[row] = {}
            for col in column_entries:
                table[row][col] = []

        # Add shift and goto
        for state, symbol in goto_map.keys():
            table[state][symbol].append(f"p{goto_map[(state, symbol)]}")

        # Add reduce
        for state_id, state in states:
            for item in state:
                # Dot at the end
                if (item.dot == len(item.rhs)):
                    if item.lhs == self.start:
                        table[state_id][self.end_of_input].append("acc")
                    else:
                        action = f"r{item.lhs}{item.dot}.{0}"
                        if (item.dot == 0):
                            action = f"r{item.lhs}{item.dot}.{self.sppf.I[item.lhs]}"
                        table[state_id][item.look_ahead].append(action)
                # Right-nulled
                else:
                    right_seq = item.rhs[item.dot:]
                    if (all([sym in self.nullable for sym in right_seq])):
                        label = ''.join(right_seq)
                        action = ""
                        if item.lhs == self.start:
                            action = "acc"
                        else:
                            action = f"r{item.lhs}{item.dot}.{self.sppf.I[label]}"
                        table[state_id][item.look_ahead].append(action)

        # print table
        if (LOGGING):
            print("\nParsing table:\n")
            frmt = "{:<12}" * len(column_entries)
            print("    ", frmt.format(*column_entries), "\n")
            ptr = 0
            for state_id in row_entries:
                print(f"{{:<5}}".format('I'+str(state_id)), end="")
                for symbol in column_entries:
                    list_opp = []
                    for opp in table[state_id][symbol]:
                        word = ""
                        word += opp
                        list_opp.append(word)
                    print(f"{{:<12}}".format("/".join(list_opp)), end="")
                print()
        return table

Let us use it to generate a rigth-nulled parse table for grammar 5.3

In [18]:
LOGGING = True

generator = TableGenerator(grammar_53, start_symbol)
table = generator.generate_parse_table()

State 0
<S'> ->  · <S>, €
<S> ->  · a <B> <B> <C>, €

State 1
<S'> -> <S> · , €

State 2
<B> ->  · b, b
<B> ->  · , b
<B> ->  · b, €
<S> -> a · <B> <B> <C>, €
<B> ->  · , €

State 3
<B> -> b · , b
<B> -> b · , €

State 4
<S> -> a <B> · <B> <C>, €
<B> ->  · b, €
<B> ->  · , €

State 5
<B> -> b · , €

State 6
<C> ->  · , €
<S> -> a <B> <B> · <C>, €

State 7
<S> -> a <B> <B> <C> · , €

---------------
Transition map
GOTO (0, '<S>') = 1
GOTO (0, 'a') = 2
GOTO (2, 'b') = 3
GOTO (2, '<B>') = 4
GOTO (4, 'b') = 5
GOTO (4, '<B>') = 6
GOTO (6, '<C>') = 7

Parsing table:

     <B>         <C>         <S>         a           b           €            

I0                           p1          p2                                  
I1                                                               acc         
I2   p4                                              p3/r<B>0.1  r<S>1.3/r<B>0.1
I3                                                   r<B>1.0     r<B>1.0     
I4   p6                              

In the code, we are storing the reductions as `rBm.k`. The non-terminal `B` is enclosed by `<>` following the fuzzingbook format, $m$ and $k$ are separated by a dot. For this we define a function to parse the action string.

In [19]:
def get_action(action: str) -> tuple[str, ...]:
        '''
            Parse the action string
            
            args
                action: the action string, either "pk" or "r\<A\>m.f"
            
            return
                - pk -> ("p", k)
                - r<A>m.f -> ("r", "\<A\>", m, f)
                - m, f are integers
        '''
        if (action == "acc"):
            return ("acc",)
        action_char = action[0]
        assert(action_char == 'p' or action_char == 'r')

        if (action_char == 'p'):
            number = int(action[1:])
            return (action_char, number)

        # Case "r"
        first_idx = action.find("<")
        last_idx = action.rfind(">")
        symbol = action[first_idx:last_idx + 1]
        dot_separator = action.rfind(".")
        number_1 = int(action[last_idx + 1:dot_separator])
        number_2 = int(action[dot_separator + 1:])
        return (action_char, symbol, number_1, number_2)

#### The RNGLR parser
With GSS, SPPF and the parse table ready, we are now ready to build the RNGLR parser. The parser processes input string one by one, for each symbol a new GSS level is created, it then performs every possible reduction before shifting. Reduction and shift are performed by $\mathrm{Reducer}$ and $\mathrm{Shifter}$ respectively. Two special bookkeeping sets $\mathcal{Q}$ and $\mathcal{R}$ are used to store pending shift and reduction actions. In general:
- $\mathcal{Q}$ stores the shift actions in the form of $(v, k)$, which means "from node $v$ to go state $k$" where $k$ is a state in the LR automaton and $v$ is a node in GSS. Elements in $\mathcal{Q}$ are processed by the $\textrm{Shifter}$.
- $\mathcal{R}$ stores the reduction actions. Whenever a new edge between $v$ and $w$ is created in the GSS, all applicable reductions from $v$ are processed. For a reduction $r(X, m, f)$, we add $(w, X, m, f, z)$ into $\mathcal{R}$ where $z$ is the SPPF node that between $v$ and $w$, if $m = 0$ then $z$ is the $\epsilon$ node.

In addition to $\mathcal{Q}$ and $\mathcal{R}$, a set $\mathcal{N}$ is also used to bookkeep SPPF nodes, set $\mathcal{N}$ is reset after each parser iteration.

In [20]:
class RNGLRParser:
    '''
        The RNGLR parser
    '''
    def __init__(self, start: str, grammar: Grammar, table: dict[int, dict[str, list[str]]]):
        '''
            Initialize RNGLR
        '''
        self.start = self.augment_start(start)
        self.grammar = grammar
        self.table = table

        self.input_str = ""
        self.end_of_input = "€"
        self.gss = GSS()

        # R and Q set, respectively
        self.reductions: list[tuple[GSSNode, str, int]] = []
        self.shifts: list[tuple[GSSNode, int]] = []

        self.accept_states: set[int] = self.get_accept_states()
        
        self.sppf = SPPF(grammar)
        self.set_N: dict[tuple[str, int], SPPFNode] = {}
    
    def augment_start(self, start: str):
        '''
            Reformat the start symbol to <S#>
        '''
        new_start = ''.join([start[:-1], "'", start[-1:]])
        return new_start

    def get_accept_states(self) -> set[int]:
        '''
            Get the accept states from the parsing table
        '''
        ret = set()
        for state_id in self.table:
            if self.end_of_input in self.table[state_id]:
                for action in self.table[state_id][self.end_of_input]:
                    if action == "acc":
                        ret.add(state_id)
        return ret

    def add_reduction(self, v: GSSNode, X: str, m: int, f:int, z: 'SPPFNode'):
        # print("     Added reduction: ", v, X, m, f, z)
        self.reductions.append((v, X, m, f, z))

##### Pseudocode and implementation
$U_i$ is level $i$ in the GSS 
**Input:** string $a_0a_1\dots a_{n-1}$, start state $S_S$, accept state $S_A$, table $T$ 
**Parse($S$)**
- If $n$ is 0
	- If $acc \in T(S_S, \$)$
		- return success
	- return failure
- Else
	- Initialisation
	- Look at $T(S_S, a_0)$
		- Add all applicable shift actions to $\mathcal{Q}$
		- Add all applicable reduce actions to $\mathcal{R}$
	- For $i = 0$ to $n$ do
		- If $U_i$ is not empty
			- $\mathcal{N} \gets \emptyset$
			- While $\mathcal{R} \ne \emptyset$
				- $\textrm{Reducer}(i)$
			- $\textrm{Shifter(i)}$
	- If $S_A \in U_n$
		- set SPPF root
		- return success
	- return failure

In [21]:
class RNGLRParser(RNGLRParser):
    def parse(self, input_str: str):
        '''
            The RNGLR recongizer, implemented based on pseudocode by Giorgios Robert Economopoulos
        '''
        sppf_root = None
        result = False

        if (len(input_str) == 0):
            if "acc" in self.table[0][self.end_of_input]:
                sppf_root = self.sppf.epsilon_sppf[self.sppf.I[self.start]]
                result = True
        else:
            # Init step
            input_str = input_str + self.end_of_input
            self.input_str = input_str
            self.gss.resize(len(input_str))
            v_0 = self.gss.create_node(0, 0)

            # Check T(S, a_0)
            for action in self.table[0][input_str[0]]:
                action = get_action(action)
                if (action[0] == 'p'):
                    self.shifts.append((v_0, action[1]))
                if (action[0] == 'r'):
                    # Reduce
                    if (action[2] == 0):
                        # Add (v_0, X, 0, f, epsilon)
                        self.add_reduction(v_0, action[1], 0, action[2], self.sppf.epsilon_sppf[0])

            # Now we parse
            for i in range(len(input_str)):
                if len(self.gss.level[i]) > 0:
                    self.set_N = {}
                    while len(self.reductions) > 0:
                        self.reducer(i)
                    self.shifter(i)

                    # if len(self.reductions) == 0:
                    #     break
            # Check accept state
            for state in self.accept_states:
                acc_node = self.gss.find_node(state, len(input_str) - 1)
                if acc_node is not None:
                    result = True

                    # Find SPPF root
                    for child in acc_node.children:
                        if child[0].label == v_0.label:
                            sppf_root = child[1]
                    
                    # print(f"SPPF root: {sppf_root}")
                    break
        
        
        # print(self.gss)
        return (result, sppf_root)

**Reducer($i$)**
- Pop top element $(v, X, m, f, y)$ from $\mathcal{R}$
- Find all the paths in the GSS with length $\max(m - 1, 0)$ starting from $v$, call the set of paths $\mathcal{X}$
- For $path$ in $\mathcal{X}$
	- Let $u$ be the final node in $path$
	- if $m = 0$
		- $z\gets$ node $f$ in the $\epsilon$-SPPF tree
	- else
		- Let $c$ be the level of $u$ in GSS
		- Find SPPF node $z = (X, c)$ in $\mathcal{N}$
			- If not exist then create $z$ and add to $\mathcal{N}$
	- Let $k$ be the label of $u$ and $pl$ be the shift action in $T(k, a_i)$
	- If exists node $w$ with label $l$ in $U_i$
		- If edge $(w, u)$ does not exist
			- Create edge $(w, u)$ with label $z$
			- For $r(B, t, f) \in T(l, a_i)$
				- If $t \neq 0$ add $(u, B, t, f, z)$ to $\mathcal{R}$
	- Else
		- Create node $w$
		- Create edge $(w, u)$ with label $z$
		- For action in $T(l, a_i)$
			- If shift action $ph$
				- Add $(w, h)$ to $\mathcal{Q}$
			- If reduce action $r(B, t, f)$
				- If $t = 0$
					- Add $(w, B, t, f, \epsilon)$ to $\mathcal{R}$
				- if $t\ne 0$ and $m\neq 0$
					- Add $(w, B, t, f, z)$ to $\mathcal{R}$
	- If $m\neq 0$
		- $nodeSequence \gets$ $w_{m-1},\dots,w_1$ be the edge labels on the path
		- Append $y$ to $nodeSequence$
		- If $f\ne 0$
			- Append $\epsilon$-SPPF node numbered $f$ to $nodeSequence$
		- Call $z.\textrm{addChildren}(nodeSequence)$

In [22]:
class RNGLRParser(RNGLRParser):
    def reducer(self, i: int):
        '''
    		The reducer, implemented based on pseudocode by Giorgios Robert Economopoulos
    	'''
        v, X, m, f, y = self.reductions.pop()
        
    	# find the set of nodes that can be reached from v with lenght m-1
        paths = v.find_paths_with_length(max(0, m - 1))
        z: SPPFNode = None
    
        for path in paths:
            u = path[-1]
            k = u.label
    
            if m == 0:
                z = self.sppf.epsilon_sppf[f]
            else:
                c = u.level
                if (X, c) not in self.set_N:
                    z = self.sppf.create_node(X, c)
                    self.set_N[X, c] = z
                else:
                    z = self.set_N[(X, c)]
            for action in self.table[k][X]:
                action_obj = get_action(action)
                if action_obj[0] == 'p':
                    w = self.gss.find_node(action_obj[1], i)
                    if w is not None:
                        if u not in [x[0] for x in w.children]:
                            w.add_child(u, z)
                            if m != 0:
                                for action in self.table[action_obj[1]][self.input_str[i]]:
                                    action_obj = get_action(action)
                                    if action_obj[0] == 'r' and action_obj[2] != 0:
                                        # (u, B, t, f, z)
                                        self.add_reduction(u, action_obj[1], action_obj[2], action_obj[3], z)
                    else:
                        w = self.gss.create_node(action_obj[1], i)
                        w.add_child(u, z)
                        for action in self.table[action_obj[1]][self.input_str[i]]:
                            action_obj = get_action(action)
                            if action_obj[0] == 'p':
                                self.shifts.append((w, action_obj[1]))
                            if action_obj[0] == 'r':
                                t = action_obj[2]
                                if t == 0:
                                    self.add_reduction(w, action_obj[1], 0, action_obj[3], self.sppf.epsilon_sppf[0])
                                elif (m != 0):
                                    self.add_reduction(u, action_obj[1], t, action_obj[3], z)
            
            if (m != 0):
                # Add children
                node_seq = list(path[:-1])
                node_seq.reverse()
                node_seq.append(y)
                if (f != 0):
                    node_seq.append(self.sppf.epsilon_sppf[f])
                
                z.add_children(node_seq)

**Shifter($i$)**
- $\mathcal{Q}' \gets\emptyset$
- Create SPPF node $z$ with label $(a_i, i)$
- While $\mathcal{Q} \ne \emptyset$
	- Pop $(v, k)$ at the top of $\mathcal{Q}$
	- If exists node $w$ with label $k$ in $U_i$
		- Create edge $(w, v)$ with label $z$
		- For $r(B, t, f)\in T(k, a_{i+1})$
			- If $t\ne 0$ add $(v, B, t, f, z)$ to $\mathcal{R}$
	- Else
		- Create node $w$ with label $k$ in $U_i$
		- Create edge $(w, v)$ with label $z$
		- For action in $T(k, a_{i + 1})$
			- If shift action $ph$
				- Add $(w, h)$ to $\mathcal{Q'}$
			- If reduce action $r(B, t, f)$
				- If $t=0$ add $(w, B, 0, f, \epsilon)$ to $\mathcal{R}$
				- if $t\ne 0$ add $(v, B, t, f, z)$ to $\mathcal{R}$
- $\mathcal{Q} \gets\mathcal{Q'}$

In [23]:
class RNGLRParser(RNGLRParser):
    def shifter(self, i: int):
        '''
            The shifter, implemented based on pseudocode by Giorgios Robert Economopoulos
        '''
        new_q: list[tuple[GSSNode, int]] = []
        z = self.sppf.create_node(self.input_str[i], i)

        while (len(self.shifts) > 0):
            v, k = self.shifts.pop()
            # print(f"[Shifter level ${i}] processing: ", v, k)
            node = self.gss.find_node(k, i + 1)
            if node is not None:
                node.add_child(v, z)
                for action in self.table[k][self.input_str[i + 1]]:
                    action_obj = get_action(action)
                    if action_obj[0] == 'r' and action_obj[2] != 0:
                        self.add_reduction(v, action_obj[1], action_obj[2], action_obj[3], z)
            else:
                new_node = self.gss.create_node(k, i + 1)
                new_node.add_child(v, z)

                for action in self.table[k][self.input_str[i + 1]]:
                    action_obj = get_action(action)
                    
                    if action_obj[0] == 'p':
                        # print("    Added shift: ", new_node, action_obj[1])
                        new_q.append((new_node, action_obj[1]))
                    if action_obj[0] == 'r':
                        # Reduce action
                        if action_obj[2] == 0:
                            self.add_reduction(new_node, action_obj[1], 0, action_obj[3], self.sppf.epsilon_sppf[0])
                        else:
                            self.add_reduction(v, action_obj[1], action_obj[2], action_obj[3], z)
        
        self.shifts = new_q

##### A parse example
**Grammar 5.3**
![automata](images/grammar_53_auto.png)

Parse table:

| State | B   | C   | S   | a   | b             | $                     |
| ----- | --- | --- | --- | --- | ------------- | --------------------- |
| 0     |     |     | p1  | p2  |               |                       |
| 1     |     |     |     |     |               | acc                   |
| 2     | p4  |     |     |     | p3/r(B, 0, 1) | r(B, 0, 1)/r(S, 1, 3) |
| 3     |     |     |     |     | r(B, 1, 0)    | r(B, 1, 0)            |
| 4     | p6  |     |     |     | p5            | r(B, 0, 1)/r(S,2,4)   |
| 5     |     |     |     |     |               | r(B, 1, 0)            |
| 6     |     | p7  |     |     |               | r(C, 0, 2)/r(S, 3, 2) |
| 7     |     |     |     |     |               | r(S, 4, 0)            |

Let's see how the parser works with the input "ab".

Firstly, the $\epsilon$-SPPF tree is constructed as above, and the GSS is initialised with one node $v_0$ that has label $S_0$. We look at $T(0, a)$, which contains only one action $p2$, so we add $(v_0, 2)$ to $\mathcal{Q}$. The $\mathrm{Shifter}$ processes $(v_0, 2)$, creates a new SPPF node $w_1$ labelled $(a, 0)$ and a new GSS node $v_1$ with label $S_2$. Edge $(v_0, v_1)$ is then created with label $(a, w_1)$. A new edge is created so we look at $T(2, b)$ and find a shift/reduce conflict, we add $(v_1, 3)$ to $\mathcal{Q}$ and $(v_1, B, 0, 1, \epsilon)$ to $\mathcal{R}$. We finish the construction of GSS level 1.
![parse_1](images/parse_example_1.png)
In the next iteration, $\textrm{Reducer}$ finds $(v_1, B, 0, 1, \epsilon)$ on top, the length of reduction is $m=0$ so the only possible path is $\{v_1\}$. The $\textrm{Reducer}$ then creates a new $v_2$ node in GSS in the same level of $v_1$, node $v_2$ has label $T(2, B) = S_4$, with an edge pointing to $v_1$. We label the edge by $(B, u_1)$, here $u_1$ is the $\epsilon$-SPPF node of the non-terminal $B$.
![parse_2](images/parse_example_2.png)

In $T(4, b)$ there is a shift action, so we add $(v_2, 5)$ to $\mathcal{Q}$. Currently there are 2 shift actions in $Q$: $(v_1, 3)$ and $(v_2, 5)$. We process those shift actions and create 2 new nodes in GSS ($v_3$ and $v_4$) along with one new $w_2$ node in SPPF. Checking $T(3, \$)$ and $T(5, \$)$, we find a reduction $r(B, 1, 0)$, so $(v_1, B, 1, 0, w_2)$ and $(v_2, B, 1, 0, w_2)$ are added to $\mathcal{R}$.
![parse_3](images/parse_example_3.png)
Now, the $\textrm{Reducer}$ is called once more. It gets the top element $(v_1, B, 1, 0, w_2)$, then searches for the path with length $m-1=0$, so the only path is $\{v_1\}$. Since $m\ne0$, in addition to new GSS node $v_5$ (label $S_4$), we also create a new SPPF node $w_3$ with label $(B, 1)$. Edge $(v_5, v_1)$ is labelled with $B, w_3$. In $T(4, \$)$ there is a reduce/reduce conflict between $r(B, 0, 1)$ and $r(S, 2, 4)$, so we add the newly found reductions $(v_5, B, 0, 1, \epsilon)$ and $(v_1, S, 2, 4, w_3)$ to $\mathcal{R}$. Finally, we add $w_2$ as node $w_3$'s child.
![parse_4](images/parse_example_4.png)
The next reduction is $(v_2, B, 1, 0, w_2)$. Since $m=1 \ne0$, we try to create SPPF node $(B, 1)$ but it already exists. In the GSS, a new node $v_6$ with label $S_6$ is created along with a new edge $(v_6, v_2)$, this new edge has label $B, w_3$. In $T(6, \$)$ there is also a reduce/reduce conflict between $r(C, 0, 2)$ and $r(S, 3, 2)$. We add both $(v_6, C, 0, 2, \epsilon)$ and $(v_2, S, 3, 2, w_3)$ to $\mathcal{R}$.
![parse_5](images/parse_example_5.png)
By now we have $\mathcal{R} = \{(v_5, B, 0, 1, \epsilon), (v_1, S, 2, 4, w_3), (v_6, C, 0, 2, \epsilon), (v_2, S, 3, 2, w_3)\}$. Process the first reduction, $v_5$ has label $S_4$, and $T(4, B) = p6$ which is the same as label in $v_6$, so new edge $(v_6, v_5)$ is added with label $B, u_1$. We don't add new reduction here because the reduction length is zero.

Next is $(v_1, S, 2, 3, w_3)$, we find all paths that start from $v_1$ and have length of one. In this case, the only path is $\{v_1, v_0\}$, along the path we also collect the SPPF node $w_1$ on edge $(v_1, v_0)$. We create new GSS node $v_7$ with label $S_2$ and edge $(v_7, v_0)$. We also create a new SPPF node $w_4$ (label $S, 0$) with 3 children: $w_1$ (collected from the path), $w_3$ (from the top element) and $u_f=u_4$. We add $w_4$ to $\mathcal{N}$.
![parse_7](images/parse_example_7.png)
Processing the reduction $(v_6, C, 0, 2, \epsilon)$, we create GSS node $v_8$, label $S_7$, with edge $(v_8, v_6)$. The new edge is labelled with $C, u_2$. No further reduction is added.

Finally, we process $(v_2, S, 3, 2, w_3)$. We search for the paths with length 2 starting from $v_2$, the only possible path is $\{v_2, v_1, v_0\}$ and we collect $u_1$ and $w_1$ along the way. The corresponding SPPF node is $(S, 0)$, which already exists the the set $\mathcal{N}$ as $w_4$. The sequence of children is $[w_1, u_1, w_3, u_2]$, we append this sequence into $w_4$ by creating two new packing nodes.
![parse_8](images/parse_example_8.png)
By now we have no reduction left and the parse is finished. At GSS level 2, $S_2$ is the accept state so we accept the string "ab".

Let us parse the string

In [24]:
input_str = "ab"
parser = RNGLRParser(start_symbol, grammar_53, table)
result, sppf_root = parser.parse(input_str)
print(result)

True


The string is accepted, that is a good sign. Now we look at the internal GSS and SPPF:

In [25]:
print(parser.gss)
print(parser.sppf)

GSS:
Level 0:
    Node(v0, 0)
Level 1:
    Node(v1, 2)
        Node(v0, 0) - SPPF Node:(a, 0)
    Node(v2, 4)
        Node(v1, 2) - SPPF Node:(<B>, -1)
Level 2:
    Node(v3, 5)
        Node(v2, 4) - SPPF Node:(b, 1)
    Node(v4, 3)
        Node(v1, 2) - SPPF Node:(b, 1)
    Node(v5, 4)
        Node(v1, 2) - SPPF Node:(<B>, 1)
    Node(v6, 6)
        Node(v5, 4) - SPPF Node:(<B>, -1)
        Node(v2, 4) - SPPF Node:(<B>, 1)
    Node(v7, 7)
        Node(v6, 6) - SPPF Node:(<C>, -1)
    Node(v8, 1)
        Node(v0, 0) - SPPF Node:(<S>, 0)

SPPF:
    a-0
    b-1
    <B>-1
        SPPF Node:(b, 1)
    <S>-0
        PackingNode
            SPPF Node:(a, 0)
            SPPF Node:(<B>, 1)
            SPPF Node:(<B><C>, -1)
        PackingNode
            SPPF Node:(a, 0)
            SPPF Node:(<B>, -1)
            SPPF Node:(<B>, 1)
            SPPF Node:(<C>, -1)
    €-2



In this output, nodes are listed one by one. Their direct children are then listed on subsequent lines, indented by one level. For example, SPPF node `<S>-0` has two packing nodes as children, and the packing nodes also have their own children. SPPF nodes with position -1 is from the $\epsilon$-SPPF tree. The structure is the same as the example, the parse is successful. To visualise the parse tree, we use

In [26]:
ee = EnhancedExtractor(sppf_root)
while True:
    t = ee.extract_a_tree()
    if t is None: break
    display_tree(t)

<S>
├─ 'a'
├─ <B>
│   └─ 'b'
└─ <B><C>
    ├─ <B>
    │   └─ 'epsilon'
    └─ <C>
        └─ 'epsilon'
<S>
├─ 'a'
├─ <B>
│   └─ 'epsilon'
├─ <B>
│   └─ 'b'
└─ <C>
    └─ 'epsilon'


We can see that there are 2 possible parse trees in this example.

#### Time complexity of RNGLR
Even though RNGLR can parse in an LR-like manner, in the worst case its time complexity is non-polynomial, $O(n^{M+1})$, where $M$ is the largest length of a reduction. The bottleneck comes from our path-searching function in GSS, because there can be an exponential number of paths. This is somewhat disappointing, since other general parsers like Earley or CYK display $O(n^3)$ time complexity in the worst case.

To improve this weakness of RNGLR, a simple approach is to refactor the grammar, eliminating any rule with length more than 2 (for example, transforming into Chomsky Normal Form). However, doing so can introduce many side effects, and increase the parse table size, as well as difficulty in building the parse tree. A more promising solution is BRNGLR; by limiting the reduction length to 2 via some smart optimisations, we don't have to perform extensive searching, hence improving the worst case to $O(n^3)$, on par with other algorithms.

## Binary Right-Nulled GLR (BRNGLR)

The main idea of BRNGLR is to perform reduction of with length less than or equal to 2 only. Consider the following rule $S \rightarrow ABC$. We would first reduce $S\rightarrow AS_1$ and then $S_1\rightarrow BC$. However we don't have to modify the grammar or the parse table. The reduction can be processed "on-the-fly."

When the parser processes an element of the form $(w, X, m, f, z)$ in $\mathcal{R}$, if $m > 2$, it creates a bookkeeping node called $X_m$ in the current level. Next, all elements $(u, X, m-1, f, z)$ are added back into $\mathcal{R}$ (where $u$ is the child of $v$). This approach ensures that reductions of length $m > 2$ are done in $m-1$ steps. The bookkeeping node is to prevent redundant path searching in later steps.

#### Parsing example
This time, we will use the following grammar
$$\begin{split}S&\rightarrow a\ b\ c\ d\ |\ a\ b\ c\ D\\
D &\rightarrow d\end{split}$$
Since the longest reduction is $S\rightarrow a\ b\ c\ D$ has 4 symbols, let's call it `grammar_4`

In [27]:
grammar_4 =  {
    "<S>": [["a", "b", "c", "d"], ["a", "b", "c", "<D>"]],
    "<D>": [["d"]]
}
start_4 = "<S>"

The parse table is

In [28]:
generator_4 = TableGenerator(grammar_4, start_4)
table_4 = generator_4.generate_parse_table()

State 0
<S> ->  · a b c <D>, €
<S'> ->  · <S>, €
<S> ->  · a b c d, €

State 1
<S'> -> <S> · , €

State 2
<S> -> a · b c <D>, €
<S> -> a · b c d, €

State 3
<S> -> a b · c <D>, €
<S> -> a b · c d, €

State 4
<S> -> a b c · <D>, €
<S> -> a b c · d, €
<D> ->  · d, €

State 5
<S> -> a b c d · , €
<D> -> d · , €

State 6
<S> -> a b c <D> · , €

---------------
Transition map
GOTO (0, '<S>') = 1
GOTO (0, 'a') = 2
GOTO (2, 'b') = 3
GOTO (3, 'c') = 4
GOTO (4, 'd') = 5
GOTO (4, '<D>') = 6

Parsing table:

     <D>         <S>         a           b           c           d           €            

I0               p1          p2                                                          
I1                                                                           acc         
I2                                       p3                                              
I3                                                   p4                                  
I4   p6                                      

In diagram form:
![grammar_4_automaton](images/grammar_4_automaton.png)

The input string is "abcd". Firstly, we also create the node $v_0$ with label 0 in the GSS. The only action in $T(0, a)$ is $p2$, so we add $(v_0, 2)$ to $\mathcal{Q}$. The $\textrm{Shifter}$ then processes this, creating a new SPPF node $w_1$ labelled $(a, 0)$. Repeating this shifting procedure for the next three symbols $b, c$, and $d$ gives us
![parse_4_1](images/parse_4_example_1.png)

At this point there is a reduce/reduce conflict in $T(5, \$)$ between $r(D, 1, 0)$ and $r(S, 4, 0)$, so we add in $\mathcal{R}$ there is $\{(v_3, S, 4, 0, w_4), (v_3, D, 1, 0, w_4)\}$. We process $(v_3, S, 4, 0, w_4)$, the reduction length $m = 4 > 2$ so we create a new bookkeeping node $v_5$ with label $S_4$ in the GSS. Since $v2$ is the only child of $v_3$, we connect the new $v_5$ node to $v_2$. In the SPPF, we also make a new $w_5$ node, this node has empty label, and its children are $w_4$ (from the item) and $w_3$ (collected on the path $v_3 \rightarrow v_2$). Finally, we label edge $(v_5, v_2)$ with $w_5$ and add $(v_2, S, 3, 0, w_5)$ back into $\mathcal{R}$.
![parse_4_2](images/parse_4_example_2.png)
The next reduce action is $(v_3, D, 1, 0, w_4)$. Here, the reduction length is 1, which is less than 2 so the $\textrm{Reducer}$ processes as usual. It creates a new GSS node $v_6$ (label 6) that links back to $v_3$, and then adds a new SPPF node $w_6$ with label $(D, 3)$, which only has one child $w_4$. In $T(6, \$)$ there is a reduction $r(S, 4, 0)$ so we add $(v_3, S, 4, 0, w_6)$ to $\mathcal{R}$.
![parse_4_3](images/parse_4_example_3.png)
Processing $(v_2, S, 3, 0, w_5)$, because the length is 3 larger than 2, we create new GSS node $v_7$ with label $S_3$ and connect it to $v_1$ - the only child of $v_2$. In the SPPF, we also create a new blank node $w_7$ with $w_5$ and $w_2$ as children. Finally we add the reduction $(v_1, S, 2, 0, w_7)$ to $\mathcal{R}$.
![parse_4_4](images/parse_4_example_4.png)
Now $(v_3, S, 4, 0, w_6)$ is the top reduction. There is already an $S_4$ GSS node in this level so we don't have to create a new one. However, we still update the SPPF node $w_5$ to reflect this reduction. Node $w_5$ now have another sequence of children: $[w_3, w_6]$, we create two packing nodes to store them. 
![parse_4_5](images/parse_4_example_5.png)
The only reduction left is $(v_1, S, 2, 0, w_7)$. The length of reduction is 2 so we create a new GSS node $v_8$ with label 1, and connect it to $v_0$-the only child of $v_1$. In the SPPF, node $w_8$ is created with label $(S, 0)$ and its children is $[w_1, w_7]$.
![parse_4_6](images/parse_4_example_6.png)
Now the parse is complete, we accept the string "abcd" because the accept state 1 is present in level 4 of the GSS, and the SPPF is successfully built as showed.

The initial reduction $(v_3, S, 4, 0, w_4)$ was performed in three steps: $(v_3, S, 4, 0, w_4)$, $(v_2, S, 3, 0, w_5)$, and $(v_1, S, 2, 0, w_7)$. This approach helps us minimise the necessary path searching in GSS. We can observe this benefit when processing the reduction $(v_3, S, 4, 0, w_6)$, only SPPF tree was updated. Although the GSS size is larger in BRNGLR, it grows in a constant rate, making it overall more efficient than RNGLR.

#### Pseudo code and implementation
First we define the BRNGLR class with its helper methods. For the most part this is similar to RNGLR.

In [29]:
class BRNGLRParser(RNGLRParser):
    '''
        The BRNGLR parser
    '''
    def __init__(self, start: str, grammar: Grammar, table: dict[int, dict[str, list[str]]]):
        super().__init__(start, grammar, table)

##### Parse function
The $\textrm{Parse}$ function is exactly the same as RNGLR

In [30]:
class BRNGLRParser(BRNGLRParser):
    def parse(self, input_str: str):
        # Same as RNGLR
        return super().parse(input_str)        

##### The Reducer
**Pseudocode**

Get $(v, X, m, f, y)$ from $\mathcal{R}$
- If $m\ge 2$
	- $\mathcal{X}$ is the set of $(u, x)$ where $u$ is child of $v$ and $x$ is the label of edge $(v, u)$
- Else $\mathcal{X} = \{(v, \epsilon)\}$
- If $m\le 2$
	- For every $(u, x) \in \mathcal{X}$
		- If $m = 0$
			- $z\gets$ node $f$ in the $\epsilon$-SPPF tree
		- Else
			- Let $c$ be the level of $u$ in GSS
			- Find SPPF node $z = (X, c)$ in $\mathcal{N}$
				- If does not exist then create $z$ and add to $\mathcal{N}$
		- Let $k$ be the label of $u$ and $pl$ be the shift action in $T(k, a_i)$
		- If exists node $w$ with label $l$ in $U_i$
			- If edge $(w, u)$ does not exist
				- Create edge $(w, u)$ with label $z$
				- For $r(B, t, f) \in T(l, a_i)$
					- If $t \neq 0$ add $(u, B, t, f, z)$ to $\mathcal{R}$
		- Else
			- Create node $w$
			- Create edge $(w, u)$ with label $z$
			- For action in $T(l, a_i)$
				- If shift action $ph$
					- Add $(w, h)$ to $\mathcal{Q}$
				- If reduce action $r(B, t, f)$
					- If $t = 0$
						- Add $(w, B, t, f, \epsilon)$ to $\mathcal{R}$
					- if $t\ne 0$ and $m\neq 0$
						- Add $(w, B, t, f, z)$ to $\mathcal{R}$
			- If $m = 1$
				- $nodeSequence = [y]$
			- if $m = 2$
				- $nodeSequence = [x, y]$
			- if $f\ne 0$
				- Append $u_f$ to $nodeSequence$
			- If $m\ne 0$
				- $z.\textrm{addChildren}(nodeSequence)$
- Else $(m > 2)$
    - If node $w$ with label $X_m$ doesn't exist
        - Create one
    - For $(u, x)\in \mathcal{X}$
        - If there isn't edge $(w, u)$
            - Create empty SPPF node $z$
            - Create edge $(w, u)$
            - Add $(u, X, m -1, 0, z)$ to $\mathcal{R}$
        - $nodeSequence = [x, y]$
        - If $f\ne 0$:
            - Append $u_f$ to $nodeSequence$
        - $z.\textrm{addChildren}(nodeSequence)$

Here we call $u_f$ the $\epsilon$-SPPF node with index $f$. The main reducer logic is divided into 2 cases, $m \le 2$ and $m > 2$. 

In the first case $m \le 2$, we don't have to perform a search on GSS anymore because the only possible lengths of paths are 0 and 1, either direct children or the node itself. Other than that, the reducer behaves similarly to RNGLR in this case.

In the second case, the $\textrm{Reducer}$ creates a new bookkeeping GSS node and an empty SPPF node. It then adds a new reduction with reduced length into $\mathcal{R}$


In [31]:
class BRNGLRParser(BRNGLRParser):
    def reducer(self, i: int):
        '''
            The reducer, implemented based on pseudocode by Giorgios Robert Economopoulos
        '''
        v, X, m, f, y = self.reductions.pop()

        X_: list[tuple[GSSNode, SPPFNode]] = []
        z: SPPFNode = None
        if (m >= 2):
            for child in v.children:
                X_.append(child)
        else:
            X_ = [(v, self.sppf.epsilon_sppf[0])]
        
        if (m <= 2):
            for u, x in X_:
                k = u.label

                if m == 0:
                    z = self.sppf.epsilon_sppf[f]
                else:
                    c = u.level
                    if (X, c) not in self.set_N:
                        z = self.sppf.create_node(X, c)
                        self.set_N[X, c] = z
                    else:
                        z = self.set_N[(X, c)]
                
                for action in self.table[k][X]:
                    action_obj = get_action(action)
                    if action_obj[0] == 'p':
                        w = self.gss.find_node(action_obj[1], i)
                        if w is not None:
                            if u not in [x[0] for x in w.children]:
                                w.add_child(u, z)
                                if m != 0:
                                    for action in self.table[action_obj[1]][self.input_str[i]]:
                                        action_obj = get_action(action)
                                        if action_obj[0] == 'r' and action_obj[2] != 0:
                                            # (u, B, t, f, z)
                                            self.reductions.append((u, action_obj[1], action_obj[2], action_obj[3], z))
                        else:
                            w = self.gss.create_node(action_obj[1], i)
                            w.add_child(u, z)
                            for action in self.table[action_obj[1]][self.input_str[i]]:
                                action_obj = get_action(action)
                                if action_obj[0] == 'p':
                                    self.shifts.append((w, action_obj[1]))
                                if action_obj[0] == 'r':
                                    t = action_obj[2]
                                    if t == 0:
                                        self.reductions.append((w, action_obj[1], 0, action_obj[3], self.sppf.epsilon_sppf[0]))
                                    elif (m != 0):
                                        self.reductions.append((u, action_obj[1], t, action_obj[3], z))
                
                node_seq: list[SPPFNode] = []
                if (m == 1):
                    node_seq = [y]
                elif (m == 2):
                    node_seq = [x, y]
                if f != 0:
                    node_seq.append(self.sppf.epsilon_sppf[f])
                
                if m != 0:
                    z.add_children(node_seq)
        else:
            w = self.gss.find_node(f"{X}_{m}", i)
            if w is None:
                w = self.gss.create_node(f"{X}_{m}", i)
            
            for u, x in X_:
                z: SPPFNode = None
    
                for child, edge in w.children:
                    if child == u:
                        z = edge
                        break
                if z is None:
                    z = self.sppf.create_node("", u.level)
                    w.add_child(u, z)
                    self.add_reduction(u, X, m - 1, 0, z)
                
                node_seq: list[SPPFNode] = [x, y]
                if f != 0:
                    node_seq.append(self.sppf.epsilon_sppf[f])
                z.add_children(node_seq)

##### The shifter
We keep the same RNGLR logic for the shifter, there is no changes.

In [32]:
class BRNGLRParser(BRNGLRParser):
    def shifter(self, i: int):
        # Same as RNGLR
        super().shifter(i)

Now we can use the new BRNGLR parser to parse the string "abcd" in `grammar_4`

In [33]:
input_str_4 = "abcd"
parser_4 = BRNGLRParser(start_4, grammar_4, table_4)
result_4, sppf_root_4 = parser_4.parse(input_str_4)
print(result_4)

True


The parser accepted it, let's look at the GSS and SPPF

In [34]:
print(parser_4.gss)
print(parser_4.sppf)

GSS:
Level 0:
    Node(v0, 0)
Level 1:
    Node(v1, 2)
        Node(v0, 0) - SPPF Node:(a, 0)
Level 2:
    Node(v2, 3)
        Node(v1, 2) - SPPF Node:(b, 1)
Level 3:
    Node(v3, 4)
        Node(v2, 3) - SPPF Node:(c, 2)
Level 4:
    Node(v4, 5)
        Node(v3, 4) - SPPF Node:(d, 3)
    Node(v5, 6)
        Node(v3, 4) - SPPF Node:(<D>, 3)
    Node(v6, <S>_4)
        Node(v2, 3) - SPPF Node:(blank)
    Node(v7, <S>_3)
        Node(v1, 2) - SPPF Node:(blank)
    Node(v8, 1)
        Node(v0, 0) - SPPF Node:(<S>, 0)

SPPF:
    a-0
    b-1
    c-2
    d-3
    <D>-3
        SPPF Node:(d, 3)
    blank
        PackingNode
            SPPF Node:(c, 2)
            SPPF Node:(<D>, 3)
        PackingNode
            SPPF Node:(c, 2)
            SPPF Node:(d, 3)
    blank
        SPPF Node:(b, 1)
        SPPF Node:(blank)
    <S>-0
        SPPF Node:(a, 0)
        SPPF Node:(blank)
    €-4



We can also visualise the parse tree:

In [35]:
ee = EnhancedExtractor(sppf_root_4)
while True:
    t = ee.extract_a_tree()
    if t is None: break
    display_tree(t)

<S>
├─ 'a'
└─ ''
    ├─ 'b'
    └─ ''
        ├─ 'c'
        └─ <D>
            └─ 'd'
<S>
├─ 'a'
└─ ''
    ├─ 'b'
    └─ ''
        ├─ 'c'
        └─ 'd'


### References
[1] D. E. Knuth, “On the translation of languages from left to right,” _Information and Control_, vol. 8, no. 6, pp. 607–639, Dec. 1965, doi: [10.1016/S0019-9958(65)90426-2](https://doi.org/10.1016/S0019-9958\(65\)90426-2).

[2] G. R. Economopoulos, “Generalised LR parsing algorithms”. Retrieved from https://core.ac.uk/download/pdf/301667613.pdf

[3] M. Tomita, _Efficient Parsing for Natural Language_. Boston, MA: Springer US, 1986. doi: 
[10.1007/978-1-4757-1885-0](https://doi.org/10.1007/978-1-4757-1885-0).