# Decidability in Context-Free Grammars (CFGs)

## Introduction
This section explores several decidable problems related to Context-Free Grammars (CFGs). Decidability in formal language theory refers to whether an algorithm exists that can determine the answer to a specific problem for all possible inputs within a given class of languages. For context-free grammars (CFGs), several important problems are decidable, meaning there exist algorithms to solve them effectively. This chapter covers key decidable problems in the context of CFGs, including:

* Emptiness Problem
* Uselessness Problem
* Membership Problem
* Finiteness Problem


In [1]:
class CFG:
    def __init__(self, productions, start_symbol):
        """
        Initialize a context-free grammar.
        
        Args:
            productions: A dictionary mapping non-terminals to lists of productions.
                Each production is a list of symbols (terminals or non-terminals).
            start_symbol: The start symbol of the grammar.
        """
        self.productions = productions
        self.start_symbol = start_symbol
        
        # Compute the set of all non-terminals and terminals
        self.non_terminals = set(productions.keys())
        self.terminals = set()
        for lhs, rhs_list in productions.items():
            for rhs in rhs_list:
                for symbol in rhs:
                    if symbol not in self.non_terminals and symbol != "∧":
                        self.terminals.add(symbol)
    
    def __str__(self):
        """Return a string representation of the grammar."""
        result = f"Start symbol: {self.start_symbol}\n"
        result += "Productions:\n"
        for lhs, rhs_list in self.productions.items():
            for rhs in rhs_list:
                if not rhs:  # Empty production
                    result += f"  {lhs} -> ∧\n"
                else:
                    result += f"  {lhs} -> {' '.join(rhs)}\n"
        return result

# Example grammar for balanced parentheses
balanced_parens = CFG(
    productions={
        'S': [['(', 'S', ')'], ['S', 'S'], []]  # [] represents ∧ (the empty string)
    },
    start_symbol='S'
)

print(balanced_parens)


Start symbol: S
Productions:
  S -> ( S )
  S -> S S
  S -> ∧



## 1. Emptiness Problem
### 1.1 Definition
The emptiness problem asks: "Is the language generated by a given CFG empty?" This problem is decidable by checking whether the start symbol S can derive any terminal string. A CFG generates an empty language if and only if its start symbol cannot derive any string of terminals. To determine this, we can use a reachability algorithm that finds all non-terminals that can derive terminal strings.

Note: A language that consists solely of the empty string $\wedge$ is not an empty language.

### 1.2 Algorithm to Check Emptiness:

**Algorithm 1:**

* Initialize a set **derivable** with all non-terminals that have a production directly to terminals or $\wedge$ (empty string).
* Iterate: Add a non-terminal to **derivable** if it has a production where all symbols are either terminals or non-terminals already in **derivable**.
* Repeat until no more non-terminals can be added. If the start symbol $S$ is in **derivable**, the language is non-empty; otherwise, it's empty.

**Algorithm 2:**

* Identify directly derivable non-terminals:
    - Mark any nonterminal $N$ that has a production of the form $N \rightarrow t$, where $t$ is a terminal or a string of terminals.
    - Replace occurrences of $N$ in other productions with $t$, this will remove $N$  altogether.
* Repeat the previous step to remove derivable non-terminals until no new non-terminals can be eliminated:
    - If $S$ is eliminated, the CFG is not empty.
    - If $S$ is not eliminated, the CFG is empty.

Note: Algorithm 1 maintains the original CFG structure and simply tracks which non-terminals can derive terminal strings. Algorithm 2 modifies the original CFG by eliminating non-terminals, which can be useful in further simplifications but changes the grammar structure. Both approaches correctly solve the Emptiness problem, but Algorithm 2's approach of modifying the grammar makes it less suitable for situations where you need to preserve the original grammar structure. However, the substitution method can be more intuitive for understanding why certain strings are derivable from the grammar.

**Comparison of Emptiness Problem Algorithms**

| Feature | Algorithm 1 | Algorithm 2 |
|---------|------------|-------------|
| **Grammar Structure** | Preserves the original CFG structure | Modifies the original CFG structure |
| **Non-terminal Handling** | Tracks which non-terminals can derive terminal strings | Eliminates non-terminals from the grammar |
| **Effect on Grammar** | Non-destructive - original grammar is unchanged | Destructive - transforms the grammar |
| **Use Cases** | Suitable when original grammar structure must be preserved | Useful for further grammar simplifications |
| **Intuitiveness** | More abstract - focuses on non-terminal properties | More intuitive for understanding string derivability |
| **Output** | Provides a yes/no answer about emptiness | Provides a simplified grammar (or determines emptiness) |


### 1.3 Examples
Consider this given CFG with productions: $S \rightarrow AB; A \rightarrow AX; B \rightarrow BY \mid XX; X \rightarrow a; Y \rightarrow b$

**Using Algorithm 1:**

* Step 1: initialize a set $derivable = \{X, Y\}$ since they have productions directly to terminals.
* Step 2: add $B$ to the set due to the production $B \rightarrow XX$ and $X$ is already in the set: $derivable = \{X, Y, B\}$
* Step 3: Repeat until no more non-terminals can be added: $derivable = \{X, Y, B\}$. Since $S$ is not in the set, the language is empty.

**Using Algorithm 2:**

* Step 1: Identify directly derivable non-terminals: $X, Y$ since they have productions directly to terminals.
* Step 2: Replace occurrences of $X, Y$ in other productions with terminals and eliminate them: $S \rightarrow AB; A \rightarrow Aa; B \rightarrow Bb \mid aa$
* Step 3: Repeat step 1 and 2: we identify another directly derivable non-terminal: $B$, replace and eliminate it: $S \rightarrow Aaa; A \rightarrow Aa$.
* Step 4: Since $S$ is not eliminated, the CFG is empty.

### 1.4 Example Python Implementation
Below, we'll demonstrate how to implement Algorithm 1 with Python code.

In [None]:
def is_language_empty(cfg):
    """
    Determine if the language generated by a CFG is empty.
    
    Args:
        cfg: A CFG object
        
    Returns:
        bool: True if the language is empty, False otherwise
    """
    # Step 1: Find all non-terminals that directly generate terminals or ε
    derivable = set()
    for non_terminal, productions in cfg.productions.items():
        for production in productions:
            # Check if production is empty (ε) or contains only terminals
            if not production or all(symbol in cfg.terminals for symbol in production):
                derivable.add(non_terminal)
                break
    
    # Step 2: Iterate until no more non-terminals can be added
    changed = True
    while changed:
        changed = False
        for non_terminal, productions in cfg.productions.items():
            if non_terminal in derivable:
                continue
                
            for production in productions:
                if all(symbol in derivable or symbol in cfg.terminals for symbol in production):
                    derivable.add(non_terminal)
                    changed = True
                    break
    
    # Step 3: Check if the start symbol can generate a string
    return cfg.start_symbol not in derivable

In [None]:
# Test with balanced parentheses grammar
print(f"Is balanced_parens language empty? {is_language_empty(balanced_parens)}")

# Grammar that generates an empty language
empty_lang = CFG(
    productions={
        'S': [['A']],
        'A': [['S']]
    },
    start_symbol='S'
)
print(f"Is empty_lang language empty? {is_language_empty(empty_lang)}")

## 2. Uselessness Problem

### 2.1 Definition

The uselessness problem asks: "Which non-terminals in a CFG are useless?" A non-terminal is useless if:
* It cannot derive any string of terminals (non-generating), or
* It cannot be reached from the start symbol.

### 2.2 Algorithm to Identify Useless Non-terminals 

This problem is decidable and it can be solved using the following algorithm: 

* Step 1: find all derivable non-terminals (as in the Emptiness problem) to determine the non-generating non-terminals.
* Step 2: remove all productions that involves non-generating non-terminals.
* Step 3: based on the revised CFG, find all reachable non-terminals starting from the start symbol.
* A non-terminal is useful if and only if it is both derivable and reachable. Otherwise it is useless.

### 2.3 Examples
Find all useless non-terminals in this given CFG with productions: $S \rightarrow AB \mid CD; A \rightarrow aA \mid a; B \rightarrow bB \mid BD \mid b; C \rightarrow abC; D \rightarrow dD \mid d; E \rightarrow e$

* Step 1: find all derivable non-terminals to determine the non-generating non-terminals:
    - initialize a set $derivable = \{A, B, D, E\}$ since they have productions directly to terminals.
    - add $S$ to the set due to the production $S \rightarrow AB$ and $A, B$ are already in the set: $derivable = \{A, B, D, E, S\}$
    - Repeat until no more non-terminals can be added: $derivable = \{A, B, D, E, S\}$.
    - So $C$ is a non-generating non-terminal.
* Step 2: remove all productions that involves $C$, so the CFG becomes $S \rightarrow AB; A \rightarrow aA \mid a; B \rightarrow bB \mid BD \mid b; D \rightarrow dD \mid d; E \rightarrow e$
* Step 3: find all reachable non-terminals:
    - Starting from $S$, we can reach $A, B$
    - From $B$ we can reach $D$
    - Since no productions contain $E$ on the righ-hand size, $E$ is unreachable.
* Conclusion: in the given CFG, $C$ and $E$ are useless non-terminals.

### 2.4 Example Python Implementation
Below, we'll demonstrate how to implement the Algorithm with Python code.

In [None]:
def find_useless_symbols(cfg):
    """
    Find all useless non-terminals in a CFG.
    
    Args:
        cfg: A CFG object
        
    Returns:
        set: The set of useless non-terminals
    """
    # Step 1: Find all generating non-terminals
    generating = set()
    for non_terminal, productions in cfg.productions.items():
        for production in productions:
            # Check if production is empty (ε) or contains only terminals
            if not production or all(symbol in cfg.terminals for symbol in production):
                generating.add(non_terminal)
                break
    
    changed = True
    while changed:
        changed = False
        for non_terminal, productions in cfg.productions.items():
            if non_terminal in generating:
                continue
                
            for production in productions:
                if all(symbol in generating or symbol in cfg.terminals for symbol in production):
                    generating.add(non_terminal)
                    changed = True
                    break
    
    # Step 2: Find all reachable non-terminals
    reachable = {cfg.start_symbol}
    changed = True
    while changed:
        changed = False
        new_reachable = set(reachable)
        
        for non_terminal in reachable:
            if non_terminal not in cfg.productions:
                continue
                
            for production in cfg.productions[non_terminal]:
                for symbol in production:
                    if symbol in cfg.non_terminals and symbol not in new_reachable:
                        new_reachable.add(symbol)
                        changed = True
        
        reachable = new_reachable
    
    # Step 3: Identify useless non-terminals
    useless = set()
    for non_terminal in cfg.non_terminals:
        if non_terminal not in generating or non_terminal not in reachable:
            useless.add(non_terminal)
    
    return useless

In [None]:
# Create a grammar with useless symbols
grammar_with_useless = CFG(
    productions={
        'S': [['A'], ['B']],
        'A': [['a', 'A'], ['a']],
        'B': [['C']],
        'C': [['B']],
        'D': [['a']]  # D is unreachable
    },
    start_symbol='S'
)

useless = find_useless_symbols(grammar_with_useless)
print(f"Useless non-terminals: {useless}")

# We can also use this to eliminate useless symbols
def eliminate_useless_symbols(cfg):
    """
    Eliminate useless symbols from a CFG.
    
    Args:
        cfg: A CFG object
        
    Returns:
        CFG: A new CFG with useless symbols eliminated
    """
    useless = find_useless_symbols(cfg)
    
    # Create new productions without useless symbols
    new_productions = {}
    for non_terminal, productions in cfg.productions.items():
        if non_terminal not in useless:
            new_prods = []
            for production in productions:
                # Only keep productions that don't contain useless non-terminals
                if not any(symbol in useless for symbol in production):
                    new_prods.append(production)
            if new_prods:  # Only add if there are valid productions
                new_productions[non_terminal] = new_prods
    
    return CFG(new_productions, cfg.start_symbol)

# Test eliminating useless symbols
simplified_grammar = eliminate_useless_symbols(grammar_with_useless)
print("Grammar after eliminating useless symbols:")
print(simplified_grammar)

## 3. Membership Problem
### 3.1 Definition
The membership problem asks: "Does a given string belong to the language generated by a CFG?" This problem is decidable using two primary algorithms: the Cocke-Younger-Kasami (CYK) algorithm and the Earley’s Algorithm.

### 3.2 CYK Algorithm (Cocke-Younger-Kasami) for Membership Problem
The CYK algorithm is a dynamic programming approach that determines whether a given string is generated by a CFG. 

* Works only for grammars in Chomsky Normal Form (CNF).
* Uses bottom-up parsing to check derivations efficiently.
* Runs in $O(n^3)$ time, where $n$ is the length of the input string.

**Algorithm Description:** The CYK algorithm uses a bottom-up parsing approach with a dynamic programming table. We can find the answer step by step by looking at shorter parts of the string first. For a string of length $n$, we create an $n \times n$ table where each cell $[i,j]$ stores the non-terminals that can generate the substring from position $i$ to $j$. We fill the table from the smallest substrings (length 1) up to the full string. For each substring, we check if it can be split into two smaller parts, where one part is generated by a non-terminal $B$ and the other by a non-terminal $C$, and if there is a rule $A \rightarrow BC$. The algorithm can be described as follows:

* Step 1: Initialize the Table
    - Let $n$ be the length of the input string $w = w_1w_2...w_n$.
    - Construct a table $T[i,j]$ to store the set of nonterminals that can derive the substring $w[i:j]$ (a substring of length $j$ starting at position $i$).
    - $i$ represents the starting position of a substring.
    - $j$ represents the length of the substring.
    - Rows (i) correspond to different starting positions in the input string.
    - Columns (j) correspond to substring lengths from 1 to n (the length of w).
    - The table is triangular, meaning $T[i,j]$ is filled only for $j \geq i$.
* Step 2: Filling single characters as the base cases
    - For each terminal $w_k$ in the input string, find all non-terminals $A$ such that $A \rightarrow w_k$ is a production.
    - Store these non-terminals in $T[k,1]$.
* Step 3: Fill the Table Using Recursion
    - For each substring length $L$: from $2$ to $n$:
        - For each starting position $i$: from $1$ to $n-L+1$:
            - Check all possible splits of the substring $w[i:i+L-1]$ into two parts:
                - Let $A \rightarrow BC$ be a rule in the CFG.
                - If $B$ is found in $T[i,s]$ and $C$ is found in $T[i+s,L-s]$, $A$ is added to $T[i,L]$.
* Step 4: Check Membership
    - If the start symbol $S$ is in $T[1,n]$, $w \in L(G)$.
    - Otherwise, $w \notin L(G)$.

### 3.3 CYK Algorithm Examples
Given a CFG in CNF: $S \rightarrow AB \mid BC; A \rightarrow BA \mid a; B \rightarrow CC \mid b; C \rightarrow AB \mid a$, decide whether $w = baaba$ can be generated by the given CFG.

* Step 1 initialize the table: Given $w = baaba$, a string of length $n = 5$, we construct a $5 \times 5$ table (at the start, all cells are empty):

| i/j | 1 | 2 | 3 | 4 | 5 |
|-----|---|---|---|---|---|
| 1   |   |   |   |   |   |
| 2   |   |   |   |   |   |
| 3   |   |   |   |   |   |
| 4   |   |   |   |   |   |
| 5   |   |   |   |   |   |

* Step 2 filling the table for substrings of length 1: for each single character in $w$, we find which non-terminals produce that terminal:
    - Since $A \rightarrow a$ and $C \rightarrow a$, $A, C$ can generate $a$;
    - $B \rightarrow b$, $B$ can generate $b$:

| i/j | 1 | 2 | 3 | 4 | 5 |
|-----|---|---|---|---|---|
| 1 |b: {B} |   |   |   |   |
| 2 |a: {A,C} |   |   |   |   |
| 3 |a: {A,C} |   |   |   |   |
| 4 |b: {B} |   |   |   |   |
| 5 |a: {A,C} |   |   |   |   |

* Step 3 filling the table for substrings of length 2: look at pairs of adjacent terminals (substrings of length 2), then apply productions of the form $A \rightarrow BC$ to check if any non-terminals derive them.
    - $w[1:2] = ba$: based on non-terminals producing length-1 substrings, we know that $BA$ and $BC$ can produce length-2 substring $ba$, since $S \rightarrow BC; A \rightarrow BA$, the set of non-terminals to produce length-2 substring $ba$ is $\{S,A\}$.
    - $w[2:2] = aa$: based on non-terminals producing length-1 substrings, we know that $AA, AC, CA, CC$ can produce length-2 substring $aa$, since $B \rightarrow CC$, the set of non-terminals to produce length-2 substring $aa$ is $\{B\}$.
    - $w[3:2] = ab$: based on non-terminals producing length-1 substrings, we know that $AB, CB$ can produce length-2 substring $ab$, since $S \rightarrow AB; C \rightarrow AB;$, the set of non-terminals to produce length-2 substring $aa$ is $\{S,C\}$.
    - $w[4:2] = ba$: this substring has been analyzed before, the set of non-terminals to produce length-2 substring $ba$ is $\{S,A\}$.
    - $w[5:2]$: this substring does not exist, we use "-" to represent an invalid entry in the table.
  
| i/j | 1 | 2 | 3 | 4 | 5 |
|-----|---|---|---|---|---|
| 1 |b: {B} |ba: {S,A}  |   |   |   |
| 2 |a: {A,C} |aa:{B}   |   |   |   |
| 3 |a: {A,C} |ab: {S,C}   |   |   |   |
| 4 |b: {B} |ba: {S,A}  |   |   |   |
| 5 |a: {A,C} |-  |   |   |   |

* Step 4 filling the table for substrings of length 3: 
    - $w[1:3] = baa$: based on non-terminals producing length-1 and length-2 substrings, we know that $baa$ can be split into either $\{b, aa\}$ or $\{ba, a\}$. Note that we split a substring into only two parts, not three or more, since a CFG in CNF has exactly two non-terminals on the right-hand side of any produciton rule.
        - for $\{b, aa\}$, based on previous analysis, $BB$ can product it. However, since no non-terminal can produce $BB$, no non-terminal can generate the substring $baa$ in this split.
        - for $\{ba, a\}$, based on previous analysis, $SA,SC,AA,AC$ can product it. However, since no non-terminal can produce $SA,SC,AA,AC$, no non-terminal can generate the substring $baa$ in this split.
        - So no non-terminal can produce $baa$ in any split. We use an enmpty set $\{\}$ to indicate that in the table.
    - $w[2:3] = aab$: based on non-terminals producing length-1 and length-2 substrings, we know that $aab$ can be split into either $\{a, ab\}$ or $\{aa, b\}$. 
        - for $\{a, ab\}$, based on previous analysis, $AS,AC,CS,CC$ can product it. Since $B$ can produce $C$, we add $B$ to the set of non-terminals for $aab$.
        - for $\{aa, b\}$, based on previous analysis, $BB$ can product it. However, since no non-terminal can produce $BB$, no non-terminal can generate the substring $aab$ in this split.
        - So $\{B\}$ can produce $aab$ in any split.
    - $w[3:3] = aba$: based on non-terminals producing length-1 and length-2 substrings, we know that $aba$ can be split into either $\{a, ba\}$ or $\{ab, a\}$. 
        - for $\{ab, a\}$, based on previous analysis, $SA,SC,CA,CC$ can product it. Since $B$ can produce $CC$, we add $B$ to the set of non-terminals for $aba$.
        - for $\{a, ba\}$, based on previous analysis, $AS,AA,CS,CA$ can product it. However, since no non-terminal can produce $AS,AA,CS,Ca$, no non-terminal can generate the substring $aba$ in this split.
        - So $\{B\}$ can produce $aba$ in any split.
    - $w[4:3],w[5:3]$: they are not valid substrings, we use "-" in the table for these entries,

| i/j | 1 | 2 | 3 | 4 | 5 |
|-----|---|---|---|---|---|
| 1 |b: {B} |ba: {S,A}  |baa: {} |   |   |
| 2 |a: {A,C} |aa:{B}   |aab: {B} |   |   |
| 3 |a: {A,C} |ab: {S,C}   |aba: {B} |   |   |
| 4 |b: {B} |ba: {S,A}  |-   |   |   |
| 5 |a: {A,C} |-  |-   |   |   |

* Step 5 filling the table for substrings of length 4: 
    - $w[1:4] = baab$: we know that $baab$ can be split into either $\{b, aab\}$, $\{ba, ab\}$, or $\{baa, b\}$.
        - for $\{b, aab\}$, based on previous analysis, $BB$ can product it. However, since no non-terminal can produce $BB$, no non-terminal can generate the substring $baab$ in this split.
        - for $\{ba, aa\}$, based on previous analysis, $SS,SC,AS,AC$ can product it. However, since no non-terminal can produce $SS,SC,AS,AC$, no non-terminal can generate the substring $baab$ in this split.
        - for $\{baa, b\}$, based on previous analysis, no non-terminal can produce $baa$, so no non-terminal can generate the substring $baab$ in this split.
        - So no non-terminal can produce $baab$ in any split. We use an enmpty set $\{\}$ to indicate that in the table.
    - $w[2:4] = aaba$: we know that $aaba$ can be split into either $\{a, aba\}$, $\{aa, ba\}$, or $\{aab, a\}$. 
        - for $\{a, aba\}$, based on previous analysis, $AB,CB$ can product it. Since $S,C$ can produce $AB$, we add $S,C$ to the set of non-terminals for $aaba$.
        - for $\{aa, ba\}$, based on previous analysis, $BS,BA$ can product it. Since $A$ can produce $BA$, we add $A$ to the set of non-terminals for $aaba$.
        - for $\{aab, a\}$, based on previous analysis, $BA,BC$ can product it. Since $A$ can produce $BA$ and $S$ can produce $BC$, we add $A,S$ to the set of non-terminals for $aaba$.
        - So $\{S,C,A\}$ can produce $aaba$ in any split.
    - $w[3:4],w[4:4],w[5:4]$: they are not valid substrings, we use "-" in the table for these entries,
 
| i/j | 1 | 2 | 3 | 4 | 5 |
|-----|---|---|---|---|---|
| 1 |b: {B} |ba: {S,A}  |baa: {} |baab:{} |   |
| 2 |a: {A,C} |aa:{B}   |aab: {B} |aaba: {S,C,A} |   |
| 3 |a: {A,C} |ab: {S,C}   |aba: {B} |-   |   |
| 4 |b: {B} |ba: {S,A}  |-   |-   |   |
| 5 |a: {A,C} |-  |-   |-   |   |

* Step 6 filling the table for substrings of length 5: 
    - $w[1:5] = baaba$: we know that it can be split into either $\{b, aaba\}$, $\{ba, aba\}$, $\{baa, ba\}$, or $\{baab, a\}$.
        - for $\{b, aaba\}$, based on previous analysis, $SB,CB,AB$ can product it. Since $S,C$ can produce $AB$, we add $S,C$ to the set of non-terminals for $baaba$.
        - for $\{ba, aba\}$, based on previous analysis, $SB,AB$ can product it. Since $S,C$ can produce $AB$, we add $S,C$ to the set of non-terminals for $baaba$.
        - for $\{baa, ba\}$, based on previous analysis, no non-terminal can produce $baa$, so no non-terminal can generate the substring $baaba$ in this split.
        - for $\{baab, a\}$, based on previous analysis, no non-terminal can produce $baab$, so no non-terminal can generate the substring $baaba$ in this split.
        - So $S,C$ can produce $baaba$ in any split.
    - $w[2:5],w[3:5],w[4:5],w[5,5]$: they are not valid substrings, we use "-" in the table for these entries,

| i/j | 1 | 2 | 3 | 4 | 5 |
|-----|---|---|---|---|---|
| 1 |b: {B} |ba: {S,A}  |baa: {} |baab:{} |baaba:{S,C} |
| 2 |a: {A,C} |aa:{B}   |aab: {B} |aaba: {S,C,A} |-  |
| 3 |a: {A,C} |ab: {S,C}   |aba: {B} |-   |-  |
| 4 |b: {B} |ba: {S,A}  |-   |-   |-  |
| 5 |a: {A,C} |-  |-   |-   |-  |

* Step 7: Since $S$ appears in $T[1:5]$, we conclude that $baaba \in L(G)$.

### 3.4 Example Python Implementation 
Below, we'll demonstrate how to implement the membership Algorithm with Python code.

In [None]:
from collections import defaultdict

def cyk_algorithm(grammar, start_symbol, word):
    n = len(word)
    if n == 0:
        return start_symbol in grammar.get("", set())
    
    # Initialize table
    table = [[set() for _ in range(n)] for _ in range(n)]
    
    # Fill in base case
    for i, symbol in enumerate(word):
        for lhs, rhs in grammar.items():
            if symbol in rhs:
                table[i][i].add(lhs)
    
    # Fill in table using dynamic programming
    for length in range(2, n + 1):  # Length of span
        for i in range(n - length + 1):  # Start position
            j = i + length - 1  # End position
            for k in range(i, j):  # Partition index
                for lhs, rhs in grammar.items():
                    for rule in rhs:
                        if len(rule) == 2 and rule[0] in table[i][k] and rule[1] in table[k + 1][j]:
                            table[i][j].add(lhs)
    
    # Check if start symbol is in the top-right cell
    return start_symbol in table[0][n - 1]

# Example usage
grammar = {
    'S': {('A', 'B'), ('B', 'C')},
    'A': {('B', 'A'), 'a'},
    'B': {('C', 'C'), 'b'},
    'C': {('A', 'B'), 'a'}
}
start_symbol = 'S'
word = "baab"
print(word + ": " + str(cyk_algorithm(grammar, start_symbol, word)))  # Output: True or False
word = "baaba"
print(word + ": " + str(cyk_algorithm(grammar, start_symbol, word)))  # Output: True or False


### 3.5 Earley’s Algorithm for Membership Problem
Earley’s algorithm is a chart parsing algorithm used for recognizing whether a given string belongs to a language defined by a CFG. 

* Works for any CFG, not just those in Chomsky Normal Form (CNF).
* Uses a top-down, dynamic programming approach that processes the input string one symbol at a time while maintaining a chart of parsing states.
* Time Compliexity: worst case: $O(n^3)$, best case: $O(n)$ for simple grammars, where $n$ is the length of the input string.

**Algorithm Description:** The fundamental unit in Earley's algorithm is the **Earley item**, which has the form: $[A \rightarrow \alpha \cdot \beta, i]$, where

* $A \rightarrow \alpha\beta$ is a production rule in the grammar
* $\cdot$ represents the current parsing position in the production rule
* $\alpha$ is the part of the production that has been recognized
* $\beta$ is the part of the production that still needs to be recognized
* $i$ is the starting position in the input string where recognition of this rule began

The algorithm maintains $n+1$ sets of Earley items for a string of length $n$, denoted $S_0, S_1, \cdots, S_n$. Each set $S_j$ contains items that represent possible parsing states after processing $j$ symbols of the input string. An Earley item $[A \rightarrow \alpha \cdot \beta, i]$ in $S_j$ means that we have recognized $\alpha$ starting from position $i$ and ending at position $j$. Earley's algorithm uses three main operations to build these sets:

* Predicting: for each item $[A \rightarrow \alpha \cdot B\beta, i]$ in $S_j$ where $B$ is a non-terminal, add $[B \rightarrow \cdot \gamma, j]$ to $S_j$ for every production rule $B \rightarrow \gamma$ in the grammar. This operation predicts what rules might be used next based on what we're expecting to see.
* Scanning: for each item $[A \rightarrow \alpha \cdot a\beta, i]$ in $S_j$ where $a$ is a terminal that matches the input at position $j$, add $[A \rightarrow \alpha a\cdot \beta, i]$ to $S_{j+1}$. This operation consumes an input symbol when it matches what we're expecting.
* Completing: for each item $[A \rightarrow \gamma \cdot, i]$ in $S_j$ which is a completed rule, for each item $[B \rightarrow \alpha \cdot A\beta, k]$ in $S_i$: add $[B \rightarrow \alpha A \cdot \beta, k]$ to $S_j$. This operation updates all rules that were waiting for the rule we just completed.

The algorithm follows these steps:

* Initialize $S_0$ with the item $[S'\rightarrow \cdot S, 0]$, where $S$ is the original start symbol and $S'$ is a new start symbol.
* For each position $j$ from $0$ to $n$:
    - Apply prediction, scanning, and completion until no new items can be added to $S_j$
    - Move to position $j+1$
* If the item $[S'\rightarrow S\cdot, 0]$ is in $S_n$, the input string is accepted; otherwise, it's rejected.

**Example:** Consider the CFG for balanced parentheses: $S \rightarrow SS \mid (S) \mid ()$. Decide if the string $(())$ can be derived by the given CFG. 

* Step 1: Initialize $S_0$: start with $[S' \rightarrow \cdot S, 0]$, add all $S$ productions by the predicting operation:
    - $[S \rightarrow \cdot SS, 0]$
    - $[S \rightarrow \cdot (S), 0]$
    - $[S \rightarrow \cdot (), 0]$
    - Finally $S_0$ contains the following items:
      $$[S' \rightarrow \cdot S, 0]$$
      $$[S \rightarrow \cdot SS, 0]$$
      $$[S \rightarrow \cdot (S), 0]$$
      $$[S \rightarrow \cdot (), 0]$$
* Step 2: Process $($ at position $0$:
    - Applying the scanning operation:
        - For item $[S \rightarrow \cdot(S), 0]$, add to $S_1$: $[S \rightarrow (\cdot S), 0]$
        - For item $[S \rightarrow \cdot(), 0]$, add to $S_1$: $[S \rightarrow (\cdot), 0]$
    - Applying the predicting operation: For item $[S \rightarrow (\cdot S), 0]$, add to $S_1$:
        - $[S \rightarrow \cdot SS, 1]$
        - $[S \rightarrow \cdot (S), 1]$
        - $[S \rightarrow \cdot (), 1]$
    - Finally $S_1$ contains the following items:
      $$[S \rightarrow (\cdot S), 0]$$
      $$[S \rightarrow (\cdot), 0]$$
      $$[S \rightarrow \cdot SS, 1]$$
      $$[S \rightarrow \cdot (S), 1]$$
      $$[S \rightarrow \cdot (), 1]$$
* Step 3: Process $($ at position $1$
    - Applying the scanning operation:
        - For item $S \rightarrow \cdot (S), 1$, add to $S_2$: $[S \rightarrow (\cdot S), 1]$
        - For item $[S \rightarrow \cdot(), 1]$, add to $S_2$: $[S \rightarrow (\cdot), 1]$
    - Applying the predicting operation: for item $[S \rightarrow (\cdot S), 1]$, add to $S_2$:
        - $[S \rightarrow \cdot SS, 2]$
        - $[S \rightarrow \cdot (S), 2]$
        - $[S \rightarrow \cdot (), 2]$
    - Finally $S_2$ contains the following items:
      $$[S \rightarrow (\cdot S), 1]$$
      $$[S \rightarrow (\cdot), 1]$$
      $$[S \rightarrow \cdot SS, 2]$$
      $$[S \rightarrow \cdot (S), 2]$$
      $$[S \rightarrow \cdot (), 2]$$
* Step 4: Process $)$ at position $2$
    - Applying the scanning operation:
        - For item $[S \rightarrow \cdot(), 1]$, add to $S_3$: $[S \rightarrow ()\cdot, 1]$
    - Applying the completing operation: for item $[S \rightarrow ()\cdot, 1]$, find items in $S_1$ expecting $S$: $[S \rightarrow (\cdot S), 0]$, so add $[S \rightarrow (S \cdot), 0]$ to $S_3$.
    - There is no new predicting operations needed
    - Finally $S_3$ contains the following items:
      $$[S \rightarrow ()\cdot, 1]$$
      $$[S \rightarrow (S\cdot), 0]$$
* Step 5: Process $)$ at position $3$
    - Applying the scanning operation:
        - For item $[S \rightarrow (S\cdot), 0]$, add to $S_4$: $[S \rightarrow (S)\cdot, 0]$
    - Applying the completing operation: for item $[S \rightarrow (S)\cdot, 0]$, find items in $S_0$ expecting $S$:
        - for $[S' \rightarrow \cdot S, 0]$, add $[S' \rightarrow S \cdot, 0]$ to $S_4$.
        - for $[S \rightarrow \cdot SS, 0]$, add $[S \rightarrow S \cdot S, 0]$ to $S_4$.
    - There is no new predicting operations needed
    - Finally $S_4$ contains the following items:
      $$[S \rightarrow (S)\cdot, 0]$$
      $$[S' \rightarrow S \cdot, 0]$$
      $$[S \rightarrow S \cdot S, 0]$$
* Step 6: since $S_4$ contains $[S' \rightarrow S \cdot, 0]$, the string $(())$ is accepted.

### 3.6 Example Python Implementation
Below, we'll demonstrate how to implement the Earley’s Algorithm with Python code.

In [None]:
from collections import defaultdict

def earley_parse(tokens, grammar, start_symbol):
    def predictor(state, k):
        non_terminal = state[1][state[2]]
        if non_terminal in grammar:
            for production in grammar[non_terminal]:
                new_state = (non_terminal, tuple(production), 0, k)
                if new_state not in chart[k]:
                    chart[k].add(new_state)
                    current_agenda.append(new_state)
    
    def scanner(state, k):
        if k < len(tokens):
            expected = state[1][state[2]]
            current_token = tokens[k]
            
            # Special handling for numeric tokens
            matches = (expected == current_token or 
                      (expected == 'num' and current_token.isdigit()))
            
            if matches:
                new_state = (state[0], state[1], state[2] + 1, state[3])
                if new_state not in chart[k + 1]:
                    chart[k + 1].add(new_state)
    
    def completer(state, k):
        origin = state[3]
        for prev_state in list(chart[origin]):
            if prev_state[2] < len(prev_state[1]) and prev_state[1][prev_state[2]] == state[0]:
                new_state = (prev_state[0], prev_state[1], prev_state[2] + 1, prev_state[3])
                if new_state not in chart[k]:
                    chart[k].add(new_state)
                    current_agenda.append(new_state)
    
    n = len(tokens)
    chart = [set() for _ in range(n + 1)]
    
    # Initialize chart[0]
    for production in grammar[start_symbol]:
        initial_state = (start_symbol, tuple(production), 0, 0)
        chart[0].add(initial_state)
    
    # Process each chart position separately
    for k in range(n + 1):
        current_agenda = list(chart[k])  # Process only states from current chart
        i = 0
        
        while i < len(current_agenda):
            state = current_agenda[i]
            i += 1
            
            if state[2] < len(state[1]):  # Dot not at the end
                next_symbol = state[1][state[2]]
                if next_symbol in grammar:  # Non-terminal
                    predictor(state, k)
                else:  # Terminal
                    scanner(state, k)
            else:  # Dot at the end (completed state)
                completer(state, k)
    
    # Check if any complete parse exists
    return any(state[0] == start_symbol and state[2] == len(state[1]) and state[3] == 0 
               for state in chart[n])

# Example usage
grammar = {
    'S': [tuple(['S', 'S']), tuple(['(', 'S', ')']), tuple(['(', ')'])]
}
tokens = list("(()())")  # Convert the string into a list of characters
print(earley_parse(tokens, grammar, 'S'))  # Output: True

grammar = {
    'E': [tuple(['E', '+', 'T']), tuple(['E', '-', 'T']), tuple(['T'])],
    'T': [tuple(['T', '*', 'F']), tuple(['T', '/', 'F']), tuple(['F'])],
    'F': [tuple(['(', 'E', ')']), tuple(['num'])]
}
tokens = list("2+3*4")
print(earley_parse(tokens, grammar, 'E'))  # Output: True



### 3.7 Comparison between Earley's and CYK Algorithm

| Feature | Earley's Algorithm | CYK Algorithm |
|---------|-------------------|---------------|
| **Grammar Requirements** | Any context-free grammar | Chomsky Normal Form only |
| **Time Complexity** | O(n³) worst case<br>O(n²) unambiguous<br>O(n) for simple CFG such as LR(k) | O(n³) always |
| **Space Complexity** | O(n²) | O(n²) |
| **Implementation Complexity** | Difficult | Simple |
| **Early Recognition** | Can recognize prefixes early | Must process entire input |


## 4. Finiteness Problem
### 4.1 Definition
The finiteness problem asks: "Is the language generated by a CFG finite or infinite?"  For context-free grammars, determining whether the generated language is finite or infinite is decidable, meaning there exists an algorithm that always terminates with the correct answer.

### 4.2 Algorithm to Check Finiteness
A language is infinite if and only if there exists a derivation path in which a non-terminal derives a string that contains itself, and it can derive a non-empty string (i.e., there's a cycle in the derivation). A CFG generates an infinite language if and only if there exists at least one non-terminal $A$ that:

* $A$ is useful, meaning it is reachable from the start symbol and can derive a terminal string, and
* $A$ can derive a string that contains itself: $A \stackrel{*}\Rightarrow xAy$ where $\stackrel{*}\Rightarrow$ means deriving in zero or more steps, and
* Either $x$ or $y$ or both is not empty.

If such a non-terminal exists, we call it a **self-embedded non-terminal**, and its recursive derivation path can be used to "pump" the derivation, enabling the generation of arbitrarily long words by repeatedly applying the recursive rule. Based on this observation, we can design an algorithm as follows:

#### Algorithm Steps:

* Step 1: Identify useful non-terminals (as previously discussed in Section 2 of this chapter)
* Step 2: Revise the CFG by eliminating all productions containing useless non-terminals
* Step 3: Detect self-embedded nonterminals: for each remaining non-terminal $A$:
    - Create a temporary marker symbol $A_M$
    - Replace all occurrences of $A$ on the left side of productions with $A_M$
    - Initialize a set $Reachable = \{A\}$
    - Repeat until no changes to $Reachable$:
        - For each nonterminal $B$ that appears on the left side of a production with some non-terminals from $Reachable$ on its right side: Add $B$ to $Reachable$
    - If $A_M$ is in $Reachable$, then $A$ is self-embedded
    - If a self-embedded nonterminal is found, exit the loop
* Step 4: if any self-embedded non-terminal was found, the language is infinite, otherwise it is finite.

### 4.3 Examples
**Example 1:** Consider the given CFG: $\{S \rightarrow aA \mid bB; A \rightarrow a \mid b; B \rightarrow ab\}$. Decide if its language is finite or infinite.

* Step 1: Identify useful non-terminals: all non-terminals $\{S, A, B\}$ are useful. Each can be reached from S and can derive terminal strings.
* Step 2: Remove useless non-terminals: no useless non-terminals to remove, the given CFG remains unchanged.
* Step 3: Detect self-embedded non-terminals:
    - For non-terminal $S$:
        - Replace $S$ with $S_M$ in left sides: $S_M \rightarrow aA \mid bB$,
        - Initialize $Reachable = \{S\}$
        - Check productions:
            - $S_M \rightarrow aA$: add $A$: $Reachable = \{S, A\}$
            - $S_M \rightarrow bB$: add $B$: $Reachable = \{S, A, B\}$
        - No more changes to $Reachable$
        - $S_M$ is not in $Reachable$, so $S$ is not self-embedded
    - For non-terminal $A$:
        - Replace $A$ with $A_M$ in left sides: $A_M \rightarrow a \mid b$
        - Initialize $Reachable = \{A\}$
        - Check productions: no non-terminals on right sides
        - No changes: $Reachable = \{A\}$
        - $A_M$ is not in $Reachable$, so $A$ is not self-embedded
    - For non-terminal $B$:
        - Replace $B$ with $B_M$ in left sides: $B_M \rightarrow ab$
        - Initialize $Reachable = \{B\}$
        - Check productions: no non-terminals on right sides
        - No changes: $Reachable = \{B\}$
        - $B_M$ is not in $Reachable$, so $B$ is not self-embedded
* Step 4: Decision: No self-embedded non-terminals found, so the language is finite.

**Example 2:** Consider the given CFG: $\{S \rightarrow AB \mid a; A \rightarrow BC; B \rightarrow CA; C \rightarrow b\}$. Decide if its language is finite or infinite.

* Step 1: Identify useful non-terminals: all non-terminals $\{S, A, B, C\}$ are useful.
* Step 2: Remove useless non-terminals: no useless non-terminals to remove, the given CFG remains unchanged.
* Step 3: Detect self-embedded non-terminals:
    - For non-terminal $S$:
        - Replace $S$ with $S_M$ in left sides: $S_M \rightarrow AB \mid a$
        - Initialize $Reachable = \{S\}$
        - Check productions:
            - $S_M \rightarrow AB$: add $A$ and $B$ to $Reachable = \{S, A, B\}$
            - $S_M \rightarrow a$: No non-terminals
            - $A \rightarrow BC$: add $C$ to $Reachable = \{S, A, B, C\}$
            - $B \rightarrow CA$: Nothing new to add
        - No more changes to $Reachable$
        - $S_M$ is not in $Reachable$, so $S$ is not self-embedded
    - For non-terminal $A$:
        - Replace $A$ with $A_M$ in left sides: $A_M \rightarrow BC$
        - Initialize $Reachable = \{A\}$
        - Check productions:
            - $A_M \rightarrow BC$: add $B$ and $C$ to $Reachable = \{A, B, C\}$
            - $B \rightarrow CA$: since $C$ is in $Reachable$ and $A$ is in the initial set, add $A_M$ to $Reachable = \{A, B, C, A_M\}$
        - $A_M$ is in $Reachable$, so $A$ is self-embedded
    - Terminate further processing upon detecting a self-embedded non-terminal.
* Step 4: Decision: found self-embedded non-terminal $A$, so the language is infinite.

### 4.4 Example Python Implementation
Below, we'll demonstrate how to implement the Earley’s Algorithm with Python code.

In [14]:
def is_language_finite(grammar, start_symbol):
    """
    Determine if a context-free grammar generates a finite language using
    the algorithm to detect self-embedded non-terminals.
    
    Args:
        grammar: Dictionary mapping non-terminals to lists of productions
        start_symbol: The start symbol of the grammar
    
    Returns:
        str: "Finite" if the language is finite, "Infinite" if it's infinite
    """
    # Step 1: Identify useful non-terminals
    useful_symbols = find_useful_symbols(grammar, start_symbol)
    
    print(f"Useful symbols: {useful_symbols}")
    
    # Step 2: Remove useless non-terminals
    filtered_grammar = {}
    for nt, productions in grammar.items():
        if nt in useful_symbols:
            filtered_prods = []
            for prod in productions:
                # Keep only productions with all non-terminals useful
                if all(symbol in useful_symbols or symbol not in grammar for symbol in prod):
                    filtered_prods.append(prod)
            if filtered_prods:
                filtered_grammar[nt] = filtered_prods
    
    # Step 3: Detect self-embedded non-terminals
    for non_terminal in filtered_grammar:
        if is_self_embedded(non_terminal, filtered_grammar, useful_symbols):
            # Step 4: If any self-embedded non-terminal is found, the language is infinite
            return "Infinite"
    
    # No self-embedded non-terminals found, the language is finite
    return "Finite"


def find_useful_symbols(grammar, start_symbol):
    """Find all useful non-terminals (both reachable and generating)."""
    # Find generating non-terminals
    generating = set()
    changed = True
    
    # Initial set: non-terminals that directly derive terminal strings
    for nt, productions in grammar.items():
        for prod in productions:
            if all(symbol not in grammar for symbol in prod):
                generating.add(nt)
                break
    
    # Find all non-terminals that can eventually derive terminal strings
    while changed:
        changed = False
        for nt, productions in grammar.items():
            if nt in generating:
                continue
                
            for prod in productions:
                if all(symbol in generating or symbol not in grammar for symbol in prod):
                    generating.add(nt)
                    changed = True
                    break
    
    # Find reachable non-terminals
    reachable = {start_symbol}
    changed = True
    
    while changed:
        changed = False
        new_reachable = set(reachable)
        
        for nt in reachable:
            if nt not in grammar:
                continue
                
            for prod in grammar[nt]:
                for symbol in prod:
                    if symbol in grammar and symbol not in new_reachable:
                        new_reachable.add(symbol)
                        changed = True
        
        reachable = new_reachable
    
    # Useful symbols are both generating and reachable
    return generating.intersection(reachable)


def is_self_embedded(non_terminal, grammar, useful_symbols):
    """
    Check if a non-terminal is self-embedded using the algorithm.
    
    A non-terminal is self-embedded if it can derive a string that contains
    itself with non-empty symbols on at least one side.
    """
    # Create a temporary marker symbol
    marker = f"M_{non_terminal}"
    
    # Initialize Reachable = {A}
    reachable = {non_terminal}
    
    # Repeat until no changes to Reachable
    changed = True
    while changed:
        changed = False
        new_reachable = set(reachable)
        
        for left_side, productions in grammar.items():
            # We only care about useful non-terminals
            if left_side not in useful_symbols:
                continue
                
            # Check if this non-terminal has a production with a reachable symbol
            contains_reachable = False
            for prod in productions:
                # Check only non-terminal symbols that are useful
                useful_nt_in_prod = [symbol for symbol in prod if symbol in useful_symbols]
                if any(symbol in reachable for symbol in useful_nt_in_prod):
                    contains_reachable = True
                    break
            
            if contains_reachable:
                # If this is our target non-terminal, add the marker
                if left_side == non_terminal:
                    if marker not in new_reachable:
                        new_reachable.add(marker)
                        changed = True
                # Otherwise, add this non-terminal
                elif left_side not in new_reachable:
                    new_reachable.add(left_side)
                    changed = True
        
        reachable = new_reachable
    
    # If M_A is in Reachable, then A is self-embedded
    return marker in reachable


# Test examples
if __name__ == "__main__":
    # Example 1: Simple Finite Grammar {ac, ad, be}
    grammar1 = {
        'S': [['a', 'T'], ['b', 'U']],
        'T': [['c'], ['d']],
        'U': [['e']]
    }

    # Example 2: Simple Infinite Grammar {b, ab, aab, ...}
    grammar2 = {
        'S': [['a', 'S'], ['b']]
    }

    # Example 3: Complex Grammar with cycles
    grammar3 = {
        'S': [['A', 'B'], ['a']],
        'A': [['B', 'C']],
        'B': [['C', 'A']],
        'C': [['b']]
    }

    # Example 4: Tricky Finite Grammar
    grammar4 = {
        'S': [['A', 'B'], ['a']],
        'A': [['C', 'D']],
        'B': [['E', 'F']],
        'C': [['g']],
        'D': [['h']],
        'E': [['i']],
        'F': [['j']]
    }

    # Example 5: Balanced Parentheses Grammar
    grammar5 = {
        'S': [['a', 'S', 'b'], []]  # [] represents empty string
    }

    # Example 6: Complex Grammar
    grammar6 = {
      'S': [['A', 'B', 'a'], ['b', 'A', 'Z'], ['b']],
      'A': [['X', 'b'], ['b', 'Z', 'A']], 
      'B': [['b', 'A', 'A']], 
      'X': [['a', 'Z', 'a'], ['b', 'A'], ['a', 'a', 'a']], 
      'Z': [['Z', 'A', 'b', 'A']]}
      
    print("Results of finiteness checking:")
    print(f"Example 1: {is_language_finite(grammar1, 'S')} (should be Finite)")
    print(f"Example 2: {is_language_finite(grammar2, 'S')} (should be Infinite)")
    print(f"Example 3: {is_language_finite(grammar3, 'S')} (should be Finite)")
    print(f"Example 4: {is_language_finite(grammar4, 'S')} (should be Finite)")
    print(f"Example 5: {is_language_finite(grammar5, 'S')} (should be Infinite)")
    print(f"Example 6: {is_language_finite(grammar6, 'S')} (should be Infinite)")
    
    # Examining which non-terminals are self-embedded
    for i, grammar in enumerate([grammar1, grammar2, grammar3, grammar4, grammar5, grammar6], 1):
        useful = find_useful_symbols(grammar, 'S')
        print(f"\nExample {i} self-embedded non-terminals (among useful symbols {useful}):")
        for nt in grammar:
            if nt in useful and is_self_embedded(nt, grammar, useful):
                print(f"  {nt} is self-embedded")

Results of finiteness checking:
Useful symbols: {'T', 'U', 'S'}
Example 1: Finite (should be Finite)
Useful symbols: {'S'}
Example 2: Infinite (should be Infinite)
Useful symbols: {'C', 'S'}
Example 3: Finite (should be Finite)
Useful symbols: {'C', 'A', 'F', 'B', 'D', 'S', 'E'}
Example 4: Finite (should be Finite)
Useful symbols: {'S'}
Example 5: Infinite (should be Infinite)
Useful symbols: {'B', 'A', 'X', 'S'}
Example 6: Infinite (should be Infinite)

Example 1 self-embedded non-terminals (among useful symbols {'T', 'U', 'S'}):

Example 2 self-embedded non-terminals (among useful symbols {'S'}):
  S is self-embedded

Example 3 self-embedded non-terminals (among useful symbols {'C', 'S'}):

Example 4 self-embedded non-terminals (among useful symbols {'C', 'A', 'F', 'B', 'D', 'S', 'E'}):

Example 5 self-embedded non-terminals (among useful symbols {'S'}):
  S is self-embedded

Example 6 self-embedded non-terminals (among useful symbols {'B', 'A', 'X', 'S'}):
  A is self-embedded
  X i

## 5. Practice Exercises
### 5.1 Exercise 1: Short Answer Questions
* Explain the difference between a non-generating non-terminal and an unreachable non-terminal.
* Why does the CYK algorithm require the grammar to be in Chomsky Normal Form?
* What defines a self-embedded non-terminal and why is it important for language finiteness?
* How does Earley's algorithm differ from the CYK algorithm in terms of grammar requirements?

### 5.2 Exercise 2: Emptiness Problem
Determine if the language generated by each following grammar is empty:

1. $\{S \rightarrow AB, A \rightarrow BC, B \rightarrow CA, C \rightarrow DA, D \rightarrow AD\}$
2. $\{S \rightarrow AB, A \rightarrow aA, B \rightarrow bB, C \rightarrow c\}$
3. $\{S \rightarrow aS \mid bS, A \rightarrow SbS\}$

### 5.3 Exercise 3: Uselessness Problem
Identify all useless non-terminals in the following grammar:

1. $\{S \rightarrow AB \mid CD, A \rightarrow aA \mid a, B \rightarrow bB \mid b, C \rightarrow cD, D \rightarrow d\}$
2. $\{S \rightarrow AB \mid aC, A \rightarrow BC, B \rightarrow bB, C \rightarrow c, D \rightarrow dD\}$
3. $\{S \rightarrow XY \mid a, X \rightarrow YZ, Y \rightarrow XZ, Z \rightarrow b\}$

### 5.4 Exercise 4: Membership Problem
1. Using the CYK algorithm, determine if the string $abbab$ is in the language of the following grammar.
$\{S \rightarrow AB \mid BB, A \rightarrow AB \mid a, B \rightarrow Ba \mid b\}$

2. Using the Earley's Algorithm, determine if the string $aab$ is in the language of the following grammar.
$\{S \rightarrow AA \mid B, A \rightarrow aA \mid a, B \rightarrow b\}$

### 5.5 Exercise 5: Finiteness Problem
Determine if each following grammar generates a finite or infinite language:

1. $\{S \rightarrow aS \mid A, A \rightarrow bA \mid b\}$
2. $\{S \rightarrow aSb \mid \wedge, A \rightarrow aAb \mid ab\}$
3. $\{S \rightarrow AB, A \rightarrow aA \mid a, B \rightarrow b\}$

### 5.6 Exercise 6: Integrated Problem
For the following grammar, determine:

1. Is the language empty?
2. Which non-terminals are useless?
3. Is the string "abba" in the language?
4. Is the language finite?
$$\{S \rightarrow AB \mid aS, A \rightarrow aA \mid a, B \rightarrow bB \mid BA \mid b\}$$

### 5.7 Exercise 7: Algorithm Implementation
Write pseudocode/Python Code for an algorithm that takes a CFG and returns a simplified equivalent grammar with no useless symbols.


## 6. Further Reading
* "Introduction to the Theory of Computation" by Michael Sipser, Section 4.1
* "Introduction to Computer Theory" by Daniel I.A. Cohen, Chapter 18
* "Automata Theory, Languages, and Computation" by Hopcroft, Motwani, and Ullman, Chapter 7