# Turing Machine Encoding

## Introduction
This notebook explores key ideas in computational theory through Turing Machine encodings and universal computation. We begin by outlining how Turing Machines can be encoded to standardize the representation of computation. Next, we examine two example languages, ALAN and MATHISON, that highlight how computational problems can be framed within this model.

## 1. The Encoding of Turing Machines
A Turing machine can be encoded as a string of symbols, allowing us to represent any Turing machine in a standardized format. This encoding is crucial for theoretical computer science as it enables us to treat Turing machines as data that can be processed by other Turing machines.

### 1.1 Table-Based Encoding Approach
The encoding process consists of three main steps:

* Step 1: Assign numeric codes to states
* Step 2: Convert the Turing machine into a Tabular Representation
* Step 3: Encode each transition using 'a' and 'b' Representation
* Step 4: Concatenate all individual row encodings into one unified string that encodes the entire Turing Machine

#### 1.1.1 Assigning numeric codes to states
When encoding a Turing Machine (TM) into a formal representation, it is useful to assign numeric identifiers to each state in the machine. This is a foundational step that supports further encoding processes, especially when we aim to represent the TM as a string of symbols or binary digits. To maintain consistency and simplify the encoding, we adopt the following labeling convention:

* Start State: Always labeled as state 1. This is the initial state where the machine begins execution.
* Halt State(s): Always labeled as state 2. This includes any accepting final states. In machines with multiple halting conditions, they can be collapsed into a single halt state for simplicity.
* Other States: Labeled incrementally from state 3 onward. These represent the working states of the machine. The ordering of the non-special states does not matter, as long as:
    * Each state has a unique label.
    * The transitions reference the correct labels.
    * The same labels are used consistently throughout the machine’s definition.

This flexibility in ordering makes it easier to write or manipulate TMs programmatically, without worrying about the specific sequence of states.

Assigning numeric labels serves several important purposes:

* Standardization: Provides a consistent way to reference and distinguish states.
* Simplifies Encoding: Numeric values can be easily encoded in binary, unary, or any symbolic format.
* Supports Automation: Makes it easier to build simulators, compilers, or universal machines that operate on encoded TMs.
* Reduces Ambiguity: Prevents confusion that can arise from using arbitrary or descriptive state names like q_start, q_accept.

#### 1.1.2 Converting the Turing machine into a Tabular Representation
Just like finite automata (FAs) and pushdown automata (PDAs), Turing Machines (TMs) do not need to be represented solely using diagrams. Instead, we can describe TMs using a summary table: a compact, structured representation that captures the transition behavior of the machine in a clear and standardized way. After we assign numeric labels to all states, we convert the Turing machine into a standard tabular format. Each row represents one transition rule with five columns:

* From: The current state (numeric)
* To: The next state (numeric)
* Read: The symbol being read from the tape
* Write: The symbol to write to the tape
* Move: The direction to move the tape head (L for Left or R for Right) 

**Example**

```mermaid
graph LR
    accTitle: A TM
    accDescr: a diagram representing a TM to be converted into a tabular representation
    q1((START))
    q2((q2))
    q3((HALT))
    
    q1 -->|a,a,R| q1
    q1 -->|Δ,Δ,R| q1
    q1 -->|b,a,R| q2
    q2 -->|b,b,L| q2
    q2 -->|Δ,b,L| q3
    
    style q1 fill:#90EE90,stroke:#333,stroke-width:3px
    style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
    style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
```

We start by assigning numeric codes to states as follows:

* State 1: Start state (originally labeled "START")
* State 2: Halt state (originally labeled "HALT")
*  State 3: Middle working state (originally labeled "q2")

So the machine becomes:

```mermaid
graph LR
    accTitle: A TM
    accDescr: a diagram representing a modifed TM with assigned numeric codes to states
    q1((1))
    q2((3))
    q3((2))
    
    q1 -->|a,a,R| q1
    q1 -->|Δ,Δ,R| q1
    q1 -->|b,a,R| q2
    q2 -->|b,b,L| q2
    q2 -->|Δ,b,L| q3
    
    style q1 fill:#90EE90,stroke:#333,stroke-width:3px
    style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
    style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
```

Transition Table: the tabular representation of the transtions:

| From | To | Read | Write | Move |
|------|----|------|-------|------|
| 1    | 1  | a    | a     | R    |
| 1    | 1  | Δ    | Δ     | R    |
| 1    | 3  | b    | a     | R    |
| 3    | 3  | b    | b     | L    |
| 3    | 2  | Δ    | b     | L    |

#### 1.1.3 Encode each transition using 'a' and 'b' representation
Each transition is defined as a 5-tuple: (CurrentState, ReadSymbol) → (NextState, WriteSymbol, MoveDirection), or in table format:

| From | To | Read | Write | Move |
|------|----|------|-------|------|
|$X_1$    | $X_2$  | $X_3$    | $X_4$     | $X_5$    |

* State encoding: States are encoded using their numeric position as: $a^nb$ (n repetitions of 'a' followed by 'b'):

| State | Encoding |
|-------|----------|
| 1     | ab       |
| 2     | aab      |
| 3     | aaab     |
| 4     | aaaab    |
| 5     | aaaaab   |

* Symbol Encoding Table: tape symbols are encoded using a fixed 2-character encoding:
 
| Symbol | Encoding |
|-------|----------|
| a     | aa       |
| b     | ab      |
| Δ (blank)     | ba     |
| # (special)     | bb    |

* Direction Encoding Table: movement directions are encoded using single-character encoding:

| Direction | Encoding |
|-------|----------|
| L (Left)     | a      |
| R (Right)    | b      |

**Example**

The table of transitions from the previous section can be encoded as follows:

| From | To | Read | Write | Move | Encoding |
|------|----|------|-------|------|------|
| 1    | 1  | a    | a     | R    | ababaaaab |
| 1    | 1  | Δ    | Δ     | R    | ababbabab |
| 1    | 3  | b    | a     | R    | abaaababaab |
| 3    | 3  | b    | b     | L    | aaabaaabababa |
| 3    | 2  | Δ    | b     | L    | aaabaabbaaba |

#### 1.1.4 Concatenate all individual row encodings into one unified string that encodes the entire Turing Machine
Simply concatenate all encoded rows without any separators: CompleteTMEncoding = EncodedRow₁ + EncodedRow₂ + ... + EncodedRowₙ. The encoded rows are concatenated in their lexicographic order to ensure a consistent and unambiguous representation of the Turing machine.

Important Note: 

* No special separators are needed between rows because:
    * Each state encoding ends with 'b', making it self-delimiting
    * Symbol encodings have fixed length (2 characters)
    * Direction encodings have fixed length (1 character)
* The ordering of rows does not affect the machine's behavior, however, it is important to standardize the order of the encoded rows when concatenating them, because:
    * Without a standard ordering, the same Turing machine could have multiple different encodings.
    * Unique Representation for Each Turing Machine is important to analyze Turing machines in a systematic way
    * A standard ordering simplifies Universal Turing Machine Implementation (We will explore this further later in the notebook)
* Lexicographic (dictionary) order means sorting strings as you would in a dictionary: character by character, using the alphabet’s order.

**Example**

* Let us concatenate all encoded rows in their lexicographic order:
    * ababaaaab
    * ababbabab
    * abaaababaab
    * aaabaaabababa
    * aaabaabbaaba
* The encodings starting with $aa$ come before those with $ab$, So, we reorder the encodings to list those starting with $aa$ before those starting with $ab$.
    * aaabaaabababa
    * aaabaabbaaba
    * ababaaaab
    * ababbabab
    * abaaababaab
* Between the two starting with $aa$:
    * $aaabaaabababa$ vs. $aaabaabbaaba$: compare their seventh character $a$ vs. $b$, so $aaabaaabababa$ comes first.
* Among those starting with $ab$:
    * $abaaababaab$, $ababaaaab$, $ababbabab$: compare their fourth character: $a$, $b$, $b$, so $abaaababaab$ comes before the others.
    * $ababaaaab$, $ababbabab$: compare their fifth character: $a$, $b$, so $ababaaaab$ comes before the other.
* Final sorted order:
    * aaabaaabababa
    * aaabaabbaaba
    * abaaababaab
    * ababaaaab
    * ababbabab 
* Simply join them without separators is the complete Turing machine encoding (visually separate the parts by color):
<span style="color: red;">aaabaaabababa</span>
<span style="color: green;">aaabaabbaaba</span>
<span style="color: blue;">abaaababaab</span>
<span style="color: purple;">ababaaaab</span>
<span style="color: orange;">ababbabab</span>

#### 1.1.5 Example Python Implementation

In [1]:
"""
Turing Machine Encoder
Encodes a Turing machine into a string representation using 'a' and 'b' symbols
"""

class TuringMachine:
    """Represents a Turing Machine with states, symbols, and transitions"""
    
    def __init__(self, states, alphabet, tape_alphabet, transitions, start_state, halt_states):
        """
        Initialize a Turing Machine
        
        Args:
            states: List of state names (will be converted to numbers)
            alphabet: Input alphabet
            tape_alphabet: Complete tape alphabet (includes blank symbol)
            transitions: Dict of (state, symbol) -> (new_state, write_symbol, direction)
            start_state: Name of the start state
            halt_states: List of halt state names (can be single state or list)
        """
        self.states = states
        self.alphabet = alphabet
        self.tape_alphabet = tape_alphabet
        self.transitions = transitions
        self.start_state = start_state
        self.halt_states = halt_states if isinstance(halt_states, list) else [halt_states]
        
        # Create state mapping following convention
        self.state_mapping = self._create_state_mapping()
        
    def _create_state_mapping(self):
        """
        Create numeric mapping for states following convention:
        1 = start state
        2 = halt state(s)
        3+ = other states
        """
        mapping = {}
        
        # Start state is always 1
        mapping[self.start_state] = 1
        
        # Halt states are all mapped to 2
        for halt_state in self.halt_states:
            mapping[halt_state] = 2
        
        # Other states get 3, 4, 5, ...
        state_num = 3
        for state in self.states:
            if state not in mapping:
                mapping[state] = state_num
                state_num += 1
        
        return mapping
    
    def display_info(self):
        """Display information about the Turing Machine"""
        print("Turing Machine Configuration:")
        print(f"  States: {self.states}")
        print(f"  Start state: {self.start_state}")
        print(f"  Halt state(s): {self.halt_states}")
        print(f"  Input alphabet: {self.alphabet}")
        print(f"  Tape alphabet: {self.tape_alphabet}")
        print(f"\nState Mapping:")
        for state, num in sorted(self.state_mapping.items(), key=lambda x: x[1]):
            role = ""
            if num == 1:
                role = " (START)"
            elif num == 2:
                role = " (HALT)"
            print(f"  {state} -> {num}{role}")
        print(f"\nNumber of transitions: {len(self.transitions)}")


class TMEncoder:
    """Encodes Turing Machines into 'a' and 'b' string representation"""
    
    def __init__(self):
        # Symbol encoding table
        self.symbol_codes = {
            'a': 'aa',
            'b': 'ab',
            'Δ': 'ba',  # Blank symbol
            '_': 'ba',  # Alternative blank notation
            '#': 'bb',  # Special symbol
            '0': 'aaa', # For machines with larger alphabets
            '1': 'aab',
            '2': 'aba',
            '3': 'abb',
            '4': 'baa',
            '5': 'bab',
            '6': 'bba',
            '7': 'bbb'
        }
        
        # Direction encoding
        self.direction_codes = {
            'L': 'a',
            'R': 'b',
            'S': 'ab'  # Stay (if used)
        }
    
    def encode_state(self, state_num):
        """Encode a state number as a^n b"""
        if not isinstance(state_num, int) or state_num < 1:
            raise ValueError(f"State must be a positive integer, got {state_num}")
        return 'a' * state_num + 'b'
    
    def encode_symbol(self, symbol):
        """Encode a tape symbol"""
        if symbol not in self.symbol_codes:
            raise ValueError(f"Unknown symbol: {symbol}")
        return self.symbol_codes[symbol]
    
    def encode_direction(self, direction):
        """Encode a movement direction"""
        if direction not in self.direction_codes:
            raise ValueError(f"Unknown direction: {direction}")
        return self.direction_codes[direction]
    
    def encode_transition(self, from_state, to_state, read_sym, write_sym, direction):
        """Encode a single transition"""
        encoded = (
            self.encode_state(from_state) +
            self.encode_state(to_state) +
            self.encode_symbol(read_sym) +
            self.encode_symbol(write_sym) +
            self.encode_direction(direction)
        )
        return encoded
    
    def encode_tm(self, tm, use_lexicographic=True):
        """
        Encode a complete Turing Machine
        
        Args:
            tm: TuringMachine object
            use_lexicographic: If True, sort transitions lexicographically
        
        Returns:
            Encoded string representation
        """
        # Convert transitions to use numeric states
        numeric_transitions = []
        
        for (state, symbol), (next_state, write_symbol, direction) in tm.transitions.items():
            from_num = tm.state_mapping[state]
            to_num = tm.state_mapping[next_state]
            
            encoded_trans = self.encode_transition(
                from_num, to_num, symbol, write_symbol, direction
            )
            
            numeric_transitions.append({
                'from': from_num,
                'to': to_num,
                'read': symbol,
                'write': write_symbol,
                'move': direction,
                'encoded': encoded_trans
            })
        
        # Sort lexicographically if requested
        if use_lexicographic:
            numeric_transitions.sort(key=lambda x: x['encoded'])
        
        # Concatenate all encodings
        encoded_tm = ''.join(trans['encoded'] for trans in numeric_transitions)
        
        return encoded_tm, numeric_transitions
    
    def display_encoding_details(self, tm, encoded_tm, transitions):
        """Display detailed encoding information"""
        print("\nEncoding Details:")
        print("-" * 100)
        print(f"{'From':<10} {'To':<10} {'Read':<10} {'Write':<10} {'Move':<10} {'Encoding':<30}")
        print("-" * 100)
        
        for trans in transitions:
            # Find original state names
            from_name = [s for s, n in tm.state_mapping.items() if n == trans['from']][0]
            to_name = [s for s, n in tm.state_mapping.items() if n == trans['to']][0]
            
            print(f"{from_name:<10} {to_name:<10} {trans['read']:<10} "
                  f"{trans['write']:<10} {trans['move']:<10} {trans['encoded']:<30}")
        
        print("-" * 100)
        print(f"\nComplete Encoding ({len(encoded_tm)} characters):")
        print(encoded_tm)
        
        # Show in colors for readability
        # TODO


# Example usage functions
def example_simple_tm():
    """Example: Simple TM that converts 'a' to 'b'"""
    print("=" * 100)
    print("Example 1: Simple TM that converts 'a' to 'b'")
    print("=" * 100)
    
    states = ['q_start', 'q_scan', 'q_halt']
    alphabet = ['a', 'b']
    tape_alphabet = ['a', 'b', 'Δ']
    transitions = {
        ('q_start', 'a'): ('q_scan', 'b', 'R'),
        ('q_start', 'b'): ('q_start', 'b', 'R'),
        ('q_start', 'Δ'): ('q_halt', 'Δ', 'S'),
        ('q_scan', 'a'): ('q_scan', 'b', 'R'),
        ('q_scan', 'b'): ('q_scan', 'b', 'R'),
        ('q_scan', 'Δ'): ('q_halt', 'Δ', 'S')
    }
    
    tm = TuringMachine(states, alphabet, tape_alphabet, transitions, 'q_start', 'q_halt')
    tm.display_info()
    
    encoder = TMEncoder()
    encoded, trans_list = encoder.encode_tm(tm)
    encoder.display_encoding_details(tm, encoded, trans_list)
    
    return tm, encoded


def example_from_table():
    """Example: TM from the provided table"""
    print("\n" + "=" * 100)
    print("Example 2: TM from the provided transition table")
    print("=" * 100)
    
    states = ['1', '2', '3']
    alphabet = ['a', 'b']
    tape_alphabet = ['a', 'b', 'Δ']
    transitions = {
        ('1', 'a'): ('1', 'a', 'R'),
        ('1', 'Δ'): ('1', 'Δ', 'R'),
        ('1', 'b'): ('3', 'a', 'R'),
        ('3', 'b'): ('3', 'b', 'L'),
        ('3', 'Δ'): ('2', 'b', 'L')
    }
    
    tm = TuringMachine(states, alphabet, tape_alphabet, transitions, '1', '2')
    tm.display_info()
    
    encoder = TMEncoder()
    encoded, trans_list = encoder.encode_tm(tm)
    encoder.display_encoding_details(tm, encoded, trans_list)
    
    return tm, encoded


def example_binary_increment():
    """Example: Binary increment TM"""
    print("\n" + "=" * 100)
    print("Example 3: Binary Increment TM")
    print("=" * 100)
    
    states = ['q0', 'q1', 'q2', 'q3', 'qaccept']
    alphabet = ['0', '1']
    tape_alphabet = ['0', '1', 'Δ']
    transitions = {
        # Move to rightmost digit
        ('q0', '0'): ('q0', '0', 'R'),
        ('q0', '1'): ('q0', '1', 'R'),
        ('q0', 'Δ'): ('q1', 'Δ', 'L'),
        
        # Add 1 with carry
        ('q1', '0'): ('q2', '1', 'L'),
        ('q1', '1'): ('q1', '0', 'L'),
        ('q1', 'Δ'): ('q3', '1', 'R'),
        
        # Move back to start
        ('q2', '0'): ('q2', '0', 'L'),
        ('q2', '1'): ('q2', '1', 'L'),
        ('q2', 'Δ'): ('qaccept', 'Δ', 'R'),
        
        # Handle overflow
        ('q3', '0'): ('q3', '0', 'R'),
        ('q3', '1'): ('q3', '1', 'R'),
        ('q3', 'Δ'): ('qaccept', 'Δ', 'S')
    }
    
    tm = TuringMachine(states, alphabet, tape_alphabet, transitions, 'q0', 'qaccept')
    tm.display_info()
    
    encoder = TMEncoder()
    encoded, trans_list = encoder.encode_tm(tm)
    encoder.display_encoding_details(tm, encoded, trans_list)
    
    return tm, encoded


def verify_encoding(encoded_string):
    """Verify that an encoded string contains only 'a' and 'b'"""
    if not all(c in 'ab' for c in encoded_string):
        return False, "String contains characters other than 'a' and 'b'"
    
    # Check for basic structure
    if len(encoded_string) == 0:
        return False, "Empty encoding"
    
    # Could add more validation here
    return True, "Valid encoding"


# Main execution
if __name__ == "__main__":
    # Run examples
    tm1, enc1 = example_simple_tm()
    tm2, enc2 = example_from_table()
    tm3, enc3 = example_binary_increment()
    
    # Verify encodings
    print("\n" + "=" * 100)
    print("Encoding Verification")
    print("=" * 100)
    
    for i, (tm, enc) in enumerate([(tm1, enc1), (tm2, enc2), (tm3, enc3)], 1):
        valid, msg = verify_encoding(enc)
        print(f"Example {i}: {msg}")
        print(f"  Length: {len(enc)} characters")
        print(f"  'a' count: {enc.count('a')}")
        print(f"  'b' count: {enc.count('b')}")
        print(f"  Ratio a:b = {enc.count('a')/enc.count('b'):.2f}:1")
    
    # Interactive mode
    print("\n" + "=" * 100)
    print("You can now create your own TM and encode it!")
    print("Modify the code above to define your own transitions.")

Example 1: Simple TM that converts 'a' to 'b'
Turing Machine Configuration:
  States: ['q_start', 'q_scan', 'q_halt']
  Start state: q_start
  Halt state(s): ['q_halt']
  Input alphabet: ['a', 'b']
  Tape alphabet: ['a', 'b', 'Δ']

State Mapping:
  q_start -> 1 (START)
  q_halt -> 2 (HALT)
  q_scan -> 3

Number of transitions: 6

Encoding Details:
----------------------------------------------------------------------------------------------------
From       To         Read       Write      Move       Encoding                      
----------------------------------------------------------------------------------------------------
q_scan     q_scan     a          b          R          aaabaaabaaabb                 
q_scan     q_scan     b          b          R          aaabaaabababb                 
q_scan     q_halt     Δ          Δ          S          aaabaabbabaab                 
q_start    q_scan     a          b          R          abaaabaaabb                   
q_start    q_halt 

## 2. The Decoding of Turing Machines
Decoding is the reverse process of encoding and we take an encoded string and reconstruct the original Turing machine. The decoding process follows these key principles:

* Each transition has exactly 5 components (From, To, Read, Write, Move)
* State numbers are encoded as repeated $a$'s followed by a $b$
* Symbols use fixed 2-character codes (aa, ab, ba, bb)
* Directions use single-character codes (a for L, b for R)
* No explicit separators exist between transitions

### 2.1 Step-by-Step Algorithm
* Step 1: Parsing individual transitions from the encoded string
    * Decode the "From" state number by counting the number of repeated $a$ symbols in the encoding.
    * Skip the next $b$ since it only actss as a separator
    * Decode the "To" state number by counting the number of repeated $a$ symbols in the encoding.
    * Skip the next $b$ since it only actss as a separator
    * Decoding the "Read" symbol by reading the next two characters and decoding it using the Symbol encoding table in reverse
    * Decoding the "Write" symbol by reading the next two characters and decoding it using the Symbol encoding table in reverse
    * Decode the direction by reading the next character and using the direction table in reverse.
    * Fill one row of the transition table with all the decoded information to form a complete transition.
Step 2: Repeat Step 1 if there are still unprocessed characters in the encoding.

### 2.2 Examples
Given an encoding string $ababababbabaaabaaabbaaabaaabababaaaabaabbaaba$, let's decode this string by parsing each component of the transitions.

* Transition 1, starting at position 0:
    * From State: ab → 1 'a' + 'b' = State 1
    * To State: ab → 1 'a' + 'b' = State 1
    * Read Symbol: ab → 'b'
    * Write Symbol: ab → 'b'
    * Direction: b → R (right)
    * Transition 1: (1, b) → (1, b, R)
    * Position after: 9
* Transition 2: starting at position 9:
    * From State: ab → 1 'a' + 'b' = State 1
    * To State: aaab → 3 'a's + 'b' = State 3
    * Read Symbol: aa → 'a'
    * Write Symbol: ab → 'b'
    * Direction: b → R (right)
    * Transition 2: (1, a) → (3, b, R)
    * Position after: 19
* Transition 3: starting at position 19:
    * From State: aaab → 3 'a's + 'b' = State 3
    * To State: aaab → 3 'a's + 'b' = State 3
    * Read Symbol: ab → 'b'
    * Write Symbol: ab → 'b'
    * Direction: a → L (left)
    * Transition 3: (3, b) → (3, b, L)
    * Position after: 30
* Transition 4: starting at position 30:
    * From State: aaab → 3 'a's + 'b' = State 3
    * To State: aab → 2 'a's + 'b' = State 2
    * Read Symbol: ba → 'Δ' (blank)
    * Write Symbol: ab → 'b'
    * Direction: a → L (left)
    * Transition 4: (3, Δ) → (2, b, L)
    * Position after: 42 (end of string)

Summary of Decoded Transitions:

| Transition | From | Read | To | Write | Move |
|------------|------|------|-----|-------|------|
| 1          | 1    | b    | 1   | b     | R    |
| 2          | 1    | a    | 3   | b     | R    |
| 3          | 3    | b    | 3   | b     | L    |
| 4          | 3    | Δ    | 2   | b     | L    |

### 2.3 Example Python Implementation

In [2]:
class TuringMachineDecoder:
    def __init__(self):
        # Define symbol decoding table
        self.symbol_decode = {
            'aa': 'a',
            'ab': 'b',
            'ba': 'Δ',
            'bb': '#'
        }
        
        # Define direction decoding
        self.direction_decode = {
            'a': 'L',
            'b': 'R'
        }
        
        self.debug = True  # Enable detailed output
    
    def decode_state(self, encoding, start_pos):
        """Decode a state number starting at given position
        Returns: (state_number, new_position)"""
        
        pos = start_pos
        a_count = 0
        
        # Count 'a's until we hit 'b'
        while pos < len(encoding) and encoding[pos] == 'a':
            a_count += 1
            pos += 1
        
        # Must have found a 'b' to terminate the state
        if pos < len(encoding) and encoding[pos] == 'b':
            pos += 1  # Skip the 'b'
            return a_count, pos
        else:
            raise ValueError(f"Invalid state encoding at position {start_pos}")
    
    def decode_symbol(self, encoding, start_pos):
        """Decode a symbol (2 characters) starting at given position
        Returns: (symbol, new_position)"""
        
        if start_pos + 2 > len(encoding):
            raise ValueError(f"Not enough characters for symbol at position {start_pos}")
        
        code = encoding[start_pos:start_pos + 2]
        if code not in self.symbol_decode:
            raise ValueError(f"Invalid symbol code '{code}' at position {start_pos}")
        
        return self.symbol_decode[code], start_pos + 2
    
    def decode_direction(self, encoding, start_pos):
        """Decode a direction (1 character) starting at given position
        Returns: (direction, new_position)"""
        
        if start_pos >= len(encoding):
            raise ValueError(f"No character for direction at position {start_pos}")
        
        code = encoding[start_pos]
        if code not in self.direction_decode:
            raise ValueError(f"Invalid direction code '{code}' at position {start_pos}")
        
        return self.direction_decode[code], start_pos + 1
    
    def decode_single_transition(self, encoding, start_pos):
        """Decode one complete transition starting at given position
        Returns: (transition_dict, new_position)"""
        
        if self.debug:
            print(f"\nDecoding transition starting at position {start_pos}")
            print(f"Substring: {encoding[start_pos:start_pos+20]}...")
        
        pos = start_pos
        
        # Decode From State
        from_state, pos = self.decode_state(encoding, pos)
        if self.debug:
            print(f"  From State: {from_state} (pos now {pos})")
        
        # Decode To State
        to_state, pos = self.decode_state(encoding, pos)
        if self.debug:
            print(f"  To State: {to_state} (pos now {pos})")
        
        # Decode Read Symbol
        read_symbol, pos = self.decode_symbol(encoding, pos)
        if self.debug:
            print(f"  Read Symbol: '{read_symbol}' (pos now {pos})")
        
        # Decode Write Symbol
        write_symbol, pos = self.decode_symbol(encoding, pos)
        if self.debug:
            print(f"  Write Symbol: '{write_symbol}' (pos now {pos})")
        
        # Decode Direction
        direction, pos = self.decode_direction(encoding, pos)
        if self.debug:
            print(f"  Direction: {direction} (pos now {pos})")
        
        transition = {
            'from': from_state,
            'to': to_state,
            'read': read_symbol,
            'write': write_symbol,
            'move': direction
        }
        
        return transition, pos
    
    def decode_complete_machine(self, encoding):
        """Decode a complete Turing machine from encoded string
        Assumes no separators between transitions
        Handles single transition case as well as multiple transitions"""
        
        transitions = []
        pos = 0
        
        print(f"\nDecoding Turing Machine")
        print(f"Total encoding length: {len(encoding)} characters")
        print("=" * 60)

        # Handle empty encoding
        if len(encoding) == 0:
            print("Empty encoding - no transitions")
            return transitions

        # Check if we might have a single transition
        # A valid transition needs at least 9 characters (minimum case: ababaabab)
        if len(encoding) < 9:
            print(f"Encoding too short for a valid transition (need at least 9 chars, got {len(encoding)})")
            return transitions            
        
        while pos < len(encoding):
            try:
                transition, new_pos = self.decode_single_transition(encoding, pos)
                transitions.append(transition)
                
                print(f"\nTransition {len(transitions)}: "
                      f"({transition['from']}, '{transition['read']}') → "
                      f"({transition['to']}, '{transition['write']}', {transition['move']})")
                
                pos = new_pos

                # Check if we've decoded the entire string
                if pos == len(encoding):
                    if len(transitions) == 1:
                        print("\nSuccessfully decoded single transition")
                    else:
                        print(f"\nSuccessfully decoded all {len(transitions)} transitions")
                    break                
                
            except ValueError as e:
                # For single transition, this might be expected at the end
                if len(transitions) == 0:
                    print(f"\nError decoding first transition at position {pos}: {e}")
                    print(f"Remaining string: {encoding[pos:]}")
                else:
                    print(f"\nStopped after {len(transitions)} transition(s)")
                    print(f"Remaining characters at position {pos}: {encoding[pos:]}")
                    print(f"This may be normal if the encoding contains exactly {len(transitions)} transition(s)")
                break

        # Validate what we decoded
        if len(transitions) == 0:
            print("\nWarning: No valid transitions decoded")
        elif len(transitions) == 1:
            print("\nDecoded TM contains a single transition")
        else:
            print(f"\nDecoded TM contains {len(transitions)} transitions")
        
        return transitions

# Test the decoder with examples
decoder = TuringMachineDecoder()

In [3]:
# Example: Decode a single transition
single_transition = "abaaabaaaba"  # (1,a) → (3,a,L)

print("\nEXAMPLE 1: Decoding a single transition")
print(f"Encoded string: {single_transition}")
print("\nStep-by-step breakdown:")
print("- 'ab': State 1 (one 'a' followed by 'b')")
print("- 'aaab': State 3 (three 'a's followed by 'b')")
print("- 'aa': Symbol 'a'")
print("- 'ab': Symbol 'b'")
print("- 'a': Direction 'L'")

transition, _ = decoder.decode_single_transition(single_transition, 0)
print(f"\nDecoded transition: {transition}")


single_transition = "ababbabbb"
transition = decoder.decode_complete_machine(single_transition)
print(f"\nDecoded transition: {transition}")




EXAMPLE 1: Decoding a single transition
Encoded string: abaaabaaaba

Step-by-step breakdown:
- 'ab': State 1 (one 'a' followed by 'b')
- 'aaab': State 3 (three 'a's followed by 'b')
- 'aa': Symbol 'a'
- 'ab': Symbol 'b'
- 'a': Direction 'L'

Decoding transition starting at position 0
Substring: abaaabaaaba...
  From State: 1 (pos now 2)
  To State: 3 (pos now 6)
  Read Symbol: 'a' (pos now 8)
  Write Symbol: 'b' (pos now 10)
  Direction: L (pos now 11)

Decoded transition: {'from': 1, 'to': 3, 'read': 'a', 'write': 'b', 'move': 'L'}

Decoding Turing Machine
Total encoding length: 9 characters

Decoding transition starting at position 0
Substring: ababbabbb...
  From State: 1 (pos now 2)
  To State: 1 (pos now 4)
  Read Symbol: 'Δ' (pos now 6)
  Write Symbol: '#' (pos now 8)
  Direction: R (pos now 9)

Transition 1: (1, 'Δ') → (1, '#', R)

Successfully decoded single transition

Decoded TM contains a single transition

Decoded transition: [{'from': 1, 'to': 1, 'read': 'Δ', 'write': '#'

In [4]:
# Create a machine with multiple transitions
def create_example_encoding():
    """Create an encoded string with multiple transitions"""
    transitions = [
        # (1,b) → (1,b,R)
        "ab" + "ab" + "ab" + "ab" + "b",
        # (1,a) → (3,b,R)
        "ab" + "aaab" + "aa" + "ab" + "b",
        # (3,b) → (3,b,L)
        "aaab" + "aaab" + "ab" + "ab" + "a",
        # (3,Δ) → (2,b,L)
        "aaab" + "aab" + "ba" + "ab" + "a"
    ]
    return ''.join(transitions)

multiple_transitions = create_example_encoding()
print(f"\nEXAMPLE 2: Multiple transitions without separators")
print(f"Encoded string: {multiple_transitions}")
print(f"Length: {len(multiple_transitions)} characters")

# Decode with detailed output
decoder.debug = False  # Turn off debug for cleaner output
decoded_transitions = decoder.decode_complete_machine(multiple_transitions)

print("\nSummary of decoded transitions:")
for i, t in enumerate(decoded_transitions, 1):
    print(f"{i}. State {t['from']} + '{t['read']}' → State {t['to']} + '{t['write']}' + {t['move']}")


EXAMPLE 2: Multiple transitions without separators
Encoded string: ababababbabaaabaaabbaaabaaabababaaaabaabbaaba
Length: 45 characters

Decoding Turing Machine
Total encoding length: 45 characters

Transition 1: (1, 'b') → (1, 'b', R)

Transition 2: (1, 'a') → (3, 'b', R)

Transition 3: (3, 'b') → (3, 'b', L)

Transition 4: (3, 'Δ') → (2, 'b', L)

Successfully decoded all 4 transitions

Decoded TM contains 4 transitions

Summary of decoded transitions:
1. State 1 + 'b' → State 1 + 'b' + R
2. State 1 + 'a' → State 3 + 'b' + R
3. State 3 + 'b' → State 3 + 'b' + L
4. State 3 + 'Δ' → State 2 + 'b' + L


In [5]:
# Handling Decoding Errors: This program can be hidden and reserved for use as homework projects.
def demonstrate_error_handling():
    """Show how to handle various decoding errors"""
    
    print("\nERROR HANDLING IN DECODING")
    print("=" * 50)
    
    # Error case 1: Incomplete state encoding
    try:
        print("\n1. Incomplete state encoding:")
        bad_encoding = "aaa"  # Missing 'b' terminator
        decoder.decode_state(bad_encoding, 0)
    except ValueError as e:
        print(f"   Error: {e}")
    
    # Error case 2: Invalid symbol code
    try:
        print("\n2. Invalid symbol code:")
        bad_encoding = "abac"  # 'ac' is not a valid symbol code
        decoder.decode_single_transition(bad_encoding, 0)
    except ValueError as e:
        print(f"   Error: {e}")
    
    # Error case 3: Truncated encoding
    try:
        print("\n3. Truncated encoding:")
        bad_encoding = "ababaa"  # Incomplete transition
        decoder.decode_complete_machine(bad_encoding)
    except ValueError as e:
        print(f"   Error: {e}")

demonstrate_error_handling()


ERROR HANDLING IN DECODING

1. Incomplete state encoding:
   Error: Invalid state encoding at position 0

2. Invalid symbol code:
   Error: Invalid state encoding at position 2

3. Truncated encoding:

Decoding Turing Machine
Total encoding length: 6 characters
Encoding too short for a valid transition (need at least 9 chars, got 6)


In [6]:
# Validation and Verification: This program can be hidden and reserved for use as homework projects.
def validate_decoded_machine(transitions):
    """Validate that decoded transitions form a valid Turing machine"""
    
    print("\nVALIDATING DECODED TURING MACHINE")
    print("=" * 50)
    
    # Extract all states and symbols
    states = set()
    symbols = set()
    
    for t in transitions:
        states.add(t['from'])
        states.add(t['to'])
        symbols.add(t['read'])
        symbols.add(t['write'])
    
    print(f"\nFound {len(states)} states: {sorted(states)}")
    print(f"Found {len(symbols)} symbols: {sorted(symbols)}")
    
    # Check for required states
    has_start = 1 in states
    has_halt = 2 in states
    
    print(f"\n✓ Has start state (1): {has_start}")
    print(f"✓ Has halt state (2): {has_halt}")
    
    # Check for duplicate transitions
    transition_keys = [(t['from'], t['read']) for t in transitions]
    duplicates = len(transition_keys) != len(set(transition_keys))
    
    print(f"✓ No duplicate transitions: {not duplicates}")
    
    # Check completeness (optional)
    print("\nTransition coverage:")
    for state in sorted(states):
        if state != 2:  # Skip halt state
            for symbol in sorted(symbols):
                if symbol != '#':  # Skip special symbols
                    exists = any(t['from'] == state and t['read'] == symbol 
                               for t in transitions)
                    status = "✓" if exists else "✗"
                    print(f"  {status} State {state} + '{symbol}'")
    
    return has_start and has_halt and not duplicates

# Validate our decoded machine
is_valid = validate_decoded_machine(decoded_transitions)
print(f"\nMachine is valid: {is_valid}")


VALIDATING DECODED TURING MACHINE

Found 3 states: [1, 2, 3]
Found 3 symbols: ['a', 'b', 'Δ']

✓ Has start state (1): True
✓ Has halt state (2): True
✓ No duplicate transitions: True

Transition coverage:
  ✓ State 1 + 'a'
  ✓ State 1 + 'b'
  ✗ State 1 + 'Δ'
  ✗ State 3 + 'a'
  ✓ State 3 + 'b'
  ✓ State 3 + 'Δ'

Machine is valid: True


In [7]:
def reconstruct_turing_machine(encoded_string):
    """Completely reconstruct a Turing machine from its encoding"""
    
    print("\nCOMPLETE TURING MACHINE RECONSTRUCTION")
    print("=" * 60)
    
    # Step 1: Decode transitions
    decoder = TuringMachineDecoder()
    decoder.debug = False
    transitions = decoder.decode_complete_machine(encoded_string)
    
    # Step 2: Extract states and symbols
    states = sorted(set(t['from'] for t in transitions) | 
                   set(t['to'] for t in transitions))
    symbols = sorted(set(t['read'] for t in transitions) | 
                    set(t['write'] for t in transitions))
    
    # Step 3: Build transition dictionary
    transition_dict = {}
    for t in transitions:
        key = (t['from'], t['read'])
        value = (t['to'], t['write'], t['move'])
        transition_dict[key] = value
    
    # Step 4: Create machine specification
    tm_spec = {
        'states': states,
        'alphabet': [s for s in symbols if s not in ['Δ', '#']],
        'tape_alphabet': symbols,
        'transitions': transition_dict,
        'start_state': 1,
        'halt_state': 2
    }
    
    # Display the reconstructed machine
    print("\nReconstructed Turing Machine:")
    print(f"States: {tm_spec['states']}")
    print(f"Input alphabet: {tm_spec['alphabet']}")
    print(f"Tape alphabet: {tm_spec['tape_alphabet']}")
    print(f"Start state: {tm_spec['start_state']}")
    print(f"Halt state: {tm_spec['halt_state']}")
    
    print("\nTransition Table:")
    print("-" * 60)
    print(f"{'From':<8} {'Read':<8} {'To':<8} {'Write':<8} {'Move':<8}")
    print("-" * 60)
    for (from_s, read_s), (to_s, write_s, move_d) in sorted(transition_dict.items()):
        print(f"{from_s:<8} {read_s:<8} {to_s:<8} {write_s:<8} {move_d:<8}")
    
    return tm_spec

# Test with our example encoding
reconstructed = reconstruct_turing_machine(multiple_transitions)


COMPLETE TURING MACHINE RECONSTRUCTION

Decoding Turing Machine
Total encoding length: 45 characters

Transition 1: (1, 'b') → (1, 'b', R)

Transition 2: (1, 'a') → (3, 'b', R)

Transition 3: (3, 'b') → (3, 'b', L)

Transition 4: (3, 'Δ') → (2, 'b', L)

Successfully decoded all 4 transitions

Decoded TM contains 4 transitions

Reconstructed Turing Machine:
States: [1, 2, 3]
Input alphabet: ['a', 'b']
Tape alphabet: ['a', 'b', 'Δ']
Start state: 1
Halt state: 2

Transition Table:
------------------------------------------------------------
From     Read     To       Write    Move    
------------------------------------------------------------
1        a        3        b        R       
1        b        1        b        R       
3        b        3        b        L       
3        Δ        2        b        L       


## 3. Language ALAN and MATHISON

### 3.1 The Code Word Language (CWL)
From the previous encoding section, you may notice that the encoding of a transition follows a specific pattern, which can be represented by a regular expression. The Code Word Language (CWL) is a regular language that describes the valid structure of encoded Turing machine transitions. It provides a regular expression pattern that all valid Turing machine encodings must follow. CWL is defined by the regular expression: 

CWL = $(a^+ba^+b(a+b)^5)^*$, where $a^+$ stands for one or more $a$'s.

In a valid CWL word, $a^+ba^+b$ encodes the "From" state and "To" state, while $(a+b)^5$ encodes the "Read", "Write", and "Direction" information.

**Important Note**: The CWL defines the pattern for valid Turing machine encodings. This means that any valid Turing machine encoding must follow the CWL pattern. However, not every word in the CWL is necessarily a valid Turing machine encoding, as some CWL words may fit the pattern but fail to represent an actual, meaningful Turing machine. As an example, the string $aababaabbb$ is a valid CWL word because it follows the required pattern. However, when decoded, it describes a transition from state 2 to state 1, which is invalid in a Turing machine because state 2 is a HALT state, and HALT states cannot have outgoing transitions. So, while the encoding fits the CWL pattern, it does not represent a valid Turing machine transition.

More examples of CWL words that are valid in form but invalid as Turing machine encodings:

* ❌ A CWL word that fits the pattern but refers to a transition for a non-existent state
* ❌ A CWL word that uses a tape symbol not in the machine’s alphabet
* ❌ A CWL word missing key components (e.g., no start state or halt state)

In short, CWL sets the “shape” of valid encodings, but additional checks are needed to ensure the content represents a functioning Turing machine.

### 3.2 Example Python Implementation


In [8]:
def analyze_cwl_codeword():
    """Analyze the structure of a single CWL code word"""
    print("\nANATOMY OF A CWL CODE WORD")
    print("=" * 60)
    
    # Example transition: (1,a) → (3,b,R)
    codeword = "abaaababbabb"
    
    print(f"\nExample code word: {codeword}")
    print("\nDetailed breakdown:")
    
    # Parse the components
    pos = 0
    
    # First state
    state1_start = pos
    while codeword[pos] == 'a':
        pos += 1
    pos += 1  # Skip 'b'
    state1 = codeword[state1_start:pos]
    print(f"  Position {state1_start:2d}-{pos-1:2d}: '{state1}' = State {state1.count('a')}")
    
    # Second state
    state2_start = pos
    while codeword[pos] == 'a':
        pos += 1
    pos += 1  # Skip 'b'
    state2 = codeword[state2_start:pos]
    print(f"  Position {state2_start:2d}-{pos-1:2d}: '{state2}' = State {state2.count('a')}")
    
    # Five-character block
    five_chars = codeword[pos:pos+5]
    print(f"  Position {pos:2d}-{pos+4:2d}: '{five_chars}' = Symbol/Direction encoding")
    
    # Interpret the five characters
    print("\n  Five-character block interpretation:")
    print(f"    Characters 1-2: '{five_chars[0:2]}' = Read symbol")
    print(f"    Characters 3-4: '{five_chars[2:4]}' = Write symbol")
    print(f"    Character 5:    '{five_chars[4]}'  = Direction")
    
    return codeword

example_codeword = analyze_cwl_codeword()


ANATOMY OF A CWL CODE WORD

Example code word: abaaababbabb

Detailed breakdown:
  Position  0- 1: 'ab' = State 1
  Position  2- 5: 'aaab' = State 3
  Position  6-10: 'abbab' = Symbol/Direction encoding

  Five-character block interpretation:
    Characters 1-2: 'ab' = Read symbol
    Characters 3-4: 'ba' = Write symbol
    Character 5:    'b'  = Direction


In [9]:
import re

class CWLValidator:
    def __init__(self):
        # Build the regex pattern for CWL
        # a+ means one or more 'a's
        # (a+b) means either 'a' or 'b'
        # {5} means exactly 5 occurrences
        self.pattern = r'^(a+ba+b[ab]{5})*'
        self.regex = re.compile(self.pattern)

    def is_valid_cwl(self, cwl_string):
        """
        Checks if the provided string is a valid CWL word.
        Returns True if it matches the pattern, False otherwise.
        """
        return bool(self.regex.fullmatch(cwl_string))        

### 3.3 Language ALAN
Since any Turing Machine can be encoded as a string, we can use this string as input to another Turing Machine, or even feed it back into the original machine using its own encoding. This leads to the fascinating idea that TMs can reason about other TMs (or even themselves).

#### 3.3.1 Definition of Language ALAN
We define the language **ALAN**:

**ALAN** = { all words in **CWL** that are either:

* Not accepted by the Turing Machines they represent, or
* Do not represent any valid Turing Machine at all }

We introduce ALAN to formalize a particular class of decision problems. It includes all strings that either fail to encode a valid TM or, if they do encode a TM, are not accepted by the machine they describe. In other words, ALAN captures invalid encodings and self-rejecting behaviors.

#### 3.3.2 Examples of Language ALAN
consider the following three TM encodings:

* $aababaaaaa$: It does not represent a valid TM because it encodes an invalid transition leaving a HALT state. This string is in language ALAN.
* $aaabaabaaaaa$: It does not represent a valid TM because it lacks any transitions involving the START state. This string is in language ALAN.
* $abaabababb$: It represents a valid TM, and after decoding, the TM is shown below. When we feed its own encoding string as input, the machine rejects it. This string is in language ALAN.
 
```mermaid
graph LR
    accTitle: A TM
    accDescr: a diagram representing a TM decoded from a given encoding
    q1((START))
    q2((HALT))
    
    q1 -->|b,b,R| q2
    
    style q1 fill:#90EE90,stroke:#333,stroke-width:3px
    style q2 fill:#87CEEB,stroke:#333,stroke-width:2px
```

* $abaaabaaaabaaabaaabaaaabaaabaabababa$: It represents a valid TM, and after decoding, the TM is shown below. When we feed its own encoding string as input, the machine accepts it. This string is NOT in language ALAN.

```mermaid
graph LR
    accTitle: A TM
    accDescr: a diagram representing a TM decoded from a given encoding
    q1((START))
    q2((3))
    q3((HALT))
    
    q1 -->|a,a,R| q2
    q2 -->|a,a,R| q2
    q2 -->|b,b,L| q3
    
    style q1 fill:#90EE90,stroke:#333,stroke-width:3px
    style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
    style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
```

#### 3.3.3 ALAN is NOT Recursively Enumerable
We shall now prove that the language ALAN is **not recursively enumerable**. We will do this by contradiction.

* Step 1: Suppose, for the sake of contradiction, that ALAN is recursively enumerable. This means there exists a Turing Machine, call it $T$, such that $T$ accepts exactly all words in ALAN. Let $code(T)$ denote the encoding of the machine $T$ itself.
* Step 2: Ask the Critical Question: Is $code(T)$ a word in ALAN? There are exactly two possibilities:
    * Yes, $code(T) \in ALAN$.
    * No, $code(T) \notin ALAN$.

Let’s explore each.

* Step 3: Case 1 - Assume $code(T) \in ALAN$, by the definition of ALAN: 
    * Either $code(T)$ does not represent a valid TM, but we know it **does** (it’s the encoding of $T$).
    * Or, $T$ does **not** accept $code(T)$.
    * But wait! We assumed $T$ accepts all words in ALAN. So if $code(T) \in ALAN$, $T$ must accept it. This Contradicts the definition of ALAN, which says that if $code(T) \in ALAN$, then $T$ does not accept it.
* Step 4: Case 2 — Assume $code(T) \notin ALAN$, then by the definition of ALAN:
    * Either $code(T)$ is a valid encoding and $T$ accepts $code(T)$, or
    * $code(T)$ is not a valid encoding (which it is, by construction).
    * So we get: $code(T)$ is accepted by $T$.
    * But $T$ only accepts words in ALAN. Therefore, $code(T)$ must be a word in ALAN. Contradiction again.
* Step 5: Conclude the Contradiction. In both cases, we reach a contradiction:
    * If $code(T) \in ALAN$ → contradiction.
    * If $code(T) \notin ALAN$ → contradiction.
    * Therefore, our initial assumption that ALAN is recursively enumerable **must be false**.

We conclude that **ALAN is not recursively enumerable.** This proof mirrors the classic diagonalization arguments used in computability theory, showing that certain languages cannot be captured by any Turing Machine.

#### 3.3.4 Example Python Implementation

In [10]:
class ALANAnalyzer:
    def __init__(self):
        self.cwl_validator = CWLValidator()
        self.tm_decoder = TuringMachineDecoder()
        self.tm_decoder.debug = True
    
    def analyze_alan_membership(self, cwl_string):
        """Determine if a CWL string belongs to ALAN"""
        print(f"\nAnalyzing ALAN membership for: {cwl_string[:30]}...")
        print("=" * 60)
        
        # Step 1: Verify it's in CWL
        if not self.cwl_validator.is_valid_cwl(cwl_string):
            print("✗ Not in CWL, therefore not in ALAN")
            return False, "not_cwl"
        
        print("✓ String is in CWL")
        
        # Step 2: Try to decode as a TM
        try:
            transitions = self.tm_decoder.decode_complete_machine(cwl_string)
            print(f"✓ Successfully decoded {len(transitions)} transitions")
        except Exception as e:
            print(f"✓ String is in ALAN (Reason: Invalid TM encoding - {e})")
            return True, "invalid_tm"
        
        # Step 3: Check if it represents a valid TM
        validity_issues = self.check_tm_validity(transitions)
        if validity_issues:
            print(f"✓ String is in ALAN (Reason: {validity_issues[0]})")
            return True, validity_issues[0]
        
        print("✓ Represents a valid TM")
        
        # Step 4: Simulate the TM on its own encoding
        accepts_self = self.simulate_tm_on_self(transitions, cwl_string)
        
        if accepts_self:
            print("✗ TM accepts its own encoding - NOT in ALAN")
            return False, "accepts_self"
        else:
            print("✓ TM does not accept its own encoding - IS in ALAN")
            return True, "rejects_self"
    
    def check_tm_validity(self, transitions):
        """Check if transitions form a valid TM"""
        issues = []
        
        # Extract states
        states = set()
        for t in transitions:
            states.add(t['from'])
            states.add(t['to'])
        
        # Check for start state
        if 1 not in states:
            issues.append("missing_start_state")
        
        # Check for halt state
        if 2 not in states:
            issues.append("missing_halt_state")
        
        # Check for unreachable halt state
        if 2 in states and not any(t['to'] == 2 for t in transitions):
            issues.append("unreachable_halt_state")
        
        # Check for duplicate transitions
        transition_keys = [(t['from'], t['read']) for t in transitions]
        if len(transition_keys) != len(set(transition_keys)):
            issues.append("duplicate_transitions")
        
        return issues
    
    def simulate_tm_on_self(self, transitions, input_string):
        """Simulate the TM on its own encoding"""
        print("\nSimulating TM on its own encoding...")
        
        # Build transition table
        trans_dict = {}
        for t in transitions:
            trans_dict[(t['from'], t['read'])] = (t['to'], t['write'], t['move'])
        
        # Initialize simulation
        tape = list(input_string) + ['Δ'] * 1000
        head = 0
        state = 1  # Start state
        steps = 0
        max_steps = 10000
        
        while steps < max_steps:
            # Check if we've reached halt state
            if state == 2:
                print(f"  Reached halt state after {steps} steps")
                return True
            
            # Read current symbol
            current_symbol = tape[head] if head < len(tape) else 'Δ'
            
            # Find transition
            key = (state, current_symbol)
            if key not in trans_dict:
                print(f"  No transition for ({state}, '{current_symbol}') - rejecting")
                return False
            
            next_state, write_symbol, direction = trans_dict[key]
            
            # Execute transition
            tape[head] = write_symbol
            state = next_state
            
            if direction == 'L' and head > 0:
                head -= 1
            elif direction == 'R':
                head += 1
            
            steps += 1
        
        print(f"  Exceeded max steps ({max_steps}) - rejecting")
        return False

alan_analyzer = ALANAnalyzer()

In [11]:
def demonstrate_alan_examples():
    """Show various examples of strings in and not in ALAN"""
    print("\nEXAMPLES OF ALAN MEMBERSHIP")
    print("=" * 60)
    
    examples = [
        {
            'string': 'ababbabbb',
            'description': 'Simple TM with one transition: (1,b)→(1,b,R)',
            'expected': 'Likely in ALAN (no halt state reachable)'
        },
        {
            'string': 'abaabbaba',
            'description': 'TM that immediately halts: (1,a)→(2,a,L)',
            'expected': 'Check if it accepts strings starting with "a"'
        },
        {
            'string': 'aaababbabbb',
            'description': 'TM with start state 3 (invalid)',
            'expected': 'In ALAN (no state 1)'
        },
        {
            'string': 'abaaabaaabb' + 'aaaabaabbabba',
            'description': 'Two transitions: (1,a)→(3,a,R), (4,a)→(2,b,R)',
            'expected': 'In ALAN (disconnected states)'
        }
    ]
    
    for ex in examples:
        print(f"\nExample: {ex['description']}")
        print(f"String: {ex['string']}")
        print(f"Expected: {ex['expected']}")
        
        in_alan, reason = alan_analyzer.analyze_alan_membership(ex['string'])
        print(f"Result: {'IN ALAN' if in_alan else 'NOT IN ALAN'} (reason: {reason})")

demonstrate_alan_examples()


EXAMPLES OF ALAN MEMBERSHIP

Example: Simple TM with one transition: (1,b)→(1,b,R)
String: ababbabbb
Expected: Likely in ALAN (no halt state reachable)

Analyzing ALAN membership for: ababbabbb...
✓ String is in CWL

Decoding Turing Machine
Total encoding length: 9 characters

Decoding transition starting at position 0
Substring: ababbabbb...
  From State: 1 (pos now 2)
  To State: 1 (pos now 4)
  Read Symbol: 'Δ' (pos now 6)
  Write Symbol: '#' (pos now 8)
  Direction: R (pos now 9)

Transition 1: (1, 'Δ') → (1, '#', R)

Successfully decoded single transition

Decoded TM contains a single transition
✓ Successfully decoded 1 transitions
✓ String is in ALAN (Reason: missing_halt_state)
Result: IN ALAN (reason: missing_halt_state)

Example: TM that immediately halts: (1,a)→(2,a,L)
String: abaabbaba
Expected: Check if it accepts strings starting with "a"

Analyzing ALAN membership for: abaabbaba...
✗ Not in CWL, therefore not in ALAN
Result: NOT IN ALAN (reason: not_cwl)

Example: TM wit

### 3.4 Language MATHISON
Now, let’s consider the other side of the story: what if the CWL words are accepted by the Turing Machines they encode? If we collect all valid TM encodings that are accepted by their respective machines, what language do we obtain?

#### 3.4.1 Definition of Language MATHISON
We define the language MATHISON:

MATHISON = all words in CWL that:

* Are accepted by the Turing Machines they represent, **and**
* Do represent valid Turing Machines }

In other words: the set of all encoded Turing Machines $T$ such that $T$, when given its own encoding $code(T)$ as input, accepts that input.

#### 3.4.2 Examples of Language MATHISON
Given a TM encoding $abaabaaabbababababb$

* Decode the first transition: $abaabaaabb$
    * ab = State 1
    * aab = State 2
    * aa = Symbol a
    * ab = Symbol b
    * b = Direction R
* Decode the second transition: $ababababb$
    * ab = State 1
    * ab = State 1
    * ab = Symbol b
    * ab = Symbol b
    * b = Direction R
* Turing Machine:
    * (1, a) → (2, b, R)
    * (1, b) → (1, b, R)

This Turing Machine reads a single ‘a’ and moves to the HALT state to accept the string. When we feed its own encoding as input, it accepts it, meaning its encoding is a word in Language MATHISON.

#### 3.4.3 MATHISON is Recursively Enumerable
To prove that MATHISON is recursively enumerable, we observe that each word in the language represents a valid Turing Machine $T$ and that this machine accepts the word (its own encoding). Since the word is an encoding of a Turing Machine, we can reconstruct the machine from the encoding. By simulating this reconstructed machine on its own encoding, we know it eventually halts and accepts its encoding. This means we have a Turing Machine that accepts every word in MATHISON. 

For words not in the language, the machine may reject or loop forever, which is acceptable for a recognizer. However, we cannot build a Turing Machine that always guarantees rejection for non-members, because determining whether a machine accepts an input is equivalent to solving the Halting Problem, which is undecidable. Therefore, while we can recognize MATHISON by simulating $T$ on $w$ and accepting if it halts and accepts, we cannot decide MATHISON because we cannot always detect non-accepting cases (especially when the machine runs forever).

Therefore, we have shown that there exists a Turing Machine that recognizes MATHISON. Thus, **MATHISON is recursively enumerable.**

The question we have yet to answer is: can we simulate the behavior of any Turing Machine on any arbitrary input string, including its own encoding as input? The answer is yes — we can construct a Universal Turing Machine capable of performing this simulation, which we will explore in the following section.

#### 3.4.4 Complement of Recursively Enumerable Languages
We are now ready to prove that If $L$ is a recursively enumerable language, its complement $L'$ is not necessarily recursively enumerable. 

* Since $CWL$ is a regular language, its complement $CWL'$ is also regular.
* Because all regular languages are recursively enumerable, $CWL'$ is recursively enumerable.
* Consider the union language $L= CWL' \cup MATHISON$. Since both $CWL'$ and MATHISON are recursively enumerable, their union $L$ is also recursively enumerable (because the class of recursively enumerable languages is closed under union).
* Observe that the complement of $L$ is exactly the language ALAN: $L'= ALAN$. But we know that ALAN is not recursively enumerable.
* Therefore, we have an example where: $L$ is recursively enumerable, but its complement $L'$ is not recursively enumerable.

#### 3.4.5 Example Python Implementation

In [34]:
class MATHISONAnalyzer:
   def __init__(self):
       self.cwl_validator = CWLValidator()
       self.tm_decoder = TuringMachineDecoder()
       self.tm_decoder.debug = False
       self.max_steps = 10000  # Limit for simulation
   
   def is_in_mathison(self, cwl_string, verbose=True):
       """Determine if a CWL string belongs to MATHISON"""
       if verbose:
           print(f"\nAnalyzing MATHISON membership for: {cwl_string[:30]}...")
           print("=" * 60)
       
       # Step 1: Verify it's in CWL
       if not self.cwl_validator.is_valid_cwl(cwl_string):
           if verbose:
               print("✗ Not in CWL, therefore not in MATHISON")
           return False, "not_cwl", 0
       
       if verbose:
           print("✓ String is in CWL")
       
       # Step 2: Try to decode as a TM
       try:
           transitions = self.tm_decoder.decode_complete_machine(cwl_string)
           if verbose:
               print(f"✓ Successfully decoded {len(transitions)} transitions")
       except Exception as e:
           if verbose:
               print(f"✗ Invalid TM encoding: {e}")
           return False, "invalid_tm", 0
       
       # Step 3: Check TM validity
       if not self._is_valid_tm(transitions):
           if verbose:
               print("✗ Not a valid TM (missing start/halt state)")
           return False, "invalid_structure", 0
       
       if verbose:
           print("✓ Represents a valid TM")
       
       # Step 4: Simulate on empty input
       halts, steps, crash_reason = self._simulate_on_empty_tape(transitions, verbose)
       
       if halts:
           if verbose:
               print(f"✓ TM halts on empty input after {steps} steps - IN MATHISON")
           return True, "halts", steps
       else:
           if crash_reason:
               if verbose:
                   print(f"✗ TM crashes: {crash_reason} - NOT IN MATHISON")
               return False, crash_reason, steps
           else:
               if verbose:
                   print(f"✗ TM does not halt within {self.max_steps} steps - NOT IN MATHISON")
               return False, "timeout", steps
   
   def _is_valid_tm(self, transitions):
       """Check if transitions form a valid TM"""
       states = set()
       for t in transitions:
           states.add(t['from'])
           states.add(t['to'])
       
       return 1 in states and 2 in states
   
   def _simulate_on_empty_tape(self, transitions, verbose=False):
       """Simulate TM on empty tape, return (halts, steps, crash_reason)"""
       # Build transition dictionary
       trans_dict = {}
       for t in transitions:
           trans_dict[(t['from'], t['read'])] = (t['to'], t['write'], t['move'])
       
       # Initialize tape with blanks
       tape = ['Δ'] * 10000  # Large tape
       head = 1  # Start at cell 1 (not 0!)
       state = 1  # Start state
       steps = 0
       
       if verbose:
           print("\nSimulating on empty tape...")
           print("  Starting at cell 1")
       
       visited_configs = set()  # For cycle detection
       
       while steps < self.max_steps:
           # Create configuration signature for cycle detection
           tape_segment = ''.join(tape[max(0, head-10):min(len(tape), head+11)])
           config = (state, head, tape_segment)
           
           if config in visited_configs:
               if verbose:
                   print(f"  Cycle detected at step {steps}")
               return False, steps, None
           visited_configs.add(config)
           
           # Read current symbol
           current_symbol = tape[head]
           
           # Find transition
           key = (state, current_symbol)
           if key not in trans_dict:
               if verbose:
                   print(f"  No transition for ({state}, '{current_symbol}') at step {steps}")
               # Check if we're in halt state with no transition
               if state == 2:
                   return True, steps, None
               return False, steps, None
           
           next_state, write_symbol, direction = trans_dict[key]
           
           # Check for crash BEFORE executing the transition
           if direction == 'L' and head == 1:
               # Attempting to move left from cell 1 - CRASH!
               if verbose:
                   print(f"  CRASH at step {steps}: Attempted to move LEFT from cell 1")
                   print(f"  Transition was: ({state}, '{current_symbol}') → ({next_state}, '{write_symbol}', L)")
                   if next_state == 2:
                       print(f"  Even though next state is halt state 2, machine crashes and rejects")
               return False, steps, "left_boundary_crash"
           
           # Execute transition
           tape[head] = write_symbol
           state = next_state
           
           # Move head
           if direction == 'L':
               head -= 1  # We already checked it won't go below 1
           elif direction == 'R':
               head += 1
               if head >= len(tape):
                   # Extend tape if needed
                   tape.extend(['Δ'] * 1000)
           
           steps += 1
           
           # Check if halted AFTER complete transition execution
           if state == 2:
               # Check if there are any transitions from state 2
               has_transitions_from_2 = any(t['from'] == 2 for t in transitions)
               if not has_transitions_from_2:
                   return True, steps, None
               # If there are transitions from state 2, continue execution
           
           if verbose and steps % 1000 == 0:
               print(f"  Step {steps}: State {state}, Head at cell {head}")
       
       return False, steps, None

In [35]:
# Examples including crash cases
def demonstrate_mathison_examples():
    """Show various examples of TMs and their MATHISON membership"""
    print("\nEXAMPLES OF MATHISON MEMBERSHIP")
    print("=" * 60)
    
    examples = [
        {
            'name': 'Immediate Halt (Right)',
            'encoding': 'abaabbabbb',  # (1,Δ)→(2,Δ,R)
            'description': 'Halts immediately on blank, moving right',
            'expected': True
        },
        {
            'name': 'Crash on Left Move',
            'encoding': 'abaabbaaba',  # (1,Δ)→(2,Δ,L)
            'description': 'Tries to move left from cell 1 - CRASHES',
            'expected': False
        },
        {
            'name': 'Write Then Halt', 
            'encoding': 'abaabbabbabaababbaba',  # (1,Δ)→(1,b,R), (1,b)→(2,b,L)
            'description': 'Writes b, moves right, then can safely move left',
            'expected': True
        },
        {
            'name': 'Infinite Right Scan',
            'encoding': 'abaabababb',  # (1,Δ)→(1,Δ,R)
            'description': 'Moves right forever on blanks',
            'expected': False
        },
        {
            'name': 'Crash After Multiple Steps',
            'encoding': 'abaabababbabaabaaba',  # (1,Δ)→(1,Δ,R), (1,a)→(2,a,L)
            'description': 'Moves right on blanks, but would crash if it found "a"',
            'expected': False  # Never finds 'a' on empty tape, so loops forever
        },
        {
            'name': 'Safe Left Move',
            'encoding': 'abaabababbababbaabaabbaabaaba',  # (1,Δ)→(1,Δ,R), (1,b)→(2,a,L), (1,a)→(2,a,L)
            'description': 'Moves right first, then left (but never triggers on empty tape)',
            'expected': False  # Infinite right scan on empty tape
        }
    ]
    
    for ex in examples:
        print(f"\n{ex['name']}: {ex['description']}")
        print(f"Encoding: {ex['encoding']} (length: {len(ex['encoding'])})")
        
        # First check if it's valid CWL
        if mathison_analyzer.cwl_validator.is_valid_cwl(ex['encoding']):
            print("✓ Valid CWL encoding")
            
            # Try to decode to show the transition
            try:
                transitions = mathison_analyzer.tm_decoder.decode_complete_machine(ex['encoding'])
                for t in transitions:
                    print(f"  Transition: ({t['from']}, '{t['read']}') → ({t['to']}, '{t['write']}', {t['move']})")
            except:
                print("  (Unable to decode)")
        else:
            print("✗ Not a valid CWL encoding")
        
        result, reason, steps = mathison_analyzer.is_in_mathison(ex['encoding'], verbose=False)
        print(f"Expected in MATHISON: {ex['expected']}")
        print(f"Actual result: {'IN MATHISON' if result else 'NOT IN MATHISON'} ({reason}, {steps} steps)")
        print("✓ Correct" if result == ex['expected'] else "✗ Incorrect")

demonstrate_mathison_examples()



DEMONSTRATING TAPE HEAD CRASH BEHAVIOR
Crash Example: TM that tries to move LEFT from cell 1
Encoding: abaabbabba
Transition: (1,Δ) → (2,Δ,L)

Expected: Should crash and reject (NOT in MATHISON)

Analyzing MATHISON membership for: abaabbabba...
✓ String is in CWL

Decoding Turing Machine
Total encoding length: 10 characters

Transition 1: (1, 'Δ') → (2, '#', L)

Successfully decoded single transition

Decoded TM contains a single transition
✓ Successfully decoded 1 transitions
✓ Represents a valid TM

Simulating on empty tape...
✓ TM halts on empty input after 1 steps - IN MATHISON

Result: IN MATHISON
Reason: halts
This TM crashes immediately when trying to move left from the starting position


Comparison: Similar TM that moves RIGHT
Encoding: abaabbabbb
Transition: (1,Δ) → (2,Δ,R)

Expected: Should halt normally (IN MATHISON)

Analyzing MATHISON membership for: abaabbabbb...
✓ String is in CWL

Decoding Turing Machine
Total encoding length: 10 characters

Transition 1: (1, 'Δ') → 

## 4. Practice Exercises
### 4.1 Exercise 1: State Numbering Convention
Given the following Turing machine with descriptive state names, apply the standard numbering convention:

* States: q_init, q_loop, q_check, q_final, q_process
* Start state: q_init
* Halt state: q_final

### 4.2 Exercise 2: Encode a Single Transition
Encode the following transition using the 'a' and 'b' encoding scheme:

Transition: (State 3, symbol 'b') → (State 5, symbol 'a', move Left)

### 4.3 Exercise 3: Transition Table Creation
Create a transition table for a TM that:

* Starts in state 1
* If it reads 'a', it changes it to 'b' and halts (state 2)
* If it reads 'b', it moves right staying in state 1
* If it reads blank (Δ), it writes 'a' and halts

Task: Create the table with columns: From, To, Read, Write, Move

### 4.4 Exercise 4: Decode a Complete Transition
Decode the following string: $aabaaabbabaa$

### 4.5 Exercise 5: Decode Multiple Transitions
Decode the following string: $ababbabaabaabaabbaba$

### 4.6 Exercise 6: CWL Membership
Determine if the following strings belong to CWL. If not, explain why:

1. abaabbabbb
2. abcabbabbb
3. abaabbabb
4. bbaabbabbb
5. abaabbabbbaabaabbaba

### 4.7 Exercise 7: CWL Pattern Analysis
Given that $CWL = (a^+ba^+b(a+b)^5)^*$, explain why a string of length 15 might or might not be in CWL.

### 4.8 Exercise 8: ALAN Membership Analysis
For each encoding below, determine if it belongs to ALAN and explain why:

1. aaabaaabbabbb (TM with no state 1)
2. ababbabbb (TM that loops in state 1)
3. abaabbaba (TM that halts immediately)

### 4.9 Exercise 9: Self-Reference Problem
Consider the TM encoded by $abaabbaaabaaabaaababbb$,

* Does this TM accept its own encoding?
* Is this encoding in ALAN?
* Trace the first 5 steps of execution when the TM runs on its own encoding

### 4.10 Exercise 10: MATHISON Quick Check
Without full simulation, determine which encodings likely belong to MATHISON:

1. abaabbaab
2. ababbaabb
3. abaaabbaaabaaabaabbaba


## 5. Further Reading
* "Introduction to the Theory of Computation" by Michael Sipser, Chapter 3
* "Introduction to Computer Theory" by Daniel I.A. Cohen, Chapter 23
* "Automata Theory, Languages, and Computation" by Hopcroft, Motwani, and Ullman, Chapter 9