# Transduction Tutorial

This notebook walks through the core concepts and APIs of the `transduction`
library:

1. **Building FSTs** — define finite-state transducers
2. **Decomposition** — compute quotient and remainder for a target prefix
3. **TransducedLM** — pushforward of a language model through an FST
4. **Decoding** — greedy and sampled generation

## 1. Building an FST

An FST maps **source** strings to **target** strings.  Each arc carries an
input label (source side) and an output label (target side).  The empty
string `''` represents epsilon (no symbol consumed/produced).

In [None]:
from transduction.fst import FST, EPSILON

# Build an FST by hand: lowercase normalizer for {a, b}
# Maps both 'A'->'a' and 'a'->'a' (and similarly for b)
fst = FST()
fst.add_start(0)
fst.add_stop(0)
for ch in 'ab':
    fst.add_arc(0, ch.lower(), ch.lower(), 0)  # 'a' -> 'a'
    fst.add_arc(0, ch.upper(), ch.lower(), 0)  # 'A' -> 'a'

fst  # renders as a Graphviz diagram in Jupyter

In [None]:
from transduction.viz import display_table

# Show the FST's relation as a rich table
pairs = sorted(fst.relation(3))
display_table(
    [[repr(src), repr(tgt)] for src, tgt in pairs],
    headings=['Source', 'Target'],
)

There are also convenience constructors:

In [None]:
from IPython.display import display

# FST.from_string: identity transducer for a fixed string
id_fst = FST.from_string('hello')
display(id_fst)

# FST.from_pairs: mapping from (input, output) symbol pairs
replace_fst = FST.from_pairs([('a', 'x'), ('b', 'y')])
display(replace_fst)

The `examples` module provides several pre-built FSTs for testing:

In [None]:
from transduction import examples

# An FST with interesting decomposition behavior
fst = examples.small()
fst

In [None]:
# Its relation (all source/target pairs up to length 4)
display_table(
    [[repr(s), repr(t)] for s, t in sorted(fst.relation(4))],
    headings=['Source', 'Target'],
)

## 2. Decomposition: Quotient and Remainder

Given a target prefix **y**, the *precover decomposition* splits the set of
source strings that produce output beginning with **y** into:

- **Quotient** Q(y): sources that produced **y** and can still continue.
- **Remainder** R(y): sources that produced **y** and have terminated.

The Q and R are represented as **FSAs** (finite-state acceptors).  Let's
visualize them:

In [None]:
from transduction.rust_bridge import RustDecomp
from IPython.display import display, HTML

result = RustDecomp(fst, 'x')

display(HTML('<h4>Quotient Q("x") — sources that can still continue:</h4>'))
display(result.quotient)

display(HTML('<h4>Remainder R("x") — sources that have terminated:</h4>'))
display(result.remainder)

The `>>` operator extends the target prefix **incrementally**, reusing
computation from the previous step:

In [None]:
from transduction.rust_bridge import RustDirtyState

state = RustDirtyState(fst)
state = state >> 'x'

display(HTML('<h4>After target "x" — Quotient:</h4>'))
display(state.quotient)
display(HTML('<h4>After target "x" — Remainder:</h4>'))
display(state.remainder)

In [None]:
state2 = state >> 'a'

display(HTML('<h4>After target "xa" — Quotient:</h4>'))
display(state2.quotient)
display(HTML('<h4>After target "xa" — Remainder:</h4>'))
display(state2.remainder)  # empty — no source string terminates here

## 3. TransducedLM

The `TransducedLM` computes the **pushforward** of an inner language model
through an FST.  It maintains a beam of K particles (source-prefix
hypotheses) and uses the decomposition to score each next target symbol.

The API mirrors the inner LM:
```python
state = tlm >> 'h'          # advance by target symbol
p = state.logp_next['e']    # log P(e | target_so_far)
```

In [None]:
import numpy as np
from transduction.lm.ngram import CharNgramLM
from transduction.lm.transduced import TransducedLM

# Train a character-level n-gram LM on mixed-case text
inner_lm = CharNgramLM.train('Hello World hello world the hero held', n=3)

# Build the transduced LM (lowercase FST, K=50 particles)
fst = examples.lowercase()
tlm = TransducedLM(inner_lm, fst, K=50)
tlm

In [None]:
# The key insight: P_target('h') = P_source('h') + P_source('H')
# because both map to the same target symbol through the FST.
s0 = inner_lm.initial()
p_h = np.exp(s0.logp_next['h'])
p_H = np.exp(s0.logp_next['H'])

state = tlm.initial()
p_target_h = np.exp(state.logp_next['h'])

display_table(
    [
        ['P_source(h)', f'{p_h:.4f}'],
        ['P_source(H)', f'{p_H:.4f}'],
        ['sum', f'{p_h + p_H:.4f}'],
        ['P_target(h)', f'{p_target_h:.4f}'],
    ],
    headings=['Quantity', 'Value'],
)

In [None]:
# Condition on target prefix "he" — TransducedState has rich HTML display
# showing particles, DFA states, and the next-symbol distribution.
state = tlm >> 'h' >> 'e'
state  # _repr_html_ renders particle table + logp_next distribution

## 4. Decoding

`TransducedLM` supports greedy and sampled decoding out of the box:

In [None]:
# Greedy decode
tokens = tlm.initial().greedy_decode(max_len=20)
display(HTML(f'<b>Greedy:</b> <code>{"" .join(tokens)}</code>'))

In [None]:
# Sample decode
np.random.seed(42)
samples = []
for i in range(5):
    tokens = tlm.initial().sample_decode(max_len=20)
    samples.append([str(i+1), repr(''.join(tokens))])

display_table(samples, headings=['#', 'Sample'])

## 5. FSA Operations

The `FSA` class supports the full suite of regular-language operations.
These are used internally by the decomposition algorithms but are also
useful on their own.  All FSAs render as Graphviz diagrams:

In [None]:
from transduction.fsa import FSA

a = FSA.from_string('ab')
b = FSA.from_string('cd')

display(HTML('<h4>FSA for "ab":</h4>'))
display(a)

display(HTML('<h4>Union (ab | cd):</h4>'))
display(a + b)

display(HTML('<h4>Concatenation (ab · cd):</h4>'))
display(a * b)

display(HTML('<h4>Kleene star (ab)*:</h4>'))
display(a.star())

In [None]:
# Determinization and minimization
nfa = a + b
dfa = nfa.det()
minimal = dfa.min_fast()

display_table(
    [
        ['NFA (union)', str(len(nfa.states))],
        ['DFA (det)', str(len(dfa.states))],
        ['Minimal', str(len(minimal.states))],
        ['Languages equal?', str(dfa.equal(minimal))],
    ],
    headings=['Automaton', 'States / Result'],
)

display(HTML('<h4>Minimized DFA:</h4>'))
display(minimal)