# üîÑ nanochat Backward/Bidirectional Language Models

This notebook lets you run inference with nanochat models trained in different directions:

| Model | Direction | Description |
|-------|-----------|-------------|
| `nanochat-760M-backward` | Backward | Right-to-left prediction |
| `nanochat-760M-bidirectional` | Bidirectional | Both directions with special tokens |
| `nanochat-760M-backward-sft` | Backward | Chat fine-tuned backward model |

**Repository:** [github.com/traghav/onanchat](https://github.com/traghav/onanchat)

---

## 1. Setup & Installation

In [9]:
# Install dependencies
!pip install -q torch huggingface_hub tiktoken maturin

# Note: rustbpe is a custom Rust tokenizer that needs to be built
# We'll build it after cloning the repo

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m9.2/9.2 MB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [10]:
# Clone the onanchat repository for model code
!git clone https://github.com/traghav/onanchat.git
%cd onanchat

# Build the rustbpe tokenizer (requires Rust)
!curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
import os
os.environ["PATH"] = f"/root/.cargo/bin:{os.environ['PATH']}"

# Build rustbpe as a Python module
%cd rustbpe
!maturin develop --release
%cd ..

Cloning into 'onanchat'...
remote: Enumerating objects: 897, done.[K
remote: Counting objects: 100% (119/119), done.[K
remote: Compressing objects: 100% (85/85), done.[K
remote: Total 897 (delta 75), reused 56 (delta 33), pack-reused 778 (from 3)[K
Receiving objects: 100% (897/897), 568.84 KiB | 9.98 MiB/s, done.
Resolving deltas: 100% (547/547), done.
/content/onanchat/onanchat
[1minfo:[0m downloading installer
[0m[1minfo: [0mprofile set to 'default'
[0m[1minfo: [0mdefault host triple is x86_64-unknown-linux-gnu
[0m[1minfo: [0msyncing channel updates for 'stable-x86_64-unknown-linux-gnu'
[0m[1minfo: [0mlatest update on 2025-11-10, rust version 1.91.1 (ed61e7d7e 2025-11-07)
[0m[1minfo: [0mdownloading component 'cargo'
[0m[1minfo: [0mdownloading component 'clippy'
[0m[1minfo: [0mdownloading component 'rust-docs'
[0m[1minfo: [0mdownloading component 'rust-std'
[0m[1minfo: [0mdownloading component 'rustc'
 74.5 MiB /  74.5 MiB (100 %)  55.6 MiB/s in  1s
[

In [11]:
import os
import sys
import torch
import pickle
from huggingface_hub import hf_hub_download

# Add to path
sys.path.insert(0, '.')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.9.0+cu126
CUDA available: True
GPU: Tesla T4


## 2. Select and Download Model

Choose which model to use:

In [12]:
#@title Select Model
MODEL_CHOICE = "nanochat-760M-backward" #@param ["nanochat-760M-backward", "nanochat-760M-bidirectional", "nanochat-760M-backward-sft"]

REPO_ID = f"raghavt/{MODEL_CHOICE}"
print(f"Selected model: {REPO_ID}")

Selected model: raghavt/nanochat-760M-backward


In [13]:
# Download model files from HuggingFace
print("Downloading model files...")

model_path = hf_hub_download(repo_id=REPO_ID, filename="model.pt")
meta_path = hf_hub_download(repo_id=REPO_ID, filename="meta.json")
tokenizer_path = hf_hub_download(repo_id=REPO_ID, filename="tokenizer.pkl")
token_bytes_path = hf_hub_download(repo_id=REPO_ID, filename="token_bytes.pt")

print(f"‚úì Model downloaded to: {model_path}")
print(f"‚úì Metadata downloaded to: {meta_path}")
print(f"‚úì Tokenizer downloaded to: {tokenizer_path}")

Downloading model files...
‚úì Model downloaded to: /root/.cache/huggingface/hub/models--raghavt--nanochat-760M-backward/snapshots/ff53bd7dfc098077b8cc4c81b6ab79d4a257f192/model.pt
‚úì Metadata downloaded to: /root/.cache/huggingface/hub/models--raghavt--nanochat-760M-backward/snapshots/ff53bd7dfc098077b8cc4c81b6ab79d4a257f192/meta.json
‚úì Tokenizer downloaded to: /root/.cache/huggingface/hub/models--raghavt--nanochat-760M-backward/snapshots/ff53bd7dfc098077b8cc4c81b6ab79d4a257f192/tokenizer.pkl


## 3. Load Model and Tokenizer

In [14]:
import json
from nanochat.gpt import GPT, GPTConfig

# Load metadata
with open(meta_path, 'r') as f:
    meta = json.load(f)

direction = meta.get('direction', 'forward')
model_config = meta['model_config']

print(f"Model direction: {direction}")
print(f"Model config: {json.dumps(model_config, indent=2)}")

Model direction: backward
Model config: {
  "sequence_len": 2048,
  "vocab_size": 65536,
  "n_layer": 20,
  "n_head": 10,
  "n_kv_head": 10,
  "n_embd": 1280
}


In [15]:
# Set device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

# Create model
config = GPTConfig(
    sequence_len=model_config['sequence_len'],
    vocab_size=model_config['vocab_size'],
    n_layer=model_config['n_layer'],
    n_head=model_config['n_head'],
    n_kv_head=model_config['n_kv_head'],
    n_embd=model_config['n_embd']
)

model = GPT(config)

# Load weights
print("Loading model weights...")
state_dict = torch.load(model_path, map_location=device, weights_only=False)
model.load_state_dict(state_dict)
model = model.to(device)
model.eval()

# Count parameters
num_params = sum(p.numel() for p in model.parameters())
print(f"‚úì Model loaded: {num_params/1e6:.1f}M parameters")

Using device: cuda
Loading model weights...
‚úì Model loaded: 561.0M parameters


In [17]:
import os
import sys
import torch
import pickle
from huggingface_hub import hf_hub_download

# Add to path
sys.path.insert(0, '.')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.9.0+cu126
CUDA available: True
GPU: Tesla T4


## 4. Inference Functions

In [18]:
@torch.no_grad()
def generate(
    model,
    tokenizer,
    prompt: str,
    max_new_tokens: int = 100,
    temperature: float = 0.8,
    top_k: int = 50,
    direction: str = "forward"
):
    """
    Generate text from a prompt.

    For backward models:
    - Input prompt is the "ending" of the text
    - Model generates what came "before"
    - Output is reversed for display
    """
    model.eval()

    # Encode prompt
    tokens = tokenizer.encode(prompt)

    # For backward models, reverse the input tokens
    if direction == "backward":
        tokens = tokens[::-1]

    # For bidirectional, add direction token
    if direction == "bidirectional":
        # Use forward direction by default for bidirectional
        forward_token = tokenizer.get_forward_token_id()
        tokens = [forward_token] + tokens

    # Add BOS token
    bos_token = tokenizer.get_bos_token_id()
    tokens = [bos_token] + tokens

    # Convert to tensor
    x = torch.tensor([tokens], dtype=torch.long, device=device)

    # Generate
    generated_tokens = []
    for _ in range(max_new_tokens):
        # Get logits
        logits = model(x)[0, -1, :]  # Last position logits

        # Apply temperature
        logits = logits / temperature

        # Top-k sampling
        if top_k > 0:
            v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
            logits[logits < v[-1]] = float('-inf')

        # Sample
        probs = torch.softmax(logits, dim=-1)
        next_token = torch.multinomial(probs, num_samples=1)

        generated_tokens.append(next_token.item())
        x = torch.cat([x, next_token.unsqueeze(0)], dim=1)

        # Truncate if too long
        if x.size(1) > config.sequence_len:
            x = x[:, -config.sequence_len:]

    # Decode generated tokens
    if direction == "backward":
        # Reverse the generated tokens for display
        generated_tokens = generated_tokens[::-1]
        generated_text = tokenizer.decode(generated_tokens)
        return generated_text + prompt  # Generated text comes before prompt
    else:
        generated_text = tokenizer.decode(generated_tokens)
        return prompt + generated_text

print("‚úì Generation function defined")

‚úì Generation function defined


## 5. Try It Out!

### Forward Generation (Standard)
Give a beginning, model generates what comes next.

### Backward Generation
Give an ending, model generates what came before!

In [19]:
#@title Generate Text
prompt = "The quick brown fox" #@param {type:"string"}
max_tokens = 50 #@param {type:"slider", min:10, max:200, step:10}
temperature = 0.8 #@param {type:"slider", min:0.1, max:2.0, step:0.1}
top_k = 50 #@param {type:"slider", min:1, max:100, step:1}

print(f"Direction: {direction}")
print(f"Prompt: {prompt}")
print("-" * 50)

output = generate(
    model=model,
    tokenizer=tokenizer,
    prompt=prompt,
    max_new_tokens=max_tokens,
    temperature=temperature,
    top_k=top_k,
    direction=direction
)

print(f"\nGenerated text:\n{output}")

Direction: backward
Prompt: The quick brown fox
--------------------------------------------------

Generated text:
 appropriate size for a medium-sized fox. The quick brown fox can also be referred to as the slow brown fox.
What kind of animal is the slow brown fox called?
The Quick Brown Fox
What kind of animal is the quick brown fox called?
The quick brown fox


### Example: Backward Model Usage

With a backward model, you provide the *ending* and the model generates what *came before*:

In [20]:
if direction == "backward":
    # Example: Give an ending, get the beginning
    ending = "and they lived happily ever after."

    print(f"Ending (your input): '{ending}'")
    print("-" * 50)

    story = generate(
        model=model,
        tokenizer=tokenizer,
        prompt=ending,
        max_new_tokens=100,
        temperature=0.9,
        top_k=50,
        direction=direction
    )

    print(f"\nGenerated story (beginning + your ending):\n{story}")
else:
    print(f"Current model is {direction}, not backward.")
    print("Select 'nanochat-760M-backward' or 'nanochat-760M-backward-sft' to try backward generation.")

Ending (your input): 'and they lived happily ever after.'
--------------------------------------------------

Generated story (beginning + your ending):
which brought her love to him.
it took very long to find the girl's love,
which brought her love to him.
it took very long to find the girl's love,
which brought her love to him.
it took very long to find that love,
which brought his love to the girl, because after all,
it took quite a while to find that love
which brought her love to him.
it took not very long to find that love,
which brought the girl's love to him,
and they lived happily ever after.


## 6. Interactive Chat (for SFT model)

In [21]:
def chat(user_message: str, max_tokens: int = 150, temperature: float = 0.8):
    """
    Simple chat interface for the SFT model.
    """
    # Format as chat
    prompt = f"User: {user_message}\nAssistant:"

    output = generate(
        model=model,
        tokenizer=tokenizer,
        prompt=prompt,
        max_new_tokens=max_tokens,
        temperature=temperature,
        top_k=50,
        direction=direction
    )

    # Extract assistant response
    if "Assistant:" in output:
        response = output.split("Assistant:")[-1].strip()
        # Stop at next "User:" if present
        if "User:" in response:
            response = response.split("User:")[0].strip()
        return response
    return output

if "sft" in MODEL_CHOICE:
    print("Chat interface ready! Use the chat() function.")
    print("Example: chat('What is the capital of France?')")
else:
    print(f"Note: Current model ({MODEL_CHOICE}) is a base model, not SFT.")
    print("For better chat, select 'nanochat-760M-backward-sft'")

Note: Current model (nanochat-760M-backward) is a base model, not SFT.
For better chat, select 'nanochat-760M-backward-sft'


In [22]:
#@title Chat with the model
user_input = "Tell me a short story about a robot." #@param {type:"string"}

if "sft" in MODEL_CHOICE:
    response = chat(user_input)
    print(f"User: {user_input}")
    print(f"\nAssistant: {response}")
else:
    # For base models, just do completion
    output = generate(
        model=model,
        tokenizer=tokenizer,
        prompt=user_input,
        max_new_tokens=100,
        temperature=0.8,
        top_k=50,
        direction=direction
    )
    print(f"Completion:\n{output}")

Completion:
ism.
To learn more about what photorealism and photorealism mean, watch this video.
Tell me a short story about hyperrealism.
Realism is a one-dimensional way of looking at the world.
Tell me a short story about hyperrealism.
What is hyperrealism?
We can view the world with hyperrealism.
What is hyperrealism?
Hyperrealism is a two-dimensional view of the world.
What is a robot?
Bring me a poem about a robot.
Tell me a short story about a robot.


---

## About These Models

These models are part of a research project studying how LLMs learn when trained in different directions:

- **Backward models** predict tokens right-to-left instead of left-to-right
- **Bidirectional models** can switch between both directions using special tokens

### Research Questions:
1. Do backward models learn different internal representations?
2. Can models transfer knowledge across directions?
3. Does bidirectional training help both directions?

### Links:
- **Code:** [github.com/traghav/onanchat](https://github.com/traghav/onanchat)
- **Models:** [huggingface.co/raghavt](https://huggingface.co/raghavt)

Based on [nanochat](https://github.com/karpathy/nanochat) by Andrej Karpathy.