<a href="https://colab.research.google.com/github/tbaeumel/MI_tutorials/blob/main/Project_Sessions_3_4_5_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### How do LLMs do addition?

This is a mini-project to test your new skills in **early decoding**, **probing**, and **causal intervention**.

We have the following question: How do LLMs do addition?

For instance, *161 + 224 = 385*

## Part 1 - Prediction Accuracy

We start by looking at the performance of *meta-llama/Llama-3.2-1B* on addition tasks.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import login
# Login to Huggingface to get access to model parameters
# Paste your token here
login('')


# Define prompts
prompts = ["122 + 125 = 247; 161 + 224 =",
           "122 + 125 = 247; 161 + 395 =",
           "122 + 125 = 247; 353 + 214 =",
           "122 + 125 = 247; 499 + 412 =",
           "122 + 125 = 247; 121 + 540 ="]

model_name = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GPTQ" # "meta-llama/Llama-3.2-1B"

# Load the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Evaluate the model on each prompt
for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=inputs['input_ids'].shape[1] + 10)
    completion = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Prompt: {prompt}")
    print(f"Model completion: {completion}\n")

### Part 2 - Early Decoding

Try to answer these questions:
1.   From which layer on does Llama predict the
correct result?
2.   What is the role of Attention and MLP layers?

To answer 1) you can use `output_hidden_states=True`, normalize the hidden states with the final layer norm (Use `print(model)` to find about names of layers in *LLama-3.2*), and apply the language modeling head of the model to get logits. Then you simply turn the logit into probabilities.

**Difficulty of 1): Easy**, since you can copy heavily from the *Early_Decoding.ipynb* notebook.

To answer 2) you will have to use hooks. Go and have a look at the forward loop of the model to find the best possible position to hook (https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/models/llama/modeling_llama.py). Once you figured this out, you can rely heavily on the *Early_Decoding.ipynb* notebook (again, use `print(model)` to find about names of layers in *LLama-3.2*)

**Difficulty of 2): More advanced**, since you have to think about the information flow in the model, to choose the best hooks.

In [None]:
# TODO

### Part 3 - Causal Interventions

Try to answer these questions:

1.   What layer (or module in that layer) is responsible for introducing operands into the 'calculation'?  
2.   Can we find out anything interesting about the use of operators?

Use pyvene to intervene on layers and modules at the last token position.
Do your investigations on a single prompt (or at most a handful of prompt).
Think carefully about the design of your interventions, so that your results are clearly interpretable.

In [None]:
# TODO