<a href="https://colab.research.google.com/github/wlg1/analogous_neuron_circuit_expms/blob/main/simple_analogies_circuits%2C_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prelims

<b style="color: red">To use this notebook, go to Runtime > Change Runtime Type and select GPU as the hardware accelerator.</b>

One reason is because tokenizer will use .cuda to process input batches in parallel.

**INTRODUCTION**

**AIM**: Investigate if there are circuits similar to those of IOI (with duplication and subj-inhibition heads, etc) for recognizing simple analogies. The task is, given "source examples" in the input, if it can correctly complete a target case. For example, one such input is:

    "Mary has a hat. John has a cane. The student is John. Ron has a cane. Horace has a hat. The student is Ron. Ashley has a cane. Ben has a hat. The student is"
    
And the correct answer is "Ashley" because the pattern is "the student has the cane". (The inputs are aimed to be written to avoid ambiguity that can result in multiple correct answers if there are several patterns).

This is inspired by how one is able to give chatgpt say a writing style it hasn't seen before, and it is able to mimic its patterns, essentially making "analogies" from its input. Of course, smaller models may not have this ability, so I sought to test them.


# Setup

In [1]:
# Janky code to do different setup when run in a Colab notebook vs VSCode
DEBUG_MODE = False
try:
    import google.colab
    IN_COLAB = True
    print("Running as a Colab notebook")
    %pip install git+https://github.com/neelnanda-io/TransformerLens.git
    # Install another version of node that makes PySvelte work way faster
    !curl -fsSL https://deb.nodesource.com/setup_16.x | sudo -E bash -; sudo apt-get install -y nodejs
    %pip install git+https://github.com/neelnanda-io/PySvelte.git
except:
    IN_COLAB = False
    print("Running as a Jupyter notebook - intended for development only!")
    from IPython import get_ipython

    ipython = get_ipython()
    # Code to automatically update the HookedTransformer code as its edited without restarting the kernel
    ipython.magic("load_ext autoreload")
    ipython.magic("autoreload 2")

Running as a Colab notebook
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/neelnanda-io/TransformerLens.git
  Cloning https://github.com/neelnanda-io/TransformerLens.git to /tmp/pip-req-build-ydyija19
  Running command git clone --filter=blob:none --quiet https://github.com/neelnanda-io/TransformerLens.git /tmp/pip-req-build-ydyija19
  Resolved https://github.com/neelnanda-io/TransformerLens.git to commit 9e034eef796ed32b0e473ca9f3c9d31e5d5046e7
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting datasets>=2.7.1 (from transformer-lens==0.0.0)
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops>=0.6.0 (from transformer-lens==0.0


## Installing the NodeSource Node.js 16.x repo...


## Populating apt-get cache...

+ apt-get update
Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease [1,581 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages [1,010 kB]
Get:3 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3,622 B]
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:5 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease [18.1 kB]
Hit:6 http://archive.ubuntu.com/ubuntu focal InRelease
Get:7 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Hit:8 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Hit:9 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal InRelease
Get:10 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2,680 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Hit:12 http://ppa.launchp

In [2]:
# Plotly needs a different renderer for VSCode/Notebooks vs Colab argh
import plotly.io as pio

if IN_COLAB or not DEBUG_MODE:
    # Thanks to annoying rendering issues, Plotly graphics will either show up in colab OR Vscode depending on the renderer - this is bad for developing demos! Thus creating a debug mode.
    pio.renderers.default = "colab"
else:
    pio.renderers.default = "png"

In [3]:
# Import stuff
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import einops
from fancy_einsum import einsum
import tqdm.notebook as tqdm
import random
from pathlib import Path
import plotly.express as px
from torch.utils.data import DataLoader

from jaxtyping import Float, Int
from typing import List, Union, Optional
from functools import partial
import copy

import itertools
from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import dataclasses
import datasets
from IPython.display import HTML

In [4]:
import pysvelte

import transformer_lens
import transformer_lens.utils as utils
from transformer_lens.hook_points import (
    HookedRootModule,
    HookPoint,
)  # Hooking utilities
from transformer_lens import HookedTransformer, HookedTransformerConfig, FactoredMatrix, ActivationCache

We turn automatic differentiation off, to save GPU memory, as this notebook focuses on model inference not model training.

In [5]:
torch.set_grad_enabled(False)

<torch.autograd.grad_mode.set_grad_enabled at 0x7f76942ab130>

Plotting helper functions:

In [6]:
def imshow(tensor, renderer=None, **kwargs):
    px.imshow(utils.to_numpy(tensor), color_continuous_midpoint=0.0, color_continuous_scale="RdBu", **kwargs).show(renderer)

def line(tensor, renderer=None, **kwargs):
    px.line(y=utils.to_numpy(tensor), **kwargs).show(renderer)

def scatter(x, y, xaxis="", yaxis="", caxis="", renderer=None, **kwargs):
    x = utils.to_numpy(x)
    y = utils.to_numpy(y)
    px.scatter(y=y, x=x, labels={"x":xaxis, "y":yaxis, "color":caxis}, **kwargs).show(renderer)

In [7]:
line(np.arange(5))

# Analyze GPT-2-Large

## Loading and Running Models

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model = HookedTransformer.from_pretrained("gpt2-large", device=device)

## Simple Analogies

### Mary has X. John has Y. Z is John.

#### Using variable letters as traits

In [None]:
example_prompt = "Mary has X. John has Y. Z is John. Ashley has X. Ben has Y. Z is"
example_answer = " Ben"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' X', '.', ' John', ' has', ' Y', '.', ' Z', ' is', ' John', '.', ' Ashley', ' has', ' X', '.', ' Ben', ' has', ' Y', '.', ' Z', ' is']
Tokenized answer: [' Ben']


Top 0th token. Logit: 18.27 Prob: 82.41% Token: | Ben|
Top 1th token. Logit: 14.88 Prob:  2.80% Token: | John|
Top 2th token. Logit: 14.32 Prob:  1.60% Token: | Ashley|
Top 3th token. Logit: 14.07 Prob:  1.25% Token: | Z|
Top 4th token. Logit: 14.07 Prob:  1.24% Token: | Benjamin|
Top 5th token. Logit: 12.73 Prob:  0.33% Token: | Bob|
Top 6th token. Logit: 12.23 Prob:  0.20% Token: | Jane|
Top 7th token. Logit: 12.23 Prob:  0.20% Token: | Joe|
Top 8th token. Logit: 12.05 Prob:  0.16% Token: | Mary|
Top 9th token. Logit: 12.03 Prob:  0.16% Token: | Bill|


As before, change the order. But don't switch the names order; switch which names have which properties (y, x). This corruption's correct answer is now 'Ashley'.

In [None]:
example_prompt = "Mary has X. John has Y. Z is John. Ashley has Y. Ben has X. Z is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' X', '.', ' John', ' has', ' Y', '.', ' Z', ' is', ' John', '.', ' Ashley', ' has', ' Y', '.', ' Ben', ' has', ' X', '.', ' Z', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 17.76 Prob: 67.66% Token: | Ben|
Top 1th token. Logit: 16.00 Prob: 11.61% Token: | Z|
Top 2th token. Logit: 15.09 Prob:  4.69% Token: | John|
Top 3th token. Logit: 13.93 Prob:  1.46% Token: | Y|
Top 4th token. Logit: 13.35 Prob:  0.82% Token: | Benjamin|
Top 5th token. Logit: 12.88 Prob:  0.51% Token: | Ashley|
Top 6th token. Logit: 12.51 Prob:  0.35% Token: | X|
Top 7th token. Logit: 12.41 Prob:  0.32% Token: | him|
Top 8th token. Logit: 12.38 Prob:  0.31% Token: | Jane|
Top 9th token. Logit: 12.24 Prob:  0.27% Token: | Mark|


We find that gpt-2 large fails at this. Could it be using external information, such as Ben being a man and John being a man? Let's test this.

In [None]:
example_prompt = "Mary has X. John has Y. Z is John. Ben has X. Ashley has Y. Z is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' X', '.', ' John', ' has', ' Y', '.', ' Z', ' is', ' John', '.', ' Ben', ' has', ' X', '.', ' Ashley', ' has', ' Y', '.', ' Z', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 17.34 Prob: 56.86% Token: | Ashley|
Top 1th token. Logit: 16.10 Prob: 16.41% Token: | Ben|
Top 2th token. Logit: 15.06 Prob:  5.85% Token: | John|
Top 3th token. Logit: 14.59 Prob:  3.63% Token: | Z|
Top 4th token. Logit: 12.72 Prob:  0.56% Token: | Benjamin|
Top 5th token. Logit: 12.67 Prob:  0.54% Token: | Ash|
Top 6th token. Logit: 12.50 Prob:  0.45% Token: | Sarah|
Top 7th token. Logit: 12.39 Prob:  0.40% Token: | Y|
Top 8th token. Logit: 12.19 Prob:  0.33% Token: | James|
Top 9th token. Logit: 12.05 Prob:  0.29% Token: | Aaron|


The model doesn't seem to be using external info as it correctly guesses "Ashley". It seems to think the correct analogies is by the "second name of the system".

Note that it does predict Ashley with far less probability than it did when the answer was Ben, but given that these are only 2 examples, we are unsure if this is just a coincidence or a correlation of using gender as external info.

Another thing we can test is to put all of system 1 in one sentence, and all of system 2 in one sentence, and see if that helps it understand the 'boundary' better that we have 2 separate systems.

In [None]:
example_prompt = "In a family, Mary has X, John has Y, so Z is John. In another family, Ashley has X, Ben has Y, so Z is"
example_answer = " Ben"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'In', ' a', ' family', ',', ' Mary', ' has', ' X', ',', ' John', ' has', ' Y', ',', ' so', ' Z', ' is', ' John', '.', ' In', ' another', ' family', ',', ' Ashley', ' has', ' X', ',', ' Ben', ' has', ' Y', ',', ' so', ' Z', ' is']
Tokenized answer: [' Ben']


Top 0th token. Logit: 17.93 Prob: 58.54% Token: | Ben|
Top 1th token. Logit: 17.01 Prob: 23.37% Token: | Ashley|
Top 2th token. Logit: 15.33 Prob:  4.36% Token: | John|
Top 3th token. Logit: 14.22 Prob:  1.43% Token: | Benjamin|
Top 4th token. Logit: 13.35 Prob:  0.60% Token: | Mary|
Top 5th token. Logit: 12.69 Prob:  0.31% Token: | Joe|
Top 6th token. Logit: 12.64 Prob:  0.30% Token: | Ash|
Top 7th token. Logit: 12.44 Prob:  0.24% Token: | James|
Top 8th token. Logit: 12.41 Prob:  0.23% Token: | Bob|
Top 9th token. Logit: 12.40 Prob:  0.23% Token: | Jane|


This attempt at a 'system separator' actually makes the probability of the correct answer go down.

Thus, we will not proceed with using "stronger separators" or trying to control for "external gender information" for the next set of expms.

Let's try adding more properties to see if it gets the picture. 

In [None]:
example_prompt = "Mary has X. John has Y. John has B. Z is John. Ashley has Y. Ben has X. Ashley has B. Z is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' X', '.', ' John', ' has', ' Y', '.', ' John', ' has', ' B', '.', ' Z', ' is', ' John', '.', ' Ashley', ' has', ' Y', '.', ' Ben', ' has', ' X', '.', ' Ashley', ' has', ' B', '.', ' Z', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 17.75 Prob: 64.95% Token: | Ben|
Top 1th token. Logit: 15.62 Prob:  7.75% Token: | Z|
Top 2th token. Logit: 15.19 Prob:  5.04% Token: | Ashley|
Top 3th token. Logit: 15.17 Prob:  4.91% Token: | John|
Top 4th token. Logit: 14.95 Prob:  3.96% Token: | B|
Top 5th token. Logit: 13.20 Prob:  0.69% Token: | Benjamin|
Top 6th token. Logit: 13.11 Prob:  0.63% Token: | Y|
Top 7th token. Logit: 12.93 Prob:  0.52% Token: | Bob|
Top 8th token. Logit: 12.35 Prob:  0.29% Token: | X|
Top 9th token. Logit: 11.95 Prob:  0.20% Token: | Bill|


Interestingly, Ashley goes up in rank. But how often does this happen; was this just a coincidence? Try adding more properties.

In [None]:
example_prompt = "Mary has X. John has Y. John has B. John has W. Z is John. Ashley has Y. Ben has X. Ashley has B. Ashley has W. Z is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' X', '.', ' John', ' has', ' Y', '.', ' John', ' has', ' B', '.', ' John', ' has', ' W', '.', ' Z', ' is', ' John', '.', ' Ashley', ' has', ' Y', '.', ' Ben', ' has', ' X', '.', ' Ashley', ' has', ' B', '.', ' Ashley', ' has', ' W', '.', ' Z', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 17.92 Prob: 64.41% Token: | Ben|
Top 1th token. Logit: 15.87 Prob:  8.34% Token: | Z|
Top 2th token. Logit: 15.86 Prob:  8.20% Token: | John|
Top 3th token. Logit: 15.58 Prob:  6.22% Token: | Ashley|
Top 4th token. Logit: 14.07 Prob:  1.37% Token: | B|
Top 5th token. Logit: 13.11 Prob:  0.52% Token: | Benjamin|
Top 6th token. Logit: 13.03 Prob:  0.49% Token: | Y|
Top 7th token. Logit: 12.91 Prob:  0.43% Token: | W|
Top 8th token. Logit: 12.73 Prob:  0.36% Token: | Bob|
Top 9th token. Logit: 12.41 Prob:  0.26% Token: | X|


Unlike before, Ashley goes down. So that 'going up in rank' may not have been correlated with 'adding more examples'; it may have been a coincidence. 

In summary, based on these few tests, we have evidence that:

- Changing the order of names doesn’t allow it to get the new correct output as it guesses the same output (which was correct before but isn’t now)
    - This is different from how the paper on IOI corrupted the input, in which changing the names allowed it to get a new corrupt output
- The gpt-2-large model, to calculate the analogous output, doesn’t: 1) use external info about gender, 2) get affected by separators of source and target systems, 3) improve its guess on the correct output by taking more information about traits into account

#### Using actual words instead of variable letters

Let's try again, but using actual words instead of variable letters. We'll start with the non-corrupted version.

In [None]:
example_prompt = "Mary has a hat. John has no hat. The student is John. Ashley has a hat. Ben has no hat. The student is"
example_answer = " Ben"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' no', ' hat', '.', ' The', ' student', ' is', ' John', '.', ' Ashley', ' has', ' a', ' hat', '.', ' Ben', ' has', ' no', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ben']


Top 0th token. Logit: 18.82 Prob: 87.16% Token: | Ben|
Top 1th token. Logit: 15.85 Prob:  4.47% Token: | Ashley|
Top 2th token. Logit: 14.95 Prob:  1.82% Token: | John|
Top 3th token. Logit: 14.26 Prob:  0.91% Token: | Benjamin|
Top 4th token. Logit: 13.15 Prob:  0.30% Token: | Mary|
Top 5th token. Logit: 12.88 Prob:  0.23% Token: | the|
Top 6th token. Logit: 12.72 Prob:  0.20% Token: | James|
Top 7th token. Logit: 12.46 Prob:  0.15% Token: | Jane|
Top 8th token. Logit: 12.21 Prob:  0.12% Token: | not|
Top 9th token. Logit: 11.92 Prob:  0.09% Token: | Bob|


 Switch which names have which properties (y, x)

In [None]:
example_prompt = "Mary has a hat. John has no hat. The student is John. Ashley has no hat. Ben has a hat. The student is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' no', ' hat', '.', ' The', ' student', ' is', ' John', '.', ' Ashley', ' has', ' no', ' hat', '.', ' Ben', ' has', ' a', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 18.86 Prob: 87.49% Token: | Ben|
Top 1th token. Logit: 15.74 Prob:  3.88% Token: | Ashley|
Top 2th token. Logit: 15.17 Prob:  2.19% Token: | John|
Top 3th token. Logit: 14.25 Prob:  0.87% Token: | Benjamin|
Top 4th token. Logit: 13.19 Prob:  0.30% Token: | Mary|
Top 5th token. Logit: 13.09 Prob:  0.27% Token: | the|
Top 6th token. Logit: 12.78 Prob:  0.20% Token: | James|
Top 7th token. Logit: 12.54 Prob:  0.16% Token: | Jane|
Top 8th token. Logit: 12.39 Prob:  0.14% Token: | not|
Top 9th token. Logit: 11.95 Prob:  0.09% Token: | Ash|


Like in the 'variable letters' case, this approach doesn't get the correct answer when the names are switched with the traits.

Given that gpt-2-large does badly on all these properties that are used in making analogies (identifying based on similar traits b/w source and target, using external info about similar traits, etc), it does not seem like there will be many, or complex enough, analogous inputs it will get the correct output on. 

We want to study a model that does well on analogy-making, to some extent, as we want to study the circuits that allow it to do so, then corrupt inputs and mechanisms to test for which parts of inputs and mechanisms are important for analogy-making. Thus, we decide not to study gpt-2-large, but -xl, which should have improved overall performance on several tasks.

# Analyze GPT-2-xl

## Loading and Running Models

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model = HookedTransformer.from_pretrained("gpt2-xl", device=device)

Downloading (…)lve/main/config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/6.43G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Using pad_token, but it is not set yet.


Loaded pretrained model gpt2-xl into HookedTransformer


### Mary has X. John has Y. Z is John.

#### Using variable letters as traits

In [None]:
example_prompt = "Mary has X. John has Y. Z is John. Ashley has X. Ben has Y. Z is"
example_answer = " Ben"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' X', '.', ' John', ' has', ' Y', '.', ' Z', ' is', ' John', '.', ' Ashley', ' has', ' X', '.', ' Ben', ' has', ' Y', '.', ' Z', ' is']
Tokenized answer: [' Ben']


Top 0th token. Logit: 16.43 Prob: 56.87% Token: | Ben|
Top 1th token. Logit: 15.28 Prob: 18.12% Token: | Ashley|
Top 2th token. Logit: 13.00 Prob:  1.85% Token: | John|
Top 3th token. Logit: 12.73 Prob:  1.40% Token: | Z|
Top 4th token. Logit: 12.32 Prob:  0.93% Token: | Mary|
Top 5th token. Logit: 11.91 Prob:  0.62% Token: | not|
Top 6th token. Logit: 11.58 Prob:  0.45% Token: | a|
Top 7th token. Logit: 11.57 Prob:  0.44% Token: | Adam|
Top 8th token. Logit: 11.49 Prob:  0.41% Token: | X|
Top 9th token. Logit: 11.47 Prob:  0.40% Token: | Y|


We find that gpt-2-xl does worse than gpt2-large at the uncorrupted input

As before, change the order. But don't switch the names order; switch which names have which properties (y, x). This corruption's correct answer is now 'Ashley'.

In [None]:
example_prompt = "Mary has X. John has Y. Z is John. Ashley has Y. Ben has X. Z is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' X', '.', ' John', ' has', ' Y', '.', ' Z', ' is', ' John', '.', ' Ashley', ' has', ' Y', '.', ' Ben', ' has', ' X', '.', ' Z', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 15.62 Prob: 46.29% Token: | Ben|
Top 1th token. Logit: 14.28 Prob: 12.11% Token: | Ashley|
Top 2th token. Logit: 13.21 Prob:  4.13% Token: | John|
Top 3th token. Logit: 13.09 Prob:  3.66% Token: | Z|
Top 4th token. Logit: 12.66 Prob:  2.39% Token: | not|
Top 5th token. Logit: 12.29 Prob:  1.64% Token: | Mary|
Top 6th token. Logit: 12.22 Prob:  1.54% Token: | Y|
Top 7th token. Logit: 11.84 Prob:  1.05% Token: | a|
Top 8th token. Logit: 11.68 Prob:  0.89% Token: | X|
Top 9th token. Logit: 11.62 Prob:  0.84% Token: | also|


#### Using actual words instead of variable letters

Let's try again, but using actual words instead of variable letters. This is because it is a LANGUAGE model, so it may respond better to actual words instead of just variables it often doesn't associate with other words. We'll start with the non-corrupted version.

In [None]:
example_prompt = "Mary has a hat. John has no hat. The student is John. Ashley has a hat. Ben has no hat. The student is"
example_answer = " Ben"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' no', ' hat', '.', ' The', ' student', ' is', ' John', '.', ' Ashley', ' has', ' a', ' hat', '.', ' Ben', ' has', ' no', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ben']


Top 0th token. Logit: 16.76 Prob: 46.20% Token: | Ben|
Top 1th token. Logit: 16.23 Prob: 27.35% Token: | Ashley|
Top 2th token. Logit: 14.60 Prob:  5.32% Token: | John|
Top 3th token. Logit: 14.13 Prob:  3.34% Token: | Mary|
Top 4th token. Logit: 13.21 Prob:  1.32% Token: | the|
Top 5th token. Logit: 12.43 Prob:  0.61% Token: | Benjamin|
Top 6th token. Logit: 12.12 Prob:  0.45% Token: | Sarah|
Top 7th token. Logit: 12.04 Prob:  0.41% Token: | Adam|
Top 8th token. Logit: 11.99 Prob:  0.39% Token: | a|
Top 9th token. Logit: 11.69 Prob:  0.29% Token: | Becky|


 Switch which names have which properties (y, x)

In [None]:
example_prompt = "Mary has a hat. John has no hat. The student is John. Ashley has no hat. Ben has a hat. The student is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' no', ' hat', '.', ' The', ' student', ' is', ' John', '.', ' Ashley', ' has', ' no', ' hat', '.', ' Ben', ' has', ' a', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 16.51 Prob: 51.53% Token: | Ben|
Top 1th token. Logit: 15.18 Prob: 13.63% Token: | Ashley|
Top 2th token. Logit: 14.54 Prob:  7.18% Token: | John|
Top 3th token. Logit: 13.82 Prob:  3.50% Token: | Mary|
Top 4th token. Logit: 13.67 Prob:  3.01% Token: | the|
Top 5th token. Logit: 12.63 Prob:  1.06% Token: | a|
Top 6th token. Logit: 12.20 Prob:  0.69% Token: | Benjamin|
Top 7th token. Logit: 11.77 Prob:  0.45% Token: | Sarah|
Top 8th token. Logit: 11.62 Prob:  0.39% Token: | Adam|
Top 9th token. Logit: 11.41 Prob:  0.31% Token: | not|


It still fails at this.

Try more/different traits with actual words (we didn't try this for -large either). Test this on keep trait description order, but switching the order of names.

In [None]:
example_prompt = "Mary has a hat. John has a cane. John has a vest. The student is John. Ashley has a cane. Ashley has a vest. Ben has a hat. The student is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' a', ' cane', '.', ' John', ' has', ' a', ' vest', '.', ' The', ' student', ' is', ' John', '.', ' Ashley', ' has', ' a', ' cane', '.', ' Ashley', ' has', ' a', ' vest', '.', ' Ben', ' has', ' a', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 18.20 Prob: 80.88% Token: | Ben|
Top 1th token. Logit: 16.17 Prob: 10.55% Token: | Ashley|
Top 2th token. Logit: 13.88 Prob:  1.07% Token: | John|
Top 3th token. Logit: 12.91 Prob:  0.40% Token: | the|
Top 4th token. Logit: 12.81 Prob:  0.37% Token: | Benjamin|
Top 5th token. Logit: 12.63 Prob:  0.31% Token: | Mary|
Top 6th token. Logit: 12.35 Prob:  0.23% Token: | a|
Top 7th token. Logit: 11.96 Prob:  0.16% Token: | Adam|
Top 8th token. Logit: 11.86 Prob:  0.14% Token: | Becky|
Top 9th token. Logit: 11.57 Prob:  0.11% Token: | Sarah|


-xl still fails and believes the answer is Ben; it even has MORE confidence that the answer is Ben!

Use more examples of the pattern in the source. First, give it a second example (with answer) where: 
- the order is switched
- use one trait per person


In [None]:
example_prompt = "Mary has a hat. John has a cane. The student is John. Ron has a cane. Horace has a hat. The student is Ron. Ashley has a cane. Ben has a hat. The student is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' a', ' cane', '.', ' The', ' student', ' is', ' John', '.', ' Ron', ' has', ' a', ' cane', '.', ' Hor', 'ace', ' has', ' a', ' hat', '.', ' The', ' student', ' is', ' Ron', '.', ' Ashley', ' has', ' a', ' cane', '.', ' Ben', ' has', ' a', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 18.59 Prob: 65.51% Token: | Ben|
Top 1th token. Logit: 17.79 Prob: 29.44% Token: | Ashley|
Top 2th token. Logit: 13.39 Prob:  0.36% Token: | Benjamin|
Top 3th token. Logit: 12.25 Prob:  0.12% Token: | Mary|
Top 4th token. Logit: 12.22 Prob:  0.11% Token: | John|
Top 5th token. Logit: 12.16 Prob:  0.11% Token: | Ash|
Top 6th token. Logit: 12.15 Prob:  0.10% Token: | the|
Top 7th token. Logit: 12.02 Prob:  0.09% Token: | a|
Top 8th token. Logit: 11.94 Prob:  0.09% Token: | Adam|
Top 9th token. Logit: 11.79 Prob:  0.07% Token: | Barney|


It fails at this too; it still thinks the answer is Ben, by a lot.

### Test sentence completions from previous papers using -xl

In [None]:
example_prompt = "The space needle is in"
example_answer = " Seattle"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'The', ' space', ' needle', ' is', ' in']
Tokenized answer: [' Seattle']


Top 0th token. Logit: 14.64 Prob: 34.63% Token: | the|
Top 1th token. Logit: 13.44 Prob: 10.39% Token: | a|
Top 2th token. Logit: 12.44 Prob:  3.81% Token: | your|
Top 3th token. Logit: 12.04 Prob:  2.57% Token: | space|
Top 4th token. Logit: 11.96 Prob:  2.36% Token: | our|
Top 5th token. Logit: 11.77 Prob:  1.96% Token: | orbit|
Top 6th token. Logit: 11.58 Prob:  1.62% Token: |.|
Top 7th token. Logit: 11.41 Prob:  1.36% Token: | an|
Top 8th token. Logit: 11.38 Prob:  1.33% Token: | fact|
Top 9th token. Logit: 11.31 Prob:  1.23% Token: | my|


In [None]:
example_prompt = "The founder of Mirosoft is"
example_answer = " Bill"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'The', ' founder', ' of', ' M', 'iro', 'soft', ' is']
Tokenized answer: [' Bill']


Top 0th token. Logit: 14.26 Prob: 13.71% Token: | a|
Top 1th token. Logit: 13.26 Prob:  5.04% Token: | the|
Top 2th token. Logit: 12.86 Prob:  3.36% Token: | back|
Top 3th token. Logit: 12.80 Prob:  3.16% Token: | an|
Top 4th token. Logit: 12.68 Prob:  2.82% Token: | now|
Top 5th token. Logit: 12.64 Prob:  2.71% Token: | one|
Top 6th token. Logit: 12.18 Prob:  1.71% Token: | currently|
Top 7th token. Logit: 12.09 Prob:  1.55% Token: | looking|
Top 8th token. Logit: 12.02 Prob:  1.45% Token: | in|
Top 9th token. Logit: 11.99 Prob:  1.42% Token: | planning|


Try using gpt-2 from huggingface instead of transformerlens

In [None]:
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2-xl')
set_seed(42)

RuntimeError: ignored

RuntimeError: ignored

Can't load it using the transformerlens setup. Try diff nb.

https://colab.research.google.com/drive/1-pUjv-gdcdClslXRI5eFBs8KM8pN8pjD#scrollTo=lohZXK5RXLb7

Try running -xl in rome demo:
https://colab.research.google.com/github/kmeng01/rome/blob/main/notebooks/rome.ipynb#scrollTo=b09f79fa

Has issues loading transformers lib from repo, so mod it to directly pip install the libs:

https://colab.research.google.com/drive/1ZExSRkPx1lUNfrrbM2d2Ge1nLut2RlAE#scrollTo=bb3c3c37

NameError: name 'init_empty_weights' is not defined

# Try loading bigger models: gpt-neo-2.7B

Check how much RAM when transformerlens loads Neo or J etc. neo-2.7b requires an a100 gpu

https://neelnanda-io.github.io/TransformerLens/model_properties_table.html

https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/main/demos/LLaMA.ipynb#scrollTo=5ShxdHDR0Hks

https://huggingface.co/EleutherAI/gpt-neo-2.7B

In [8]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model = HookedTransformer.from_pretrained("gpt-neo-2.7B", device=device) 

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/10.7G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

Using pad_token, but it is not set yet.


Loaded pretrained model gpt-neo-2.7B into HookedTransformer


### Mary has X. John has Y. Z is John.

In [9]:
example_prompt = "Mary has X. John has Y. Z is John. Ashley has Y. Ben has X. Z is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' X', '.', ' John', ' has', ' Y', '.', ' Z', ' is', ' John', '.', ' Ashley', ' has', ' Y', '.', ' Ben', ' has', ' X', '.', ' Z', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 15.06 Prob: 29.44% Token: | Ben|
Top 1th token. Logit: 14.56 Prob: 17.87% Token: | Ashley|
Top 2th token. Logit: 14.52 Prob: 17.19% Token: | John|
Top 3th token. Logit: 14.02 Prob: 10.42% Token: | Mary|
Top 4th token. Logit: 12.74 Prob:  2.90% Token: | X|
Top 5th token. Logit: 12.39 Prob:  2.04% Token: | Y|
Top 6th token. Logit: 12.29 Prob:  1.85% Token: | Z|
Top 7th token. Logit: 11.60 Prob:  0.93% Token: | not|
Top 8th token. Logit: 11.20 Prob:  0.62% Token: | a|
Top 9th token. Logit: 10.90 Prob:  0.46% Token: | also|


This has 2.5b compared to xl's 1.5b, but the 'ashley' is closer in prob to top token 'ben'

#### Using actual words instead of variable letters

In [10]:
example_prompt = "Mary has a hat. John has no hat. The student is John. Ashley has a hat. Ben has no hat. The student is"
example_answer = " Ben"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' no', ' hat', '.', ' The', ' student', ' is', ' John', '.', ' Ashley', ' has', ' a', ' hat', '.', ' Ben', ' has', ' no', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ben']


Top 0th token. Logit: 15.43 Prob: 40.44% Token: | Ben|
Top 1th token. Logit: 15.02 Prob: 26.82% Token: | Ashley|
Top 2th token. Logit: 13.77 Prob:  7.72% Token: | Mary|
Top 3th token. Logit: 13.69 Prob:  7.13% Token: | John|
Top 4th token. Logit: 12.52 Prob:  2.21% Token: | not|
Top 5th token. Logit: 11.61 Prob:  0.88% Token: | the|
Top 6th token. Logit: 11.36 Prob:  0.69% Token: | a|
Top 7th token. Logit: 10.67 Prob:  0.35% Token: | also|
Top 8th token. Logit: 10.66 Prob:  0.34% Token: | Benjamin|
Top 9th token. Logit: 10.55 Prob:  0.31% Token: | no|


In [11]:
example_prompt = "Mary has a hat. John has no hat. The student is John. Ashley has no hat. Ben has a hat. The student is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' no', ' hat', '.', ' The', ' student', ' is', ' John', '.', ' Ashley', ' has', ' no', ' hat', '.', ' Ben', ' has', ' a', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 15.51 Prob: 46.94% Token: | Ben|
Top 1th token. Logit: 14.62 Prob: 19.33% Token: | Ashley|
Top 2th token. Logit: 13.88 Prob:  9.18% Token: | Mary|
Top 3th token. Logit: 13.34 Prob:  5.37% Token: | John|
Top 4th token. Logit: 12.49 Prob:  2.30% Token: | not|
Top 5th token. Logit: 11.78 Prob:  1.13% Token: | the|
Top 6th token. Logit: 11.41 Prob:  0.77% Token: | a|
Top 7th token. Logit: 10.83 Prob:  0.44% Token: | in|
Top 8th token. Logit: 10.81 Prob:  0.43% Token: | no|
Top 9th token. Logit: 10.79 Prob:  0.42% Token: | also|


The above still fails the analogy

In [12]:
example_prompt = "Mary has a hat. John has a cane. John has a vest. The student is John. Ashley has a cane. Ashley has a vest. Ben has a hat. The student is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' a', ' cane', '.', ' John', ' has', ' a', ' vest', '.', ' The', ' student', ' is', ' John', '.', ' Ashley', ' has', ' a', ' cane', '.', ' Ashley', ' has', ' a', ' vest', '.', ' Ben', ' has', ' a', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 15.71 Prob: 36.55% Token: | Ashley|
Top 1th token. Logit: 15.62 Prob: 33.67% Token: | Ben|
Top 2th token. Logit: 14.49 Prob: 10.82% Token: | John|
Top 3th token. Logit: 14.18 Prob:  7.91% Token: | Mary|
Top 4th token. Logit: 11.87 Prob:  0.79% Token: | the|
Top 5th token. Logit: 11.79 Prob:  0.73% Token: | not|
Top 6th token. Logit: 10.78 Prob:  0.27% Token: | Benjamin|
Top 7th token. Logit: 10.68 Prob:  0.24% Token: | also|
Top 8th token. Logit: 10.62 Prob:  0.23% Token: | Ash|
Top 9th token. Logit: 10.50 Prob:  0.20% Token: | Jack|


Amazingly, this is able to get the analogy's correct answer (Ashley), but just barely; it is 36% for ashley and 33% for Ben

Use more examples of the pattern in the source. First, give it a second example (with answer) where: 
- the order is switched
- use one trait per person


In [13]:
example_prompt = "Mary has a hat. John has a cane. The student is John. Ron has a cane. Horace has a hat. The student is Ron. Ashley has a cane. Ben has a hat. The student is"
example_answer = " Ashley"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' a', ' cane', '.', ' The', ' student', ' is', ' John', '.', ' Ron', ' has', ' a', ' cane', '.', ' Hor', 'ace', ' has', ' a', ' hat', '.', ' The', ' student', ' is', ' Ron', '.', ' Ashley', ' has', ' a', ' cane', '.', ' Ben', ' has', ' a', ' hat', '.', ' The', ' student', ' is']
Tokenized answer: [' Ashley']


Top 0th token. Logit: 15.94 Prob: 41.88% Token: | Ben|
Top 1th token. Logit: 15.65 Prob: 31.33% Token: | Ashley|
Top 2th token. Logit: 14.55 Prob: 10.50% Token: | Hor|
Top 3th token. Logit: 13.34 Prob:  3.13% Token: | John|
Top 4th token. Logit: 13.28 Prob:  2.95% Token: | Ron|
Top 5th token. Logit: 12.46 Prob:  1.30% Token: | Mary|
Top 6th token. Logit: 11.46 Prob:  0.48% Token: | the|
Top 7th token. Logit: 11.28 Prob:  0.40% Token: | not|
Top 8th token. Logit: 11.15 Prob:  0.35% Token: | Ash|
Top 9th token. Logit: 10.36 Prob:  0.16% Token: | Benjamin|


Now it fails and thinks the answer is Ben. Perhaps the example before was just a coincidence, not a correlation.

### Test if the model can recognize who "has" trait

Note that the model must recognize “has”, associating the name with the trait. Perhaps test if it can do this in the first place before trying to chain it into systems of traits:

In [16]:
example_prompt = "Mary has a hat. John does not have a hat. The person who has the hat is"
example_answer = " Mary"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' does', ' not', ' have', ' a', ' hat', '.', ' The', ' person', ' who', ' has', ' the', ' hat', ' is']
Tokenized answer: [' Mary']


Top 0th token. Logit: 14.08 Prob: 19.60% Token: | Mary|
Top 1th token. Logit: 13.05 Prob:  7.03% Token: | the|
Top 2th token. Logit: 12.82 Prob:  5.56% Token: | not|
Top 3th token. Logit: 12.67 Prob:  4.79% Token: | a|
Top 4th token. Logit: 12.29 Prob:  3.29% Token: | wearing|
Top 5th token. Logit: 12.15 Prob:  2.85% Token: | John|
Top 6th token. Logit: 12.10 Prob:  2.72% Token: | called|
Top 7th token. Logit: 11.73 Prob:  1.87% Token: | more|
Top 8th token. Logit: 11.56 Prob:  1.58% Token: | happy|
Top 9th token. Logit: 11.50 Prob:  1.49% Token: | also|


In [18]:
example_prompt = "Mary has a hat. John has a cane. The person who has the hat is"
example_answer = " Mary"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'Mary', ' has', ' a', ' hat', '.', ' John', ' has', ' a', ' cane', '.', ' The', ' person', ' who', ' has', ' the', ' hat', ' is']
Tokenized answer: [' Mary']


Top 0th token. Logit: 12.85 Prob:  8.05% Token: | Mary|
Top 1th token. Logit: 12.61 Prob:  6.31% Token: | a|
Top 2th token. Logit: 12.52 Prob:  5.75% Token: | the|
Top 3th token. Logit: 11.81 Prob:  2.83% Token: | going|
Top 4th token. Logit: 11.75 Prob:  2.67% Token: | not|
Top 5th token. Logit: 11.69 Prob:  2.52% Token: | in|
Top 6th token. Logit: 11.55 Prob:  2.20% Token: | called|
Top 7th token. Logit: 11.36 Prob:  1.82% Token: | wearing|
Top 8th token. Logit: 11.12 Prob:  1.42% Token: | more|
Top 9th token. Logit: 11.05 Prob:  1.33% Token: | always|


In [19]:
example_prompt = "John has a cane. Mary has a hat. The person who has the hat is"
example_answer = " Mary"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'John', ' has', ' a', ' cane', '.', ' Mary', ' has', ' a', ' hat', '.', ' The', ' person', ' who', ' has', ' the', ' hat', ' is']
Tokenized answer: [' Mary']


Top 0th token. Logit: 13.18 Prob:  9.53% Token: | the|
Top 1th token. Logit: 12.35 Prob:  4.14% Token: | going|
Top 2th token. Logit: 12.34 Prob:  4.11% Token: | a|
Top 3th token. Logit: 12.29 Prob:  3.91% Token: | not|
Top 4th token. Logit: 12.23 Prob:  3.68% Token: | John|
Top 5th token. Logit: 11.98 Prob:  2.88% Token: | more|
Top 6th token. Logit: 11.87 Prob:  2.57% Token: | in|
Top 7th token. Logit: 11.78 Prob:  2.35% Token: | always|
Top 8th token. Logit: 11.56 Prob:  1.89% Token: | called|
Top 9th token. Logit: 11.55 Prob:  1.86% Token: | wearing|


In [33]:
example_prompt = "John has a cane. Mary has a hat. Who has the hat?"
example_answer = " Mary"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'John', ' has', ' a', ' cane', '.', ' Mary', ' has', ' a', ' hat', '.', ' Who', ' has', ' the', ' hat', '?']
Tokenized answer: [' Mary']


Top 0th token. Logit: 15.87 Prob: 50.61% Token: |
|
Top 1th token. Logit: 13.24 Prob:  3.64% Token: | John|
Top 2th token. Logit: 13.12 Prob:  3.22% Token: | Who|
Top 3th token. Logit: 12.92 Prob:  2.65% Token: | The|
Top 4th token. Logit: 12.87 Prob:  2.50% Token: | Mary|
Top 5th token. Logit: 12.61 Prob:  1.93% Token: | And|
Top 6th token. Logit: 12.39 Prob:  1.55% Token: |

|
Top 7th token. Logit: 12.17 Prob:  1.25% Token: | I|
Top 8th token. Logit: 12.03 Prob:  1.09% Token: | Well|
Top 9th token. Logit: 11.93 Prob:  0.98% Token: | It|


In [23]:
example_prompt = "John is tall. Mary is red. The person who is red is"
example_answer = " Mary"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'John', ' is', ' tall', '.', ' Mary', ' is', ' red', '.', ' The', ' person', ' who', ' is', ' red', ' is']
Tokenized answer: [' Mary']


Top 0th token. Logit: 14.14 Prob: 20.37% Token: | John|
Top 1th token. Logit: 13.04 Prob:  6.74% Token: | tall|
Top 2th token. Logit: 12.84 Prob:  5.52% Token: | the|
Top 3th token. Logit: 12.75 Prob:  5.06% Token: | taller|
Top 4th token. Logit: 12.64 Prob:  4.51% Token: | not|
Top 5th token. Logit: 12.32 Prob:  3.29% Token: | a|
Top 6th token. Logit: 12.22 Prob:  2.99% Token: | Mary|
Top 7th token. Logit: 11.75 Prob:  1.86% Token: | also|
Top 8th token. Logit: 11.51 Prob:  1.46% Token: | short|
Top 9th token. Logit: 11.43 Prob:  1.35% Token: | shorter|


In [32]:
example_prompt = "John is tall. Mary is red. Who is red?"
example_answer = " Mary"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'John', ' is', ' tall', '.', ' Mary', ' is', ' red', '.', ' Who', ' is', ' red', '?']
Tokenized answer: [' Mary']


Top 0th token. Logit: 14.78 Prob: 29.12% Token: |
|
Top 1th token. Logit: 12.95 Prob:  4.65% Token: | John|
Top 2th token. Logit: 12.84 Prob:  4.19% Token: | The|
Top 3th token. Logit: 12.50 Prob:  2.98% Token: | Mary|
Top 4th token. Logit: 12.36 Prob:  2.58% Token: | Who|
Top 5th token. Logit: 12.28 Prob:  2.39% Token: | I|
Top 6th token. Logit: 12.23 Prob:  2.27% Token: | Well|
Top 7th token. Logit: 12.11 Prob:  2.02% Token: | You|
Top 8th token. Logit: 11.99 Prob:  1.78% Token: | That|
Top 9th token. Logit: 11.98 Prob:  1.77% Token: | Red|


### Test other prompts from prev papers

In [30]:
example_prompt = "The Empire State Building is in"
example_answer = " New"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'The', ' Empire', ' State', ' Building', ' is', ' in']
Tokenized answer: [' New']


Top 0th token. Logit: 15.11 Prob: 25.85% Token: | the|
Top 1th token. Logit: 14.16 Prob:  9.95% Token: | New|
Top 2th token. Logit: 14.12 Prob:  9.60% Token: | a|
Top 3th token. Logit: 13.67 Prob:  6.15% Token: | danger|
Top 4th token. Logit: 12.55 Prob:  1.99% Token: | fact|
Top 5th token. Logit: 12.27 Prob:  1.51% Token: | trouble|
Top 6th token. Logit: 12.21 Prob:  1.43% Token: | need|
Top 7th token. Logit: 12.20 Prob:  1.40% Token: | its|
Top 8th token. Logit: 12.18 Prob:  1.38% Token: | many|
Top 9th token. Logit: 12.17 Prob:  1.37% Token: | flames|


In [29]:
example_prompt = "The city the Empire State Building is in is"
example_answer = " New"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'The', ' city', ' the', ' Empire', ' State', ' Building', ' is', ' in', ' is']
Tokenized answer: [' New']


Top 0th token. Logit: 14.19 Prob: 19.54% Token: | New|
Top 1th token. Logit: 12.92 Prob:  5.52% Token: | not|
Top 2th token. Logit: 12.81 Prob:  4.92% Token: | a|
Top 3th token. Logit: 12.74 Prob:  4.59% Token: | called|
Top 4th token. Logit: 12.48 Prob:  3.55% Token: | the|
Top 5th token. Logit: 11.78 Prob:  1.76% Token: | now|
Top 6th token. Logit: 11.77 Prob:  1.74% Token: | in|
Top 7th token. Logit: 11.72 Prob:  1.66% Token: | one|
Top 8th token. Logit: 11.71 Prob:  1.64% Token: | Manhattan|
Top 9th token. Logit: 11.62 Prob:  1.50% Token: | named|


In [31]:
example_prompt = "The city the Empire State Building is in is New"
example_answer = " York"
utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'The', ' city', ' the', ' Empire', ' State', ' Building', ' is', ' in', ' is', ' New']
Tokenized answer: [' York']


Top 0th token. Logit: 20.59 Prob: 96.50% Token: | York|
Top 1th token. Logit: 16.72 Prob:  2.01% Token: | Jersey|
Top 2th token. Logit: 14.41 Prob:  0.20% Token: | Haven|
Top 3th token. Logit: 14.39 Prob:  0.20% Token: | Roc|
Top 4th token. Logit: 14.30 Prob:  0.18% Token: | Orleans|
Top 5th token. Logit: 13.56 Prob:  0.09% Token: | Yorkers|
Top 6th token. Logit: 13.18 Prob:  0.06% Token: | London|
Top 7th token. Logit: 13.10 Prob:  0.05% Token: | Albany|
Top 8th token. Logit: 13.08 Prob:  0.05% Token: |burgh|
Top 9th token. Logit: 13.02 Prob:  0.05% Token: | Brunswick|
