# CRISP: Unlearning Harry Potter Demo

This notebook demonstrates the CRISP method for unlearning concept (Harry Potter).

## 1. Setup and Model Loading
We initialize the environment, select the model (Gemma 2 2B or Llama 3.1 8B), and download the necessary Sparse Autoencoders (SAEs).

In [2]:
%load_ext autoreload
%autoreload 2

import os
import torch
from globals import GEMMA_2_2B, LLAMA_3_1_8B
from crisp import CRISP, CRISPConfig
from unlearn import unlearn_lora, UnlearnConfig
from data import load_hp_data, HPDataConfig, genenrate_hp_eval_text
from sae import JumpReLUSAE, TopkSae
from eval import get_mcq_accuracy
from utils import load_cached_features, get_feature_tokens
from crisp import LayerFeatures
from plot import plot_features_scatter

# os.environ['HF_TOKEN'] = <YOUR_HF_TOKEN>
if os.environ['HF_TOKEN'] is None:
    raise ValueError("HF_TOKEN environment variable not set. Please set it to your Hugging Face token.")

# Configuration
MODEL_CARD = GEMMA_2_2B # LLAMA_3_1_8B
is_gemma = (MODEL_CARD == GEMMA_2_2B)

GEMMA_CONFIG = {
    "sae_layers": list(range(4, 15, 2)),
    "save_path": "gemma_sae_cache",
    "sae_class": JumpReLUSAE,
    "model_name_short": "gemma",
    "unlearn": {
        "learning_rate": 1e-5,
        "k_features": 10,
        "alpha": 5,
    },
    "neuronpedia_id": "gemma-2-2b",
    "neuronpedia_source_suffix": "-gemmascope-res-16k",
    "layer_to_plot": 10
}

LLAMA_CONFIG = {
    "sae_layers": list(range(4, 30, 2)),
    "save_path": "llama_sae_cache",
    "sae_class": TopkSae,
    "model_name_short": "llama",
    "unlearn": {
        "learning_rate": 2e-5,
        "k_features": 10,
        "alpha": 30,
    },
    "neuronpedia_id": "llama3.1-8b",
    "neuronpedia_source_suffix": "-llamascope-res-32k",
    "layer_to_plot": 20
}

CONFIG = GEMMA_CONFIG if is_gemma else LLAMA_CONFIG

SAE_LAYERS = CONFIG["sae_layers"]

print(f"Using model: {MODEL_CARD}")
print(f"Operating on layers: {SAE_LAYERS}")

  from .autonotebook import tqdm as notebook_tqdm


Using model: google/gemma-2-2b
Operating on layers: [4, 6, 8, 10, 12, 14]


In [3]:
save_path = CONFIG["save_path"]
SAE_CLASS = CONFIG["sae_class"]

print(f"Checking/Downloading SAEs to {save_path}...")
for layer in SAE_LAYERS:
    layer_path = os.path.join(save_path, f"layer_{layer}")
    if not os.path.exists(layer_path):
        print(f"Downloading SAE for layer {layer}...")
        SAE_CLASS.download_and_save(layer=layer, save_path=save_path)

Checking/Downloading SAEs to gemma_sae_cache...


In [4]:
config = CRISPConfig(
    layers=SAE_LAYERS, 
    model_name=CONFIG["model_name_short"], 
    bf16=True
)
crisp = CRISP(config)

Loading from cache: /private/fetaya-lab/noam_diamant/projects/Unlearning_with_SAE/CRISP/crisp/gemma_sae_cache


Loading SAEs:   0%|          | 0/6 [00:00<?, ?it/s]

Loading layers.4 on cuda:0


Loading SAEs:  17%|█▋        | 1/6 [00:00<00:02,  2.10it/s]

Loading layers.6 on cuda:0


Loading SAEs:  33%|███▎      | 2/6 [00:00<00:01,  2.27it/s]

Loading layers.8 on cuda:0


Loading SAEs:  50%|█████     | 3/6 [00:01<00:01,  2.36it/s]

Loading layers.10 on cuda:0


Loading SAEs:  67%|██████▋   | 4/6 [00:01<00:00,  2.40it/s]

Loading layers.12 on cuda:0


Loading SAEs:  83%|████████▎ | 5/6 [00:02<00:00,  2.37it/s]

Loading layers.14 on cuda:0


Loading SAEs: 100%|██████████| 6/6 [00:02<00:00,  2.32it/s]
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  1.83it/s]


## 3. Load Data
We load the forget dataset (Harry Potter text) and a retain dataset (general book text) to help the model distinguish between what to remove and what to keep.

In [5]:
# Load Harry Potter text to unlearn and retain data
data_config = HPDataConfig(n_examples=2500)

data = load_hp_data(
    n_examples=data_config.n_examples,
    benign=data_config.retain_type,
    max_len=data_config.max_length
)
print(f"Loaded {len(data['forget'])} HP examples and {len(data['retain'])} retain examples")

Loaded 2500 HP examples and 2500 retain examples


## 4. Identify Salient Features
CRISP uses SAEs to identify features that are highly active on the forget data but not on the retain data. These salient features represent the specific knowledge we want to unlearn.

In [6]:
# Process texts to identify Harry Potter-specific features
print("Processing features (this may take a moment)...")
crisp.process_multi_texts_batch(
    text_target=data['forget'],
    text_benign=data['retain'],
    data_config=data_config,
    batch_size=8
)
print("Feature processing complete.")

Processing features (this may take a moment)...
Found 6 cached layers and 0 uncached layers.
All layers have cached features.
Feature processing complete.


## 5. Unlearn
We apply the unlearning process. This involves fine-tuning the model (using LoRA) to suppress the identified salient features while maintaining performance on general tasks.

In [7]:
crisp.unload_lora()
torch.cuda.empty_cache()

In [8]:
uconfig = UnlearnConfig(
    learning_rate=CONFIG["unlearn"]["learning_rate"],
    k_features=CONFIG["unlearn"]["k_features"],
    alpha=CONFIG["unlearn"]["alpha"],
    beta=0.99,
    gamma=0.01,
    batch_size=4,
    lora_rank=4,
    data_type="hp",
    verbose=True
)

print("Starting unlearning process...")
unlearn_lora(crisp, text_target=data['forget'], text_benign=data['retain'], config=uconfig, data_config=data_config)
print("Unlearning complete.")

Starting unlearning process...
Unlearn Config: {
  "learning_rate": 1e-05,
  "num_epochs": 1,
  "batch_size": 4,
  "k_features": 10,
  "alpha": 5,
  "beta": 0.99,
  "gamma": 0.01,
  "lora_rank": 4,
  "save_model": false,
  "save_path": "/private/fetaya-lab/noam_diamant/projects/Unlearning_with_SAE/CRISP/CRISP/saved_models/crisp",
  "data_type": "hp",
  "verbose": "hp"
}
CRISP Config: {
  "model_name": "google/gemma-2-2b",
  "layers": [
    4,
    6,
    8,
    10,
    12,
    14
  ],
  "saes_model_name": "google/gemma-2-2b",
  "bf16": true
}
Data Config: HPDataConfig(max_length=1000, min_length=1000, n_examples=2500, forget_dataset_name='WutYee/HarryPotter_books_1to7', retain_dataset_name='Blackroot/Tiny-Open-Domain-Books', wiki_dataset_name='wikitext', wiki_config='wikitext-2-raw-v1', retain_type='book')
SEED: 0


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass


Batch 25/625, Loss: 3.80e-01 (Unlearn: 3.79e-01, Reg: 8.43e-04, Coherency: 3.31e-05)




Batch 50/625, Loss: 3.60e-01 (Unlearn: 3.57e-01, Reg: 2.49e-03, Coherency: 9.25e-05)




Batch 75/625, Loss: 3.58e-01 (Unlearn: 3.52e-01, Reg: 5.46e-03, Coherency: 3.72e-04)




Batch 100/625, Loss: 3.44e-01 (Unlearn: 3.38e-01, Reg: 5.00e-03, Coherency: 4.88e-04)




Batch 125/625, Loss: 3.35e-01 (Unlearn: 3.29e-01, Reg: 5.31e-03, Coherency: 9.65e-04)




Batch 150/625, Loss: 3.29e-01 (Unlearn: 3.16e-01, Reg: 1.32e-02, Coherency: 4.35e-04)




Batch 175/625, Loss: 3.07e-01 (Unlearn: 2.98e-01, Reg: 8.97e-03, Coherency: 8.13e-04)


The following generation flags are not valid and may be ignored: ['top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Batch 200/625, Loss: 2.92e-01 (Unlearn: 2.80e-01, Reg: 9.34e-03, Coherency: 1.98e-03)

----------------------------------------------------------------------

Harry Potter is a student at Hogwarts School of Witchcraft and Wizardry. He
Voldemort and Dumbledore are famous for their rivalry in the Harry Potter series. But
The school where Harry, Ron, and Hermione study is called the "School of Hard Knocks."

The
A wizard who wishes to master the Dark Arts should attend the Dark Arts Academy. The Dark Arts Academy is
Professor Snape is known for teaching potions and for being a very strict teacher. He
Harry Potter's closest friends are the ones who are always there for him. They
The high-security wizarding prison guarded by Dementors is named Azkaban.

The name Azkaban
The sport played on broomsticks at Hogwarts is called Quidditch. It is a game that is
The wizarding bank protected by goblins are named after the goblins who live in the bank.


The magical creatures that pull the Hogwarts c


[AAsking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[

Overall accuracy for hp_mcq: 0.589




Batch 225/625, Loss: 2.73e-01 (Unlearn: 2.58e-01, Reg: 1.38e-02, Coherency: 1.76e-03)




Batch 250/625, Loss: 2.41e-01 (Unlearn: 2.31e-01, Reg: 9.64e-03, Coherency: 4.79e-04)




Batch 275/625, Loss: 2.17e-01 (Unlearn: 2.03e-01, Reg: 1.26e-02, Coherency: 1.34e-03)




Batch 300/625, Loss: 2.12e-01 (Unlearn: 1.87e-01, Reg: 1.68e-02, Coherency: 8.06e-03)




Batch 325/625, Loss: 2.32e-01 (Unlearn: 2.13e-01, Reg: 1.56e-02, Coherency: 3.60e-03)




Batch 350/625, Loss: 2.30e-01 (Unlearn: 2.15e-01, Reg: 1.43e-02, Coherency: 1.10e-03)




Batch 375/625, Loss: 2.55e-01 (Unlearn: 2.33e-01, Reg: 1.65e-02, Coherency: 5.37e-03)




Batch 400/625, Loss: 2.41e-01 (Unlearn: 2.05e-01, Reg: 3.20e-02, Coherency: 4.58e-03)

----------------------------------------------------------------------

Harry Potter is a student at Hogwarts.

Harry Read.

This is the
Voldemort and Dumbledore are famous for the "<strong><strong><strong><strong><strong><strong><strong>
The school where Harry, Ron, and Hermione study is called the school library s. The school library s.
A wizard who wishes to master the Dark Arts should attend the Dark Wizard, who is a Dark Wizard,
Professor Snape is known for teaching the subject of <strong>Biochemistry</strong> at the
Harry Potter's closest friends are back in the spotlight in the new film, "
The high-security wizarding prison guarded by Dementors is named <strong>The</strong> <strong>Prisoner</strong> 
The sport played on broomsticks at Hogwarts is called Qu

The answer is 1.

The
The wizarding bank protected by goblins are named "Wizard"

<strong>The 0 wizard</strong>
The magical creatures that


[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Evaluating batches: 100%|██████████| 155/155 [00:08<00:00, 18.54it/s, Batch

Overall accuracy for hp_mcq: 0.302




Batch 425/625, Loss: 2.50e-01 (Unlearn: 2.25e-01, Reg: 2.03e-02, Coherency: 4.24e-03)




Batch 450/625, Loss: 2.57e-01 (Unlearn: 2.26e-01, Reg: 2.94e-02, Coherency: 1.42e-03)




Batch 475/625, Loss: 2.54e-01 (Unlearn: 2.30e-01, Reg: 1.94e-02, Coherency: 4.70e-03)




Batch 500/625, Loss: 2.75e-01 (Unlearn: 2.42e-01, Reg: 2.64e-02, Coherency: 6.62e-03)




Batch 525/625, Loss: 2.36e-01 (Unlearn: 2.14e-01, Reg: 1.73e-02, Coherency: 4.61e-03)




Batch 550/625, Loss: 2.53e-01 (Unlearn: 2.20e-01, Reg: 3.15e-02, Coherency: 1.56e-03)




Batch 575/625, Loss: 2.42e-01 (Unlearn: 2.14e-01, Reg: 2.33e-02, Coherency: 4.52e-03)




Batch 600/625, Loss: 2.83e-01 (Unlearn: 2.44e-01, Reg: 3.02e-02, Coherency: 8.85e-03)

----------------------------------------------------------------------

Harry Potter is a student at Hogwarts.

<strong>Harry 2017
Voldemort and Dumbledore are famous in the United States

[User 00
The school where Harry, Ron, and Hermione study is called 's answer 1.
The 1
A wizard who wishes to master the Dark Arts should attend the DarkMagic class

The DarkMagic class will
Professor Snape is known for teaching 
 
 
 
 

Harry Potter's closest friends are about to be the main characters in a movie as
The high-security wizarding prison guarded by Dementors is named<strong><strong><strong><strong><strong><strong><strong><strong><strong><strong>
The sport played on broomsticks at Hogwarts is called Qu.

The sport of 202
The wizarding bank protected by goblins are named "Wizard" <strong>"Wizard"</strong> 
The magical creatures that pull the Hogwarts carriages are called <strong>"</strong> <strong><stro


[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Evaluating batches: 100%|██████████| 155/155 [00:08<00:00, 18.81it/s, Batch Acc

Overall accuracy for hp_mcq: 0.242


Epochs: 100%|██████████| 1/1 [06:57<00:00, 417.44s/it]

Batch 625/625, Loss: 2.58e-01 (Unlearn: 2.05e-01, Reg: 4.83e-02, Coherency: 5.10e-03)
Unlearning complete.





In [9]:
print("-"*50)
print("Original Model")
print("-"*50)

with crisp.model.disable_adapter():
    print("Evaluating original Harry Potter accuracy...")
    original_hp_acc = get_mcq_accuracy(crisp, type="hp")

    print("Generating Harry Potter evaluation text of origina model...")
    genenrate_hp_eval_text(crisp)

    print(f"Evaluating original MMLU accuracy...")
    original_mmlu_acc = get_mcq_accuracy(crisp, type="mmlu")


print("-"*50)
print("After Unlearning")
print("-"*50)

print("Evaluating Harry Potter accuracy after unlearning...")
hp_acc_after = get_mcq_accuracy(crisp, type="hp")
print(f"HP Accuracy after unlearning: {hp_acc_after:.2%} vs original {original_hp_acc:.2%}")

print("Generating Harry Potter evaluation text after unlearning...")
genenrate_hp_eval_text(crisp)

print("Evaluating MMLU accuracy after unlearning...")
after_acc = get_mcq_accuracy(crisp, type="mmlu")

--------------------------------------------------
Original Model
--------------------------------------------------
Evaluating original Harry Potter accuracy...


Evaluating batches: 100%|██████████| 155/155 [00:07<00:00, 20.62it/s, Batch Acc=0.571, Total Acc=0.625]


Overall accuracy for hp_mcq: 0.625
Generating Harry Potter evaluation text of origina model...

----------------------------------------------------------------------

Harry Potter is a student at Hogwarts School of Witchcraft and Wizardry. He
Voldemort and Dumbledore are famous for their rivalry. But what if they were friends
The school where Harry, Ron, and Hermione study is called Hogwarts. It is a school for wizards and witches
A wizard who wishes to master the Dark Arts should attend the Dark Arts Academy. The Dark Arts Academy is
Professor Snape is known for teaching Potions and being a very strict teacher. He
Harry Potter's closest friends are all here, including Ron Weasley, Hermione Granger,
The high-security wizarding prison guarded by Dementors is named Azkaban.

The wizarding prison is
The sport played on broomsticks at Hogwarts is called Quidditch. It is a game played by
The wizarding bank protected by goblins are named Gringotts.

The wizarding bank is
The magical creatur

Evaluating batches: 100%|██████████| 1756/1756 [03:03<00:00,  9.58it/s, Batch Acc=1.000, Total Acc=0.463]


Overall accuracy for mmlu: 0.463
--------------------------------------------------
After Unlearning
--------------------------------------------------
Evaluating Harry Potter accuracy after unlearning...


Evaluating batches: 100%|██████████| 155/155 [00:08<00:00, 17.99it/s, Batch Acc=0.000, Total Acc=0.235]


Overall accuracy for hp_mcq: 0.235
HP Accuracy after unlearning: 23.49% vs original 62.47%
Generating Harry Potter evaluation text after unlearning...

----------------------------------------------------------------------

Harry Potter is a student at Hogwarts (2020) The 2
Voldemort and Dumbledore are famous for being a part of a group of the same
The school where Harry, Ron, and Hermione study is called 'S10' = 's10
A wizard who wishes to master the Dark Arts should attend the Dark Magic

A wizard who is not a
Professor Snape is known for teaching 
<strong>1. </strong><strong>The 
Harry Potter's closest friends are the ones we are experiencing. 202
The high-security wizarding prison guarded by Dementors is named<strong>The high-security wizard</strong>, <strong>
The sport played on broomsticks at Hogwarts is called Qu.

The sport of my favorite broomstick
The wizarding bank protected by goblins are named "Wizard" <strong>"Wizard"</strong> 
The magical creatures that pull the Hogwarts

Evaluating batches: 100%|██████████| 1756/1756 [03:33<00:00,  8.22it/s, Batch Acc=1.000, Total Acc=0.449]

Overall accuracy for mmlu: 0.449





## 6. Analysis and Visualization
We visualize the features to see which ones were identified as salient. We also inspect the top features using Neuronpedia to understand what concepts they represent.

In [10]:
# Visualize features for one of the layers
layer_to_plot = CONFIG["layer_to_plot"]

cached_features = load_cached_features(layer_to_plot, data_config, model_name=MODEL_CARD)
layer_features = LayerFeatures(cached_features)

print(f"Loaded {len(layer_features.features)} features for layer {layer_to_plot}")

plot_features_scatter(
    layer_features=layer_features,
    k_features=5,
    top_percentile=0.05
)

Loaded 15974 features for layer 10
Plotting top 5.0% most frequent features: 798 out of 15974
Target features: [6520, 2443, 3347, 13314, 3870]
Benign features: [12405, 8296, 2211, 12091, 11316]
Shared features: [3986, 3031, 4392, 1740, 2575]


In [11]:
# Inspect the most salient feature using Neuronpedia
from IPython.display import HTML, display

model_id = CONFIG["neuronpedia_id"]

# Find feature with highest target_acts_relative
best_feature = max(layer_features.topk_filtered(5), key=lambda f: f.target_acts_relative)
feature_index = best_feature.index

print(f"Inspecting Feature {feature_index} from Layer {layer_to_plot}...")

feature_data = get_feature_tokens(model_id, layer_to_plot, feature_index, top_k=5)

if feature_data:
    source = f"{layer_to_plot}{CONFIG['neuronpedia_source_suffix']}"
        
    neuronpedia_url = f"https://www.neuronpedia.org/{model_id}/{source}/{feature_index}"

    # Create HTML content
    tokens_html = ' '.join([f'<span style="background-color: #e1ecf4; color: #2c5282; padding: 2px 8px; border-radius: 4px; margin-right: 5px; display: inline-block; border: 1px solid #b3d4fc;">{token}</span>' for token in feature_data['pos_str']])
    
    description = feature_data['explanations'][0]['description'] if feature_data.get('explanations') else "No description available"

    html_content = f"""
    <div style="border: 1px solid #e0e0e0; padding: 20px; border-radius: 8px; background-color: #ffffff; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; box-shadow: 0 2px 4px rgba(0,0,0,0.05); max-width: 600px;">
        <div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 15px; border-bottom: 1px solid #eee; padding-bottom: 10px;">
            <h3 style="margin: 0; color: #333; font-size: 1.2em;">Feature {feature_index}</h3>
            <span style="background-color: #f0f0f0; color: #666; padding: 2px 8px; border-radius: 12px; font-size: 0.8em;">Layer {layer_to_plot}</span>
        </div>
        
        <div style="margin-bottom: 20px;">
            <div style="text-transform: uppercase; font-size: 0.75em; color: #888; margin-bottom: 5px; letter-spacing: 0.5px;">Auto Description</div>
            <div style="font-size: 1.1em; color: #1a1a1a; line-height: 1.4;">{description}</div>
        </div>
        
        <div style="margin-bottom: 20px;">
            <div style="text-transform: uppercase; font-size: 0.75em; color: #888; margin-bottom: 8px; letter-spacing: 0.5px;">Top Tokens (Logit Lens)</div>
            <div style="display: flex; flex-wrap: wrap; gap: 5px;">
                {tokens_html}
            </div>
        </div>
        
        <div style="text-align: right;">
            <a href="{neuronpedia_url}" target="_blank" style="color: #0969da; text-decoration: none; font-size: 0.9em; font-weight: 500;">View on Neuronpedia &rarr;</a>
        </div>
    </div>
    """
    display(HTML(html_content))

Inspecting Feature 6520 from Layer 10...
