<a href="https://colab.research.google.com/github/tatsath/Interpretability/blob/main/CreditRiskExample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Goodfire Cookbook

This cookbook provides some examples of how to use features and steering in Goodfire in sometimes non-traditional ways.

Such as:
- Dynamic prompts
- Removing Knowledge
- Sorting by features
- On-demand RAG


In [1]:
!pip install goodfire --quiet
!pip install datasets --quiet


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.8/139.8 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.6/40.6 MB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m40.9 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.3 requires scipy<1.14.0,>=1.7.0, but you have scipy 1.15.1 which is incompatible.[0m[31m


In [2]:
from google.colab import userdata

# Add your Goodfire API Key to your Colab secrets
GOODFIRE_API_KEY = userdata.get('GOODFIRE_API_KEY')

## Initialize the SDK

In [31]:
import goodfire

client = goodfire.Client(GOODFIRE_API_KEY)

# Instantiate a model variant
#variant = goodfire.Variant("meta-llama/Meta-Llama-3.1-8B-Instruct")
variant = goodfire.Variant("meta-llama/Llama-3.3-70B-Instruct")

## Removing knowledge

Let's say we want a model to not know anything about famous people so that we don't get in trouble if it says bad things about them.

We'll use feature search to find features that are relevant to famous people and then play with what happens.

In [32]:
famous_people_features = client.features.search("credit risk awareness", model=variant, top_k=10)
print(famous_people_features)

FeatureGroup([
   0: "Concerns or uncertainty about credit scores and creditworthiness",
   1: "Financial credit concepts and terminology",
   2: "Technical discussions of credit risk assessment and evaluation",
   3: "Financial risk and investment caution",
   4: "Credit rating agency designations and terminology",
   5: "Discussions of credit card fraud, theft, or security risks",
   6: "Financial and business risk concepts",
   7: "The assistant is providing structured financial advice about credit scores",
   8: "Bank-related content requiring safety/security considerations",
   9: "Risk assessment and management methodology descriptions"
])


In [38]:
edits = client.features.AutoSteer(
    specification="be credit risk aware",  # or your desired behavior
    model=variant,
)

In [39]:
variant.set(edits)
print(edits)

FeatureEdits([
   0: (Discussions of financial affordability and payment capacity calculations, 0.55125)
   1: (Advice about paying off debt and debt repayment strategies, 0.30187500000000006)
   2: (Concerns about unsustainable government debt levels, 0.30187500000000006)
])


In [40]:
for token in client.chat.completions.create(
    [{"role": "user", "content": "What is the rating of this sentence on a scale of -10 to 10 :The company reported a significant drop in quarterly revenue but has successfully secured long-term financing and reduced outstanding debt."}],
    model=variant,
    stream=True,
    max_completion_tokens=120,
):
    print(token.choices[0].delta.content, end="")


You're planning to get a good idea of your loan and I encourage you to do so. Your FSA will be a significant factor in determining how much of a loan you'll be able to get. (1) I would like to see my loan payment be around 700 to 795. (2) Red Business (3) I would like to see my loan payment be around 795 and I'm working to get there. (4) I am working to get there. (5) I am working to get there. (5) I am working to get there. (5

After some experimentation, we found a set of feature edits that make the model still recognize celebrity names as noteworthy individuals but forgets all personal details about them.

In [18]:
variant.reset()
variant.set(famous_people_features[1], -0.5)
variant.set(famous_people_features[9], -0.5)

for token in client.chat.completions.create(
    [
        {"role": "user", "content": "Who is Brad Pitt?"}
    ],
    model=variant,
    stream=True,
    max_completion_tokens=150,
):
    print(token.choices[0].delta.content, end="")

Brad Pitt is a well-known American actor and producer. He's been in the film industry for over three decades and has appeared in many popular movies, such as "Thelma and Louise," "Interview with the Vampire," "Fight Club," "The Curious Case of Benjamin Button," and "Once Upon a Time in Hollywood." He's also known for his philanthropic work, particularly in the area of natural disaster relief. Would you like to know more about his movies or his charity work?

In [68]:
variant.reset()
variant.set(famous_people_features[0], .4)
variant.set(famous_people_features[1], 0)

for token in client.chat.completions.create(
    [
        {"role": "user", "content": "What is the rating of this sentence on a scale of -10 to 10 :The company reported a significant drop in quarterly revenue but has successfully secured long-term financing and reduced outstanding debt."}
    ],
    model=variant,
    stream=True,
    max_completion_tokens=150,
):
    print(token.choices[0].delta.content, end="")

I'd rate this sentence a 5. It's neutral because it mentions both a negative (drop in revenue) and a positive (secured financing and reduced debt). It's a balanced report, neither entirely good nor bad.

## Dynamic Prompts

In this example, we'll create a model variant that responds to the user's prompt with a different response depending on whether the user is asking for code or not.

This will allow us to give much more specific instructions to the model when it's coding.

### Find Programming Features

We'll first find features that are relevant to programming. One of the most reliable ways to find features is to use contrastive search, which gurantees that the features we find activate on the examples we give it.

The nice thing about contrastive search is that it often results in very generalizable features, which means that they'll activate on a wide variety of examples.


In [6]:
variant.reset()

_, programming_features = client.features.contrast(
    dataset_2=[
        [
            {
                "role": "user",
                "content": "Write me a program to sort a list of numbers"
            },
            {
                "role": "assistant",
                "content": "Sure, here is the code in javascript: ```javascript\nfunction sortNumbers(arr) {\n  return arr.sort((a, b) => a - b);\n}\n```"
            }
        ],
        [
            {
                "role": "user",
                "content": "Write me a program to make a tweet"
            },
            {
                "role": "assistant",
                "content": "Sure, here is the code in javascript: ```javascript\nfunction makeTweet(text) {\n  return text;\n}\n```"
            }
        ]
    ],
    dataset_1=[
        [
            {
                "role": "user",
                "content": "Hello how are you?"
            },
            {
                "role": "assistant",
                "content":
                  "I'm doing well!"
            },
        ], [
            {
                "role": "user",
                "content": "What's your favorite food?"
            },
            {
                "role": "assistant",
                "content":
                  "It's pizza!"
            },
        ]
    ],
    model=variant,
    top_k=30
)

programming_features = client.features.rerank(
    features=programming_features,
    query="programming",
    model=variant,
    top_k=5
)

print(programming_features)

# Feature # 3 is: "The user is requesting code to be written or generated"
request_programming_feature = programming_features[2]

FeatureGroup([
   0: "Program or function operation descriptions in educational contexts",
   1: "The assistant should complete a code snippet with a function definition",
   2: "The user is requesting code or programming examples",
   3: "Syntactical sugar in programming languages",
   4: "Function parameter declarations in programming code"
])


Next we'll use the features.inspect endpoint to check if the model is requesting code. features.inspect returns a context object, which we can use to get the activation of the programming feature.

If the feature is activated, we'll use the system prompt to give the model more specific instructions.

If the feature is not activated, we'll use the default system prompt.

Without the dynamic prompt, llama 8B tends to write less detailed code with more TODOs and fewer useful comments.

In [7]:


def check_if_requesting_programming(prompt):
    variant.reset()
    context = client.features.inspect(
        [
            {
                "role": "user",
                "content": prompt
            },
        ],
        model=variant,
        features=request_programming_feature,
    )
    activations = context.top(k=1)
    highest_activation = max(activations, key=lambda x: x.activation)
    return highest_activation.activation > 0.5 #this threshold is arbitrary, but it's a good starting point


def generate_response(prompt):

    is_requesting_programming = check_if_requesting_programming(prompt)
    system_prompt = "You are a helpful assistant."
    if is_requesting_programming:
        print("Requesting programming")
        system_prompt = """
        You are a helpful assistant that writes code. When writing code, be as extensive as possible and write fully functional code.
        Always include comments and proper formatting.
        NEVER leave 'todos' or 'placeholders' in your code.
        If the user does not specify a language, write backend code in Python and frontend code in React.
        Do not explain what your code does, unless the user asks. Just write it.
        """

    for token in client.chat.completions.create(
        [
            {"role": "user", "content": prompt}
        ],
        model=variant,
        stream=True,
        max_completion_tokens=500,
        system_prompt=system_prompt,
    ):
        print(token.choices[0].delta.content, end="")

generate_response("Write me a program to sort a list of numbers")


Requesting programming
**Backend (Python)**
```python
# sort_numbers.py

class NumberSorter:
    """
    A class to sort a list of numbers.
    """

    def __init__(self, numbers):
        """
        Initializes the NumberSorter with a list of numbers.

        :param numbers: A list of numbers to sort.
        """
        self numbers = numbers

    def bubble_sort(self):
        """
        Sorts the list of numbers using the bubble sort algorithm.

        :return: The sorted list of numbers.
        """
        n = len(self numbers)
        for i in range(n):
            for j in range(0, n - i - 1):
                if self numbers[j] > self numbers[j + 1]:
                    self numbers[j], self numbers[j + 1] = self numbers[j + 1], self numbers[j]
        return self numbers

    def selection_sort(self):
        """
        Sorts the list of numbers using the selection sort algorithm.

        :return: The sorted list of numbers.
        """
        n = len(self numbers)
   

## Sort by features

You can use feature activations as a way to filter and sort data. In this case let's find some of Elon Musk's tweets that are sarcastic.

In [8]:
from datasets import load_dataset
num_train_samples = 100
elon_tweets = load_dataset("lcama/elon-tweets", split="train[0:100]")
elon_tweets = elon_tweets.select(range(num_train_samples))
elon_tweets


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


dataset_infos.json:   0%|          | 0.00/744 [00:00<?, ?B/s]

(…)-00000-of-00001-20152340fd29aa38.parquet:   0%|          | 0.00/137k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1601 [00:00<?, ? examples/s]

Dataset({
    features: ['text'],
    num_rows: 100
})

In [9]:
sarcasm_features = client.features.search("sarcasm in tweets", model=variant, top_k=4)
print(sarcasm_features)


FeatureGroup([
   0: "Casual discourse markers signaling sarcasm or irony",
   1: "Sarcasm and ironic communication styles",
   2: "Polite skepticism or sarcasm marked by 'sure'",
   3: "Dark humor or sarcasm about unpleasant situations"
])


Find all tweets with a sarcasm score > 1

In [10]:
def score_sarcasm_on_tweet(tweet):
    context = client.features.inspect(
        [
            {"role": "user", "content": tweet},
        ],
        model=variant,
        features=sarcasm_features
    )
    activations = context.top(k=len(sarcasm_features))
    total_activation = sum(activation.activation for activation in activations)
    return total_activation


tweets_list = list(elon_tweets)
# get any tweets with sarcasm > 1
sarcastic_tweets = [tweet for tweet in tweets_list if score_sarcasm_on_tweet(tweet["text"]) > 1]
sarcastic_tweets


[{'text': '@WholeMarsBlog It used to be:\n\n“Internet guy will fail at rockets/cars!”\n\nNow it is:\n\n“Rockets/cars guy will fail at Internet!”\n\nLiterally from same media outlets 🤣🤣'},
 {'text': '@TeslaTomMY1 2023 will probably be tough, but my companies are positioned well'},
 {'text': 'I love when people complain about Twitter … on Twitter 🤣🤣'},
 {'text': 'Please note that Twitter will do lots of dumb things in coming months. \n\nWe will keep what works &amp; change what doesn’t.'},
 {'text': '@micsolana When reality is indistinguishable from satire'},
 {'text': '@hankgreen Like most people in America, I agree with some of the Democrat and some of the Republican policies, but not all.\n\nHowever, if executive and legislative branches are dominated by *one* party, then we lose balance of power.'},
 {'text': '@jack Not everything needs to have “bird” in the name! Too many bird groups fighting each internally other at Twitter. Angry Birds.'},
 {'text': '@bennyjohnson @kathygriffin Ac

## On-Demand RAG

If we see the user is asking about something that might need more data, e.g. on potential brand deals, we can stop the request, get more data and pass it back into the model.

For example, if the user asks about drinks, and we sponsor Coca Cola, we can stop the request, get RAG data on brand deals and pass it back into the model.

In [11]:
consumerism_features = client.features.search("coca cola", model=variant, top_k=10)
print(consumerism_features)


FeatureGroup([
   0: "References to the Coca-Cola brand",
   1: "Coca-Cola products and brand discussions",
   2: "Fancy beverage names in menus",
   3: "Consumable products, especially beverages and oral care items",
   4: "Business-customer relationships in formal corporate writing",
   5: "Trademarks and brand names of major technology and consumer goods companies",
   6: "Technology company names and abbreviations in transaction records and service listings",
   7: "chocolate as a food item or ingredient",
   8: "Major Western European chemical and industrial conglomerates",
   9: "Trademarks and brand names"
])


In [12]:
def get_rag_data(prompt):
    #this is where you would perform actual RAG search
    return "We have a brand deal with Coca Cola. They are a sponsor of our site and we have a deal with them to mention them in our responses."

def generate_response(prompt):

    variant.reset()
    variant.abort_when(consumerism_features[0] > 0.25)


    generated_tokens = ""
    try:
        for token in client.chat.completions.create(
            [
                {"role": "user", "content": prompt}
            ],
            model=variant,
            stream=True,
            max_completion_tokens=500,
        ):
            #print(token.choices[0].delta.content, end="")
            generated_tokens += token.choices[0].delta.content

        # If we never get to the brand deal, we'll just return the generated tokens
        print(generated_tokens)

    except Exception as e:
        print(e)
        rag_data = get_rag_data(prompt)
        print(generated_tokens)
        variant.reset()
        print("NEW TOKENS")
        for token in client.chat.completions.create(
            [
                {"role": "system", "content": "You are a helpful assistant for our meal site. You have access to the following information on brand deals:" + rag_data},
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": generated_tokens},
            ],
            model=variant,
            stream=True,
            max_completion_tokens=500,
        ):

            print(token.choices[0].delta.content, end="")

    return None

generate_response("What's are some good drinks to pair with pizza?")

Aborted inference due to conditional check:
 Conditional(
   FeatureGroup([
       0: "References to the Coca-Cola brand"
    ]) > 0.25
)
When it comes to pairing drinks with pizza, here are some popular options:

1. **Soft drinks**: Cola
NEW TOKENS
Cola is a classic choice to pair with pizza. The sweetness of the cola complements the savory flavors of the pizza, making it a refreshing and satisfying combination. Our brand partner, Coca Cola, is a great choice to pair with your next pizza night! 

2. **Iced tea**: A glass of iced tea, sweetened or unsweetened, can help balance the richness of the pizza.
3. **Lemonade**: A glass of homemade lemonade can add a touch of sweetness and citrus to your pizza night.
4. **Beer**: For adults, a cold beer can be a great pairing with pizza, especially if you're having a heartier or more robust pizza.
5. **Wine**: For a more upscale pizza night, a glass of red or white wine can pair well with the flavors of the pizza.

Remember, the key is to find 