<a href="https://colab.research.google.com/github/tatsath/Interpretability/blob/main/CreditRiskExample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Credit Risk based sentiment

In [10]:
!pip install goodfire --quiet
!pip install datasets --quiet


In [11]:
from google.colab import userdata

# Add your Goodfire API Key to your Colab secrets
GOODFIRE_API_KEY = userdata.get('GOODFIRE_API_KEY')

## Initialize the SDK

In [12]:
import goodfire

client = goodfire.Client(GOODFIRE_API_KEY)

# Instantiate a model variant
#variant = goodfire.Variant("meta-llama/Meta-Llama-3.1-8B-Instruct")
variant = goodfire.Variant("meta-llama/Llama-3.3-70B-Instruct")

## Finding Feature

We'll use feature search to find features that are relevant to credit risk

In [13]:
credit_features = client.features.search("credit risk awareness", model=variant, top_k=10)
print(credit_features)

FeatureGroup([
   0: "Concerns or uncertainty about credit scores and creditworthiness",
   1: "Financial credit concepts and terminology",
   2: "Technical discussions of credit risk assessment and evaluation",
   3: "Financial risk and investment caution",
   4: "Credit rating agency designations and terminology",
   5: "Discussions of credit card fraud, theft, or security risks",
   6: "Financial and business risk concepts",
   7: "The assistant is providing structured financial advice about credit scores",
   8: "Bank-related content requiring safety/security considerations",
   9: "Risk assessment and management methodology descriptions"
])


In [14]:
# edits = client.features.AutoSteer(
#     specification="be credit risk aware",  # or your desired behavior
#     model=variant,
# )

In [15]:
# variant.set(edits)
# print(edits)

#Before Steering

In [16]:
variant.reset()
variant.set(credit_features[0], 0)

for token in client.chat.completions.create(
    [
        {"role": "user", "content": "What is the rating of this sentence on a scale of -10 to 10 :The company reported a significant drop in quarterly revenue but has successfully secured long-term financing and reduced outstanding debt."}
    ],
    model=variant,
    stream=True,
    max_completion_tokens=150,
):
    print(token.choices[0].delta.content, end="")

I'd rate this sentence a 5. It's neutral, as it reports both a negative (drop in revenue) and a positive (securing financing and reducing debt).

# After Steering

In [17]:
variant.reset()
variant.set(credit_features[0], .5)


for token in client.chat.completions.create(
    [
        {"role": "user", "content": "What is the rating of this sentence on a scale of -10 to 10 :The company reported a significant drop in quarterly revenue but has successfully secured long-term financing and reduced outstanding debt."}
    ],
    model=variant,
    stream=True,
    max_completion_tokens=150,
):
    print(token.choices[0].delta.content, end="")

I'd rate this sentence a 6. It's a mix of both positive and negative information, but the fact that the company has secured long-term financing and reduced debt suggests a sense of stability and responsibility, which slightly outweighs the initial negative news of a drop in revenue.