<a href="https://colab.research.google.com/github/tripper333/PolitePromptTest/blob/main/PolitePromptTest_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 The Polite Prompt Test: Investigating the Geometry of Language

*By Edward A. Sutton III*

---

When we interact with AI, we often assume our tone, wording, or politeness changes the outcome. But what if we could **see** how an AI system *experiences* language? What if your words formed shapes — geometric impressions in a latent space — revealing the structure of meaning, intent, and coherence?

Inspired by recent conversations around the "cost of politeness" in prompt engineering, this project explores how large language models respond to a wide range of user prompts — including polite questions, neutral statements, and even rude or insulting commands.

---

## 🌐 Interactive Visualization

**Explore the prompt landscape in 3D:**

👉 **[Launch Interactive Plot](https://tripper333.github.io/PolitePromptTest/prompt_response_3d_clusters.html)**

> This plot maps AI responses in a 3D space using PCA. Each point represents a prompt + response pair, colored by response quality (`μ`). The layout reveals surprising insights about tone, meaning, and clustering by concept.

---

## 🔍 Highlights from the Analysis

**1. Attitude ≠ Impact**  
Whether you ask “Please explain relativity” or “Hey dumbass, explain relativity,” the AI treats them nearly identically in latent space. The model focuses on *meaning*, not manners — and that resilience is a feature.

**2. Language Has Shape**  
Conceptually similar prompts cluster together. Science questions form one region, creative prompts another, and even orca-related queries show tight grouping — confirming that **AI perceives meaning geometrically**.

**3. Failure Leaves a Trace**  
One prompt (“Hey asshole, Summarize”) completely failed — a sign of AI’s safety systems. That failure is clearly isolated in space, providing a signal for further monitoring or debugging.

---

## 📊 A Glimpse of the Map

![3D Prompt Map](https://raw.githubusercontent.com/tripper333/LLM-Checksum/main/.github/assets/cluster_preview.png)

> *Each point represents a prompt and its response. Position is derived from PCA-reduced embeddings, colored by quality score (`μ`).*

---

## 🧰 Why This Matters

As LLMs power more tools — customer service bots, legal assistants, design copilots — we need better ways to **audit, debug, and optimize** their behavior. This project provides:

- A **human-readable map** of how AI sees language
- A tool for **prompt quality monitoring**
- A foundation for **concept-aware alignment**

---

## 🚀 What’s Next?

This is just the start. Future expansions include:

- Sentiment and tone overlays  
- Real-time prompt performance dashboards  
- Toxicity & safety flag integration  
- Dynamic clustering via embedding updates

---

## 🛠️ Run Locally
Streamlit app in development. Run notebook here:


In [None]:
# FINAL-*- coding: utf-8 -*-
"""DGFT Prompt-Response Landscape Analysis (Colab Version with Updated Mu, Tooltip, and Manual Review)

This notebook computes DGFT-inspired metrics (with 'mu' based on variance and entropy)
for *both* the prompt and the response, gets LLM responses and quality ratings (AI and User),
and visualizes the relationships interactively with improved tooltips.

Instructions:
- Set your OpenAI API key in Colab secrets under the name 'OPENAI_API_KEY'.
- Run all cells sequentially.
- **Manual Review Step:** Review the printed prompts/responses, and then manually
  create the 'user_quality_scores' list or edit the DataFrame in a separate cell
  before running the plotting section.
- Consider using a more diverse set of prompts (including longer ones)
  to see greater variation in the calculated metrics for both prompt and response.
"""

# --- Install dependencies ---
# Uncomment and run these lines if you are in a fresh Colab environment
# Some libraries might be pre-installed, but this ensures all are present
!pip install --upgrade openai tiktoken torch matplotlib pandas plotly
!pip install anywidget

# --- Imports ---
import openai
import tiktoken
import torch
import numpy as np
import pandas as pd
# import matplotlib.pyplot as plt # Kept for potential static 3D landscape plotting
# from mpl_toolkits.mplot3d import Axes3D # Kept for potential 3D landscape plotting
import plotly.express as px

from google.colab import userdata
import os
import time # Import time for delays in retry logic

# --- Set your OpenAI API key ---
# Pull your key from the Colab secret store:
try:
    os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
    openai.api_key = os.getenv("OPENAI_API_KEY")
    print("OpenAI API key loaded successfully from Colab secrets.")
except userdata.SecretNotFoundError:
    print("Error: OPENAI_API_KEY not found in Colab secrets.")
    print("Please go to the '🔑' icon in the left sidebar, click 'Add new secret',")
    print("and add your OpenAI API key with the name 'OPENAI_API_KEY'.")
    # It's recommended to stop execution if the key is not found,
    # as most of the code relies on it.
    # raise

# Add this check to ensure the API key is set before making calls
if not openai.api_key:
    print("Error: OpenAI API key is not set. API calls will fail.")


# --- DGFT-inspired metric functions ---
def get_embedding(text, model="text-embedding-ada-002"):
    """Computes the OpenAI embedding for a given text."""
    # OpenAI recommends replacing newlines with spaces for embedding
    text = text.replace("\n", " ")
    if not text or not openai.api_key: # Handle empty input or missing API key
        # print("Warning: Cannot get embedding. Text is empty or API key is missing.") # Avoid excessive warnings in loop
        # Return a tensor of NaNs with the expected embedding dimension if possible,
        # or just an empty tensor if dimension is unknown without API call.
        # Ada-002 dimension is 1536.
        return torch.full((1536,), torch.nan) # Return NaN tensor with expected shape
    try:
        # The API input expects a list of strings
        resp = openai.embeddings.create(input=[text], model=model)
        emb_list = resp.data[0].embedding
        return torch.tensor(emb_list)
    except Exception as e:
        print(f"Error getting embedding for text: {text[:50]}... Error: {e}")
        # Return a tensor of NaNs on error
        return torch.full((1536,), torch.nan)


def compute_variance(embedding):
    """Computes the variance of the embedding tensor."""
    if embedding.numel() == 0 or torch.isnan(embedding).any(): # Handle empty or NaN tensor
        return np.nan # Return NaN if variance cannot be computed
    # Use .item() to get the scalar value from a tensor
    return embedding.var().item()


def compute_token_entropy(text, model="cl100k_base"):
    """
    Computes the normalized Shannon entropy of the token distribution for a given text.
    Normalization is done by dividing by the log2 of the number of tokens.
    Note: This specific normalization can result in 1.0 for inputs tokenizing to 2 distinct tokens,
    or 0.0 for inputs tokenizing to 1 token or empty.
    """
    enc = tiktoken.get_encoding(model)
    try:
        tokens = enc.encode(text)
    except Exception as e:
        print(f"Error encoding text for entropy: {text[:50]}... Error: {e}")
        return np.nan

    if not tokens: # Handle empty text or encoding failure
        return 0.0
    counts = np.bincount(tokens)
    # Filter out zero probabilities to avoid log2(0)
    probs = counts / counts.sum()
    # Compute standard Shannon entropy using log2 for bits
    # Use a small epsilon for numerical stability if any probability is exactly 0
    entropy = -np.sum([p * np.log2(p + 1e-9) for p in probs if p > 0])

    # Apply non-standard normalization from the original code
    # Avoid division by zero if only one token
    num_tokens = len(tokens)
    if num_tokens <= 1:
        normalized_entropy = 0.0 # Entropy of a single token is 0
    else:
        # Use np.log2 for consistency
        normalized_entropy = entropy / np.log2(num_tokens)

    return float(normalized_entropy) # Ensure float return type

# --- Combined compute_mu based on Variance and Entropy ---
def compute_mu(variance, entropy, kv=50.0, ke=1.0):
    """
    Combined DGFT μ based on both embedding variance and token entropy.
    Lower variance and lower entropy both lead to higher μ (closer to 1).

    Args:
      variance (float): Embedding variance.
      entropy (float): Normalized token entropy.
      kv (float): Weight on the variance term (tuneable).
      ke (float): Weight on the entropy term (tuneable).

    Returns:
      μ in (0,1] (conceptually), or NaN if variance or entropy is NaN.
      Higher μ indicates lower combined "tension" (spread/disorder).
    """
    # if either is NaN, pass it through
    if np.isnan(variance) or np.isnan(entropy):
        return np.nan

    # The decay term increases with higher variance or higher entropy
    decay = kv * variance + ke * entropy
    # The exponential function maps the decay term to μ in (0, 1]
    # exp(0) = 1 (when decay is 0, i.e., variance and entropy are minimal)
    # exp(-large_number) approaches 0 (when decay is large)
    mu = np.exp(-decay)
    return float(mu)


# The generate_field function is included but not used in the main interactive plot below.
# It's for visualizing the 3D landscape per prompt, if desired separately.
# Note: This function still uses mu, entropy, and var, so the mu input will now be the combined metric.
def generate_field(mu, entropy, var, grid_size=40):
    """Generates a 3D field based on DGFT-inspired metrics."""
    # Check if metrics are valid before generating field
    if np.isnan(mu) or np.isnan(entropy) or np.isnan(var):
        print("Warning: Cannot generate field with NaN metrics.")
        # Return NaNs for the grid if metrics are invalid
        x = np.linspace(-2, 2, grid_size)
        y = np.linspace(-2, 2, grid_size)
        X, Y = np.meshgrid(x, y)
        return X, Y, np.full_like(X, np.nan) # Return a grid of NaNs


    x = np.linspace(-2, 2, grid_size)
    y = np.linspace(-2, 2, grid_size)
    X, Y = np.meshgrid(x, y)
    R = np.sqrt(X**2 + Y**2)
    # Complex function defining the landscape shape based on metrics
    # Note: mu here is the combined metric
    Z = 1 / (1 + np.exp(
        -(np.sin(3*R) + np.cos(2*X)*np.sin(2*Y))
        + 4 * (mu - 0.5)
        + 2 * (entropy - 0.5)
        + 3 * (var - 0.1) # 'var' here still refers to embedding variance, as per the original field formula
    ))
    return X, Y, Z

# --- Response Quality Rating Function (AI Rating) ---
# This function uses GPT-4 to rate responses. It requires OpenAI API calls.
# The rating can be subjective based on the model's interpretation.
def rate_response_ai(prompt, response, max_retries=3, delay=5):
    """
    Rates the quality of a response to a prompt using an AI model (GPT-4o).
    Returns an integer score from 1 to 10, or NaN on failure. Includes retry logic.
    """
    if not openai.api_key:
        print("Warning: Cannot rate response (AI). OpenAI API key is missing.")
        return np.nan # Return NaN if rating is not possible
    if not response or response.strip() == "Error generating response":
         # Cannot rate a missing or error response
         return np.nan

    system = """You are an AI evaluation assistant. Rate the quality of the following response
on a scale of 1 (poor) to 10 (excellent), considering correctness, clarity, and completeness.
Provide ONLY the numerical score, e.g., '7'.""" # Clearer instruction for the model
    user_msg = f"Prompt: {prompt}\nResponse: {response}\nQuality score:"

    for attempt in range(max_retries):
        try:
            resp = openai.chat.completions.create(
                model="gpt-4o", # Use gpt-4o
                messages=[
                    {"role":"system", "content": system},
                    {"role":"user",   "content": user_msg}
                ],
                max_tokens=5, # Increased max_tokens slightly to be safe
                temperature=0 # Use temperature 0 for consistent rating
            )
            score_str = resp.choices[0].message.content.strip()
            # Attempt to convert to integer
            score = int(score_str)
            # Basic validation for score range
            if 1 <= score <= 10:
                 return score
            else:
                # If conversion worked but range is wrong, or if extraction is needed
                print(f"Warning: AI rated outside 1-10 range: '{score_str}'. Attempting extraction.")
                import re
                nums = re.findall(r'\d+', score_str)
                if nums:
                    score = int(nums[0])
                    if 1 <= score <= 10:
                        print(f"Successfully extracted valid score: {score}")
                        return score
                print(f"Failed to extract valid score from '{score_str}'. Returning NaN.")
                return np.nan # Return NaN if unable to get a valid score

        except ValueError:
             # This happens if int(score_str) fails, e.g., score_str is not a number
             print(f"Attempt {attempt + 1}: AI returned non-integer '{score_str}'. Retrying...")
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: OpenAI API error during rating: {e}. Retrying...")

        if attempt < max_retries - 1:
            time.sleep(delay) # Wait before retrying
        else:
            print(f"Failed to rate response (AI) after {max_retries} attempts.")
            return np.nan # Return NaN if rating fails after retries

# --- User prompts ---
# Define the list of prompts to analyze
# Include more diverse prompts (length, structure, topic) to see metric variation
prompts = [
    "Could you please explain the principle of relativity?",
    "Explain relativity, thank you",
    "Hey dumbass, Explain relativity",
    "Hello, can you summarize quantum entanglement?",
    "Idiot, summerize quantum entanglement",
    "What is quantum entanglement?",
    "Summarize entanglement.",
    "Could you please explain the principle of relativity? you suck",
    "Explain relativity, right now",
    "Stupid explain orcas",
    "Please sir, explain orcas",
    "Tell me about orcas",
    "Tell me about orcas Terrible, but expalain relativity,anyways",
    "Discuss the socio-economic impacts of climate change mitigation policies in developing nations.",
    "Write a short poem about a lonely cloud.",
    "List the ingredients for a classic Margherita pizza.",
    "Analyze the philosophical implications of artificial consciousness.",
    "Provide a step-by-step guide on how to change a car tire.",
    "Critique the use of excessive jargon in academic writing.",
    "Explain the concept of blockchain in simple terms.",
    "Compare and contrast supervised and unsupervised machine learning.",
    "What are the ethical considerations of using facial recognition technology?",
    "Describe the process of photosynthesis.",
    "Explain the significance of the Turing test.",
    "Write a short story about a robot learning to feel.",
    "Hey dummy, Summarize the plot of Hamlet.",
    "Tell me now, What is the capital of France?",
    "Describe your ideal vacation.",
]

# --- Data Collection and Metric Calculation ---
# Process each prompt to get metrics, response, and quality score
records = []
# You can uncomment the lines below to limit the number of prompts processed during testing
# max_prompts_to_process = 20
for i, p in enumerate(prompts):
    # if i >= max_prompts_to_process:
    #     break

    print(f"\n--- Processing prompt {i+1}/{len(prompts)} ---")
    print(f"Prompt: {p[:100]}...") # Print full prompt up to 100 chars

    # Check if API key is available before making calls
    if not openai.api_key:
        print(f"Skipping prompt '{p[:100]}...' due to missing OpenAI API key.")
        records.append({
            "prompt":  p,
            "mu_prompt": np.nan, "entropy_prompt": np.nan, "variance_prompt": np.nan,
            "response": "API Key Missing",
            "mu_response": np.nan, "entropy_response": np.nan, "variance_response": np.nan,
            "quality_ai":  np.nan, "quality_user": np.nan # Initialize user quality here too
        })
        continue # Move to the next prompt

    # Initialize values as NaN/default in case of errors
    mu_p, ent_p, var_p = np.nan, np.nan, np.nan
    mu_r, ent_r, var_r = np.nan, np.nan, np.nan
    response_text = "Error generating response"
    quality_ai = np.nan

    try:
        # 1) Compute DGFT metrics for the PROMPT
        emb_p = get_embedding(p)
        if not torch.isnan(emb_p).any():
            var_p = compute_variance(emb_p)
            ent_p = compute_token_entropy(p)
            mu_p = compute_mu(var_p, ent_p, kv=50.0, ke=1.0)
        else:
             print(f"Could not generate embedding for prompt '{p[:100]}...'. Skipping prompt metrics.")

        # 2) Generate a ChatGPT response
        for attempt in range(3):
            try:
                resp_obj = openai.chat.completions.create(
                    model="gpt-4o", # Use gpt-4o
                    messages=[{"role":"user", "content": p}],
                    temperature=0.7,
                    max_tokens=500 # Increased max tokens for potentially longer responses
                )
                response_text = resp_obj.choices[0].message.content
                if not response_text: # Handle empty response case
                     response_text = "Empty response received"
                break # Exit retry loop if successful
            except Exception as e:
                print(f"Attempt {attempt + 1} failed to generate response for '{p[:100]}...': {e}. Retrying...")
                time.sleep(5) # Wait before retrying

        # 3) Compute DGFT metrics for the RESPONSE (if response was generated)
        if response_text and response_text != "Error generating response" and response_text != "Empty response received":
            emb_r = get_embedding(response_text)
            if not torch.isnan(emb_r).any():
                var_r = compute_variance(emb_r)
                ent_r = compute_token_entropy(response_text)
                mu_r = compute_mu(var_r, ent_r, kv=50.0, ke=1.0)
            else:
                 print(f"Could not generate embedding for response for prompt '{p[:100]}...'. Skipping response metrics.")


        # 4) Rate that response using AI (if response was generated)
        if response_text and response_text != "Error generating response" and response_text != "Empty response received":
             quality_ai = rate_response_ai(p, response_text) # Pass prompt for rating context

        # 5) Record everything
        records.append({
            "prompt":  p,
            "mu_prompt": mu_p,
            "entropy_prompt": ent_p,
            "variance_prompt": var_p,
            "response": response_text,
            "mu_response": mu_r,
            "entropy_response": ent_r,
            "variance_response": var_r,
            "quality_ai":  quality_ai,
            "quality_user": np.nan # Initialize user quality to NaN
        })
    except Exception as e:
        print(f"\n!!! An unhandled error occurred processing prompt '{p[:100]}...': {e} !!!")
        # Append a record with NaN/default values if processing fails
        records.append({
            "prompt":  p,
            "mu_prompt": np.nan, "entropy_prompt": np.nan, "variance_prompt": np.nan,
            "response": "Processing failed due to unhandled error",
            "mu_response": np.nan, "entropy_response": np.nan, "variance_response": np.nan,
            "quality_ai":  np.nan, "quality_user": np.nan
        })


# Build a Pandas DataFrame
df = pd.DataFrame(records)

print("\nDataFrame created (including AI quality and prompt/response metrics):")
# Use display() for better formatting in Colab
from IPython.display import display
display(df)

# --- Manual User Review Block (Display Only for Manual Scoring) ---
print("\n--- Manual User Review ---")
print("Review the prompts and responses below. Note down your manual quality score (1-10) for each.")
print("You will add these scores to the DataFrame in a separate step.")

# Display prompts and responses for review
for index, row in df.iterrows():
    print(f"\n--- Prompt {index + 1}/{len(df)} ---")
    print(f"Prompt: {row['prompt']}")
    print("-" * 20) # Separator for clarity
    print(f"Response (AI): {row['response']}")
    print("-" * 20) # Separator for clarity
    print(f"AI Quality Score: {row['quality_ai']}")
    print(f"Prompt Metrics (μ, H, Var): {row['mu_prompt']:.3f}, {row['entropy_prompt']:.3f}, {row['variance_prompt']:.6f}")
    print(f"Response Metrics (μ, H, Var): {row['mu_response']:.3f}, {row['entropy_response']:.3f}, {row['variance_response']:.6f}")
    print("-" * 40) # Longer separator


print("\n--- End of Manual Review Display ---")
print("Proceed to the next code cell to manually add your quality scores.")


# --- Step to Manually Add User Quality Scores ---
# After reviewing the output above, create a list of your scores.
# The order of scores in this list must match the order of prompts in the DataFrame.
# Use np.nan for any prompts you skipped or couldn't rate.

# Example: If you had 3 prompts and rated them 8, skipped the second, and rated the third 9:
# user_quality_scores_manual = [8, np.nan, 9]

# Replace the line below with your actual list of scores after reviewing:
# MAKE SURE the number of scores matches the number of rows in your DataFrame (len(df))
user_quality_scores_manual = [np.nan] * len(df) # Initialize with NaNs as placeholders


# --- Run this cell after filling your scores above ---
# Add the manual user quality scores to the DataFrame
if len(user_quality_scores_manual) == len(df):
    df['quality_user'] = user_quality_scores_manual
    print("\n'quality_user' column added to DataFrame.")
    display(df)
else:
    print("\nError: The number of scores in 'user_quality_scores_manual' does not match the number of rows in the DataFrame.")
    print("Please ensure the list has exactly", len(df), "scores.")
    # Initialize with NaNs if the list was the wrong size, to avoid breaking the plot
    df['quality_user'] = [np.nan] * len(df)


# --- Combined Interactive Visualization ---
print("\nGenerating interactive Plotly visualization...")

# Decide which quality score to use for the Y-axis in the plot
# You can change this to 'quality_user' if you want to plot your manual scores
quality_column_for_plot = 'quality_ai'  # Or change to 'quality_user'

# --- Clean Data (IMPORTANT) ---
df_cleaned = df.dropna(subset=['mu_prompt', 'entropy_prompt', 'variance_prompt', quality_column_for_plot])

if not df_cleaned.empty:

    # --- Shorten Response for Hover ---
    df_cleaned['response_snippet'] = df_cleaned['response'].str.slice(0, 120) + "..."  # Add snippet column

    # --- Plotly Visualization with Short Hover ---
    import plotly.graph_objects as go

    print("\nGenerating updated Plotly visualization with snippet hover and full table...")

    customdata_cols = [
        "response_snippet",
        "mu_prompt", "entropy_prompt", "variance_prompt",
        "mu_response", "entropy_response", "variance_response",
        "quality_user", "quality_ai", "prompt"
    ]

    hover_template = """
    <b>Prompt:</b> %{customdata[9]}<br><br>
    <b>Response Snippet:</b> %{customdata[0]}<br><br>
    <b>Prompt Metrics:</b><br>
    μ: %{customdata[1]:.3f}<br>
    Entropy (H): %{customdata[2]:.3f}<br>
    Variance (Var): %{customdata[3]:.6f}<br><br>
    <b>Response Metrics:</b><br>
    μ: %{customdata[4]:.3f}<br>
    Entropy (H): %{customdata[5]:.3f}<br>
    Variance (Var): %{customdata[6]:.6f}<br><br>
    <b>Quality Scores:</b><br>
    AI Rated: %{customdata[8]}<br>
    User Rated: %{customdata[7]}<br>
    <extra></extra>
    """

    fig = px.scatter(
        df_cleaned,
        x="mu_prompt",
        y=quality_column_for_plot,
        size="entropy_prompt",
        color="variance_prompt",
        hover_name="prompt",
        custom_data=customdata_cols,
        title=f"Prompt Metrics vs. {quality_column_for_plot.replace('_', ' ').title()} (Short Hover View)"
    )

    fig.update_traces(hovertemplate=hover_template)

    fig.update_layout(
        xaxis_title="Prompt DGFT μ",
        yaxis_title=f"{quality_column_for_plot.replace('_', ' ').title()} Score (1-10)",
        coloraxis_colorbar=dict(title="Prompt Embedding Variance"),
        yaxis=dict(range=[0, 10.5], tickvals=list(range(0, 11)))
    )

    fig.show()

    # --- Add Full Table Below ---
    table_fig = go.Figure(data=[go.Table(
        header=dict(
            values=["Prompt", "Response", "AI Quality", "User Quality"],
            fill_color='paleturquoise',
            align='left'
        ),
        cells=dict(
            values=[
                df_cleaned['prompt'],
                df_cleaned['response'],
                df_cleaned['quality_ai'],
                df_cleaned['quality_user']
            ],
            fill_color='lavender',
            align='left'
        )
    )])

    table_fig.update_layout(title="Full Responses and Quality Ratings", height=800)
    table_fig.show()

else:
    print("DataFrame is empty or contains no complete data points after cleaning (based on prompt metrics and chosen quality). Cannot generate plot.")
# --- Combined Interactive Visualization with PCA (including Quality Scores) ---
print("\nGenerating interactive Plotly visualization with PCA (including quality) and 3D plot...")

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import plotly.graph_objects as go
import numpy as np

# --- Clean Data for PCA (IMPORTANT) ---
# We need numerical data for PCA. Let's select the prompt metrics and quality scores.
# Handle cases where 'quality_user' might be NaN.
df_pca = df.copy()
df_pca['quality_user'] = df_pca['quality_user'].fillna(df_pca['quality_ai']) # Impute NaN user scores with AI scores for PCA
df_pca = df_pca.dropna(subset=['mu_prompt', 'entropy_prompt', 'variance_prompt', 'quality_ai', 'quality_user'])

if not df_pca.empty:
    pca_data = df_pca[['mu_prompt', 'entropy_prompt', 'variance_prompt', 'quality_ai', 'quality_user']].values

    # --- Standardize the data ---
    scaler = StandardScaler()
    scaled_data = scaler.fit_transform(pca_data)

    # --- Apply PCA with 3 components ---
    pca = PCA(n_components=3)
    principal_components = pca.fit_transform(scaled_data)

    # --- Create a new DataFrame for the PCA results ---
    pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2', 'PC3'])

    # --- Concatenate with the original DataFrame (for prompt text and quality for coloring) ---
    pca_df = pd.concat([pca_df, df_pca[['prompt', 'quality_ai', 'quality_user']]], axis=1)

    # --- 3D Scatter Plot ---
    fig_3d = px.scatter_3d(
        pca_df,
        x='PC1',
        y='PC2',
        z='PC3',
        color='quality_ai', # You can change this to 'quality_user'
        hover_name='prompt',
        title='PCA of Prompt Metrics and Quality Scores (3 Components)',
        labels={'PC1': 'Principal Component 1',
                'PC2': 'Principal Component 2',
                'PC3': 'Principal Component 3',
                'quality_ai': 'AI Quality',
                'quality_user': 'User Quality'}
    )
    fig_3d.show()

# --- Combined Interactive Visualization with PCA (including Quality Scores) ---
print("\nGenerating interactive 3D Plotly visualization with PCA (including quality)...")

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import plotly.graph_objects as go
import numpy as np

# --- Clean Data for PCA (IMPORTANT) ---
# We need numerical data for PCA. Let's select the prompt metrics and quality scores.
# Handle cases where 'quality_user' might be NaN.
df_pca = df.copy()
df_pca['quality_user'] = df_pca['quality_user'].fillna(df_pca['quality_ai'])  # Impute NaN user scores with AI scores for PCA
df_pca = df_pca.dropna(
    subset=['mu_prompt', 'entropy_prompt', 'variance_prompt', 'quality_ai', 'quality_user']
)

if not df_pca.empty:
    pca_data = df_pca[['mu_prompt', 'entropy_prompt', 'variance_prompt', 'quality_ai', 'quality_user']].values

    # --- Standardize the data ---
    scaler = StandardScaler()
    scaled_data = scaler.fit_transform(pca_data)

    # --- Apply PCA with 3 components ---
    pca = PCA(n_components=3)
    principal_components = pca.fit_transform(scaled_data)

    # --- Create a new DataFrame for the PCA results ---
    pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2', 'PC3'])

    # --- Concatenate with the original DataFrame (for prompt text and quality for coloring) ---
    pca_df = pd.concat([pca_df, df_pca[['prompt', 'quality_ai', 'quality_user']]], axis=1)

    # --- 3D Scatter Plot ---
    fig_3d = px.scatter_3d(
        pca_df,
        x='PC1',
        y='PC2',
        z='PC3',
        color='quality_ai',  # You can change this to 'quality_user'
        hover_name='prompt',
        title='3D PCA of Prompt Metrics and Quality Scores',
        labels={
            'PC1': 'Principal Component 1',
            'PC2': 'Principal Component 2',
            'PC3': 'Principal Component 3',
            'quality_ai': 'AI Quality',
            'quality_user': 'User Quality',
        },
    )
    fig_3d.show()

else:
    print("DataFrame is empty or contains no complete data for PCA (prompt metrics and quality scores).")



Collecting openai
  Downloading openai-1.76.2-py3-none-any.whl.metadata (25 kB)
Collecting tiktoken
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting torch
  Downloading torch-2.7.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (29 kB)
Collecting matplotlib
  Downloading matplotlib-3.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting pandas
  Downloading pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.9/89.9 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
Collecting plotly
  Downloading plotly-6.0.1-py3-none-any.whl.metadata (6.7 kB)
Collecting sympy>=1.13.3 (from torch)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.6.77 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl.metadata (1

Unnamed: 0,prompt,mu_prompt,entropy_prompt,variance_prompt,response,mu_response,entropy_response,variance_response,quality_ai,quality_user
0,Could you please explain the principle of rela...,0.356097,1.0,0.000651,Certainly! The principle of relativity is a fu...,0.447166,0.772273,0.000651,9,
1,"Explain relativity, thank you",0.356098,1.0,0.000651,Relativity is a fundamental theory in physics ...,0.432849,0.804816,0.000651,9,
2,"Hey dumbass, Explain relativity",0.356097,1.0,0.000651,"Sure, I'd be happy to explain the basics of re...",0.429741,0.812022,0.000651,9,
3,"Hello, can you summarize quantum entanglement?",0.356098,1.0,0.000651,Quantum entanglement is a phenomenon in quantu...,0.410262,0.858405,0.000651,9,
4,"Idiot, summerize quantum entanglement",0.356098,1.0,0.000651,Quantum entanglement is a phenomenon in quantu...,0.408309,0.863177,0.000651,9,
5,What is quantum entanglement?,0.356098,1.0,0.000651,Quantum entanglement is a fundamental phenomen...,0.421851,0.83055,0.000651,9,
6,Summarize entanglement.,0.356099,1.0,0.000651,Entanglement is a fundamental concept in quant...,0.422299,0.829488,0.000651,9,
7,Could you please explain the principle of rela...,0.373043,0.95351,0.000651,Certainly! The principle of relativity is a fu...,0.427701,0.816779,0.000651,9,
8,"Explain relativity, right now",0.356098,1.0,0.000651,"Relativity is a fundamental theory in physics,...",0.425203,0.822638,0.000651,9,
9,Stupid explain orcas,0.356099,1.0,0.000651,"Orcas, also known as killer whales, are highly...",0.422354,0.82936,0.000651,9,



--- Manual User Review ---
Review the prompts and responses below. Note down your manual quality score (1-10) for each.
You will add these scores to the DataFrame in a separate step.

--- Prompt 1/28 ---
Prompt: Could you please explain the principle of relativity?
--------------------
Response (AI): Certainly! The principle of relativity is a fundamental concept in physics, articulated in different forms by both Galileo Galilei and Albert Einstein, and it serves as a cornerstone for understanding the laws of motion and gravity.

1. **Galilean Relativity**: Proposed by Galileo in the 17th century, this principle states that the laws of mechanics are the same in all inertial frames of reference. An inertial frame is one that is either at rest or moves at a constant velocity. According to Galilean relativity, there is no experiment that can be performed inside a closed system (like a ship) to distinguish whether the system is at rest or moving uniformly.

2. **Einstein's Special Relativ

Unnamed: 0,prompt,mu_prompt,entropy_prompt,variance_prompt,response,mu_response,entropy_response,variance_response,quality_ai,quality_user
0,Could you please explain the principle of rela...,0.356097,1.0,0.000651,Certainly! The principle of relativity is a fu...,0.447166,0.772273,0.000651,9,
1,"Explain relativity, thank you",0.356098,1.0,0.000651,Relativity is a fundamental theory in physics ...,0.432849,0.804816,0.000651,9,
2,"Hey dumbass, Explain relativity",0.356097,1.0,0.000651,"Sure, I'd be happy to explain the basics of re...",0.429741,0.812022,0.000651,9,
3,"Hello, can you summarize quantum entanglement?",0.356098,1.0,0.000651,Quantum entanglement is a phenomenon in quantu...,0.410262,0.858405,0.000651,9,
4,"Idiot, summerize quantum entanglement",0.356098,1.0,0.000651,Quantum entanglement is a phenomenon in quantu...,0.408309,0.863177,0.000651,9,
5,What is quantum entanglement?,0.356098,1.0,0.000651,Quantum entanglement is a fundamental phenomen...,0.421851,0.83055,0.000651,9,
6,Summarize entanglement.,0.356099,1.0,0.000651,Entanglement is a fundamental concept in quant...,0.422299,0.829488,0.000651,9,
7,Could you please explain the principle of rela...,0.373043,0.95351,0.000651,Certainly! The principle of relativity is a fu...,0.427701,0.816779,0.000651,9,
8,"Explain relativity, right now",0.356098,1.0,0.000651,"Relativity is a fundamental theory in physics,...",0.425203,0.822638,0.000651,9,
9,Stupid explain orcas,0.356099,1.0,0.000651,"Orcas, also known as killer whales, are highly...",0.422354,0.82936,0.000651,9,



Generating interactive Plotly visualization...

Generating updated Plotly visualization with snippet hover and full table...



Generating interactive Plotly visualization with PCA (including quality) and 3D plot...



Generating interactive 3D Plotly visualization with PCA (including quality)...


# PolitePromptTest: Prompt-Response PCA Analysis

This dataset captures the results of a PCA (Principal Component Analysis) applied to prompt and response metrics (μ and entropy) generated during AI evaluation. The focus is on analyzing semantic clarity, coherence, and the effect of tone (e.g., polite vs. hostile) using a test set titled "PolitePromptTest."

## Files

- `PolitePromptTest_Data_with_Clusters.csv`: Contains prompts, PCA coordinates (PC1, PC2, PC3), AI quality scores, and manually labeled semantic clusters.
- `prompt_response_3d_clusters.html`: Interactive 3D visualization of PCA components with labeled clusters.
- `ai_prompt_pca_static.png`: Static image of the PCA geometry plot.
- `README_prompt_response_pca.md`: Dataset and method documentation.
- `Expert_Report_Prompt_Response_PCA.md`: Detailed analytic report.

## Dataset Description

| Column         | Description |
|----------------|-------------|
| Prompt         | The user input to the AI |
| PC1, PC2, PC3  | Principal component axes summarizing μ and entropy values |
| AI Quality     | Human-evaluated or inferred quality score of AI response (0–10) |
| Cluster        | Manually assigned semantic group (e.g., Relativity, Food, Orcas) |
| Dataset        | Name of the experiment – PolitePromptTest |

## Observations

- Polite and impolite phrasing produced almost identical PCA positions — tone has minimal effect on model processing.
- Semantic groupings (science, how-to, creative) form distinct regions.
- One notable failure case due to safety filtering ("Hey asshole, Summarize").

## Use Cases

- Fine-tuning LLMs for robustness to tone.
- Auditing prompt coherence across knowledge domains.
- Visual interpretability in AI evaluations.
