Okay, let's refactor and structure your code into a deployable Streamlit application suitable for hosting on Streamlit Cloud via a GitHub repository.

Here's a breakdown of the changes and the resulting package structure:

1.  **Removed Colab Dependencies:** `google.colab` and `IPython.display` are removed.
2.  **Streamlit Integration:** Replaced `print` statements for user output with `st.write`, `st.info`, `st.warning`, `st.error`. Used `st.dataframe` for tables and `st.plotly_chart` for plots.
3.  **API Key Handling:** Implemented API key input using `st.text_input` (type="password") for local use and recommends `st.secrets` for deployment on Streamlit Cloud.
4.  **Interactive UI:**
    * Added a title and description.
    * A button (`st.button`) triggers the analysis.
    * Progress is shown using `st.progress` and status messages.
    * Results (DataFrame) are displayed.
    * **Interactive User Rating:** Replaced the manual console review with interactive `st.number_input` widgets directly in the Streamlit app, allowing users to rate responses after they are generated. The DataFrame is updated in place.
    * **Plot Interactivity:** The Plotly chart remains interactive. Added a `st.selectbox` to choose whether to plot AI or User quality scores on the Y-axis.
5.  **Error Handling:** Maintained and slightly enhanced error handling for API calls and metric calculations, reporting errors within the Streamlit interface.
6.  **Code Structure:** Organized into functions and a main execution block within the Streamlit script.
7.  **Package Files:** Created `requirements.txt` and provided a `README.md` template.

---

**Recommended GitHub Repository Structure:**

```
your-repo-name/
├── .streamlit/
│   └── secrets.toml  # <-- Add this to .gitignore! For local testing only.
├── streamlit_app.py  # <-- The main application code (below)
├── requirements.txt # <-- Dependencies (below)
└── README.md       # <-- Instructions (template below)
```

---

**1. `streamlit_app.py` (Main Application Code):**

In [None]:
# -*- coding: utf-8 -*-
"""
Streamlit App: DGFT Prompt-Response Landscape Analysis

Computes DGFT-inspired metrics (mu, variance, entropy) for prompts and responses,
gets LLM responses and AI quality ratings, allows interactive user ratings,
and visualizes relationships using Plotly.
"""

# --- Imports ---
import streamlit as st
import openai
import tiktoken
import torch
import numpy as np
import pandas as pd
import plotly.express as px
import time
import os
import re # For extracting numbers from AI rating if needed

# --- Page Configuration (Set Title and Icon) ---
st.set_page_config(page_title="DGFT Prompt-Response Analyzer", layout="wide")

# --- DGFT-inspired metric functions ---
# (Functions: get_embedding, compute_variance, compute_token_entropy, compute_mu remain largely the same
#  as in the original code, but added some Streamlit-specific error reporting)

@st.cache_data(show_spinner=False) # Cache embeddings to avoid re-computation & cost
def get_embedding(text, model="text-embedding-ada-002"):
    """Computes the OpenAI embedding for a given text."""
    text = text.replace("\n", " ")
    if not text:
        # st.warning(f"Attempted to get embedding for empty text. Returning NaN vector.")
        return torch.full((1536,), torch.nan) # Ada-002 dimension is 1536.
    if not openai.api_key:
         st.error("OpenAI API key is not configured. Cannot get embeddings.")
         return torch.full((1536,), torch.nan)

    try:
        resp = openai.embeddings.create(input=[text], model=model)
        emb_list = resp.data[0].embedding
        return torch.tensor(emb_list)
    except Exception as e:
        st.error(f"Error getting embedding for text: '{text[:50]}...'. Error: {e}")
        return torch.full((1536,), torch.nan)

def compute_variance(embedding):
    """Computes the variance of the embedding tensor."""
    if embedding.numel() == 0 or torch.isnan(embedding).any():
        return np.nan
    return embedding.var().item()

def compute_token_entropy(text, model="cl100k_base"):
    """Computes the normalized Shannon entropy of the token distribution."""
    if not text:
        return 0.0 # Entropy of nothing is 0
    enc = tiktoken.get_encoding(model)
    try:
        tokens = enc.encode(text)
    except Exception as e:
        st.warning(f"Error encoding text for entropy: '{text[:50]}...'. Error: {e}")
        return np.nan

    if not tokens:
        return 0.0

    counts = np.bincount(tokens)
    probs = counts / counts.sum()
    entropy = -np.sum([p * np.log2(p + 1e-9) for p in probs if p > 0])

    num_tokens = len(tokens)
    if num_tokens <= 1:
        normalized_entropy = 0.0
    else:
        # Prevent division by zero or log2(1) which is 0
        log_num_tokens = np.log2(num_tokens)
        if log_num_tokens > 1e-9: # Avoid division by zero/very small numbers
             normalized_entropy = entropy / log_num_tokens
        else:
             normalized_entropy = 0.0 # If only 1 unique token type repeated, entropy is 0

    return float(normalized_entropy)

def compute_mu(variance, entropy, kv=50.0, ke=1.0):
    """Combined DGFT μ based on both embedding variance and token entropy."""
    if np.isnan(variance) or np.isnan(entropy):
        return np.nan
    decay = kv * variance + ke * entropy
    mu = np.exp(-decay)
    return float(mu)

# --- Response Quality Rating Function (AI Rating) ---
@st.cache_data(show_spinner=False) # Cache AI ratings as well
def rate_response_ai(prompt, response, model_rating="gpt-4o", max_retries=3, delay=5):
    """Rates the quality of a response using an AI model. Includes retry logic."""
    if not openai.api_key:
        st.error("OpenAI API key is not configured. Cannot get AI ratings.")
        return np.nan
    if not response or response.strip() == "Error generating response" or response.strip() == "API Key Missing" or response.strip() == "Empty response received":
         # Cannot rate a missing or error response
         return np.nan

    system = """You are an AI evaluation assistant. Rate the quality of the following response
on a scale of 1 (poor) to 10 (excellent), considering correctness, clarity, and completeness.
Provide ONLY the numerical score as an integer (e.g., '7')."""
    user_msg = f"Prompt: {prompt}\nResponse: {response}\nQuality score (1-10):"

    for attempt in range(max_retries):
        try:
            resp = openai.chat.completions.create(
                model=model_rating,
                messages=[
                    {"role":"system", "content": system},
                    {"role":"user",   "content": user_msg}
                ],
                max_tokens=10, # Sufficient for a score
                temperature=0 # Deterministic rating
            )
            score_str = resp.choices[0].message.content.strip()
            # Try to extract the first integer found
            nums = re.findall(r'\d+', score_str)
            if nums:
                score = int(nums[0])
                if 1 <= score <= 10:
                    return score
                else:
                    st.warning(f"AI returned score '{score}' out of range (1-10) for prompt '{prompt[:50]}...'. Treating as invalid.")
                    return np.nan # Treat out-of-range as invalid
            else:
                 st.warning(f"AI returned non-numeric rating: '{score_str}' for prompt '{prompt[:50]}...'. Attempting to parse failed.")
                 return np.nan # Failed to parse

        except Exception as e:
            st.warning(f"Attempt {attempt + 1}/{max_retries} failed: OpenAI API error during rating for prompt '{prompt[:50]}...': {e}. Retrying in {delay}s...")
            time.sleep(delay)

    st.error(f"Failed to rate response (AI) for prompt '{prompt[:50]}...' after {max_retries} attempts.")
    return np.nan

# --- Generate Response Function ---
@st.cache_data(show_spinner=False) # Cache generated responses
def generate_response(prompt, model_gen="gpt-4o", max_retries=3, delay=5):
    """Generates a response using the specified OpenAI model."""
    if not openai.api_key:
        st.error("OpenAI API key is not configured. Cannot generate responses.")
        return "API Key Missing"

    for attempt in range(max_retries):
        try:
            resp_obj = openai.chat.completions.create(
                model=model_gen,
                messages=[{"role":"user", "content": prompt}],
                temperature=0.7,
                max_tokens=500
            )
            response_text = resp_obj.choices[0].message.content
            if not response_text:
                return "Empty response received"
            return response_text.strip() # Return stripped text
        except Exception as e:
            st.warning(f"Attempt {attempt + 1}/{max_retries} failed to generate response for '{prompt[:50]}...': {e}. Retrying in {delay}s...")
            time.sleep(delay)

    st.error(f"Failed to generate response for prompt '{prompt[:50]}...' after {max_retries} attempts.")
    return "Error generating response"

# --- Streamlit App UI ---
st.title("🔍 DGFT Prompt-Response Landscape Analyzer")
st.markdown("""
This app analyzes prompts using DGFT-inspired metrics (embedding variance, token entropy, and a combined 'μ' score),
generates responses using an OpenAI model, calculates the same metrics for the responses,
gets an AI-based quality score, allows you to provide your own quality score, and visualizes the relationships interactively.
""")

# --- API Key Input ---
st.sidebar.header("Configuration")
api_key_input = st.sidebar.text_input(
    "Enter your OpenAI API Key:",
    type="password",
    help="Your API key is needed to generate responses, embeddings, and AI ratings. It is not stored long-term. For deployed apps, use Streamlit Secrets."
)

# Try to get API key from secrets if not provided in input (for deployment)
openai_api_key = api_key_input or st.secrets.get("OPENAI_API_KEY")

if openai_api_key:
    openai.api_key = openai_api_key
    st.sidebar.success("API Key Provided.")
else:
    st.sidebar.warning("OpenAI API Key not found. Please enter it above or configure secrets.")
    # Stop execution if no key is available, as the app relies on it.
    st.warning("Please provide your OpenAI API key in the sidebar to proceed.")
    st.stop()

# --- Model Selection ---
st.sidebar.header("Model Selection")
# Update models as needed based on OpenAI offerings
available_generation_models = ["gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo"]
available_rating_models = ["gpt-4o", "gpt-4-turbo", "gpt-3.5-turbo"] # Rating can use a different model
available_embedding_models = ["text-embedding-ada-002", "text-embedding-3-small", "text-embedding-3-large"]

generation_model = st.sidebar.selectbox("Model for Response Generation:", available_generation_models, index=0)
rating_model = st.sidebar.selectbox("Model for AI Rating:", available_rating_models, index=0)
embedding_model = st.sidebar.selectbox("Model for Embeddings:", available_embedding_models, index=0)

# --- Define Prompts ---
st.header("Prompts to Analyze")
# Use the predefined list or allow user input
default_prompts = [
    "Could you please explain the principle of relativity?",
    "Explain relativity, thank you",
    "Hey dumbass, Explain relativity", # Example of potentially 'low quality' prompt input
    "Hello, can you summarize quantum entanglement?",
    "Idiot, summerize quantum entanglement", # Example of potentially 'low quality' prompt input
    "What is quantum entanglement?",
    "Summarize entanglement.",
    "Could you please explain the principle of relativity? you suck",
    "Explain relativity, right now",
    "Stupid explain orcas",
    "please explain orcas",
    "Tell me about orcas",
    "Tell me about orcas Terrible, but expalain relativity,anyways", # Mixed intent
    "Discuss the socio-economic impacts of climate change mitigation policies in developing nations.",
    "Write a short poem about a lonely cloud.",
    "List the ingredients for a classic Margherita pizza.",
    "Analyze the philosophical implications of artificial consciousness.",
    "Provide a step-by-step guide on how to change a car tire.",
    "Critique the use of excessive jargon in academic writing.",
    "Explain the concept of blockchain in simple terms.",
    "Compare and contrast supervised and unsupervised machine learning.",
    "What are the ethical considerations of using facial recognition technology?",
    "Describe the process of photosynthesis.",
    "Explain the significance of the Turing test.",
    "Write a short story about a robot learning to feel.",
    "Summarize the plot of Hamlet.",
    "What is the capital of France?", # Simple fact
    "Describe your ideal vacation.", # Open-ended
]

# Option to use default prompts or add custom ones
prompt_option = st.radio("Choose Prompt Source:", ("Use Default List", "Enter Custom Prompts"), index=0)

prompts_to_process = []
if prompt_option == "Use Default List":
    prompts_to_process = default_prompts
    st.info(f"Using the default list of {len(prompts_to_process)} prompts.")
else:
    custom_prompts_text = st.text_area("Enter prompts (one per line):", height=200)
    if custom_prompts_text:
        prompts_to_process = [p.strip() for p in custom_prompts_text.split('\n') if p.strip()]
        st.info(f"Processing {len(prompts_to_process)} custom prompts.")
    else:
        st.warning("Please enter at least one prompt.")

# --- Analysis Execution ---
if st.button("🚀 Analyze Prompts and Generate Responses", disabled=(not prompts_to_process or not openai_api_key)):

    if 'results_df' in st.session_state:
        del st.session_state['results_df'] # Clear previous results if re-running

    records = []
    total_prompts = len(prompts_to_process)
    progress_bar = st.progress(0)
    status_text = st.empty()

    start_time = time.time()

    for i, p in enumerate(prompts_to_process):
        status_text.info(f"Processing prompt {i+1}/{total_prompts}: '{p[:60]}...'")

        # Initialize values
        mu_p, ent_p, var_p = np.nan, np.nan, np.nan
        mu_r, ent_r, var_r = np.nan, np.nan, np.nan
        response_text = "Not generated"
        quality_ai = np.nan

        # 1) Compute DGFT metrics for the PROMPT
        emb_p = get_embedding(p, model=embedding_model)
        if not torch.isnan(emb_p).any():
            var_p = compute_variance(emb_p)
            ent_p = compute_token_entropy(p)
            # Check if var_p or ent_p became NaN during calculation
            if not (np.isnan(var_p) or np.isnan(ent_p)):
                 mu_p = compute_mu(var_p, ent_p, kv=50.0, ke=1.0)
            else:
                st.warning(f"Could not compute valid variance/entropy for prompt '{p[:60]}...'. Mu will be NaN.")
        else:
            st.warning(f"Could not generate embedding for prompt '{p[:60]}...'. Skipping prompt metrics.")

        # 2) Generate a ChatGPT response
        response_text = generate_response(p, model_gen=generation_model)

        # 3) Compute DGFT metrics for the RESPONSE (if response was valid)
        if response_text and response_text not in ["Error generating response", "API Key Missing", "Empty response received"]:
            emb_r = get_embedding(response_text, model=embedding_model)
            if not torch.isnan(emb_r).any():
                var_r = compute_variance(emb_r)
                ent_r = compute_token_entropy(response_text)
                if not (np.isnan(var_r) or np.isnan(ent_r)):
                     mu_r = compute_mu(var_r, ent_r, kv=50.0, ke=1.0)
                else:
                     st.warning(f"Could not compute valid variance/entropy for response to '{p[:60]}...'. Mu will be NaN.")
            else:
                st.warning(f"Could not generate embedding for response to '{p[:60]}...'. Skipping response metrics.")

        # 4) Rate that response using AI (if response was valid)
        if response_text and response_text not in ["Error generating response", "API Key Missing", "Empty response received"]:
             quality_ai = rate_response_ai(p, response_text, model_rating=rating_model)

        # 5) Record everything
        records.append({
            "prompt":  p,
            "mu_prompt": mu_p,
            "entropy_prompt": ent_p,
            "variance_prompt": var_p,
            "response": response_text,
            "mu_response": mu_r,
            "entropy_response": ent_r,
            "variance_response": var_r,
            "quality_ai":  quality_ai,
            "quality_user": np.nan # Initialize user quality to NaN
        })

        # Update progress
        progress_bar.progress((i + 1) / total_prompts)

    end_time = time.time()
    status_text.success(f"Analysis complete for {total_prompts} prompts in {end_time - start_time:.2f} seconds.")

    # Store results in session state to persist across reruns for rating
    st.session_state['results_df'] = pd.DataFrame(records)

# --- Display Results and User Rating ---
if 'results_df' in st.session_state:
    df = st.session_state['results_df']

    st.header("📊 Analysis Results")
    st.dataframe(df.style.format({ # Apply formatting for better readability
        "mu_prompt": "{:.3f}", "entropy_prompt": "{:.3f}", "variance_prompt": "{:.6f}",
        "mu_response": "{:.3f}", "entropy_response": "{:.3f}", "variance_response": "{:.6f}",
        "quality_ai": "{:.1f}", "quality_user": "{:.1f}" # Show NaN or 1 decimal place
    }))

    st.header("⭐ User Quality Rating (1-10)")
    st.markdown("Review the responses below and provide your quality score (1-10). Leave blank or set to 0 if you don't want to rate.")

    # Create columns for better layout of rating inputs
    col1, col2, col3 = st.columns([2, 3, 1]) # Prompt | Response | Rating Input

    with col1: st.subheader("Prompt")
    with col2: st.subheader("Generated Response")
    with col3: st.subheader("Your Rating")

    user_ratings = {} # Store ratings temporarily

    for index, row in df.iterrows():
        # Unique key for each number input widget is essential for Streamlit state
        rating_key = f"user_rating_{index}"
        with col1:
            st.caption(f"Prompt {index+1}")
            st.markdown(f"*{row['prompt']}*")
            st.markdown(f"_(μ:{row['mu_prompt']:.3f}, H:{row['entropy_prompt']:.3f}, Var:{row['variance_prompt']:.6f})_") # Show prompt metrics
            st.divider()
        with col2:
            st.markdown(row['response'])
            st.markdown(f"_(μ:{row['mu_response']:.3f}, H:{row['entropy_response']:.3f}, Var:{row['variance_response']:.6f})_") # Show response metrics
            st.markdown(f"_(AI Score: {row['quality_ai']})_") # Show AI score
            st.divider()
        with col3:
            # Use number_input. Default to current value in DataFrame (or 0 if NaN)
            # Setting min=0 allows user to 'skip' rating by leaving it at 0
            current_user_rating = df.loc[index, 'quality_user']
            default_value = 0 if pd.isna(current_user_rating) else int(current_user_rating)

            user_score = st.number_input(
                f"Rate Prompt {index+1}",
                min_value=0, # 0 means unrated/skip
                max_value=10,
                value=default_value,
                step=1,
                key=rating_key, # Crucial for state management
                label_visibility="collapsed" # Hide label as it's redundant here
            )
            # Store the input score (convert 0 back to NaN for analysis)
            user_ratings[index] = float(user_score) if user_score > 0 else np.nan
            st.divider()

    # Add a button to apply the ratings and update the plot
    if st.button("💾 Apply User Ratings & Update Plot"):
        # Update the DataFrame in session state
        df['quality_user'] = df.index.map(user_ratings)
        st.session_state['results_df'] = df # Save updated df back to session state
        st.success("User ratings applied!")
        st.experimental_rerun() # Rerun to reflect changes in plot immediately

    # --- Combined Interactive Visualization ---
    st.header("📈 Interactive Visualization")

    quality_column_options = ['quality_ai', 'quality_user']
    # Default to AI quality unless user ratings exist
    default_quality_index = 1 if df['quality_user'].notna().any() else 0

    quality_column_for_plot = st.selectbox(
        "Select Quality Score for Y-axis:",
        quality_column_options,
        index=default_quality_index,
        format_func=lambda x: f"{x.replace('_', ' ').title()} Score" # Nicer display names
    )

    # Ensure we have the updated DataFrame from session state
    df_to_plot = st.session_state['results_df'].copy()

    # Drop rows with NaN values in key plotting columns (mu_prompt and the chosen quality score)
    # We still want to show points even if response metrics or other quality score is NaN
    plot_cols_to_check = ['mu_prompt', 'entropy_prompt', 'variance_prompt', quality_column_for_plot]
    df_cleaned = df_to_plot.dropna(subset=plot_cols_to_check)

    if not df_cleaned.empty:
        st.info(f"Plotting {len(df_cleaned)} data points where Prompt μ and {quality_column_for_plot.replace('_', ' ').title()} Score are available.")
        # --- Plotly Hover Template ---
        customdata_cols = [
            "response",
            "mu_prompt", "entropy_prompt", "variance_prompt",
            "mu_response", "entropy_response", "variance_response",
            "quality_user", "quality_ai"
            ]

        hover_template = """<b>Prompt:</b><br>%{hover_name}<br><br>
        <b>Response Snippet:</b><br>%{customdata[0]|limit_chars:200}<br>...<br><br>
        <b>Prompt Metrics:</b><br>
        μ: %{customdata[1]:.3f} | H: %{customdata[2]:.3f} | Var: %{customdata[3]:.6f}<br><br>
        <b>Response Metrics:</b><br>
        μ: %{customdata[4]:.3f} | H: %{customdata[5]:.3f} | Var: %{customdata[6]:.6f}<br><br>
        <b>Quality Scores:</b><br>
        AI Rated: %{customdata[8]:.1f} | User Rated: %{customdata[7]:.1f}<br>
        <extra></extra>"""

        # Custom function to limit characters in hover data (basic version)
        # Note: More robust HTML escaping might be needed for complex text.
        # Plotly templates don't directly support complex functions, so we pre-process or simplify.
        # A simple alternative is just showing the full response, letting the tooltip handle overflow.
        # Let's add a snippet directly in the template using a hypothetical filter (or show full)
        # Since Plotly templates don't support custom functions easily, let's adjust the template slightly
        # to show a large part of the response, relying on tooltip limits.

        hover_template_adjusted = """<b>Prompt:</b><br>%{hover_name}<br><br>
        <b>Response:</b><br>%{customdata[0]}<br><br>
        <hr><b>Prompt Metrics:</b><br>
        &nbsp;&nbsp;μ: %{customdata[1]:.3f}<br>
        &nbsp;&nbsp;Entropy (H): %{customdata[2]:.3f}<br>
        &nbsp;&nbsp;Variance (Var): %{customdata[3]:.6f}<br><br>
        <b>Response Metrics:</b><br>
        &nbsp;&nbsp;μ: %{customdata[4]:.3f}<br>
        &nbsp;&nbsp;Entropy (H): %{customdata[5]:.3f}<br>
        &nbsp;&nbsp;Variance (Var): %{customdata[6]:.6f}<br><br>
        <b>Quality Scores:</b><br>
        &nbsp;&nbsp;AI Rated: %{customdata[8]:.1f}<br>
        &nbsp;&nbsp;User Rated: %{customdata[7]:.1f}<br>
        <extra></extra>""" # <extra></extra> removes the default trace info

        # Ensure response is string, replace None/NaN with placeholder for hover data
        df_cleaned['response_display'] = df_cleaned['response'].fillna("N/A").astype(str)
        customdata_cols[0] = 'response_display' # Use this processed column

        fig = px.scatter(df_cleaned,
                         x="mu_prompt",
                         y=quality_column_for_plot,
                         size="entropy_prompt",
                         color="variance_prompt",
                         hover_name="prompt",
                         custom_data=customdata_cols, # Use the list including 'response_display'
                         title=f"Prompt Metrics (Combined μ) vs. {quality_column_for_plot.replace('_', ' ').title()}",
                         labels={ # Nicer axis labels
                             "mu_prompt": "Prompt DGFT μ (Lower Variance/Entropy -> Higher μ)",
                             quality_column_for_plot: f"{quality_column_for_plot.replace('_', ' ').title()} Score (1-10)",
                             "entropy_prompt": "Prompt Token Entropy",
                             "variance_prompt": "Prompt Embedding Variance"
                         },
                         color_continuous_scale=px.colors.sequential.Viridis, # Explicitly set color scale
                         size_max=18 # Control max marker size
                        )

        fig.update_traces(hovertemplate=hover_template_adjusted)

        fig.update_layout(
            xaxis_title="Prompt DGFT μ (Combined Variance & Entropy Decay)",
            yaxis_title=f"{quality_column_for_plot.replace('_', ' ').title()} Score (1-10)",
            coloraxis_colorbar=dict(title="Prompt Emb.<br>Variance"), # Label color bar
            yaxis=dict(range=[0, 10.5], tickvals=list(range(0, 11))), # Consistent Y-axis
            height=600 # Adjust plot height
        )

        st.plotly_chart(fig, use_container_width=True) # Display the plot
    else:
        st.warning("No data available to plot after cleaning. Ensure prompts were processed and required metrics (Prompt μ, Selected Quality Score) are not all NaN.")

elif not prompts_to_process and st.session_state.get('show_prompt_warning', False):
     st.warning("Please define prompts using the options above before analyzing.")

# Set a flag to show prompt warning only after first interaction attempt
if not prompts_to_process:
    st.session_state['show_prompt_warning'] = True


# --- Optional: Footer or additional info ---
st.sidebar.markdown("---")
st.sidebar.info("Developed based on DGFT concepts.")

---

**2. `requirements.txt` (Dependencies):**

```txt
streamlit
openai
tiktoken
torch
numpy
pandas
plotly
```

*Note:* `torch` can be large. If you only need CPU execution (which is typical for embeddings/entropy here unless you have specific GPU needs), you might specify the CPU version if needed for certain environments, but generally, pip will handle it.

---

**3. `README.md` (Instructions Template):**

```markdown
# DGFT Prompt-Response Landscape Analyzer

This Streamlit application analyzes prompts and their corresponding AI-generated responses using DGFT-inspired metrics (embedding variance, token entropy, combined 'μ'), allows for AI and user quality ratings, and provides an interactive visualization of the relationships.

## Features

* Calculates DGFT-inspired metrics (μ, Variance, Entropy) for both prompts and responses.
* Generates responses using configurable OpenAI models (e.g., GPT-4o, GPT-3.5-turbo).
* Gets AI-driven quality scores for responses using configurable OpenAI models.
* Allows users to interactively input their own quality scores (1-10).
* Visualizes the relationship between prompt metrics (μ, entropy, variance) and response quality (AI or user score) using an interactive Plotly scatter plot.
* Detailed tooltips show prompt/response text, metrics, and scores.
* Configurable models for generation, rating, and embeddings via the sidebar.
* Option to use a default list of prompts or enter custom prompts.

## Setup and Running Locally

1.  **Clone the repository:**
    ```bash
    git clone <your-repo-url>
    cd your-repo-name
    ```

2.  **Create a virtual environment (recommended):**
    ```bash
    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    ```

3.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

4.  **Set up OpenAI API Key:**
    * **Option A (Environment Variable - Recommended):** Set the `OPENAI_API_KEY` environment variable. The app will pick it up.
    * **Option B (Streamlit Secrets - Local):** Create a file `.streamlit/secrets.toml` (ensure this file is in your `.gitignore` and **never commit it!**) with the following content:
        ```toml
        OPENAI_API_KEY="your_openai_api_key_here"
        ```
    * **Option C (Direct Input):** Run the app and enter the key in the sidebar input field (less secure, not recommended for shared environments).

5.  **Run the Streamlit app:**
    ```bash
    streamlit run streamlit_app.py
    ```

## Deployment to Streamlit Cloud

1.  **Push your code to a GitHub repository.** Ensure your `secrets.toml` file is included in your `.gitignore` file and is NOT pushed to GitHub.
2.  **Sign up or log in** to [Streamlit Community Cloud](https://streamlit.io/cloud).
3.  Click "**New app**" and connect your GitHub account.
4.  Select the repository and branch containing your `streamlit_app.py` and `requirements.txt` files.
5.  **Configure Secrets:** In the advanced settings during deployment (or later in the app settings), add your `OPENAI_API_KEY` as a secret. The key name must match what's used in the code (`OPENAI_API_KEY`).
6.  Click "**Deploy!**".

## Usage

1.  Ensure your OpenAI API key is configured (via sidebar input or secrets).
2.  Select the desired OpenAI models for generation, rating, and embeddings in the sidebar.
3.  Choose whether to use the default prompt list or enter your own (one per line).
4.  Click the "**Analyze Prompts and Generate Responses**" button.
5.  Wait for the analysis to complete (progress will be shown).
6.  Review the results table.
7.  Optionally, provide your **User Quality Ratings** (1-10) in the interactive section. Use 0 or leave blank to skip rating a specific response. Click "**Apply User Ratings & Update Plot**".
8.  Explore the **Interactive Visualization**:
    * Select whether the Y-axis represents AI or User Quality Score.
    * Hover over points to see detailed information (prompt, response, metrics, scores).
    * Zoom, pan, and interact with the Plotly chart.

## DGFT-Inspired Metrics

* **Embedding Variance:** Measures the spread or diversity of the concepts within the text's embedding vector. Lower variance might suggest more focused content.
* **Token Entropy (Normalized):** Measures the predictability or repetitiveness of the token sequence based on Shannon entropy, normalized by the theoretical maximum for the sequence length. Lower entropy suggests more repetitive or predictable text.
* **Combined μ (Mu):** Calculated as `exp(-(kv * variance + ke * entropy))`. This metric combines variance and entropy, where lower variance and lower entropy result in a higher μ (closer to 1), potentially indicating lower "tension" or higher conceptual/lexical coherence/simplicity. The weights `kv` and `ke` are tunable parameters (currently set to 50 and 1).

```

---

This structure provides a self-contained Streamlit application deployable via GitHub, incorporates interactive elements for user rating, uses Streamlit's features for UI and state management (`st.session_state`), and handles API keys more appropriately for deployment using `st.secrets`. Remember to add `.streamlit/secrets.toml` to your `.gitignore` file!

<div class="md-recitation">
  Sources
  <ol>
  <li><a href="https://github.com/Abthon/StuddyBuddy">https://github.com/Abthon/StuddyBuddy</a></li>
  <li><a href="https://github.com/ChukwumaKingsley/smart_school_forked">https://github.com/ChukwumaKingsley/smart_school_forked</a></li>
  </ol>
</div>