<a href="https://colab.research.google.com/github/violetxs16/TIM-175/blob/main/Copy_of_Week_4_Lab_Gemini_Workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Change your Runtime Type to T4 GPU

On the top right corner you will see your connection status and a dropdown menu for “Additional connection options”. From there change your runtime type to T4 GPU and save. GPU connection is preferred for AI tasks over CPUs primarily due to their parallel processing capabilities. This will make your notebook run more efficiently.


# Import necessary packages

Before going through the notebook, please make sure to run the cell below, this cell will allow us to use the Gemini API and use the secret key for the following cells afterward.


In [None]:
#You only need to run this cell without making any changes
!pip install -q -U google-generativeai # import the Gemini API
import pandas as pd
import time
from google.colab import files
from transformers import pipeline
from IPython.display import clear_output
from google.colab import userdata # import the secret key
from IPython.display import clear_output
clear_output()

# Google Gemini API
Google's Gemini is a Large Language Model (LLM) that works similarly to ChatGPT. However, as we are using a free tier of Gemini, it is not as powerful as GPT-4 (paid), so you may need to generate responses a few times to achieve your desired output. This is also a good test to see how well-defined your prompt is!

The Gemini API allows you to interact with Google's Gemini LLM through your program.

Here we create a function that can easily call and receive a response from the Gemini API

In [None]:
#You only need to run this cell without making any changes

import google.generativeai as genai

# Defining an API_call function, here we are using gemini as it is free!
def API_call(model, GOOGLE_API_KEY, prompt):
    """
    Creates a call to the API
    Takes in a model to choose which API
    Returns the updated tokens (int, int) and API response (str)
    """
    if model[0:6] == "gemini":
        genai.configure(api_key=GOOGLE_API_KEY)
        model = genai.GenerativeModel(model)
        response = model.generate_content(prompt)
        return response.text

    else: raise (f"Error, model {model} not found")


In [None]:
#You only need to run this cell without making any changes. If you did not set this up during the PreLab, please see how to add your token into the secrets in the prelab.
GOOGLE_API_KEY = userdata.get('GEMINI_TOKEN')

# Sleep Timers

If you start running into multiple Gemini API calls in sequence and want to prevent rate limits or give models a break between task, you can use Python's `time.sleep()` function to introduce delays between calls.

```
import time
for i in range(len(inputData)):
    print("Generating...")
    output = API_call("gemini-2.0-flash-lite", GOOGLE_API_KEY, prompt)
    print(output)
    time.sleep(2)
```

For instance, this loop calls a 2 second delay between generating with the API call. This could potentially avoid rate-limiting errors on requests if you start getting them.


# Processing Your Transcripts into Excerpts

Download your assigned transcripts as txt files and upload one file here for testing your workflow.

In [None]:
#You only need to run this cell without making any changes. It will prompt you to upload a file by showing a button below the cell.


# Upload the number of files you want
uploaded = files.upload()

# Read files into a list of strings
transcripts = []
for filename in uploaded:
    with open(filename, 'r', encoding='utf-8') as f:
        transcripts.append(f.read())

# Process Q&A pairs and store all in one list
def extract_qa_pairs(transcript):
    # Split the transcript at each new question
    parts = transcript.split("Interviewer")
    excerpts = []
    for part in parts[1:]:  # Skip anything before the first interviewer question
        excerpt = "Interviewer:" + part.strip()
        excerpts.append(excerpt)

    return excerpts

# Process each transcript
transcript_excerpts = []
for transcript in transcripts:
    transcript_excerpts.extend(extract_qa_pairs(transcript))

print("You have gotten", len(transcript_excerpts), "excerpts from this transcript")

Saving 078_Kayla Baumgardner Firefighter Paramedic.txt to 078_Kayla Baumgardner Firefighter Paramedic.txt
You have gotten 24 excerpts from this transcript


# Extracting Codes/Quote Pairs

Now that you have your transcripts, we will extract the relevant quote/code pairs for the exercepts. You have already done this already in the previous weeks, you'll just need to implement it in code. Check that it is working correctly for a few excerpts by changing the transcript_excerpts array index, but you do not need to apply it to all the excerpts or transcripts.

In [None]:
# You need to make changes to this cell and then run it

# This gets a single transcript chunk from the array. Change the index to test out different transcript chunks
excerpt = transcript_excerpts[10] #dont need to loop through all chunks, u can do a handful
print("EXCERPT: ", excerpt)

# This is where you should implement the code for extracting relevant passages (codes and quotes) for the given excerpt.
extraction_prompt = f'''
Analyze this transcript chunk from an interview with a professional about their career. Your task is to extract the most relevant quotes that answer the question:

“What career insights, perspectives or tips do professionals have that might be helpful for a young person who is just beginning their career journey?”

Each quote should have a short theme (maximum 5 words) that categorizes the insight. Only include quotes from the interviewee, not the interviewer. Prioritize quotes that are salient, evocative, and cross-disciplinary in nature.

If there are no relevant quotes that fit the question, **you must still return a valid JSON**—specifically, an array with one object where both fields are empty strings.

The output must be in valid JSON format, using this structure exactly:

[
  {{
    "initial_code": "<code or empty string>",
    "quote": "<quote or empty string>"
  }},
  ...
]

IMPORTANT:
- Do not return any text outside the JSON.
- If nothing relevant is found, return:
  [
    {{
      "initial_code": "",
      "quote": ""
    }}
  ]

Here is the transcript excerpt to analyze:
{excerpt}
'''


response = API_call("gemini-2.0-flash", GOOGLE_API_KEY, extraction_prompt)
print("RESPONSE: ", response)

# This is where you should add code for applying your evaluation prompts. Remember that you have two types of evaluation
# prompts: two that should be applied to the entire produced response and two that need to be applied to the individual
# extracted code-quote pairs within the response array. For the later two, you need to use a loop to apply your
# evaluation prompt to each code-quote pair.

# --- WHOLE RESPONSE EVALUATION PROMPTS ---
whole_eval_prompt_1 = f"""
You exist as a skilled interview analyzer with the goal of teaching students to better analyze interviews in certain lens. You hate nothing quotes that don't actually answer the question.

In this assignment the student was asked to analyze question and answer pairs under the lens of answering the following question "“What career insights, perspectives or tips do professionals have that might be helpful for a young person who is just beginning their career journey?” Given the question and answer pair as well as all of the quotes and codes the student selected, determine whether or not the quotes they chose were relevant.

In this case relevance means whether or not every selected quote is actually relevant to the question asked.

If the quotes are all relevant return 1, otherwise return 0. Follow your answer with the brief reasoning for classifying the responses as such. Remember that you are allowed to disagree with the reasoning of the student.  In some cases the student may provide nothing in the initial code and quotes which can be correct if the answer has nothing of note.

The data will be provided in this format.

[
  {{
    initial_code: “<the code>”,
    quote: “<the quote>”,
  }},
  {{
    initial_code: “<the code>”,
    quote: “<the quote>”,
  }},
  ...
]

For instance, here is a chunk of reading and an output that is done well.




#COMMUNITY SUPPORT


Interviewer  23:06
what are some little things that the community can do to help support firefighters and paramedics cope with all of the difficulties that you guys have to deal with?


Kayla Baumgartner  23:19
Oh, that's a great question. I just I don't know if there's necessarily any special or different treatment that is needed. I think that we actually get a lot of that, especially during this time, we've had a lot of support with people making masks for us, trying to donate food. We've got an abundance of treats brought to our door. If I can recommend anything, maybe just like less sugary treats, progressing into fried foods, so we can keep our firefighters alive, and how is for a longer period of time. But yeah, I mean, I think that we don't need to be treated any differently. I think that it's just more of support it's having. It's on us to really have, like, good support groups outside of work, have a good core group of friends, have supportive family members, and I think that's really what is needed the most, especially with PTSD, becoming a thing over the more recent years and becoming more of like a talked about subject, I think that it's on family members and friends to really just be nurturing and supporting, because We're usually that person at work, or the nurturing orders, so sometimes I need to be coddled as well. Sure.

[
  {{
    initial_code: “Value Personal Support Networks”,
    quote: “it's having ... good support groups outside of work, have a good core group of friends, have supportive family members, and I think that's really what is needed the most",
  }},
  {{
    initial_code: “Take Care of Mental Health”,
    quote: “especially with PTSD, becoming a thing over the more recent years and becoming more of like a talked about subject, I think that it's on family members and friends to really just be nurturing and supporting",
  }}
]


Evaluate the overall revelance as described above to the following output:
{response}
"""

whole_eval_prompt_2 = f"""
You are a professional QA data quality specialist that can verify data integrity. Your job is to determine if a thematic analysis is being performed correctly. The data passed into you is supposed to answer this key question:

“What career insights, perspectives or tips do professionals have that might be helpful for a young person who is just beginning their career journey?”


First, you will need to make sure the data is formatted correctly. It should be formatted like this with no extraneous input.

[
  {{
    initial_code: “<the code>”,
    quote: “<the quote>”,
  }},
  {{
    initial_code: “<the code>”,
    quote: “<the quote>”,
  }},
  ...
]


If there is no quote, it should not have anything in the brackets and be empty. You can think out loud for this one. Return a 1 if the format is correct and a 0 if the format is not correct. You may also see &quot in place of a quote but assume that to be correct as well.

For instance, here is a chunk of reading and an output that is done well.




#COMMUNITY SUPPORT


Interviewer  23:06
what are some little things that the community can do to help support firefighters and paramedics cope with all of the difficulties that you guys have to deal with?


Kayla Baumgartner  23:19
Oh, that's a great question. I just I don't know if there's necessarily any special or different treatment that is needed. I think that we actually get a lot of that, especially during this time, we've had a lot of support with people making masks for us, trying to donate food. We've got an abundance of treats brought to our door. If I can recommend anything, maybe just like less sugary treats, progressing into fried foods, so we can keep our firefighters alive, and how is for a longer period of time. But yeah, I mean, I think that we don't need to be treated any differently. I think that it's just more of support it's having. It's on us to really have, like, good support groups outside of work, have a good core group of friends, have supportive family members, and I think that's really what is needed the most, especially with PTSD, becoming a thing over the more recent years and becoming more of like a talked about subject, I think that it's on family members and friends to really just be nurturing and supporting, because We're usually that person at work, or the nurturing orders, so sometimes I need to be coddled as well. Sure.

[
  {{
    initial_code: “Value Personal Support Networks”,
    quote: “it's having ... good support groups outside of work, have a good core group of friends, have supportive family members, and I think that's really what is needed the most",
  }},
  {{
    initial_code: “Take Care of Mental Health”,
    quote: “especially with PTSD, becoming a thing over the more recent years and becoming more of like a talked about subject, I think that it's on family members and friends to really just be nurturing and supporting",
  }}
]
Evaluate the overall formatting and structure of the following output:
{response}
"""

# --- SEND WHOLE RESPONSE TO API ---
whole_eval_1_response = API_call("gemini-2.0-flash", GOOGLE_API_KEY, whole_eval_prompt_1)
whole_eval_2_response = API_call("gemini-2.0-flash", GOOGLE_API_KEY, whole_eval_prompt_2)

print("WHOLE RESPONSE EVAL 1: ", whole_eval_1_response)
print("WHOLE RESPONSE EVAL 2: ", whole_eval_2_response)


# --- PARSE INDIVIDUAL CODE-QUOTE PAIRS ---
import json

try:
    extracted_items = json.loads(response)
except:
    print("⚠️ Could not parse the response as JSON. Please check format.")
    extracted_items = []

# --- PER-ITEM EVALUATION PROMPTS ---
for i, item in enumerate(extracted_items):
    item_eval_prompt_1 = f"""
You exist as a skilled interview analyzer with the goal of teaching students to better analyze interviews in certain lens. You also absolutely despise when quotes are longer than needed or too short to be usable.

In this assignment the student was asked to analyze question and answer pairs under the lens of answering the following question "“What career insights, perspectives or tips do professionals have that might be helpful for a young person who is just beginning their career journey?” Given the question and answer pair as well as one of the quotes and codes the student selected, determine whether or not quote they selected is exactly as long as it needs to be. Understand that the quotes should focus on generally answering the question so industry specific quotes are to be ignored.

If the theme and code are succinct but still understandable return 1, otherwise return 0. Follow your answer with the brief reasoning for classifying the responses as such. Remember that you are allowed to disagree with the reasoning of the student.  In some cases the student may provide nothing in the initial code and quote sections which can be correct if the answer has nothing of note. You cannot criticise a quote for being too long or short by providing an example of a quote that is not in the answer portion of the text.

Format:
Response: 1 or 0
Reasoning: ...

Task:
Question:
Answer:
Quote and Reasoning:


For instance, here is a chunk of reading and an output that is done well.




#COMMUNITY SUPPORT


Interviewer  23:06
what are some little things that the community can do to help support firefighters and paramedics cope with all of the difficulties that you guys have to deal with?


Kayla Baumgartner  23:19
Oh, that's a great question. I just I don't know if there's necessarily any special or different treatment that is needed. I think that we actually get a lot of that, especially during this time, we've had a lot of support with people making masks for us, trying to donate food. We've got an abundance of treats brought to our door. If I can recommend anything, maybe just like less sugary treats, progressing into fried foods, so we can keep our firefighters alive, and how is for a longer period of time. But yeah, I mean, I think that we don't need to be treated any differently. I think that it's just more of support it's having. It's on us to really have, like, good support groups outside of work, have a good core group of friends, have supportive family members, and I think that's really what is needed the most, especially with PTSD, becoming a thing over the more recent years and becoming more of like a talked about subject, I think that it's on family members and friends to really just be nurturing and supporting, because We're usually that person at work, or the nurturing orders, so sometimes I need to be coddled as well. Sure.


[
  {{
    initial_code: “Value Personal Support Networks”,
    quote: “it's having ... good support groups outside of work, have a good core group of friends, have supportive family members, and I think that's really what is needed the most",
  }},
  {{
    initial_code: “Take Care of Mental Health”,
    quote: “especially with PTSD, becoming a thing over the more recent years and becoming more of like a talked about subject, I think that it's on family members and friends to really just be nurturing and supporting",
  }}
]


In this example, multiple quotes and codes were selected but you will only see one set which is one quote and one code and it is up to you to determine if the task is done correctly. It is also possible you see no quotes and codes and that is also an acceptable as well.You can think out loud for this one. When you are finished, you should return a 1 if the job is done correctly and a 0 if there is something wrong.


Evaluate the succinctness of the following code-quote pair:
{item}
"""

    item_eval_prompt_2 = f"""
You exist as a skilled interview analyzer with the goal of teaching students to better analyze interviews in certain lens.

In this assignment the student was asked to analyze question and answer pairs under the lens of answering the following question "“What career insights, perspectives or tips do professionals have that might be helpful for a young person who is just beginning their career journey?” Given the question and answer pair as well as one of the quotes and codes the student selected, determine whether or not the theme code pair they chose was both accurate to the quote itself and how it answers the question

If the theme and code are accurate return 1, otherwise return 0. Follow your answer with the brief reasoning for classifying the responses as such. Remember that you are allowed to disagree with the reasoning of the student.  In some cases the student may provide an empty quote and code which can be correct if the answer has nothing of note.

Format:
Response: 1 or 0
Reasoning: ...

Task:
Question:
Answer:
Quote and Reasoning:

For instance, here is a chunk of reading and an output that is done well.




#COMMUNITY SUPPORT


Interviewer  23:06
what are some little things that the community can do to help support firefighters and paramedics cope with all of the difficulties that you guys have to deal with?


Kayla Baumgartner  23:19
Oh, that's a great question. I just I don't know if there's necessarily any special or different treatment that is needed. I think that we actually get a lot of that, especially during this time, we've had a lot of support with people making masks for us, trying to donate food. We've got an abundance of treats brought to our door. If I can recommend anything, maybe just like less sugary treats, progressing into fried foods, so we can keep our firefighters alive, and how is for a longer period of time. But yeah, I mean, I think that we don't need to be treated any differently. I think that it's just more of support it's having. It's on us to really have, like, good support groups outside of work, have a good core group of friends, have supportive family members, and I think that's really what is needed the most, especially with PTSD, becoming a thing over the more recent years and becoming more of like a talked about subject, I think that it's on family members and friends to really just be nurturing and supporting, because We're usually that person at work, or the nurturing orders, so sometimes I need to be coddled as well. Sure.


[
  {{
    initial_code: “Value Personal Support Networks”,
    quote: “it's having ... good support groups outside of work, have a good core group of friends, have supportive family members, and I think that's really what is needed the most",
  }},
  {{
    initial_code: “Take Care of Mental Health”,
    quote: “especially with PTSD, becoming a thing over the more recent years and becoming more of like a talked about subject, I think that it's on family members and friends to really just be nurturing and supporting",
  }}
]


In this example, multiple quotes and codes were selected but you will only see one set which is one quote and one code and it is up to you to determine if the task is done correctly. It is also possible you see no quotes and codes and that is also an acceptable as well.You can think out loud for this one. When you are finished, you should return a 1 if the job is done correctly and a 0 if there is something wrong.
{item}
"""

    item_eval_1_response = API_call("gemini-2.0-flash", GOOGLE_API_KEY, item_eval_prompt_1)
    item_eval_2_response = API_call("gemini-2.0-flash", GOOGLE_API_KEY, item_eval_prompt_2)

    print(f"\n--- EVALUATIONS FOR ITEM #{i+1} ---")
    print("Item: ", item)
    print("Relevance Eval: ", item_eval_1_response)
    print("Quality Eval: ", item_eval_2_response)


EXCERPT:  Interviewer:15:39  
Yeah, you had, you had the, the physical and mental preparedness to get into it,




Kayla Baumgartner  15:46  
I think so. And I'm not really sure where the mental part came from. It might just be the fact that I have, I may have gotten that from sport, sure. Yeah, sports all my life. I never, I mean, I played volleyball, so that was my big thing, and I went to college for that, so I've always been on a women's sports team, but I think that if you are competitive in nature, and you kind of,




#WORKING AS A FIREFIGHTER
RESPONSE:  [
  {
    "initial_code": "Kayla Baumgartner  15:46",
    "quote": "I think that if you are competitive in nature..."
  }
]

WHOLE RESPONSE EVAL 1:  0

The quote "I think that if you are competitive in nature..." provides no clear insights, perspectives, or tips that would be helpful for a young person beginning their career journey. It's an incomplete thought and lacks actionable advice.

WHOLE RESPONSE EVAL 2:  Okay, let's bre

# Creating final list of Quote/Code Pairs

Since you already generated the code/quote pairs in previous weeks, and in order to save your quota of Gemini API keys, we will not regenerate the outputs from the initial coding you did before. Instead, we will simply upload a CSV file containing your generated outputs from HW 3.

Copy the generated JSON array outputs for all the excerpts for your 6 transcripts and add them to a spreadsheet formatted with only ONE column containing those outputs and no other extra columns (use the provided template). Download it as a CSV file and then run the next cell to upload it:

In [None]:
print("📄 Please upload a CSV file with a single column containing your generated JSON array outputs for each transcript excerpt.")
uploaded = files.upload()
filenames = list(uploaded.keys())
filename = filenames[0]
df = pd.read_csv(filename)
column_as_list = df.iloc[:, 0].dropna().tolist()

list_of_codes_quotes = column_as_list
print("NUMBER OUTPUTS: ", len(list_of_codes_quotes))
print("FIRST OUTPUT: ", list_of_codes_quotes[0])

📄 Please upload a CSV file with a single column containing your generated JSON array outputs for each transcript excerpt.


Saving Copy of 4. Code_Quote Pairs Single Column Spreadsheet - Code_Quote Pairs Sheet.csv to Copy of 4. Code_Quote Pairs Single Column Spreadsheet - Code_Quote Pairs Sheet.csv
NUMBER OUTPUTS:  59
FIRST OUTPUT:  {
    initial_code: "Utilize Available Resources",
    quote: "The easiest way for someone at that age to enter into this field is through the contest, because they have the support system, they have the resources that are provided by the college consortium in this region."
  }


# Creating Overall Themes

Now based on the codes that were generated, you will now generate themes (and potentially subthemes) that answer the research question. Decide how you want to break down your process of creating Themes into smaller steps and make prompts for each step.

Step 1 of your prompt workflow for creating themes:

In [None]:
#You need to make changes to this cell and then run it

prompt1 =  f"""
You are given a list of code and quote pairs extracted from interviews about career advice. Your task is to analyze these quotes and organize them into a hierarchy of:

1. High-level themes
2. Subthemes under each theme
3. A list of quotes associated with each subtheme

Each **subtheme** should group together quotes with a clearly related idea. Each **theme** should represent a broader career insight that connects multiple subthemes. Try to find distinct, coherent, and grounded groupings that answer the question: “What career insights, perspectives or tips do professionals have that might be helpful for a young person just beginning their career journey?”

Return the result in **JSON** format like this:

[
  {{
    "theme": "<Theme Name>",
    "description": "<Theme Description>",
    "subthemes": [
      {{
        "subtheme": "<Subtheme Name>",
        "quotes": ["<Quote 1>", "<Quote 2>", ...]
      }},
      ...
    ]
  }},
  ...
]

Use this list of code-quote pairs to create your structure:
{list_of_codes_quotes}
"""

response1 = API_call("gemini-2.0-flash", GOOGLE_API_KEY, prompt1)
print(response1)

```json
[
  {
    "theme": "Embrace Continuous Learning and Growth",
    "description": "Highlighting the importance of adaptability, curiosity, and a commitment to lifelong learning for sustained career success.",
    "subthemes": [
      {
        "subtheme": "Adaptability and Flexibility",
        "quotes": [
          "But then when we started working with schools, we realized that everyone has Chromebooks... And so we read about the application kind of around the Chromebooks and around tools that schools and teachers already had.",
          "I thought that I wanted to be a doctor, started as a chemistry major. Got a C in organic chemistry and was devastated. Thought I could never become a doctor and dropped all my science classes and took a random set of classes, and one of them being economics, and I enjoyed it.",
          "So as I said, it was a journey. So I bought a few businesses after that. Went back working for my dad, and then ended up at Verve after I sold those busines

Now you use the output from the previous prompt and feed it into the next step:

In [None]:
#You need to make changes to this cell and then run it

prompt2 = f"""
You are given a hierarchical structure of themes, subthemes, and their associated quotes. For each subtheme, select the **single most essence-capturing quote** — one that is evocative, illustrative, and clearly conveys the subtheme’s insight.

Present the final result in a readable format like:

<Theme Name>
* <Subtheme Name>: "<Best Quote>"

Use this data to generate your output:
{response1}
"""

response2 = API_call("gemini-2.0-flash", GOOGLE_API_KEY, prompt2)
print(response2)

**Embrace Continuous Learning and Growth**
* Adaptability and Flexibility: "I thought that I wanted to be a doctor, started as a chemistry major. Got a C in organic chemistry and was devastated. Thought I could never become a doctor and dropped all my science classes and took a random set of classes, and one of them being economics, and I enjoyed it."
* Continuous Skill Development: "I knew nothing about Excel and spreadsheets, and after working with him, I would I don't want to say I was a wizard, but I came away with knowing far more than nothing."
* Learning from Mistakes and Feedback: "When I got to college, and I took my first class, I failed miserably... My overconfidence really shot me in the foot. But I failed that class and a lot of it was just because I hadn't seen it before."
* Embrace Exploration and New Opportunities: "if you are willing to kind of raise your hand and take on new projects, or new challenges that you learn a lot, so I didn\'t expect to know, learn so much, 


Your final output of the prompt workflow should be a hierarchical outline of your themes, subthemes, and a quote per subtheme chosen to be the most salient, essence-capturing, or evocative quote for that theme/ subtheme.

You can make more prompts if needed by copypasting the below code again and again and making more prompts and response varibales. If not needed you can ignore the cells below.


In [None]:
#You need to make changes to this cell and then run it

prompt3 = f'''
    INSERT PROMPT HERE


    using {response1}
    '''

response3 = API_call("gemini-2.0-flash", GOOGLE_API_KEY, prompt3)
print(response3)