## GP Transco assignment

Given an audio conversation between two people you need to:

* Produce a concise summary encapsulating the essential points of the conversation.
* Rate the overall sentiment of the dialogue on a scale from 1 to 10,  with 1 indicating a highly negative sentiment and 10 denoting a highly positive sentiment.
* Extract key emotions in call with their score for each ranging from 1 to 10 with 1 indicating a highly negative sentiment and 10 denoting a highly positive sentiment.
* Extract Appointment time to Delivery from the dialogue.

In [55]:
from openai import OpenAI
import pprint as pp

### Utility funcs

* Using a single function for completions, only manipulating the system prompt.
* Need to set your own OPENAI_API_KEY env variable.
* No error handling or reproducibility efforts as this is only a mock-up with a single data entry.

In [72]:
# initialize a global openai client instance
client = OpenAI()

# generate the transcript
def get_transcript(audio_file_name):
  audio_file = open(audio_file_name, "rb")
  transcript = client.audio.transcriptions.create(
    file=audio_file,
    model="whisper-1",
    response_format="text",
    # no need for the timestamps in this simple case
  #   timestamp_granularities=["word"]
  )
  return transcript


# a generic function to get the gpt completion
def get_completion(model_name, temperature, max_tokens, system_prompt, input):
    completion = client.chat.completions.create(
        model=model_name,
        temperature=temperature,
        max_tokens=max_tokens,
        messages=[
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": input
            }
        ]
    )
    
    return completion.choices[0].message.content


### 0. Generate a transcript of the call.

* Use Whisper API

In [73]:
# try out the basic transcription first, whisper api
audio_file_name = "aud-20240305025406001766-5f28b652ac8760c054204aa095bc31e3-C862.wav"

transcript = get_transcript(audio_file_name)


print("The transcript: ")
# pprint to make it more readable in the notebook
pp.pprint(transcript)

The transcript: 
("Hey, Fidel, I'm sorry, this is Juan. Hey, I'm done with the truck. Okay, "
 'perfect, perfect. What time is my truck going to kick in tomorrow? Let me '
 'see real quick. For tomorrow it says open window, actually. Let me double '
 'check that real quick, Fidel. I have an 8 a.m. appointment time for the '
 'delivery. On Wednesday? On Wednesday, yes, sir. Alright. Yeah, yeah. But '
 "it's a 24-7 facility, so if you have any trouble on the route, just give me "
 "a call so we can fix it, okay? Yeah, I'm not leaving until tomorrow. I'm out "
 "of time, so I just use PC out here? No, no, no, it's fine, it's fine. Are "
 "you at HQ, right? Yeah, I'm out of time, so... No, it's fine, you can sit "
 'down at headquarters. So I roll with YARMUV? Do I use YARMUV? Oh, yeah, '
 "YARMUV, it's okay, it's fine. Okay, yeah, and you can also take the 15 "
 "minutes for the post-trip. It won't count like a violation, so you can take "
 'the 15 minutes for the post-trip. Oh, I can use

### 1. Produce a concise summary encapsulating the essential points of the conversation.

In [74]:
# let's go with gpt4-turbo-preview
model_name = "gpt-4-turbo-preview"
# temperature or top_p where the model considers the results of the tokens with top_p probability mass.
temperature = 0
max_tokens = 1000
system_prompt = """
    You are a helpful assistant for a logistics company. Your task is to produce a 
    concise summary of the conversation between a truck driver and his manager in 
    the transcribed text encapsulating the essential points. Write down the main 
    bullet points summarizing the conversation. Use only the context provided, don't use abbreviations.
    """

summary = get_completion(model_name, temperature, max_tokens, system_prompt, transcript)

print("Consice summary: ")
print(summary)

Consice summary: 
- Juan informs Fidel that he has finished with the truck.
- Juan inquires about his next schedule, and Fidel confirms an 8 a.m. delivery appointment on Wednesday.
- Fidel mentions the delivery location is a 24-7 facility and offers assistance if Juan encounters any issues on the route.
- Juan states he won't leave until tomorrow due to being out of driving time and asks if he should use Personal Conveyance (PC) to move the truck.
- Fidel advises Juan to stay at headquarters instead of using PC and confirms it's fine to use Yard Move (YARMUV) for necessary movements.
- Fidel allows Juan to take 15 minutes for a post-trip inspection without it counting as a violation.
- Juan is instructed to switch from YARMUV to on-duty for the post-trip inspection, then to sleeper berth or off-duty status.


### 2. Rate the overall sentiment of the dialogue on a scale from 1 to 10,  with 1 indicating a highly negative sentiment and 10 denoting a highly positive sentiment.

* Process the original transcript as the sentiment needs to be derived from the complete conversation, not the summary.
* One could also consider sentiment over time (both during the conversation and how the sentiment changes between different conversations in time).
* Only the overall sentiment is needed, not individual sentiments from each person.

In [75]:
temperature = 0
# only need a single token in the completion
max_tokens = 1
model_name = "gpt-4-turbo-preview"
system_prompt = """
    You are a helpful sentiment analyzer assistant. Your task is to determine what is the
    sentiment conveyed by the text. Rate the overall sentiment of the dialogue on a scale 
    from 1 to 10, with 1 indicating a highly negative sentiment and 10 denoting a highly 
    positive sentiment. Provide only a sentiment rating in the reply.
    """

sentiment = get_completion(model_name, temperature, max_tokens, system_prompt, transcript)

In [76]:
print("Overall sentiment: ")
print(sentiment)

Overall sentiment: 
7


### 3. Extract key emotions in call with their score for each ranging from 1 to 10 with 1 indicating a highly negative sentiment and 10 denoting a highly positive sentiment.

* Possibly need to figure out what are the possible/relevant key emotions first. But for now just go with what gpt provides.


In [85]:
max_tokens=1000
system_prompt = """
    You are an advanced sentiment and emotion analyzer assistant. Your task is
    to determine what are the strongest positive and negative emotions in the provided conversation text. Extract from 
    two to eight key emotions in the conversation and rate each of those emotions on a scale 
    from 1 to 10, with 1 indicating a highly negative sentiment and 10 denoting a highly positive
    sentiment. Return a table with the name of the emotion and its rating in each row.
    """

key_emotions = get_completion(model_name, temperature, max_tokens, system_prompt, transcript)

In [86]:
print("Key emotions: ")
print(key_emotions)
# it seems that the rating of emotions is not consistent over multiple runs.
# also, it would be useful to provide a list of relevant emotions to the model for more consistency

Key emotions: 
| Emotion         | Rating |
|-----------------|--------|
| Cooperation     | 9      |
| Reassurance     | 8      |
| Understanding   | 8      |
| Patience        | 7      |
| Concern         | 6      |
| Relief          | 7      |


### 4. Extract Appointment time to Delivery from the dialogue.

* Again, here we use the original transcript, not the summary.
* As there are no further details, return the delivery time in a sentence.

In [81]:
system_prompt = """
    You are a helpful assistant for a logistics company. Your task is
    to analyze the provided input conversation and extract the appointment 
    time to delivery from the conversation. Return the exact extracted delivery 
    appointment time in a brief summary sentence.
    """

delivery_time = get_completion(model_name, temperature, max_tokens, system_prompt, transcript)

In [82]:
print("Delivery time: ")
print(delivery_time)

Delivery time: 
The delivery appointment time is at 8 a.m. on Wednesday.
