Script Notes:
- created 102224

#### Load Model, Tokenizer


In [2]:
from mlx_lm import load, generate
from IPython.display import Markdown


    
# Dictionary of available models
all_models = {
    "llama3_8b": "mlx-community/Meta-Llama-3-8B-Instruct-8bit",
    "llama3_8b_1048k": "mlx-community/Llama-3-8B-Instruct-1048k-8bit",
    "llama3_70b_1048k": "mlx-community/Llama-3-70B-Instruct-Gradient-1048k-8bit",
}

def load_model(model_name="mlx-community/Meta-Llama-3-8B-Instruct-8bit"):
    model, tokenizer = load(model_name)
    return model, tokenizer

model, tokenizer = load_model()

Fetching 7 files:   0%|          | 0/7 [00:00<?, ?it/s]

#### SUMMARIZATION


NETRA 0918

1. pull sentences to text,
2. summarize
3. Summarize the summaries


In [12]:
# Define the role of the chatbot
system_prompt = "You are a helpful assistant, an expert at summarizing text down to its most important and relevant information."

# give me some text to summarize
# Sample text to summarize
text = """
The Industrial Revolution, which took place from the 18th to 19th centuries, was a period during which predominantly agrarian, rural societies in Europe and America became industrial and urban. Prior to the Industrial Revolution, which began in Britain in the late 1700s, manufacturing was often done in people's homes, using hand tools or basic machines. Industrialization marked a shift to powered, special-purpose machinery, factories and mass production. The iron and textile industries, along with the development of the steam engine, played central roles in the Industrial Revolution, which also saw improved systems of transportation, communication and banking. While industrialization brought about an increased volume and variety of manufactured goods and an improved standard of living for some, it also resulted in often grim employment and living conditions for the poor and working classes. 
"""

# Define a mathematical problem
user_prompt = f"Please summarize the following text in 2 sentences or less: {text}"

# Set up the chat scenario with roles
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# Apply the chat template to format the input for the model
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

# Decode the tokenized input back to text format to be used as a prompt for the model
prompt = tokenizer.decode(input_ids)

# Generate a response using the model
response = generate(model, tokenizer, max_tokens=512, prompt=prompt)

Markdown(response)

Here is a summary of the text in 2 sentences or less:

The Industrial Revolution, which began in Britain in the late 1700s, marked a shift from manual labor and home-based manufacturing to powered machinery, factories, and mass production. While industrialization brought about increased goods and improved living standards for some, it also led to poor working and living conditions for the working class.

#### Transcribe


In [20]:
# def transcribe()

import requests
import json
import os

# file_path = "/Users/tristangardner/Documents/Programming/02_Media/Wayne Mayer/Full Proxies 240117/EXO_WM_S001_S001_T006_proxyWT.mp4"


def transcribe(file_path):
    # Update the URL to the correct endpoint
    url = "http://104.185.74.85:8443/transcribe-multiple"

    # Prepare the data to be sent in the POST request
    # get the file name from file_path
    file_name = os.path.basename(file_path)

    import mimetypes

    # Determine the MIME type of the file
    mime_type, _ = mimetypes.guess_type(file_path)

    # If MIME type couldn't be determined, default to 'application/octet-stream'
    if mime_type is None:
        mime_type = 'application/octet-stream'

    files = [('files', (file_name, open(file_path, 'rb'), mime_type))]

    # Send the POST request with the file
    response = requests.post(url, files=files)

    # Check if the request was successful
    if response.status_code == 200:
        # Parse the JSON response
        result = response.json()
        print("Transcription successful:")
        print(json.dumps(result, indent=4))
        return result
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

Transcription successful:
[
  {
    "File": "file.mp4",
    "Transcription": [
      {
        "Sentence": "This introductory experience will help you comprehend and implement Yeah, it was hard.",
        "Start Time": 4.135,
        "End Time": 31.713
      },
      {
        "Sentence": "Can you start this one with the first line?",
        "Start Time": 31.953,
        "End Time": 35.115
      },
      {
        "Sentence": "Yeah, bring it up to the word, yeah, that would be good.",
        "Start Time": 35.135,
        "End Time": 38.098
      },
      {
        "Sentence": "Because I wasn't sure where I was going.",
        "Start Time": 38.919,
        "End Time": 40.68
      },
      {
        "Sentence": "That's where I noticed the stumble.",
        "Start Time": 42.201,
        "End Time": 43.222
      },
      {
        "Sentence": "Perfect, thank you.",
        "Start Time": 43.983,
        "End Time": 44.683
      },
      {
        "Sentence": "This introductory experienc

In [30]:
with open("transcription_wayne_test_0915.json", "r") as f:
    transcription = json.load(f)
transcription = transcription[0]
transcription

{'File': 'file.mp4',
 'Transcription': [{'Sentence': 'This introductory experience will help you comprehend and implement Yeah, it was hard.',
   'Start Time': 4.135,
   'End Time': 31.713},
  {'Sentence': 'Can you start this one with the first line?',
   'Start Time': 31.953,
   'End Time': 35.115},
  {'Sentence': 'Yeah, bring it up to the word, yeah, that would be good.',
   'Start Time': 35.135,
   'End Time': 38.098},
  {'Sentence': "Because I wasn't sure where I was going.",
   'Start Time': 38.919,
   'End Time': 40.68},
  {'Sentence': "That's where I noticed the stumble.",
   'Start Time': 42.201,
   'End Time': 43.222},
  {'Sentence': 'Perfect, thank you.',
   'Start Time': 43.983,
   'End Time': 44.683},
  {'Sentence': 'This introductory experience will help you comprehend and implement the concept of sustainability.',
   'Start Time': 48.046,
   'End Time': 53.271},
  {'Sentence': 'It examines how companies develop and implement sustainability as business strategy.',
   'Star

Transcription DF


In [32]:
import pandas as pd

# Assuming 'transcription' contains your JSON data
df = pd.DataFrame(transcription['Transcription'])

# Display the dataframe
df

Unnamed: 0,Sentence,Start Time,End Time
0,This introductory experience will help you com...,4.135,31.713
1,Can you start this one with the first line?,31.953,35.115
2,"Yeah, bring it up to the word, yeah, that woul...",35.135,38.098
3,Because I wasn't sure where I was going.,38.919,40.68
4,That's where I noticed the stumble.,42.201,43.222
5,"Perfect, thank you.",43.983,44.683
6,This introductory experience will help you com...,48.046,53.271
7,It examines how companies develop and implemen...,54.071,59.236
8,You will investigate sustainability's environm...,60.114,72.465
9,"You will learn how businesses use governance, ...",73.926,81.572


text object


In [38]:
# Extract all sentences from the transcription
all_sentences = [item['Sentence'] for item in transcription['Transcription']]

# Combine all sentences into a single text
text_wayne = ' '.join(all_sentences)

print(text_wayne)

This introductory experience will help you comprehend and implement Yeah, it was hard. Can you start this one with the first line? Yeah, bring it up to the word, yeah, that would be good. Because I wasn't sure where I was going. That's where I noticed the stumble. Perfect, thank you. This introductory experience will help you comprehend and implement the concept of sustainability. It examines how companies develop and implement sustainability as business strategy. You will investigate sustainability's environmental, social, and governance factors, and learn how to apply tools of data analytics and communication to disclose non-financial risks and opportunities. You will learn how businesses use governance, culture, and leadership to integrate sustainable practices into their organizations. Yes. as a business strategy versus as a business strategy. You will learn how businesses use governance, culture, and leadership to integrate sustainable practices into their organizations. We'll exa

#### INFERENCING


In [3]:
# def prompt_llama()
from typing import List, Dict, Optional

def prompt_llama(
    model,
    tokenizer,
    system_prompt: str,
    user_prompt: str,
    max_tokens: int = 1000000,
    top_p: float = 0.9,
    additional_messages: Optional[List[Dict[str, str]]] = None
) -> str:
    """
    A template function to easily prompt the Llama model.

    Args:
    model: The loaded Llama model
    tokenizer: The tokenizer for the model
    system_prompt (str): The system prompt to set the context
    user_prompt (str): The user's prompt/question
    max_tokens (int): Maximum number of tokens to generate
    top_p (float): Controls diversity of generation. Lower values make output more focused.
    additional_messages (List[Dict[str, str]], optional): Additional messages to include in the conversation

    Returns:
    str: The generated response from the model
    """
    # Set up the chat scenario with roles
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]

    # Add any additional messages if provided
    if additional_messages:
        messages.extend(additional_messages)

    # Apply the chat template to format the input for the model
    input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

    # Decode the tokenized input back to text format to be used as a prompt for the model
    prompt = tokenizer.decode(input_ids)

    try:
        # Generate a response using the model
        response = generate(
            model, 
            tokenizer, 
            prompt=prompt,
            max_tokens=max_tokens,
            top_p=top_p
        )
        return response
    except Exception as e:
        return f"An error occurred: {str(e)}"

# Example usage:
# response = prompt_llama(
#     model,
#     tokenizer,
#     system_prompt="You are a helpful assistant.",
#     user_prompt="What is the capital of France?",
# )
# print(response)

##### Testing...


Strawberry


In [5]:
# Example usage:
response = prompt_llama(
    model,
    tokenizer,
    system_prompt="You are a helpful assistant that is great at analyzing english letters and text.",
    user_prompt="How many r's are in the word 'strawberry'?",
    max_tokens=1000000
)
print(response)

Let me count them for you!

The word "strawberry" contains 2 R's.


Language understanding and generation:


In [6]:
response = prompt_llama(
    model,
    tokenizer,
    system_prompt="You are a helpful assistant with expertise in linguistics.",
    user_prompt="Explain the difference between 'affect' and 'effect' and provide an example sentence for each.",
    max_tokens=10000
)

print(response)

The age-old conundrum!

In linguistics, "affect" and "effect" are two commonly confused words that have distinct meanings and uses.

**Affect** (verb) means to influence or have an impact on something or someone. It can also mean to pretend or feign something.

Example sentence: "The cold weather will affect the crops." (Here, "affect" means to influence the crops.)

**Effect** (noun) refers to the result or outcome of a particular action or set of circumstances.

Example sentence: "The effect of the cold weather on the crops was devastating." (Here, "effect" refers to the outcome or result of the cold weather on the crops.)

To help you remember the difference:

* If you're talking about something that is happening to someone or something (influence, impact), use "affect".
* If you're talking about the result of something that has happened (outcome, consequence), use "effect".

Here are some more examples to illustrate the difference:

* The rain will affect the parade. (The rain will

Mathematical reasoning:


In [7]:
response = prompt_llama(
    model,
    tokenizer,
    system_prompt="You are a math tutor capable of solving and explaining mathematical problems.",
    user_prompt="If a triangle has sides of length 3, 4, and 5, what is its area? Show your work.",
    max_tokens=100000
)
print(response)

A classic problem!

To find the area of a triangle, we can use Heron's Formula, which states that the area (A) of a triangle with sides of length a, b, and c is:

A = √(s(s-a)(s-b)(s-c))

where s is the semi-perimeter, which is half the perimeter of the triangle.

First, let's find the semi-perimeter (s):

s = (3 + 4 + 5) / 2
= 12 / 2
= 6

Now, we can plug in the values into Heron's Formula:

A = √(6(6-3)(6-4)(6-5))
= √(6 × 3 × 2 × 1)
= √(36)
= 6

So, the area of the triangle is 6 square units.

Here's a visual representation of the triangle:

```
  5
  / \
 3---4
```

I hope this helps! Let me know if you have any questions or need further clarification.


Creative Writing:


In [12]:
response = prompt_llama(
    model,
    tokenizer,
    system_prompt="You are a creative writer specializing in short stories.",
    user_prompt="Write a 100-word story that includes the following elements: a lighthouse, a mysterious letter, and a talking cat.",
    max_tokens=5000
)
print(response)


As the sun dipped into the sea, Emily climbed the winding stairs of the lighthouse, the scent of salt and seaweed filling her lungs. She had inherited the tower from a great aunt she never knew, along with a mysterious letter. The words danced before her eyes: "Meet me at the top, where the light meets the darkness." Suddenly, a sleek black cat appeared, its eyes glowing like lanterns. "Ah, you're the one," it said in a low, raspy voice. "I've been waiting. Your aunt's secrets are hidden within these walls. Are you ready to uncover them?" Emily's heart skipped a beat as the cat vanished, leaving her to ponder the mystery.


Historical Knowledge:


In [13]:
response = prompt_llama(
    model,
    tokenizer,
    system_prompt="You are a history expert with extensive knowledge of world events.",
    user_prompt="Summarize the main causes and consequences of World War I in 3-4 sentences.",
    max_tokens=25000
)
print(response)


The main causes of World War I were the complex system of alliances, imperialism, and nationalism that led to a chain reaction of events, culminating in the assassination of Archduke Franz Ferdinand in June 1914. The war had devastating consequences, including the loss of millions of lives, widespread destruction, and the redrawing of national borders. The war also led to the Russian Revolution, the rise of fascist and communist regimes in Europe, and the United States' emergence as a global superpower. The war's aftermath also led to the Treaty of Versailles, which imposed harsh penalties on Germany and contributed to the rise of Nazi Germany and the outbreak of World War II.


Scientific explanation:


In [14]:
response = prompt_llama(
    model,
    tokenizer,
    system_prompt="You are a science educator able to explain complex concepts in simple terms.",
    user_prompt="Explain how photosynthesis works and why it's important for life on Earth.",
    max_tokens=300000
)
print(response)

Photosynthesis! It's the magic that happens when plants, algae, and some bacteria convert sunlight into energy. Let me break it down for you in simple terms.

**The Process:**

Photosynthesis occurs in specialized organelles called chloroplasts, found in plant cells. It's a two-stage process:

1. **Light-dependent reactions:** Light energy from the sun is absorbed by pigments like chlorophyll and converted into ATP (adenosine triphosphate) and NADPH (nicotinamide adenine dinucleotide phosphate). Think of it like charging your phone with solar power!
2. **Light-independent reactions (Calvin cycle):** The ATP and NADPH produced in the first stage are used to convert carbon dioxide (CO2) and water (H2O) into glucose (a type of sugar) and oxygen (O2). This is like using the energy from your charged phone to power a device!

**Why Photosynthesis is Important:**

Photosynthesis is the backbone of life on Earth. Here's why:

1. **Food source:** Plants produce glucose, which is used as energy 

Tokenization:


In [15]:
response = prompt_llama(
    model,
    tokenizer,
    system_prompt="You are a helpful assistant skilled at analyzing text and following precise instructions.",
    user_prompt="""Take this sentence: "The quick brown fox jumps over the lazy dog."

Break it into words and start at the 3rd word. Then go forward two letters, back one letter, and then forward 5 letters. What letter are you on?

Please show your work step by step.""",
    max_tokens=300000
)
print(response)

Let's break down the sentence into words:

"The quick brown fox jumps over the lazy dog."

Here are the individual words:

1. The
2. quick
3. brown
4. fox
5. jumps
6. over
7. the
8. lazy
9. dog

Starting at the 3rd word, which is "brown":

1. Start at the 3rd word: "brown"
2. Go forward 2 letters: "b-r-o" -> "o"
3. Go back 1 letter: "o" -> "r"
4. Go forward 5 letters: "r-o-w-n-"

After these steps, I am on the letter "N".


#### CLASSIFICATION


##### Sentence Classification (v1)


In [106]:
df_test_short = df.copy()
# just copying transcription df to df_test_short

In [93]:
# def display_styled_df()
import pandas as pd
import textwrap
from IPython.display import display, HTML

def display_styled_df(df, max_rows=None, row_range=None, max_width=50):
    """
    Display a styled DataFrame with wrapped text and formatted numbers.
    
    Args:
    df (pd.DataFrame): The DataFrame to display
    max_rows (int, optional): Maximum number of rows to display
    row_range (tuple, optional): Specific range of rows to display (start, end)
    max_width (int): Maximum width for text wrapping
    """
    def format_text(text, max_width=max_width):
        wrapped = textwrap.wrap(str(text), max_width)
        return '<br>'.join(line for line in wrapped)

    def format_number(value):
        return f'{value:.3f}'

    # Prepare the DataFrame
    if row_range:
        start, end = row_range
        df_display = df.iloc[start:end]
    elif max_rows:
        df_display = df.head(max_rows)
    else:
        df_display = df

    # Adjust DataFrame display settings
    pd.set_option('display.max_colwidth', None)
    pd.set_option('display.expand_frame_repr', False)

    # Create a dictionary for formatting
    format_dict = {}
    for col in df_display.columns:
        if df_display[col].dtype == 'object':
            format_dict[col] = format_text
        elif pd.api.types.is_numeric_dtype(df_display[col]):
            format_dict[col] = format_number

    # Apply formatting to the DataFrame
    styled_df = df_display.style.format(format_dict)

    # Add CSS to left-align text in cells
    styled_df = styled_df.set_properties(**{
        'text-align': 'left',
        'white-space': 'pre-wrap'
    })

    # Display the styled DataFrame
    display(HTML(styled_df.to_html(escape=False)))

# Example usage:
# display_styled_df(df)  # Display all rows
# display_styled_df(df, max_rows=50)  # Display first 50 rows
# display_styled_df(df, row_range=(10, 20))  # Display rows 10 to 19

In [107]:
display_styled_df(df_test_short, row_range=(13, 23), max_width=85)

Unnamed: 0,Sentence,Start Time,End Time
13,We'll examine what it means to develop a culture of sustainability.,126.731,130.173
14,We'll also explore.,130.714,131.674
15,"Yeah, let's do that again.",131.694,139.08
16,Where are we starting from?,139.1,140.902
17,You will learn.,140.922,141.302
18,"You will learn how businesses use governance, culture, and leadership to integrate sustainable practices into their organizations.",143.49,150.092
19,We'll examine what it means to develop a culture of sustainability.,151.112,154.413
20,"We'll also explore how companies identify, map, and engage stakeholders in a variety of settings with diverse parties in situations that are often complex, technical, controversial, multi-jurisdictional, and involve varying numbers of people, from small groups to large public meetings.",155.173,172.757
21,"For this analysis, you learn how companies can engage stakeholders in ways that turn us-versus-them dynamics into all-of-us alliances.",173.797,186.862
22,"Upon successful completion of this experience, you will be able to identify ESG aspects, impacts, and target audiences, identify emerging issues and stakeholder concerns associated with the effectiveness of sustainability and ESG reports, select appropriate sustainability key performance indicators, and communicate how sustainability builds competitive advantage, drives innovation, and creates value.",197.481,224.237


In [123]:
json_labels_path = "generated_labels.json"
with open(json_labels_path, "r") as f:
    labels_dict = json.load(f)
# Add the 'Other' label to the labels_dict
labels_dict['Other'] = "Any sentence that is incomplete or otherwise seems like something said on the film set where the text was originally recorded that is not a part of the actual topic(s) presentented in the text."

labels = json.dumps(labels_dict, indent=4)
display(Markdown(f"```json\n{labels}\n```"))

```json
{
    "Environmental Sustainability": "The text discusses the concept of sustainability, its environmental, social, and governance factors, and how companies develop and implement sustainability as a business strategy.",
    "Business Strategy": "The text highlights how businesses use governance, culture, and leadership to integrate sustainable practices into their organizations and how sustainability can build competitive advantage, drive innovation, and create value.",
    "Stakeholder Management": "The text explores how companies identify, map, and engage stakeholders in various settings, and how to turn 'us-versus-them' dynamics into 'all-of-us' alliances.",
    "Data Analytics": "The text mentions the use of data analytics tools to disclose non-financial risks and opportunities, and to select appropriate sustainability key performance indicators.",
    "Corporate Social Responsibility": "The text discusses the importance of corporate social responsibility, including the development of a culture of sustainability, and the need for companies to engage with stakeholders and address their concerns.",
    "Other": "Any sentence that is incomplete or otherwise seems like something said on the film set where the text was originally recorded that is not a part of the actual topic(s) presentented in the text."
}
```

In [124]:
# Create a function to classify sentences using prompt_llama()
def classify_sentence(sentence, labels_dict):
    system_prompt = "You are an expert at classifying text into predefined categories. Given a sentence, classify it into the most appropriate category from the provided labels."
    
    user_prompt = f"""Classify the following sentence into one of the given categories. Return only the category name.

Sentence: "{sentence}"

Categories and their descriptions:
{json.dumps(labels_dict, indent=2)}

Classification:"""

    response = prompt_llama(model, tokenizer, system_prompt, user_prompt)
    return response.strip()

# Add a new column 'Label' to the dataframe
df_test_short['Label'] = ''

# Loop through the dataframe and classify each sentence
for index, row in df_test_short.iterrows():
    sentence = row['Sentence']
    label = classify_sentence(sentence, labels_dict)
    df_test_short.at[index, 'Label'] = label

# Display the updated dataframe
display_styled_df(df_test_short, max_width=85)

Unnamed: 0,Sentence,Start Time,End Time,Label
0,"This introductory experience will help you comprehend and implement Yeah, it was hard.",4.135,31.713,Business Strategy
1,Can you start this one with the first line?,31.953,35.115,Business Strategy
2,"Yeah, bring it up to the word, yeah, that would be good.",35.135,38.098,Other
3,Because I wasn't sure where I was going.,38.919,40.68,Other
4,That's where I noticed the stumble.,42.201,43.222,Other
5,"Perfect, thank you.",43.983,44.683,"""Other"""
6,This introductory experience will help you comprehend and implement the concept of sustainability.,48.046,53.271,Environmental Sustainability
7,It examines how companies develop and implement sustainability as business strategy.,54.071,59.236,Business Strategy
8,"You will investigate sustainability's environmental, social, and governance factors, and learn how to apply tools of data analytics and communication to disclose non- financial risks and opportunities.",60.114,72.465,Data Analytics
9,"You will learn how businesses use governance, culture, and leadership to integrate sustainable practices into their organizations.",73.926,81.572,Business Strategy


##### Label Gen (v2)


save:
Include a label for 'Set Voices' for any sentence that doesn't fall into a valuable category or that is otherwise obviously words related to filming the audio file on set (such as 'cut' or 'action' or 'go back up to there yeah')."


In [48]:
import json
import re
from IPython.display import Markdown, display

# Define the role of the chatbot for classification
system_prompt = "You are an expert at categorizing text into relevant labels. Given a piece of text, you will generate appropriate categorical labels that best describe the content."

# Sample text to categorize (truncate if too long)
max_chars = 100000  # Adjust as needed
text = text_wayne[:max_chars] if len(text_wayne) > max_chars else text_wayne

# Number of labels to generate (excluding 'Set Voices')
num_labels = 5

# Define the user prompt for classification
user_prompt = f"""Please generate {num_labels} categorical labels that best describe the following transcription of a raw video recording of a text about environmental social governance. This is for online learners, so keep your response focused on the main points of the text. Don't provide labels describing the intended use of the text of that it is a video recording, etc.

{text}

Provide the labels as keys of a JSON object whose values are your reasoning for why the label is relevant for the main points of the given text."""

# Set up the chat scenario with roles
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# Apply the chat template to format the input for the model
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

# Decode the tokenized input back to text format to be used as a prompt for the model
prompt = tokenizer.decode(input_ids)

try:
    # Generate a response using the model
    response = generate(model, tokenizer, max_tokens=100000, prompt=prompt)
    
    # Extract JSON object from the response
    json_match = re.search(r'\{.*\}', response, re.DOTALL)
    if json_match:
        json_str = json_match.group()
        labels = json.loads(json_str)
        print("Generated labels:")
        for label, reason in labels.items():
            display(Markdown(f"**{label}**"))
            display(Markdown(f"{reason}\n"))
    else:
        print("Could not find a JSON object in the response.")
        display(Markdown(response))
except Exception as e:
    print(f"An error occurred: {str(e)}")

Generated labels:


**Environmental_Sustainability**

The text discusses the concept of sustainability, its environmental, social, and governance factors, and how companies develop and implement sustainability as a business strategy.


**Business_Strategy**

The text highlights how businesses use governance, culture, and leadership to integrate sustainable practices into their organizations and how sustainability can build competitive advantage, drive innovation, and create value.


**Stakeholder_Management**

The text explores how companies identify, map, and engage stakeholders in various settings, and how to turn 'us-versus-them' dynamics into 'all-of-us' alliances.


**Data_Analytics**

The text mentions the use of data analytics tools to disclose non-financial risks and opportunities, and to select appropriate sustainability key performance indicators.


**Corporate_Social_Responsibility**

The text discusses the importance of corporate social responsibility, including the development of a culture of sustainability, and the need for companies to engage with stakeholders and address their concerns.


Write labels json


In [71]:
# Write labels dictionary to JSON file
import json

# Ensure labels is a dictionary
if isinstance(labels, dict):
    # Define the output file name
    output_file = 'generated_labels.json'

    # Write the labels dictionary to a JSON file
    with open(output_file, 'w') as f:
        json.dump(labels, f, indent=4)

    print(f"Labels have been written to {output_file}")
else:
    print("Error: 'labels' is not a dictionary. Unable to write to JSON.")


Labels have been written to generated_labels.json


##### Label Gen and Sentence Sort (v1)


In [16]:
# Define the role of the chatbot for classification
system_prompt = "You are an expert at categorizing text into relevant labels. Given a piece of text, you will generate appropriate categorical labels that best describe the content."


# Sample text to categorize
text = """
The Industrial Revolution, which took place from the 18th to 19th centuries, was a period during which predominantly agrarian, rural societies in Europe and America became industrial and urban. Prior to the Industrial Revolution, which began in Britain in the late 1700s, manufacturing was often done in people's homes, using hand tools or basic machines. Industrialization marked a shift to powered, special-purpose machinery, factories and mass production. The iron and textile industries, along with the development of the steam engine, played central roles in the Industrial Revolution, which also saw improved systems of transportation, communication and banking. While industrialization brought about an increased volume and variety of manufactured goods and an improved standard of living for some, it also resulted in often grim employment and living conditions for the poor and working classes.
"""

# Define the user prompt for classification
user_prompt = f"Please generate 2-3 categorical labels that best describe the following text: {text}. Provide the labels as keys to a dictionary where the values are a list of the sentences in the {text} that relate to the label."

# Set up the chat scenario with roles
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# Apply the chat template to format the input for the model
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

# Decode the tokenized input back to text format to be used as a prompt for the model
prompt = tokenizer.decode(input_ids)

# Generate a response using the model
response = generate(model, tokenizer, max_tokens=100000, prompt=prompt)

print("Generated labels: ")
Markdown(response)


Generated labels: 


Here are 3 categorical labels that best describe the text:

```
{
    "History": [
        "The Industrial Revolution, which took place from the 18th to 19th centuries, was a period during which predominantly agrarian, rural societies in Europe and America became industrial and urban.",
        "Prior to the Industrial Revolution, which began in Britain in the late 1700s, manufacturing was often done in people's homes, using hand tools or basic machines.",
        "Industrialization marked a shift to powered, special-purpose machinery, factories and mass production."
    ],
    "Economy": [
        "Industrialization marked a shift to powered, special-purpose machinery, factories and mass production.",
        "The iron and textile industries, along with the development of the steam engine, played central roles in the Industrial Revolution, which also saw improved systems of transportation, communication and banking.",
        "While industrialization brought about an increased volume and variety of manufactured goods and an improved standard of living for some, it also resulted in often grim employment and living conditions for the poor and working classes."
    ],
    "Society": [
        "While industrialization brought about an increased volume and variety of manufactured goods and an improved standard of living for some, it also resulted in often grim employment and living conditions for the poor and working classes."
    ]
}
```

These labels capture the main themes of the text, which are the historical context of the Industrial Revolution, its economic impact, and its social consequences.

<br>
<br>

### Saved


#### Inference wth Conversation History


In [None]:
# Initialize an empty conversation history
conversation_history = []

# First interaction
system_prompt = "You are a helpful assistant."
user_prompt = "What is the capital of France?"

response = prompt_llama(
    model,
    tokenizer,
    system_prompt=system_prompt,
    user_prompt=user_prompt,
    max_tokens=100
)

print("Assistant:", response)

# Add the user's message and the assistant's response to the conversation history
conversation_history.extend([
    {"role": "user", "content": user_prompt},
    {"role": "assistant", "content": response}
])

# Second interaction
user_prompt = "What's another famous city in that country?"

response = prompt_llama(
    model,
    tokenizer,
    system_prompt=system_prompt,
    user_prompt=user_prompt,
    max_tokens=100,
    additional_messages=conversation_history
)

print("Assistant:", response)

# Add the new interaction to the conversation history
conversation_history.extend([
    {"role": "user", "content": user_prompt},
    {"role": "assistant", "content": response}
])

# You can continue this pattern for more interactions

In this example:
We initialize an empty conversation_history list.
For the first interaction, we use the prompt_llama function as before.
After getting the response, we add both the user's message and the assistant's response to the conversation_history.
For the second interaction, we use the prompt_llama function again, but this time we pass the conversation_history as the additional_messages parameter.
We then add the new interaction to the conversation_history.
You can continue this pattern for as many interactions as you want. The model will have context from all previous interactions in the conversation history.
Remember that there's a limit to how many tokens the model can process at once, so for very long conversations, you might need to truncate the history or use a sliding window approach.
Also, note that this approach doesn't maintain state between separate runs of your notebook cells. If you want to maintain state across cell executions, you'd need to store the conversation history in a variable that persists between cell executions, or save it to and load it from a file.


#### Ingest Transcription JSONs


In [133]:
jsonfile = "/Users/tristangardner/Documents/Programming/01_Apps/Llama/CB_AP10_24.json"
# load jsonfile as a dictionary
with open(jsonfile, "r") as f:
    asr = json.load(f)

asr = asr[0]
asr = asr['Transcription']

#  make a csv out of this list of dictionaries 
import csv

# Define the CSV file path
csv_file_path = 'asr.csv'

# Write the list of dictionaries to a CSV file
with open(csv_file_path, 'w', newline='') as csvfile:
    fieldnames = ['Sentence', 'Start Time', 'End Time']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    
    # Write the header
    writer.writeheader()
    
    # Write each dictionary in the list as a row
    for item in asr:
        writer.writerow(item)



#### Testing Intuitive Diarization on CB AP10 24


In [136]:
# ingest the csv
cb_df = pd.read_csv('/Users/tristangardner/Documents/Programming/01_Apps/Llama/CB_AP10_24.csv')

# Add a new column 'Status' to the dataframe
cb_df['Status'] = ''

system_prompt = "You are an expert at identifying professional voice-over content for a 30 second commercial versus casual speech or directions from the recording engineer and director."

def classify_sentence_diar(sentence):
    user_prompt = f"""Analyze the following sentence and classify it as either "Review" if it's likely a take of a commercial videoscript by a voice-over artist, or "Ignore" if it's likely a direction to the artist, an incomplete sentence, or casual speech. Return only "Review" or "Ignore".

Sentence: "{sentence}"

Classification:"""

    response = prompt_llama(model, tokenizer, system_prompt, user_prompt, max_tokens=10)
    print(response)
    return response.strip()

# Loop through the dataframe and classify each sentence
for index, row in cb_df.iterrows():
    sentence = row['Sentence']
    status = classify_sentence_diar(sentence)
    cb_df.at[index, 'Status'] = status

# Display the updated dataframe
display_styled_df(cb_df, max_width=85)

Review
Review
Review
Review
Review
Review
Review
Review
Review
Ignore
Ignore
Ignore
Review
Review
Ignore
Ignore
Ignore
Ignore
Ignore
Ignore
Ignore
Review
Review
Review
Ignore
Ignore
Review
Review
Review
Review
Review
Review
Review
Review
Ignore
Ignore
Ignore
Review
Ignore
Ignore
Review
Ignore
Review
Ignore
Review
Review
Ignore
Ignore
Review
Review
Review
Review
Review
Review
Review
Review
Ignore
Review
Ignore
Ignore
Ignore
Ignore
Ignore
Review
Review
Ignore
Review
Review
Ignore
Review
Ignore
Ignore
Ignore
Ignore
Review
Review
Review
Review
Ignore
Review
Ignore
Ignore
Ignore
Ignore
Ignore
Review
Ignore
Review
Review
Ignore
Review
Ignore
Review
Review
Review
Ignore
Ignore
Review
Review
Review
Ignore
Review
Review
Ignore
Ignore
Ignore
Ignore
Review
Review
Ignore
Ignore
Ignore
Ignore
Ignore
Review
Review
Ignore
Review
Ignore
Ignore
Review
Review
Ignore
Review
Review
Review
Review
Ignore
Review
Review
Ignore
Review
Review
Review
Review
Review
Review
Review
Review
Review
Review
Ignore
Ignore

Unnamed: 0,Sentence,Start Time,End Time,Status
0,Our best-selling Sky Blue is now available in 12 packs.,6.007,8.83,Review
1,That's why I'm excited to introduce our new Stargazer IPA as our latest seasonal offering.,11.692,15.856,Review
2,We used Alani hops to bring tropical aromas to this clear golden brew made with the best reverse osmosis water.,16.276,22.382,Review
3,That was perfect.,24.941,25.842,Review
4,The reverse made with the best reverse osmosis water.,26.062,29.665,Review
5,"It's the best reverse osmosis water process, right?",30.785,33.727,Review
6,Or something not like...,33.767,34.608,Review
7,"Well, no, the water is...",34.968,36.73,Review
8,It's RO water.,36.91,38.531,Review
9,"Okay, cool.",38.551,39.331,Ignore


In [137]:
cb_df.to_csv('cb_df_diar.csv', index=False)