# Voice Transcription with CPU Friendly AI Models Example (Greatest Speeches of 20th Century)

### Introduction

If we were given an audio file, is there any way we could identify the time stamps where specific words were said? Is there any way we could extract all the key words mentioned about a topic?

With AI 🤖, we can do all of this and much more! The key lies in being able to parse audio into text, allowing us to then harness the natural language processing capabilities of language models to perform sophisticated analyses and inferences on our data.

Regardless of who you are, such an approach to audio transcription and analysis will augment how you interact with and extract knowledge from audio files.

Let's see how we can do this with llmware, using the Whisper model and the SLIM Extract Tool.

### For Google Colab Users

If you are using Colab for free, we highly recommend you activate the T4 GPU hardware accelerator. Our models are designed to run with at least 16GB of RAM, activating T4 will grant the notebook 16GB of GDDR6 RAM as apposed to the ~13GB Colab gives automatically.

To activate T4:
1. click on the "Runtime" tab
2. click on "Change runtime type"
3. select T4 GPU under Hardware Accelerator

NOTE: there is a weekly usage limit on using T4 for free

### Installing and importing dependencies

In [None]:
%pip install llmware

In [None]:
import os

from llmware.parsers import Parser
from llmware.setup import Setup
from llmware.util import Utilities
from llmware.models import ModelCatalog

### Worker function

Our main worker function will do the following:
1. Load in sample audio files with the `Setup` class
2. Parse the audio into text by calling the WhisperCPP model via the `Parser` class
3. Perform a text search for the word "president" (these specific text chunks are used in the next step)
4. Use the SLIM Extract Tool to extract the president names from these chunks
5. Check if the extracted name matches someone in a specified list (`various_american_presidents`)
    - If there is a match, append information about the location of the extracted name and the time stamp in the audio to a `final_list`
6. Output the content of the `final_list`.

In [None]:
def greatest_speeches_example():

    # four sample file sets - "famous_quotes" | "greatest_speeches" | "youtube_demos" | "earnings_calls"
    voice_sample_files = Setup().load_voice_sample_files(small_only=False)

    input_folder = os.path.join(voice_sample_files, "greatest_speeches")

    print("\nStep 1 - converting, parsing and text chunking ~56 WAV files")

    parser_output = Parser(chunk_size=400, max_chunk_size=600).parse_voice(input_folder,
                                                                           write_to_db=False,
                                                                           copy_to_library=False,
                                                                           remove_segment_markers=True,
                                                                           chunk_by_segment=True,
                                                                           real_time_progress=False)

    print("\nStep 2- look at the text chunks with all of the metadata")

    for i, entries in enumerate(parser_output):
        print("all parsed blocks: ", i, entries)

    print("\nStep 3- run an inline text search for 'president'")

    results = Utilities().fast_search_dicts("president", parser_output)

    for i, res in enumerate(results):
        print("search results: ", i, res)

    print("\nStep 4- use LLM to review each search result - and if specific U.S. presidents found, then display the source")

    extract_model = ModelCatalog().load_model("slim-extract-tool", sample=False, temperature=0.0, max_output=200)

    final_list = []

    for i, res in enumerate(results):

        response = extract_model.function_call(res["text"], params=["president name"])

        various_american_presidents = ["kennedy", "carter", "nixon", "reagan", "clinton", "obama"]

        extracted_name = ""
        if "president_name" in response["llm_response"]:
            if len(response["llm_response"]["president_name"]) > 0:
                extracted_name = response["llm_response"]["president_name"][0].lower()
            else:
                print("\nupdate: skipping result - no president name found - ", response["llm_response"], res["text"])

        for president in various_american_presidents:
            if president in extracted_name:
                print("\nextracted american president text: ", i, extracted_name)
                print("file source: ", res["file_source"])
                print("time stamp: ", res["coords_x"], res["coords_y"], res["coords_cx"], res["coords_cy"])
                print("text: ", i, res["text"])
                final_list.append({"key": president, "source": res["file_source"], "time_start": res["coords_x"],
                                   "text": res["text"]})

    print("\nStep 5 - final results")
    for i, f in enumerate(final_list):
        print("final results: ", i, f)

    return final_list

### Main block

Here, we call the worker function.

In [6]:
if __name__ == "__main__":

    greatest_speeches_example()

The output, `final_list`, is a Python dictionary with information about the instances where a president name mentioned in the audio matches one of our specified presidents in the `various_american_presidents_list`. It has the following keys:
- key: the name of the president identified
- source: the audio file this was found in
- time_start: the time stamp in seconds where the president was mentioned
- text: which contains the text chunk the name was found in.