In [1]:
import os
import azure.cognitiveservices.speech as speechsdk

from utils import save_wav_from_mp4

In [None]:
def get_text_transcription_from_video(mp4_file, audio_file=None):
    
    if not audio_file:
        audio_file = save_wav_from_mp4(mp4_file=mp4_file)
        speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('AZURE_SUBSCRIPTION_KEY'), region=os.environ.get('AZURE_SERVICE_REGION'))
        audio_config = speechsdk.AudioConfig(filename=audio_file)
    else:
        audio_config = speechsdk.AudioConfig(filename=audio_file)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    
    def stop_cb(evt):
        speech_recognizer.stop_continuous_recognition()
        stop_cb(evt)
    def handle_final_result(evt):
        all_results.append(evt.result.text)

    all_results = []

    speech_recognizer.recognized.connect(handle_final_result)
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)
    
    speech_recognizer.start_continuous_recognition()
    transcription = " ".join(all_results)
    return transcription

In [25]:
with open("transcription.txt", "w") as fp:
    fp.write(transcription)

In [3]:
with open("transcription.txt", "r") as fp:
    transcription = fp.read()

In [7]:
transcription

" So now, now what is the problem with the few short prompting, right? We just discussed right now that few short prompting we might be able to achieve a lot of things. But when there is a multi step reasoning that is required, right? We just prompting it by saying, OK, so here is a question, here is the answer. Now I'll give you the question. You do the task. OK. That's what few short prompting kind of you know, let's believe it right now. This does not work when the reasoning that is required requires multiple steps, right? So now you're just saying, OK, give some explanation and then just saying answer is 11 between this question to this answer, there are so many steps and thoughts that are required. To generate this right. So now can we explicitly get the model to generate those thoughts and then answer the question rather than trying to jump to, you know, some number based on the context that is provided by the question, right. That's the key idea behind. This one chain of thought

In [26]:
from dotenv import load_dotenv
from pathlib import Path
load_dotenv()

True

In [27]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini", )

In [28]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="For a given transcription from a video to be published online, give a very short pre-read for the viewers on what to expect from the video"),
    HumanMessage(content=f"TRANSCRIPTION: {transcription}\nPREREAD:"),
]

response = model.invoke(messages)

In [29]:
response.content

'In this video, we explore the concept of "chain of thought" prompting in language models, focusing on its importance for multi-step reasoning tasks. You\'ll learn how traditional prompting methods can fall short when complex reasoning is required, and how the chain of thought technique can help induce logical thinking in language models. We\'ll discuss the method\'s implementation, including examples that illustrate how to structure prompts effectively to improve the accuracy of model responses. Join us as we unpack this influential prompting technique and its applications in enhancing language model performance!'

In [32]:
messages = [
    SystemMessage(content="For a given transcription from the video published online, give a summary of the video for the viewers"),
    HumanMessage(content=f"TRANSCRIPTION: {transcription}\SUMMARY:"),
]

response = model.invoke(messages)

In [33]:
response.content

'In this video, the speaker discusses the concept of "chain of thought prompting" as a technique to enhance the performance of language models, particularly in tasks that require multi-step reasoning. They explain that while traditional few-shot prompting may work for straightforward queries, it often falls short when complex reasoning is needed. The key idea is to encourage models to articulate their thought processes step-by-step rather than jumping directly to answers.\n\nThe speaker highlights the importance of structuring prompts in a way that illustrates the reasoning process, thereby enabling the model to generate more accurate responses. By providing detailed examples that break down the problem into intermediate steps, the model can learn to mimic this logical thinking in its future outputs.\n\nThe video emphasizes that this approach not only improves the accuracy of answers but also enhances interpretability, allowing users to understand the reasoning behind the model\'s conc

In [34]:
messages = [
    SystemMessage(content="You are an expert teacher in AI. For a given transcription from a video, generate 10 multiple choice questions that cover the topics discussed in this session. The questions are to be of recall-from-content type"),
    HumanMessage(content=f"TRANSCRIPTION: {transcription}\QUESTIONS:"),
]

response = model.invoke(messages)

In [35]:
print(response.content)

1. What is the main problem with few-shot prompting when it comes to multi-step reasoning?
   - A) It generates too many answers
   - B) It does not effectively guide the model through the necessary reasoning steps
   - C) It requires too much data
   - D) It is not applicable to language models
   
2. What is the key idea behind the chain of thought prompting method?
   - A) To provide straightforward answers without reasoning
   - B) To induce logical, step-by-step thinking in the model
   - C) To simplify the input questions
   - D) To minimize the number of examples used in prompting

3. How does a language model like LLM generate tokens?
   - A) By implementing logical reasoning
   - B) Based on a series of previous tokens and their probabilities
   - C) Through external computational tools
   - D) By guessing the answers

4. What does the implementation of chain of thought prompting aim to achieve?
   - A) Generate random outputs
   - B) Enable language models to generate a serie