# 1. Initializing the OpenAI Client

This section of the code is responsible for setting up the environment and initializing the OpenAI client, which we will use to interact with OpenAI APIs such as Whisper and TTS.

In [1]:
import os
from openai import OpenAI

from dotenv import load_dotenv, find_dotenv

# Load environment variables from a .env file
_ = load_dotenv(find_dotenv()) 

# Initialize the OpenAI client with the API key
client = OpenAI(
    api_key=os.environ['OPENAI_API_KEY'],   # Retrieves API key from environment variables
)

## Key Components:

- **Environment Variables**: We use `dotenv` to load environment variables. This is a secure way to manage sensitive information like API keys. The `.env` file should contain your `OPENAI_API_KEY`.

- **OpenAI Client Initialization**: We create an instance of the `OpenAI` class from the `openai` package, passing the API key from the environment variables. This client will be used to make requests to OpenAI services.

> 💡 **Tip:** Always keep your API keys secure. Never hardcode them into your scripts. Using environment variables as shown here is a best practice.


# 2. Function to Transcribe Audio to Text

This function, `get_transcript`, takes the path of an audio file and uses OpenAI's Whisper model to transcribe the audio to text.

In [2]:
def get_transcript(file_path):
    # Open the audio file in binary read mode
    audio_file = open(file_path, "rb")
    
    # Use the OpenAI Whisper model to transcribe the audio
    transcript = client.audio.transcriptions.create(
        model="whisper-1",           # Specifies the Whisper model to use
        file=audio_file,             # Passes the audio file to the API
        response_format="text"       # Requests the transcription in text format
    )

    # Return the transcription
    return transcript

## Key Points:

- **Opening the File**: The audio file is opened in binary read mode (`"rb"`), which is required for audio data processing.

- **Transcription Request**: The `client.audio.transcriptions.create` method is used to send the audio file to OpenAI's Whisper API for transcription.

- **Model Specification**: Here, `"whisper-1"` is specified as the model. Depending on your needs and OpenAI's offerings, you might use a different model version.

- **Returning the Transcript**: The function returns the transcription result, which can then be used for further processing or displayed in the notebook.

> 💡 **Note:**  Ensure that the audio file format and content are compatible with the Whisper API's requirements for accurate transcription.

# 3. Creating an AI Tutor Conversation Chain with Langchain

This section demonstrates how to create a conversation chain for an AI tutor using the Langchain library, integrated with OpenAI's GPT model. The AI tutor is designed to simulate a conversational English teaching experience, where it actively participates in dialogues with a human (student) and provides corrections and suggestions in a conversational manner.

In [3]:
# Import necessary classes from the langchain and langchain_openai libraries
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts.prompt import PromptTemplate
#LDH from langchain_openai import OpenAI
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.prompts.prompt import PromptTemplate

# Initialize the OpenAI model with a specified temperature.
# Temperature set to 0 for deterministic, consistent responses.
llm = OpenAI(
    temperature=0
)

# Create a conversation buffer memory to keep track of the conversation.
# This includes prefixes to distinguish between the AI tutor and the human user.
memory = ConversationBufferMemory(
    ai_prefix="AI Tutor:",
    human_prefix="Human:",
)

# Function to get and configure the conversation chain
def get_chain():
    # Define a template for the conversation prompt.
    # This template sets the context for the conversation and instructions for the AI.
    prompt_template = """
    The following is a friendly conversation between a human and an AI.
    The AI a top-tier English tutor with years of experience.
    The AI is talking to a student who wants to practice speaking English. 
    The AI is to help the student practice speaking English by having a conversation. 

    The AI should feel free to correct the student's grammar and pronunciation and/or suggest different words or phrases to use whenever the AI feels needed.
    And when the AI corrects the student, the AI must start the sentence with "it is better to put it this way"
    But even when you correct the student, try to make a conversation first, and then correct the student

    Current conversation:
    {history}
    Human: {input}
    AI Tutor:"""

    # Create a PromptTemplate object with the defined prompt template.
    # This template includes variables for the conversation history and the latest human input.
    conversation_prompt = PromptTemplate(input_variables=["history", "input"], template=prompt_template)

    # Initialize the conversation chain.
    # This chain uses the defined prompt, the language model (llm), and the conversation memory.
    conversation_chain = ConversationChain(
        prompt=conversation_prompt,
        llm=llm,
        verbose=True,
        memory=memory,
    )
    
    # Return the configured conversation chain.
    return conversation_chain


  llm = OpenAI(
  memory = ConversationBufferMemory(


## Key Components:

- **Library Imports**: The script begins by importing necessary classes from `langchain` and `langchain_openai`, which are essential for setting up the conversation chain and integrating with OpenAI's model.

- **OpenAI Model Initialization**: The `OpenAI` class is instantiated with a specified `temperature` parameter. A temperature of 0 is chosen for deterministic, consistent responses, making it suitable for an educational context where predictable, accurate outputs are preferred.

- **Conversation Memory Setup**: A `ConversationBufferMemory` instance is created, defining prefixes to differentiate between the AI tutor's and the human's dialogues. This memory buffer helps in maintaining the context and flow of the conversation.

- **Conversation Chain Function**: The `get_chain` function is defined to configure and return the conversation chain. Within this function:

    - A `prompt_template` is defined, setting the context for the AI tutor's role and guidelines for the conversation.
    - A `PromptTemplate` object is created using the defined template. This object facilitates the incorporation of conversation history and the latest input into the AI's response generation.
    - Finally, a `ConversationChain` is instantiated, linking the prompt template, the language model (`llm`), and the conversation memory (`memory`). The `verbose` parameter is set to True for detailed output, useful for debugging and understanding the AI's decision-making process. 
        - Feel free to set `verbose` to `False` if you do not need to see how `Langchain` prompts its converstaion with `ChatGPT`


> **💡 Tip:** This function plays a key role in maintaining the flow of conversation, ensuring that the AI's responses are contextually relevant and pedagogically sound.

# 4. Generating AI Responses for Conversational Interaction

This section is focused on generating responses from an AI tutor in a conversational context. It leverages the conversation chain established earlier to process input transcripts and produce relevant, context-aware responses. This functionality is central to creating an interactive, AI-driven conversation experience, such as in language learning applications or chatbots.

In [4]:
def get_gpt_response(transcript):
    # Talk to the AI Tutor via langchain 
    conversation = get_chain()
    answer = conversation.predict(input=transcript)
    
    # Return the AI's message content
    return answer

## Key Points:
- **Function Definition**: The `get_gpt_response` function is defined to handle the processing of input transcripts and generate responses using the AI tutor.

- **Conversation Chain Retrieval**: Inside the function, `get_chain()` is called to retrieve the pre-configured conversation chain. This chain is set up to utilize the Langchain library with a specific conversation context and rules, as defined in the previous code block.

- **Generating the AI Response**: The predict method of the conversation chain is used to generate a response from the AI based on the given transcript (input text). This method considers the conversation's history and the AI's role as an English tutor, ensuring responses are contextually relevant and pedagogically sound.

- **Return Statement**: The function returns the generated answer, allowing it to be used elsewhere in the application, such as in a user interface for displaying the AI's responses.

# 5. Function to Play AI Tutor's Response Using Text-to-Speech

This function, `play_gpt_response_with_tts`, converts the AI tutor's textual response into speech using Text-to-Speech (TTS) and plays it aloud for the user.

In [5]:
import os
from playsound import playsound
#LDH_added
from pydub import AudioSegment
from pydub.playback import play
import subprocess

# Path to temporarily store the generated speech file
speech_file_path = f"./speech_llm.wav"
fixed_speech_file_path = f"./speech_llm_converted.wav"    #LDH added

def play_gpt_response_with_tts(gpt_response):
    # Generate speech from the GPT response using TTS

    response = client.audio.speech.create(
        model="tts-1",          # Specifies the TTS model to use
        voice="alloy",          # Chooses a specific voice for the TTS
        input=gpt_response      # The text input to be converted to speech
    )
    
    try:
        # Stream the audio to a file
        response.stream_to_file(speech_file_path)
    except AttributeError as e:
        print("Wrong method:", e)
    except FileNotFoundError as e:
        print("Wrong file path:", e)
    except Exception as e:
        print("Other errors:", e)

    #LDH Convert the LLM speech to the one which can be played by a ffmpeg command using subprocess
    try:
        subprocess.run(
            ['ffmpeg', '-i', speech_file_path, fixed_speech_file_path], 
            check=True,  # This will raise an error if ffmpeg fails
            stdout=subprocess.PIPE,  # Capture stdout
            stderr=subprocess.PIPE  # Capture stderr
        )
        print(f"Conversion successful: {fixed_speech_file_path}")
    except subprocess.CalledProcessError as e:
        print(f"Error occurred: {e.stderr.decode('utf-8')}")

    # Play the generated speech audio
    #LDH playsound(speech_file_path)
    audio = AudioSegment.from_wav(fixed_speech_file_path)
    play(audio)

    # Remove the temporary speech file to clean up
    os.remove(fixed_speech_file_path)

## Key Points:

- **TTS Conversion**: The `client.audio.speech.create` method from the OpenAI API is used to convert the AI's textual response into speech. The `tts-1` model and `alloy` voice are specified here, but these can be adjusted based on your preferences.

- **Temporary Audio File Handling**: The generated speech is streamed to a file named `speech.wav` stored at the given file path. This approach is used to handle the audio output efficiently.

- **Audio Playback**: The `playsound` library plays the audio file, allowing the user to hear the AI's response.

- **Cleanup**: After playing the audio, the temporary file is removed to avoid clutter and manage storage efficiently.

> **💡 Note:** This function bridges the gap between textual AI responses and auditory output, making the interaction more engaging and accessible, especially for auditory learners.

# 6. Facilitating Dialogue with the AI Tutor Using Speech-to-Text and Text-to-Speech

The function `talk_to_gpt` orchestrates the process of converting user speech to text, obtaining a response from the AI tutor, and then converting this response back to speech.

In [6]:
def talk_to_gpt(file_path):
    # Transcribe user speech to text
    user_transcript = get_transcript(file_path)

    # Get the GPT tutor's response to the user's transcript
    # Uses only the last 10 messages in history for context
    gpt_response = get_gpt_response(user_transcript)
    
    # Play the GPT response using OpenAI's TTS API
    play_gpt_response_with_tts(gpt_response=gpt_response)

## Key Components:

- **Function Definition**: The `talk_to_gpt` function is designed to handle the complete cycle of user interaction with the AI tutor, from speech input to spoken response.

- **Speech-to-Text Conversion**: The function begins by transcribing user speech into text. The `get_transcript` function is called with the `file_path` of the user's speech recording, converting the speech into a text transcript.

- **AI Tutor Response Generation**: The text transcript is then passed to the `get_gpt_response` function. This function, as defined earlier, generates a context-aware response from the AI tutor, considering the ongoing conversation.

- **Text-to-Speech Conversion**: The AI's textual response is then converted back into speech. The `play_gpt_response_with_tts` function takes the AI's response and uses OpenAI's Text-to-Speech (TTS) API to play the response. This creates an auditory output that the user can listen to.

- **End-to-End Interaction**: By combining these steps, the function enables a fluid, conversational interaction between the user and the AI tutor. It transforms user speech into text, processes it through the AI, and returns a spoken response, thus completing the dialogue loop.

> **💡 Note:** This function is central to the user interaction, seamlessly integrating speech-to-text, AI response generation, and text-to-speech to simulate a natural conversation flow.

# 7. Audio Recording Class for User Input

The `AudioRecorder` class encapsulates the functionality needed to record audio from the user, which can then be processed for speech-to-text conversion.

In [7]:
import threading
import sounddevice as sd
import numpy as np
import wavio

class AudioRecorder:
    def __init__(self):
        self.is_recording = False      # Flag to control recording state
        self.audio_data = []           # List to store audio frames
        self.fs = 44100                # Sample rate (in Hz)
        self.channels = 1              # Number of audio channels

    def start_recording(self):
        self.is_recording = True
        self.audio_data = []
        # Start recording in a separate thread
        threading.Thread(target=self.record).start()

    def stop_recording(self):
        self.is_recording = False      # Stop the recording

    def record(self):
        # Set up the audio input stream
        with sd.InputStream(samplerate=self.fs, channels=self.channels) as stream:
            while self.is_recording:
                data, _ = stream.read(1024)  # Read audio data from the input stream
                self.audio_data.append(data)  # Append data to the audio_data list

    def save(self, filename='speech_human.wav'):
        # Save the recorded audio to a file
        if self.audio_data:
            wav_data = np.concatenate(self.audio_data, axis=0)  # Concatenate all audio frames
            wavio.write(filename, wav_data, self.fs, sampwidth=2)  # Write to WAV file
            print("Recording saved to", filename)
            return filename
        else:
            print("No recording data to save.")


## Key Features:

- **Initialization**: Sets up initial variables like sample rate, channels, and recording state.

- **Start and Stop Recording**: Methods to control the start and stop of audio recording.

- **Multithreading for Recording**: Uses a separate thread to handle audio input, ensuring the main program remains responsive.

- **Audio Data Collection**: Continuously reads audio data from the microphone and stores it in a list.

- **Saving the Recording**: Concatenates the recorded audio frames and saves them as a WAV file. This file can then be used for further processing like speech-to-text.

> **💡 Note:** This class provides a foundational audio input mechanism, crucial for capturing the user's speech in real-time.

# 8. Interactive Interface for Audio Recording and Processing

This section of the code creates an interactive interface using IPython widgets to control the audio recording and initiate conversation with the AI tutor.

In [8]:
import ipywidgets as widgets
from IPython.display import display

# Initialize the audio recorder
recorder = AudioRecorder()

# Create buttons for starting and stopping the recording
start_button = widgets.Button(description="Start Recording")
stop_button = widgets.Button(description="Stop Recording")

def on_start_clicked(b):
    # Function to handle start button click
    recorder.start_recording()  # Start recording audio
    print("Recording started...")

def on_stop_clicked(b):
    # Function to handle stop button click
    print("Recording stopped and saved.")
    recorder.stop_recording()  # Stop recording audio
    file_name = recorder.save()  # Save the recorded audio to a file
    talk_to_gpt(file_name)  # Process the audio file through the AI tutor
    #LDH os.remove(file_name)  # Remove the temporary audio file

# Assign the click event handlers to the buttons
start_button.on_click(on_start_clicked)
stop_button.on_click(on_stop_clicked)

# Display the buttons in the Jupyter Notebook interface
display(start_button, stop_button)

Button(description='Start Recording', style=ButtonStyle())

Button(description='Stop Recording', style=ButtonStyle())

## Key Components:

- **Button Widgets**: Two buttons are created using `ipywidgets` for starting and stopping the audio recording.

- **Event Handlers**: Functions `on_start_clicked` and `on_stop_clicked` are defined to handle the respective button clicks.
    - `on_start_clicked` starts the audio recording.
    - `on_stop_clicked` stops the recording, saves the audio, processes it through the AI tutor (`talk_to_gpt`), and then cleans up the temporary file.
    
- **Display Widgets**: The `display` function from `IPython.display` is used to render the buttons in the Jupyter Notebook.

> **💡 Note:** This interactive setup allows users to easily control the recording process and seamlessly initiate interaction with the AI tutor, enhancing the user experience in the Jupyter Notebook.