- [ ] Write down all the libraries that we need to install
- [ ] Add Try catch for exceptions when moving to .py file

## Important Notes
### Packages to install :
- pip install sounddevice
- pip install numpy
- Install ffmpeg (needed for intsalling local version of whisper)
    - On Ubuntu or Debian
      sudo apt update && sudo apt install ffmpeg
    - On Arch Linux
      sudo pacman -S ffmpeg
    - On MacOs
      brew install ffmpeg
    - On Windows
      winget install ffmpeg
- pip install -U openai-whisper
- pip install anthropic
- pip install dotenv
- pip install python-dotenv
- pip install openai
- pip install pyaudio
- pip install keyboard
- Installing pytorch
  - Install Cuda Toolkit (https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html) -> Only needed for GPU acceleration
  - Install Cudnn (https://developer.nvidia.com/cudnn) -> Only needed for GPU acceleration
  - pInstall pytorch with or without cuda (https://pytorch.org/get-started/locally/)
- pip install PyMuPDF -> To read the pdf files like CV's
### Instructions :
- There needs to be a .env in this notebook's working directory, which contains the api keys.

In [None]:
# Imports
import sounddevice as sd
import numpy as np
import scipy.io.wavfile as wav
import whisper
from joblib import load, dump
from AnthropicWrapper import ClaudeChatCV
from dotenv import load_dotenv, find_dotenv
import os
from openai import OpenAI
import pyaudio
import threading
import tkinter as tk
from threading import Thread
import time
import keyboard

load_dotenv(os.path.join(os.path.dirname(os.getcwd()), ".env"))

In [None]:
# --- Settings ---
FS = 44100               # Sampling frequency
THRESHOLD = 50          # Volume threshold for silence (adjust this)
SILENCE_DURATION = 1.5   # Seconds of silence before stopping (adjust this)
CHUNK_SIZE = 1024        # Process audio in chunks for efficiency
SPEAKING_SPEED = 1.1     # Speed of speaking

In [None]:
# --- Globals ---
pause_loop = False
end_loop = False
unixtime = str(time.time())[:10]
print(unixtime)
human_audio_n = 0

In [None]:
# -- Init Directory --
if not os.path.exists("interviews/" + unixtime):
    os.makedirs("interviews/" + unixtime)
    os.makedirs("interviews/" + unixtime + "/audio")
    os.makedirs("interviews/" + unixtime + "/pdfs")
    os.makedirs("interviews/" + unixtime + "/joblib")
audio_directory = "interviews/" + unixtime + "/audio/"
pdf_directory = "interviews/" + unixtime + "/pdfs/"
joblib_directory = "interviews/" + unixtime + "/joblib/"

In [None]:
job_role = "RAG AI Engineer"
candidate_skill = "Entry-Level"
role_description = """
Permanent

London (Hybrid)

Salary - £50,000 - £75,000 p/a + benefits

My client are on the cutting edge of digital reinvention, helping clients reimagine how they serve their connected customers and operate enterprises. As an experienced AI Engineer, you'll play a pivotal role in their revolution. You'll leverage deep learning, neuro-linguistic programming (NLP), computer vision, chatbots, and robotics to enhance business outcomes and drive innovation. Join their multidisciplinary team to shape their AI strategy and showcase the potential of AI through early-stage solutions.

Tasks

1. Enhance Retrieval and Generation:
Create and manage RAG pipelines to improve information retrieval and content generation tasks.
1. LLMs Optimization:
Understand the nuances between prompting and training large language models (LLMs) to enhance model performance.
1. LLM Evaluation:
Evaluate different LLMs to find the best fit for specific use cases.
1. Model Efficiency:
Address speed, performance, and cost-related issues in model implementation.
1. Collaboration and Innovation:
Work closely with cross-functional teams to integrate AI solutions into production environments.
Stay informed about the latest advancements in AI and machine learning to continuously enhance our solutions.
Requirements

4+ years of hands-on Python development experience, especially with machine learning frameworks (e.g., TensorFlow, PyTorch).
Proven experience setting up and optimizing retrieval-augmented generation (RAG) pipelines.
Strong understanding of large language models (LLMs) and the differences between prompting and training.
Production-level experience with AWS services.
Hands-on experience testing and comparing different LLMs (OpenAI, Llama, Claude, etc.).
Familiarity with model speed and cost optimization challenges.
Excellent problem-solving skills and attention to detail.
Strong communication and teamwork abilities.
Benefits

Endless Learning and Growth: Explore boundless opportunities for personal and professional development in our dynamic, AI-driven startup.
Inclusive and Supportive Environment: Join a collaborative culture that prioritizes transparency, trust, and open dialogue among team members.
Generous Benefits: Enjoy comprehensive perks, including unlimited annual leave, birthday leave, and exciting team trips.
Impactful Work: Contribute to the financial industry by working with cutting-edge AI technologies that make a difference.
Please apply for this exciting role ASAP!!
"""

In [None]:
system_prompt = f"""
You are a skilled interviewer who is conducting an initial phone screening interview for a candidate for a {candidate_skill} {job_role} role to see if the candidate is at minimum somewhat qualified for the role and worth the time to be fully interviewed. The role and company description is copypasted from the job posting as follows: {role_description}. Parse through it to extract any information you feel is relevant.
Your job is to begin a friendly discussion with the candidate, and ask questions relevant to the {job_role} role, which may or may not be based on the interviewee's CV, which you have access to. Be sure to stick to this topic even if the candidate tries to steer the conversation elsewhere. If the candidate has other experience on his CV, you can ask about it, but keep it within the context of the {job_role} role.
After the candidate responds to each of your questions, you should not summarise or provide feedback on their responses. THIS POINT IS KEY! You should not summarise or provide feedback on their responses. You must keep your responses short and concise without reiterating what is good about the candidate's response or experience when they reply.
You can ask follow-up questions if you wish.
Once you have asked sufficient questions such that you deem the candidate is or isn't fitting for the role, end the interview by thanking the candidate for their time and informing them that they will receive word soon on the outcome of the screening interview. If the candidate does not seem fititng for the role, or if something feels off such as the candidate being unconfident or very very vague feel free to end the interview early. There is no need to inform them of your opinion of their performance, as this will be evaluated later.
The candidate will begin the interview by greeting you. You are to greet them back, and begin the interview.
For this specific run, keep the interview to a maximum of 4 questions.
"""

In [None]:
# Initialising Conversation Model
chat_model_name = "claude-3-5-sonnet-20240620"
cv_path = "docs/cvs/cv-deb.pdf"

chat_model = ClaudeChatCV(chat_model_name, system_prompt, cv_path)

In [None]:
# Initialising whisper
stt_model = whisper.load_model("medium", device="cuda")

In [None]:
# Initialising text2speech
tts_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def stream_tts(input_string):
    def _stream_tts():
        p = pyaudio.PyAudio()
        stream = p.open(format=8,
                        channels=1,
                        rate=round(24_005 * SPEAKING_SPEED),
                        output=True)
        with tts_client.audio.speech.with_streaming_response.create(
            model="tts-1",
            voice="nova",
            input=input_string,
            response_format="pcm"
        ) as response:
            for chunk in response.iter_bytes(1024):
                stream.write(chunk)
                
        thread_done.set()

    thread_done = threading.Event()

    thread = threading.Thread(target=_stream_tts)
    thread.start()
    thread_done.wait()

In [None]:
# Initialising speech2text without silence detection
def record_speech():
    global human_audio_n
    print("Recording... Speak now!")
    audio_data = np.array([], dtype=np.int16)  # Initialize empty array

    with sd.InputStream(samplerate=FS, channels=1, dtype='int16') as stream:
        while True:
            chunk, overflowed = stream.read(CHUNK_SIZE)
            if overflowed:
                print("Warning: Input overflowed!")
            audio_data = np.append(audio_data, chunk)

            if pause_loop:
                break
    
    human_audio_n += 1
    wavstring = f"/audio_{human_audio_n}_{unixtime}.wav"
    wav.write(audio_directory + wavstring, FS, audio_data)

    return wavstring


def keyboard_listener():
    global pause_loop
    global end_loop
    
    def on_key_press(event):
        global pause_loop
        global end_loop
        #print("Key pressed: {}" .format(event.name))
        if event.name == "+":
            pause_loop = False
            print("Set to continue on next loop")
        elif event.name == "-":
            pause_loop = True
            print("Set to pause on next loop")
        elif event.name == "*":
            end_loop = True
            print("Set to end on next loop")
    
    keyboard.on_press(on_key_press)
    keyboard.wait('esc')

listener_thread = threading.Thread(target=keyboard_listener)
listener_thread.start()

In [None]:
def create_ui():
    global pause_loop, end_loop
    
    def toggle_pause():
        global pause_loop
        pause_loop = not pause_loop
        pause_button.config(text="Stop" if not pause_loop else "Speak now",
                            bg="red" if not pause_loop else "green")

    def end_program():
        global end_loop
        end_loop = True
        root.quit()
        root.destroy()

    root = tk.Tk()
    root.title("Control Panel")
    root.geometry("300x200")
    root.configure(bg='#f0f0f0')

    frame = tk.Frame(root, bg='#f0f0f0')
    frame.pack(expand=True, fill='both', padx=20, pady=20)

    pause_button = tk.Button(frame, text="Stop", command=toggle_pause, 
                             bg="red", fg="white", font=("Arial", 12), 
                             width=10, height=2)
    pause_button.pack(pady=10)

    end_button = tk.Button(frame, text="End Program", command=end_program, 
                           bg="gray", fg="white", font=("Arial", 12), 
                           width=10, height=2)
    end_button.pack(pady=10)

    root.protocol("WM_DELETE_WINDOW", end_program)
    root.mainloop()

# Start the UI in a separate thread
ui_thread = Thread(target=create_ui)
ui_thread.start()

In [None]:
while True:
    if not pause_loop:
        # --- Record Speech ---
        time.sleep(0.1)
        print("Recording speech...")
        wav_file = record_speech()

        if end_loop:
            break

        # --- Speech to Text ---
        print("Converting speech to text...")
        text = stt_model.transcribe(audio_directory + wav_file, language="en")
        print("You said: ", text.get("text"))

        if end_loop:
            break

        # --- Chatbot ---
        print("Chatting...")
        response = chat_model.chat_with_history_doc(text.get("text"))

        print("Chatbot: ", response)

        # --- Text to Speech ---
        print("Converting text to speech...")
        stream_tts(response)

    else:
        time.sleep(0.1)
        if end_loop:
            break

In [None]:
conversation = chat_model.get_message_history()
dump(conversation, joblib_directory + "conversation.joblib")


print(conversation)

In [None]:
import re
from fpdf import FPDF

# Create a PDF object
pdf = FPDF()
pdf.add_page()

# Set margins (in millimeters)
pdf.set_margins(left=10, top=20, right=10)  # Right margin set to 10mm 

# Choose a nicer font
pdf.set_font("Helvetica", size=12)

# Calculate usable width for text wrapping (accounting for margins)
usable_width = pdf.w - pdf.l_margin - pdf.r_margin -0 # 10px right margin 

# Add conversation to the PDF
for turn in conversation[2:]:
    # Skip turns with empty roles or content
    if turn['role'] and turn['content']:
        # Subheading style for "User" and "Assistant"
        pdf.set_font("Helvetica", style="B", size=14)
        pdf.cell(0, 10, txt=f"{turn['role'].capitalize()}:", ln=True)

        # Content with regular font and correct indentation
        pdf.set_font("Helvetica", size=12)
        pdf.x = pdf.l_margin  # Reset x-coordinate to the left margin
        pdf.multi_cell(usable_width, 6, txt=turn['content'])
        pdf.ln(3)

# Save the PDF
pdf.output(pdf_directory + "conversation.pdf")