<a href="https://colab.research.google.com/github/tarunku/open_llm/blob/main/Whisper_LLaMA_Audio_Processing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Summary
This notebook enables:
1. **Audio transcription** using OpenAI Whisper and an open-source alternative.
2. **Meeting minutes generation** using LLaMA with structured summaries.
3. **Efficient inference** using 4-bit quantization.

The combination of OpenAI Whisper and Meta LLaMA enables accurate meeting transcription and summarization in Markdown format.



## Setup and Dependencies
The notebook installs and configures the necessary dependencies:

In [None]:
!pip uninstall fsspec
!pip install fsspec==2025.3.0
!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2

### Library Imports
The following key libraries are used:
- `requests`: For making HTTP requests
- `torch`: For deep learning model inference
- `bitsandbytes`: Quantization support for LLaMA model
- `transformers`: For NLP and ASR models
- `openai`: OpenAI API integration
- `google.colab`: For Colab-specific functions
- `huggingface_hub`: Hugging Face authentication

In [None]:
# imports

import os
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, AutoModelForSpeechSeq2Seq
import torch

## Authentication and Drive Setup
The notebook retrieves API tokens from Google Colab's secret storage:


In [None]:
hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

# Sign in to OpenAI using Secrets in Colab

openai_api_key = userdata.get('OPENAI_API_KEY')
openai = OpenAI(api_key=openai_api_key)

drive.mount("/content/drive")
audio_filename = "/content/drive/MyDrive/__tmp/llms/Meeting Recording.mp3"

## Model Selection and Configuration
The notebook sets up OpenAI Whisper for transcription and Meta's LLaMA model for text generation:

In [None]:
# Constants
AUDIO_MODEL = "whisper-1"
LLAMA = "meta-llama/Meta-Llama-3.1-8B-Instruct"

### Audio Transcription with OpenAI Whisper
The notebook reads the audio file and converts it to text using OpenAI Whisper:

In [None]:
# Use the Whisper OpenAI model to convert the Audio to Text
# If you'd prefer to use an Open Source model
# which I've added to the bottom of this colab

audio_file = open(audio_filename, "rb")
transcription = openai.audio.transcriptions.create(model=AUDIO_MODEL, file=audio_file, response_format="text")
print(transcription)

## Meeting Minutes Generation
Prompt processes the transcript to generate structured meeting minutes in Markdown format:


In [None]:
system_message = "You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."
user_prompt = f"Below is an extract transcript of a CPTS Architecture Forum meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\n{transcription}"

messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]


## Model Configuration for Summarization
The notebook configures a 4-bit quantization setup for LLaMA:

In [None]:
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

## LLaMA Tokenization and Inference
The LLaMA model is loaded and used to process the transcript:

In [None]:

tokenizer = AutoTokenizer.from_pretrained(LLAMA)
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map="auto", quantization_config=quant_config)
outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)


In [None]:
response = tokenizer.decode(outputs[0])
display(Markdown(response))

## Alternative Open-Source Whisper Implementation
The notebook also provides an alternative transcription approach using an open-source Whisper model:

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

whisper_transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-medium")

transcription = whisper_transcriber(audio_filename, return_timestamps=True)
print(transcription)


## Summary
This notebook enables:
1. **Audio transcription** using OpenAI Whisper and an open-source alternative.
2. **Meeting minutes generation** using LLaMA with structured summaries.
3. **Efficient inference** using 4-bit quantization.

The combination of OpenAI Whisper and Meta LLaMA enables accurate meeting transcription and summarization in Markdown format.

