<a href="https://colab.research.google.com/github/vilasha/ollama-sandbox/blob/master/Meeting_to_actions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convert meeting recording (audio only) to minutes and actions

This Colab notebook implements an end-to-end pipeline for transcribing a meeting audio file and generating structured meeting minutes with action items using open-source models.

Key components:

1. Speech-to-Text Transcription
   Loads an English audio file (MP3) from Google Drive.
   Uses OpenAI's whisper-small.en via Hugging Face's pipeline("automatic-speech-recognition") in FP16 on GPU to produce a full transcript (timestamps available but unused).

2. Meeting Minutes Generation
   Prompts Meta's Llama-3.2-3B-Instruct (loaded with 4-bit quantization via BitsAndBytes for efficient GPU inference) to convert the raw transcript into structured markdown minutes.
   The prompt requests summary (including attendees, location, date), discussion points, takeaways, and action items with owners.
   Applies a chat template, uses a TextStreamer for live output, and renders the final response as Markdown.

Supporting setup:
   Mounts Google Drive, logs into Hugging Face (via stored token), installs/upgrades bitsandbytes and accelerate for quantization support.

The workflow is lightweight, runs on free Colab resources (enabled by 4-bit quantization of the 3B model), and combines a strong ASR model with a compact instruction-tuned LLM for summarization and action-item extraction.

In [None]:
!pip install -q --upgrade bitsandbytes accelerate
import os
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig
import torch

# Define models to call. For speech recognition we will be using WhisperSmall
# from OpenAI and for converting meeting minutes to action items Llama from Meta
WHISPER = "openai/whisper-small.en"
LLAMA = "meta-llama/Llama-3.2-3B-Instruct"

# Connect Google Drive, it becomes accessible like if it was mounted local disk
# On the first run of the Notebook there will be a pop-up message asking to
# allow to connect to the Google Drive
drive.mount("/content/drive")
# I saved the file at https://drive.google.com/file/d/1P86u__2pIJ9E8GAdw4RHFU0Ry_Y4LPZE/view?usp=sharing
# In my Google Drive it resides at /Colab Notebooks/src/council_010421_2022101V.mp3
# The file is from one of the HuggingFace datasets: https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio/tree/main
audio_filename = "/content/drive/MyDrive/Colab Notebooks/src/council_010421_2022101V.mp3"

# Sign in to HuggingFace Hub
hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

# Open the file
audio_file = open(audio_filename, "rb")

# Speech recognition
pipe = pipeline(
    "automatic-speech-recognition",
    model=WHISPER,
    dtype=torch.float16,
    device='cuda',
    return_timestamps=True
)

result = pipe(audio_filename)
transcription = result["text"]
print(transcription)

# Convert transcription to action items
system_message = """
You produce minutes of meetings from transcripts, with summary, key discussion points,
takeaways and action items with owners, in markdown format without code blocks.
"""

user_prompt = f"""
Below is an extract transcript of a meeting.
Please write minutes in markdown without code blocks, including:
- a summary with attendees, location and date
- discussion points
- takeaways
- action items with owners

Transcription:
{transcription}
"""

messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

# Quantify Llama model, so it works faster and takes less space on the local disk
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)
tokenizer = AutoTokenizer.from_pretrained(LLAMA)
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map="auto", quantization_config=quant_config)
outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)
response = tokenizer.decode(outputs[0])
display(Markdown(response))