# **SUMMARIX: Our Project allows a User to upload a video and get a transcript and a Summary of its content**

## **Installing required packages**

In [1]:
!pip install yt-dlp moviepy pydub openai-whisper transformers pytube

Collecting yt-dlp
  Using cached yt_dlp-2025.3.31-py3-none-any.whl.metadata (172 kB)
Collecting moviepy
  Downloading moviepy-2.1.2-py3-none-any.whl.metadata (6.9 kB)
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting openai-whisper
  Using cached openai-whisper-20240930.tar.gz (800 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting transformers
  Downloading transformers-4.50.3-py3-none-any.whl.metadata (39 kB)
Collecting pytube
  Downloading pytube-15.0.0-py3-none-any.whl.metadata (5.0 kB)
Collecting imageio_ffmpeg>=0.2.0 (from moviepy)
  Downloading imageio_ffmpeg-0.6.0-py3-none-macosx_11_0_arm64.whl.metadata (1.5 kB)
Collecting proglog<=1.0.0 (from moviepy)
  Downloading proglog-0.1.11-py3-none-any.whl.metadata (794 bytes)
Collecting torch (from openai-whisper)
  Downloading torch-2.6.0-cp312-none-macosx_11

## **Importing the necessary libraries and defining file paths and directories**

In [None]:
import os
import tempfile
import yt_dlp
import whisper
import torch
from transformers import pipeline
from moviepy.editor import VideoFileClip
from google.colab import files

# Define file paths and directories
VIDEO_FILE = "video.mp4"
AUDIO_FILE = "audio.mp3"
FRAMES_DIR = "frames"

## **Function to extract audio from video**

In [7]:
def extract_audio_from_video(video_path):
    print("Extracting audio from video...")
    video = VideoFileClip(video_path)
    temp_audio_path = tempfile.mktemp(suffix='.wav')
    video.audio.write_audiofile(temp_audio_path, codec='pcm_s16le')
    video.close()
    return temp_audio_path

## **Function to Transcribe audio using Whisper**

In [8]:
def transcribe_audio(audio_path):
    print("Transcribing audio... (this may take a while depending on video length)")
    model = whisper.load_model("base")
    result = model.transcribe(audio_path)
    return result["text"]

## **Function to summarize text using BART**

In [9]:
def summarize_text(text, max_length=500, min_length=100):
    print("Generating summary...")
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    max_chunk_size = 1024
    chunks = [text[i:i + max_chunk_size] for i in range(0, len(text), max_chunk_size)]
    summaries = []
    for chunk in chunks:
        if len(chunk) > 100:
            result = summarizer(chunk, max_length=max_length // len(chunks), min_length=min_length // len(chunks))
            summaries.append(result[0]['summary_text'])
    return " ".join(summaries)

## **Function to process video: extract audio, transcribe, and summarize**

In [10]:
def process_video(video_path):
    audio_path = extract_audio_from_video(video_path)
    transcript = transcribe_audio(audio_path)
    os.remove(audio_path)
    print("\n===== FULL TRANSCRIPT =====")
    print(transcript)
    summary = summarize_text(transcript)
    print("\n===== VIDEO SUMMARY =====")
    print(summary)
    return transcript, summary

## **Function to download and process a YouTube video using yt-dlp**


In [11]:
def process_youtube_video(youtube_url):
    print("Downloading YouTube video...")
    temp_dir = tempfile.mkdtemp()
    ydl_opts = {
        'outtmpl': os.path.join(temp_dir, '%(id)s.%(ext)s'),
        'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]',
        'merge_output_format': 'mp4',
        'cookiefile': 'cookies.txt',
        'verbose': True,
    }

    try:
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            info = ydl.extract_info(youtube_url, download=True)
            video_path = ydl.prepare_filename(info)
            transcript, summary = process_video(video_path)
            os.remove(video_path)
            os.rmdir(temp_dir)
            return transcript, summary
    except Exception as e:
        print(f"Error processing YouTube video: {e}")
        return None, None

## **User Interface for video processing**

In [14]:
print("===== VIDEO SUMMARIZER =====")
print("Please choose an option:")
print("1. Upload a video file")
print("2. Use a YouTube URL")
option = input("Enter your choice (1 or 2): ")

if option == "1":
    print("Upload your video file:")
    uploaded = files.upload()
    for filename in uploaded.keys():
        print(f"Processing {filename}...")
        transcript, summary = process_video(filename)
        with open("transcript.txt", "w") as f:
            f.write(transcript)
        with open("summary.txt", "w") as f:
            f.write(summary)
        print("\nTranscript and summary have been saved to text files.")
        files.download("transcript.txt")
        files.download("summary.txt")
elif option == "2":
    youtube_url = input("Enter the YouTube URL: ")
    transcript, summary = process_youtube_video(youtube_url)
    if transcript:
        with open("transcript.txt", "w") as f:
            f.write(transcript)
        with open("summary.txt", "w") as f:
            f.write(summary)
        print("\nTranscript and summary have been saved to text files.")
        files.download("transcript.txt")
        files.download("summary.txt")
else:
    print("Invalid option selected.")


===== VIDEO SUMMARIZER =====
Please choose an option:
1. Upload a video file
2. Use a YouTube URL
Enter your choice (1 or 2): 2
Enter the YouTube URL: https://www.youtube.com/watch?v=K27diMbCsuw&t=126s


[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2025.02.19 from yt-dlp/yt-dlp [4985a4041] (pip) API
[debug] params: {'outtmpl': '/tmp/tmp7h7h8jlz/%(id)s.%(ext)s', 'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]', 'merge_output_format': 'mp4', 'cookiefile': 'cookies.txt', 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.54 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.11.11 (CPython x86_64 64bit) - Linux-6.1.85+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: certifi-2025.01.31, requests-2.32.3, secretstorage-3.3.1, sqlite3-

Downloading YouTube video...
[youtube] Extracting URL: https://www.youtube.com/watch?v=K27diMbCsuw&t=126s
[youtube] K27diMbCsuw: Downloading webpage
[youtube] K27diMbCsuw: Downloading tv client config
[youtube] K27diMbCsuw: Downloading player 74e4bb46
[youtube] K27diMbCsuw: Downloading tv player API JSON
[youtube] K27diMbCsuw: Downloading ios player API JSON


[debug] [youtube] Extracting signature function js_74e4bb46_102
[debug] Loading youtube-sigfuncs.js_74e4bb46_102 from cache
[debug] Loading youtube-nsig.74e4bb46 from cache
[debug] [youtube] Decrypted nsig rOSB8Sf4y1gpaR7zM6d => DANnCZtFfPBRnA
[debug] [youtube] Extracting signature function js_74e4bb46_106
[debug] Loading youtube-sigfuncs.js_74e4bb46_106 from cache
[debug] Loading youtube-nsig.74e4bb46 from cache
[debug] [youtube] Decrypted nsig 527GrVWL93xHHriiXIA => 675HQ87CGbLcvQ
[debug] [youtube] K27diMbCsuw: ios client https formats require a GVS PO Token which was not provided. They will be skipped as they may yield HTTP Error 403. You can manually pass a GVS PO Token for this client with --extractor-args "youtube:po_token=ios.gvs+XXX". For more information, refer to  https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide . To enable these broken formats anyway, pass --extractor-args "youtube:formats=missing_pot"


[youtube] K27diMbCsuw: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec, channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] K27diMbCsuw: Downloading 1 format(s): 401+140


[debug] Invoking http downloader on "https://rr4---sn-qxo7rn7k.googlevideo.com/videoplayback?expire=1741862099&ei=c2DSZ-HWD8zcybgP0Jr4qAw&ip=34.28.238.135&id=o-AKvMlwAGvV_GwmDz8jf8d5cANXLL2TyreglEXDU79a1J&itag=401&aitags=133%2C134%2C135%2C136%2C160%2C242%2C243%2C244%2C247%2C278%2C298%2C299%2C302%2C303%2C308%2C315%2C394%2C395%2C396%2C397%2C398%2C399%2C400%2C401&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&met=1741840499%2C&mh=cA&mm=31%2C26&mn=sn-qxo7rn7k%2Csn-a5mlrnlz&ms=au%2Conr&mv=m&mvi=4&pl=16&rms=au%2Cau&initcwndbps=9918750&bui=AUWDL3yGJ5_7ew6l6TPWtl1mtNcwLFJ8G8eXrYKHX_rHgcT9D2MhraSzRzzrSqKyAQWfaEIlQH9W6h_8&vprv=1&svpuc=1&mime=video%2Fmp4&ns=wZT7eIcSyIs5iov6UsIXae4Q&rqh=1&gir=yes&clen=120012187&dur=257.799&lmt=1741453808560980&mt=1741840125&fvip=5&keepalive=yes&lmw=1&fexp=51358317%2C51411872&c=TVHTML5&sefc=1&txp=5532534&n=675HQ87CGbLcvQ&sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cxpc%2Cbui%2Cvprv%2Csvpuc%2Cmime%2Cns%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&lsparams=me

[download] Destination: /tmp/tmp7h7h8jlz/K27diMbCsuw.f401.mp4
[download] 100% of  114.45MiB in 00:00:02 at 43.76MiB/s  


[debug] Invoking http downloader on "https://rr4---sn-qxo7rn7k.googlevideo.com/videoplayback?expire=1741862099&ei=c2DSZ-HWD8zcybgP0Jr4qAw&ip=34.28.238.135&id=o-AKvMlwAGvV_GwmDz8jf8d5cANXLL2TyreglEXDU79a1J&itag=140&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&met=1741840499%2C&mh=cA&mm=31%2C26&mn=sn-qxo7rn7k%2Csn-a5mlrnlz&ms=au%2Conr&mv=m&mvi=4&pl=16&rms=au%2Cau&initcwndbps=9918750&bui=AUWDL3yGJ5_7ew6l6TPWtl1mtNcwLFJ8G8eXrYKHX_rHgcT9D2MhraSzRzzrSqKyAQWfaEIlQH9W6h_8&vprv=1&svpuc=1&mime=audio%2Fmp4&ns=wZT7eIcSyIs5iov6UsIXae4Q&rqh=1&gir=yes&clen=4175039&dur=257.927&lmt=1741448326473558&mt=1741840125&fvip=5&keepalive=yes&lmw=1&fexp=51358317%2C51411872&c=TVHTML5&sefc=1&txp=5532534&n=675HQ87CGbLcvQ&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cbui%2Cvprv%2Csvpuc%2Cmime%2Cns%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&lsparams=met%2Cmh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Crms%2Cinitcwndbps&lsig=AFVRHeAwRQIhAM1JkXDl58iZ1CvY1XpqkhC82C_aj-UG24vsILy5Fkf-AiBeaAM5YpbIIxWFv6nGzoIMYBwO5Zxnnk

[download] Destination: /tmp/tmp7h7h8jlz/K27diMbCsuw.f140.m4a
[download] 100% of    3.98MiB in 00:00:00 at 37.38MiB/s  
[Merger] Merging formats into "/tmp/tmp7h7h8jlz/K27diMbCsuw.mp4"


[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i file:/tmp/tmp7h7h8jlz/K27diMbCsuw.f401.mp4 -i file:/tmp/tmp7h7h8jlz/K27diMbCsuw.f140.m4a -c copy -map 0:v:0 -map 1:a:0 -movflags +faststart file:/tmp/tmp7h7h8jlz/K27diMbCsuw.temp.mp4


Deleting original file /tmp/tmp7h7h8jlz/K27diMbCsuw.f140.m4a (pass -k to keep)
Deleting original file /tmp/tmp7h7h8jlz/K27diMbCsuw.f401.mp4 (pass -k to keep)
Extracting audio from video...
MoviePy - Writing audio in /tmp/tmp2ar36gjx.wav




MoviePy - Done.
Transcribing audio... (this may take a while depending on video length)


  checkpoint = torch.load(fp, map_location=device)





===== FULL TRANSCRIPT =====
 Hi, I'm Pete from Manus AI. For the past year, we'll be quietly building what we believe is the next evolution in AI. And today, we're launching an early preview of Manus, the first general AI agent. This isn't just another chap-out of workflow. It's a truly autonomous agent that bridges the gap between conception and execution, what other AI stops at generating ideas, Manus delivers results. We see it as the next paradigm of human machine collaboration, and potentially, it glims into AI. Now, let me show you Manus in action across three completely different tasks. Let's start with an easy one. In this example, we ask Manus to help screen resumes. I've just said Manus is if file containing 10 resume documents. Since each Manus agent has its own computer, it can work like a human. First, I'm zipping the file, then browsing through each resume page by page, and recording important information to documents. Manus works asynchronously in the file, which means 

Device set to use cpu



===== VIDEO SUMMARY =====
Manus is the first general AI agent. It's a truly autonomous agent that bridges the gap between conception and execution. We see it as the next paradigm of human machine collaboration. For complex tasks, Manus first creeps down and creates a to-do list. Manus has his own knowledge of memory, so it can teach Manus that the next time it handles a similar task, it will deliver a spreadsheet. Manus can access authoritative data sources through APIs. After validating the required data, Manus begins writing code for data analysis and visualization. For Manus, coding isn't necessary to the goal, but rather a universal tool. Manus operates as a multi-agent system powered by several distinct models. Later this year, we're going to open source some of these models, specifically post-trained for Manus.

Transcript and summary have been saved to text files.


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>