## Fact Checker
I will receive a chunk of text/statement. I need to verify if it is fact or perception


In [None]:
!pip install yt-dlp faster-whisper

In [1]:
import os
import subprocess
import tempfile
from pathlib import Path

from yt_dlp import YoutubeDL
from faster_whisper import WhisperModel

In [2]:
def download_youtube_audio(youtube_url: str, out_dir: Path) -> Path:
    """
    Uses yt-dlp to download best audio as m4a/webm/opus, returns path to the audio file.
    """
    out_dir.mkdir(parents=True, exist_ok=True)
    # %(ext)s will be chosen automatically by yt-dlp
    out_tmpl = str(out_dir / "%(title).200B.%(ext)s")

    ydl_opts = {
        "format": "bestaudio/best",
        "outtmpl": out_tmpl,
        "quiet": True,
        "noplaylist": True,
        # If you want direct audio extraction & re-encode to mp3 via ffmpeg:
        # "postprocessors": [{"key": "FFmpegExtractAudio", "preferredcodec": "mp3", "preferredquality": "192"}],
    }

    with YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(youtube_url, download=True)
        filename = ydl.prepare_filename(info)  # actual downloaded file (likely .m4a or .webm)
        # If a postprocessor runs, extension could change; otherwise keep as-is.
        return Path(filename)

In [3]:
def to_wav_16k_mono(src_audio: Path, out_dir: Path) -> Path:
    """
    Normalizes to 16kHz mono WAV via ffmpeg—ideal for most STT models.
    """
    out_dir.mkdir(parents=True, exist_ok=True)
    wav_path = out_dir / (src_audio.stem + ".16k.wav")
    cmd = [
        "ffmpeg", "-y",
        "-i", str(src_audio),
        "-ac", "1",            # mono
        "-ar", "16000",        # 16 kHz
        "-vn",
        str(wav_path),
    ]
    subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    return wav_path

In [4]:
def transcribe_with_faster_whisper(wav_path: Path, model_size: str = "small") -> str:
    """
    Transcribes with faster-whisper (CTranslate2). Choose model_size: tiny/base/small/medium/large-v3.
    """
    model = WhisperModel(model_size, device="auto", compute_type="auto")
    segments, info = model.transcribe(str(wav_path), vad_filter=True)
    text = "".join(seg.text for seg in segments)
    return text.strip()

In [5]:
def youtube_to_text(youtube_url: str) -> str:
    with tempfile.TemporaryDirectory() as tmp:
        tmpdir = Path(tmp)
        raw = download_youtube_audio(youtube_url, tmpdir)
        wav = to_wav_16k_mono(raw, tmpdir)
        return transcribe_with_faster_whisper(wav)

In [6]:
url = "https://www.youtube.com/watch?v=UuGrBhK2c7U"
print(youtube_to_text(url))

                                                           

model.bin:   0%|          | 0.00/484M [00:00<?, ?B/s]

vocabulary.txt: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

The ocean conveyor belt and the Gulfstream. Ocean currents have a direct influence on our lives. They determine our weather, our climate and much more. The ocean currents and wind systems transport heat from the equator to the poles and operate like a large engine for the global climate. In the oceans there are numerous currents. The so-called ocean conveyor belt is very important for our climate. This term describes a combination of currents that result in four of the five global oceans exchanging water with each other. They form a worldwide circulation system. The conveyor belt is also called the thermohaline circulation, with thermo referring to the temperature and haline to the salt content of the water. Both determine the density of the water. While the masses of water may be moved in part by wind, primarily the different densities of the global oceans are responsible for their movement. Warm water has a lower density and rises, while cold water sinks. The water's density also inc

In [10]:
url = "https://www.youtube.com/watch?v=UuGrBhK2c7U"
print(youtube_to_text(url))

The ocean conveyor belt and the Gulfstream. Ocean currents have a direct influence on our lives. They determine our weather, our climate and much more. The ocean currents and wind systems transport heat from the equator to the poles and operate like a large engine for the global climate. In the oceans there are numerous currents. The so-called ocean conveyor belt is very important for our climate. This term describes a combination of currents that result in four of the five global oceans exchanging water with each other. They form a worldwide circulation system. The conveyor belt is also called the thermohaline circulation, with thermo referring to the temperature and haline to the salt content of the water. Both determine the density of the water. While the masses of water may be moved in part by wind, primarily the different densities of the global oceans are responsible for their movement. Warm water has a lower density and rises, while cold water sinks. The water's density also inc

In [7]:
!pip install youtube-transcript-api

Collecting youtube-transcript-api
  Obtaining dependency information for youtube-transcript-api from https://files.pythonhosted.org/packages/41/92/3d1a580f0efcad926f45876cf6cb92b2c260e84ae75dae5463bbf38f92e7/youtube_transcript_api-1.2.2-py3-none-any.whl.metadata
  Downloading youtube_transcript_api-1.2.2-py3-none-any.whl.metadata (24 kB)
Downloading youtube_transcript_api-1.2.2-py3-none-any.whl (485 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.0/485.0 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: youtube-transcript-api
Successfully installed youtube-transcript-api-1.2.2


In [19]:
from youtube_transcript_api import YouTubeTranscriptApi

video_id = "7hBMbQ9de1g"
transcript = YouTubeTranscriptApi().fetch(video_id)

# Create a string variable with the full transcript
transcript_text = ""
for snippet in transcript:
    transcript_text += snippet.text + " "

# Clean up extra spaces and print the result
transcript_text = transcript_text.strip()
print("Full transcript:")
print(transcript_text)

Full transcript:
- Most of the deadliest heat waves to date have been dry heat waves. But as our climate warms
a possibly even deadlier heat is on the rise. - It doesn't matter how breezy it is, it doesn't matter how
much shade you're under. It doesn't matter how much
water you're drinking. - These are extreme humid
heat waves, often referred to as wet-bulb events. And these are, in my opinion, one of the scariest byproducts
of climate change. They've been relatively rare historically, but a new study shows that
by two degrees Celsius of global warming, many of the most populated
places on Earth are likely to experience them. And the only way to protect ourselves from
these dangerous wet-bulb temperatures is to cool our environment. But what happens if that's not possible? - The time of year that always has the highest
stress on the grid is always just the hottest day of the year because so many people are using AC and so then the AC is liable to go out. - What happens if there's
anoth