In [18]:
config

{'system_prompt': 'You are a skilled Thai financial editor and summarizer.\nYour task is to read auto-generated Thai YouTube subtitles of a company\'s \'Opportunity Day\' presentation in the Thai stock market.\n\nThese subtitles may have:\n- Incorrect spacing or misspelled words.\n- Extra timestamps in seconds (e.g., "12.20s: ...").\n\nInstructions:\n1. Ignore timestamps. Only use the spoken text.\n2. Correct any obvious spelling or spacing mistakes that affect meaning.\n3. If you cannot confidently fix unclear text, mark it as [ไม่ชัดเจน].\n4. Produce a detailed Thai summary with sections:\n   • Company overview or updates\n   • Financial performance (รายได้ กำไร อัตราการเติบโต)\n   • Future plans, expansions, or investments\n   • Risk factors or challenges\n   • Management Q&A or analyst Q&A\n   • Specific numbers, dates, or KPIs\n5. Organize the summary in clear sections with bullet points.\n6. End with a 1-2 line insight about the company\'s situation and outlook.\nWrite in formal 

In [1]:
import requests
from bs4 import BeautifulSoup

url = "https://www.youtube.com/live/OBxvdajvyyM?si=SwDKNkg5wzqoS4uR"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

title = soup.title.string.replace(" - YouTube", "").strip()

In [2]:
import re

# get video title name
def clean_title(text):
    return re.sub(r'[^0-9a-zA-Z\u0E00-\u0E7F\.]', '', text)
title = clean_title(text=title)

# get video id
match = re.search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url)
if match:
    video_id = match.group(1)

In [3]:
from youtube_transcript_api import YouTubeTranscriptApi

# Fetch transcript (auto-captions or uploaded)
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['th', 'en'])

# Optionally, save to file
file_name = f'{title}_{video_id}'
with open(f"{file_name}_subtitle.txt", "w", encoding="utf-8") as f:
    for entry in transcript:
        f.write(f"{entry['start']:.2f}s: {entry['text']}\n")
        # f.write(f"{entry['text']}")


In [4]:
title = file_name.split('_')[0]

In [5]:
# Read file content
with open(f"{file_name}_subtitle.txt", "r", encoding="utf-8") as f:
    text_content = f.read()

In [19]:
# import ollama

# user_prompt = f"""
# You are an expert summarizer.
# Your task is to summarize the given YouTube video subtitle into clear, structured topics with as much important detail as possible.

# **Video Title:** {title}

# **Subtitle Text (may include timestamps):**
# {text_content}

# **Instructions:**
# - Ignore timestamps if present.
# - Identify all main topics or sections discussed in the video.
# - For each topic, write a clear **topic heading** and detailed bullet points (include sub-points if needed).
# - Remove filler words and repetitions.
# - Combine related lines into clear, complete thoughts.
# - Keep the style clear and easy to read.
# - Cover ALL important points in the subtitle — do not skip parts.
# - Finish with a short **Conclusion** section that summarizes the overall message in 2–4 sentences.
# - ⚠️ **Important:** DO NOT add any introduction, opening phrase, or explanation before the first topic. Start the output IMMEDIATELY with the first topic heading.
# - The first line must be EXACTLY in this format:
#   **Topic 1: [Title]**
# - Never write any other sentence before that line.

# **Format Example:**

# **Topic 1: Investment Strategy**
# - Point 1
# - Point 2
#   - Sub-point

# **Topic 2: Earnings Per Share (EPS)**
# - Point 1
# ...

# **Conclusion**
# Short final summary.
# """

# system_prompt = """
# You are a helpful, precise summarizer.
# Your role is to read YouTube subtitles and produce an accurate, detailed summary with clear topic headings and bullet points.
# Ignore timestamps and filler words.
# NEVER write an introduction or any opening sentence. Only output the formatted summary.
# Always start DIRECTLY with the first topic heading: **Topic 1: [Title]**
# Your summaries must be detailed, well-organized, and cover all key points.
# Always end with a short conclusion.
# """

# response = ollama.chat(
#     model="llama3",
#     messages=[
#         {"role": "system", "content": system_prompt},
#         {"role": "user", "content": user_prompt}
#     ]
# )

# summary = response["message"]["content"]

# print(summary)

In [26]:
import yaml

# load config
file_name = 'youtube_config.yaml'
with open(file_name, "r") as file:
    config = yaml.safe_load(file)

In [27]:
import ollama

system_prompt = f"""
{config['system_prompt']}
"""

user_prompt = f"""
{config['user_prompt']}
{text_content}
"""

response = ollama.chat(
    model="llama3",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
)

summary = response["message"]["content"]

print(summary)

This is a Thai subtitle transcript from an Opportunity Day presentation for a Thai stock, specifically VGI Public Company Limited (VGI). The transcript appears to be a Q&A session with the company's executives.

Here are some key points mentioned in the transcript:

1. Financial performance: VGI reported a positive financial performance, with cash reserves of around 20 billion baht.
2. Business outlook: The company is optimistic about its future prospects, with plans to invest in digital services and expand its business operations.
3. Sustainability: VGI emphasized its commitment to sustainability, including reducing carbon emissions and promoting environmentally friendly practices.
4. Turtle Extra: The company mentioned plans to open new "Turtle Extra" stores, which will offer a wider range of products and services.
5. Virtual Bank: VGI discussed its virtual bank project, which aims to provide financial services online.
6. Plan B, C, D: The company mentioned three contingency plans (P

เอาสิ่งที่สรุปได้ไปหากูเกิลเพื่อทำ research อีกที

In [7]:
summary_text = response["message"]["content"]

# Save to a .txt file
with open(f"{file_name}_summary.txt", "w", encoding="utf-8") as file:
    file.write(summary_text)