Skip to content

feat: add Threads Trending Now as content source via Playwright scraping#3

Merged
thaitien280401-stack merged 2 commits intoMAINfrom
copilot/fetch-trending-now-articles
Apr 7, 2026
Merged

feat: add Threads Trending Now as content source via Playwright scraping#3
thaitien280401-stack merged 2 commits intoMAINfrom
copilot/fetch-trending-now-articles

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 7, 2026

The app currently only fetches posts from a specific user's Threads profile via the official API. This adds support for sourcing content from the Trending Now section on Threads' search page instead.

The official Threads API has no trending/search endpoint, so this uses Playwright (already a dependency) to scrape threads.net/search.

Changes

  • threads/trending.py (new) — Playwright-based scraper that:

    • Navigates to the Threads search page in a single browser session
    • Extracts trending topic links, visits each topic page, scrapes thread posts
    • scrape_thread_replies() fetches replies from thread pages (API only supports authenticated user's own threads)
  • threads/threads_client.py — Added _get_trending_content() helper, wired into get_threads_posts() via new source config check. Falls back to user threads automatically if scraping fails. Extracted _MAX_TITLE_LENGTH constant to replace repeated magic 200.

  • utils/.config.template.toml — Added source option to [threads.thread]

Usage

[threads.thread]
source = "trending"   # or "user" (default, existing behavior)

Notes

Web scraping is inherently fragile — if Threads changes their DOM structure, the selectors in trending.py will need updating. The automatic fallback to user-based sourcing ensures the pipeline doesn't break silently.

Copilot AI and others added 2 commits April 7, 2026 17:44
- Create threads/trending.py: Playwright-based scraper for Threads
  trending topics and thread replies
- Modify threads/threads_client.py: add source config check, integrate
  trending scraper with fallback to user threads
- Update .config.template.toml: add source option (user/trending)

Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/01a85c1b-5157-4723-80f1-ca726e410a39

Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>
…r setup

Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/01a85c1b-5157-4723-80f1-ca726e410a39

Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Threads content source that scrapes “Trending now” via Playwright, enabling the pipeline to generate videos from trending topics when the official Threads API can’t provide search/trending data.

Changes:

  • Added a Playwright-based trending scraper (threads/trending.py) that extracts trending topics, threads, and replies.
  • Wired trending sourcing into get_threads_posts() via a new [threads.thread].source config option, with fallback to user-based sourcing on scrape failure.
  • Replaced repeated magic 200 title slicing with _MAX_TITLE_LENGTH.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
utils/.config.template.toml Introduces [threads.thread].source to select between user and trending.
threads/trending.py New Playwright scraper for trending topics, thread posts, and replies scraping.
threads/threads_client.py Adds trending integration + fallback, and centralizes title length constant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread threads/trending.py
Comment on lines +119 to +137
container = link_handle.evaluate_handle(
"""el => {
let node = el;
for (let i = 0; i < 10; i++) {
node = node.parentElement;
if (!node) return el.parentElement || el;
const text = node.innerText || '';
if (text.length > 30 && (
node.getAttribute('role') === 'article' ||
node.tagName === 'ARTICLE' ||
node.dataset && node.dataset.testid
)) {
return node;
}
}
return el.parentElement ? el.parentElement.parentElement || el.parentElement : el;
}"""
)
raw = container.inner_text().strip() if container else ""
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_post_text() dùng evaluate_handle() nhưng sau đó gọi container.inner_text(). Trong Playwright Python, evaluate_handle() trả về JSHandle, không đảm bảo có method inner_text(); cần chuyển sang container_handle.as_element() (và handle có thể None) trước khi đọc text, hoặc dùng link_handle.evaluate() trả về string trực tiếp. Nếu không, scraper sẽ lỗi runtime khi gặp handle không phải ElementHandle.

Copilot uses AI. Check for mistakes.
Comment thread threads/threads_client.py
Comment on lines +445 to +456
if source == "trending" and not POST_ID:
content = _get_trending_content(
max_comment_length=max_comment_length,
min_comment_length=min_comment_length,
)
if content is not None:
return content
# Fallback: nếu trending thất bại, tiếp tục dùng user threads
print_substep(
"⚠️ Trending không khả dụng, chuyển sang lấy từ user threads...",
style="bold yellow",
)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nhánh source == "trending" có thể return content ngay cả khi không scrape được đủ replies (thậm chí 0 replies). Downstream (TTS/engine_wrapper + final_video) giả định có ít nhất 1 comment trong comment mode, nên trường hợp 0 replies sẽ làm thiếu file comment_0.png/mp3 và pipeline hỏng. Nên truyền min_comments vào _get_trending_content() và nếu not storymode thì chỉ return khi len(comments) >= min_comments, còn lại return None để fallback sang user threads.

Copilot uses AI. Check for mistakes.
Comment thread threads/threads_client.py
Comment on lines +351 to +368
sanitised = sanitize_text(reply_text)
if not sanitised or sanitised.strip() == "":
continue

if len(reply_text) > max_comment_length:
continue
if len(reply_text) < min_comment_length:
continue

content["comments"].append(
{
"comment_body": reply_text,
"comment_url": "",
"comment_id": re.sub(
r"[^\w\s-]", "", f"trending_reply_{idx}"
),
"comment_author": f"@{reply_username}",
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trong _get_trending_content(), biến sanitised = sanitize_text(reply_text) chỉ được dùng để kiểm tra rỗng, nhưng comment_body lại lưu reply_text (chưa sanitize). Nếu sanitize_text() loại emoji/ký tự đặc biệt để TTS không lỗi, thì lưu raw text sẽ khiến các bước sau (add_periods/process_text) vẫn gặp ký tự không mong muốn. Nên dùng sanitised làm comment_body (hoặc giải thích rõ vì sao giữ nguyên raw text là an toàn).

Copilot uses AI. Check for mistakes.
Comment thread threads/threads_client.py
Comment on lines +287 to +295
if not text or _contains_blocked_words(text):
continue
title_candidate = text[:_MAX_TITLE_LENGTH]
if is_title_used(title_candidate):
print_substep(
f"Bỏ qua trending đã tạo video: {text[:50]}...",
style="bold yellow",
)
continue
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ở nhánh trending, is_title_used() đang kiểm tra title_candidate = text[:_MAX_TITLE_LENGTH] nhưng thread_title thực tế lại có thể là topic_title. Điều này làm dedup lệch: có thể tạo lại video nhiều lần cho cùng topic_title vì lịch sử lưu theo thread_title. Nên dùng cùng một giá trị (ví dụ display_title/topic_title nếu có) cho cả dedup và thread_title.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants