feat: add Threads Trending Now as content source via Playwright scraping by Copilot · Pull Request #3 · thaitien280401-stack/RedditVideoMakerBot

Copilot · 2026-04-07T17:52:37Z

The app currently only fetches posts from a specific user's Threads profile via the official API. This adds support for sourcing content from the Trending Now section on Threads' search page instead.

The official Threads API has no trending/search endpoint, so this uses Playwright (already a dependency) to scrape threads.net/search.

Changes

threads/trending.py (new) — Playwright-based scraper that:
- Navigates to the Threads search page in a single browser session
- Extracts trending topic links, visits each topic page, scrapes thread posts
- scrape_thread_replies() fetches replies from thread pages (API only supports authenticated user's own threads)
threads/threads_client.py — Added _get_trending_content() helper, wired into get_threads_posts() via new source config check. Falls back to user threads automatically if scraping fails. Extracted _MAX_TITLE_LENGTH constant to replace repeated magic 200.
utils/.config.template.toml — Added source option to [threads.thread]

Usage

[threads.thread]
source = "trending"   # or "user" (default, existing behavior)

Notes

Web scraping is inherently fragile — if Threads changes their DOM structure, the selectors in trending.py will need updating. The automatic fallback to user-based sourcing ensures the pipeline doesn't break silently.

- Create threads/trending.py: Playwright-based scraper for Threads trending topics and thread replies - Modify threads/threads_client.py: add source config check, integrate trending scraper with fallback to user threads - Update .config.template.toml: add source option (user/trending) Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/01a85c1b-5157-4723-80f1-ca726e410a39 Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>

…r setup Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/01a85c1b-5157-4723-80f1-ca726e410a39 Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>

Copilot

Pull request overview

Adds a new Threads content source that scrapes “Trending now” via Playwright, enabling the pipeline to generate videos from trending topics when the official Threads API can’t provide search/trending data.

Changes:

Added a Playwright-based trending scraper (threads/trending.py) that extracts trending topics, threads, and replies.
Wired trending sourcing into get_threads_posts() via a new [threads.thread].source config option, with fallback to user-based sourcing on scrape failure.
Replaced repeated magic 200 title slicing with _MAX_TITLE_LENGTH.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`utils/.config.template.toml`	Introduces `[threads.thread].source` to select between `user` and `trending`.
`threads/trending.py`	New Playwright scraper for trending topics, thread posts, and replies scraping.
`threads/threads_client.py`	Adds trending integration + fallback, and centralizes title length constant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T17:57:29Z

+        container = link_handle.evaluate_handle(
+            """el => {
+                let node = el;
+                for (let i = 0; i < 10; i++) {
+                    node = node.parentElement;
+                    if (!node) return el.parentElement || el;
+                    const text = node.innerText || '';
+                    if (text.length > 30 && (
+                        node.getAttribute('role') === 'article' ||
+                        node.tagName === 'ARTICLE' ||
+                        node.dataset && node.dataset.testid
+                    )) {
+                        return node;
+                    }
+                }
+                return el.parentElement ? el.parentElement.parentElement || el.parentElement : el;
+            }"""
+        )
+        raw = container.inner_text().strip() if container else ""


_get_post_text() dùng evaluate_handle() nhưng sau đó gọi container.inner_text(). Trong Playwright Python, evaluate_handle() trả về JSHandle, không đảm bảo có method inner_text(); cần chuyển sang container_handle.as_element() (và handle có thể None) trước khi đọc text, hoặc dùng link_handle.evaluate() trả về string trực tiếp. Nếu không, scraper sẽ lỗi runtime khi gặp handle không phải ElementHandle.

Copilot · 2026-04-07T17:57:30Z

+    if source == "trending" and not POST_ID:
+        content = _get_trending_content(
+            max_comment_length=max_comment_length,
+            min_comment_length=min_comment_length,
+        )
+        if content is not None:
+            return content
+        # Fallback: nếu trending thất bại, tiếp tục dùng user threads
+        print_substep(
+            "⚠️ Trending không khả dụng, chuyển sang lấy từ user threads...",
+            style="bold yellow",
+        )


Nhánh source == "trending" có thể return content ngay cả khi không scrape được đủ replies (thậm chí 0 replies). Downstream (TTS/engine_wrapper + final_video) giả định có ít nhất 1 comment trong comment mode, nên trường hợp 0 replies sẽ làm thiếu file comment_0.png/mp3 và pipeline hỏng. Nên truyền min_comments vào _get_trending_content() và nếu not storymode thì chỉ return khi len(comments) >= min_comments, còn lại return None để fallback sang user threads.

Copilot · 2026-04-07T17:57:30Z

+            sanitised = sanitize_text(reply_text)
+            if not sanitised or sanitised.strip() == "":
+                continue
+
+            if len(reply_text) > max_comment_length:
+                continue
+            if len(reply_text) < min_comment_length:
+                continue
+
+            content["comments"].append(
+                {
+                    "comment_body": reply_text,
+                    "comment_url": "",
+                    "comment_id": re.sub(
+                        r"[^\w\s-]", "", f"trending_reply_{idx}"
+                    ),
+                    "comment_author": f"@{reply_username}",
+                }


Trong _get_trending_content(), biến sanitised = sanitize_text(reply_text) chỉ được dùng để kiểm tra rỗng, nhưng comment_body lại lưu reply_text (chưa sanitize). Nếu sanitize_text() loại emoji/ký tự đặc biệt để TTS không lỗi, thì lưu raw text sẽ khiến các bước sau (add_periods/process_text) vẫn gặp ký tự không mong muốn. Nên dùng sanitised làm comment_body (hoặc giải thích rõ vì sao giữ nguyên raw text là an toàn).

Copilot · 2026-04-07T17:57:31Z

+        if not text or _contains_blocked_words(text):
+            continue
+        title_candidate = text[:_MAX_TITLE_LENGTH]
+        if is_title_used(title_candidate):
+            print_substep(
+                f"Bỏ qua trending đã tạo video: {text[:50]}...",
+                style="bold yellow",
+            )
+            continue


Ở nhánh trending, is_title_used() đang kiểm tra title_candidate = text[:_MAX_TITLE_LENGTH] nhưng thread_title thực tế lại có thể là topic_title. Điều này làm dedup lệch: có thể tạo lại video nhiều lần cho cùng topic_title vì lịch sử lưu theo thread_title. Nên dùng cùng một giá trị (ví dụ display_title/topic_title nếu có) cho cả dedup và thread_title.

Copilot AI and others added 2 commits April 7, 2026 17:44

Copilot AI assigned Copilot and thaitien280401-stack Apr 7, 2026

Copilot created this pull request from a session on behalf of thaitien280401-stack April 7, 2026 17:52 View session

thaitien280401-stack marked this pull request as ready for review April 7, 2026 17:52

Copilot AI review requested due to automatic review settings April 7, 2026 17:52

thaitien280401-stack merged commit e696279 into MAIN Apr 7, 2026

Copilot started reviewing on behalf of thaitien280401-stack April 7, 2026 17:53 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Threads Trending Now as content source via Playwright scraping#3

feat: add Threads Trending Now as content source via Playwright scraping#3
thaitien280401-stack merged 2 commits intoMAINfrom
copilot/fetch-trending-now-articles

Copilot AI commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Apr 7, 2026

Changes

Usage

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants