Scheduler: 1 video per 3h with title deduplication#1
Scheduler: 1 video per 3h with title deduplication#1thaitien280401-stack merged 6 commits intoMAINfrom
Conversation
Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/b2183a86-2887-4db0-82aa-07d9da5aa1be Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>
- Add threads/ module: Threads API client for fetching content from Meta Threads - Add video_creation/threads_screenshot.py: Generate Threads-style images with PIL - Add uploaders/ module: YouTube, TikTok, Facebook auto-upload support - Add scheduler/ module: APScheduler-based automated pipeline - Update main.py: New entry point with manual/auto/scheduled modes - Update config template: Threads, uploaders, scheduler configuration - Update dependencies: google-api-python-client, APScheduler - Update references from Reddit to Threads across TTS and video modules - Maintain backward compatibility with --reddit flag Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/b2183a86-2887-4db0-82aa-07d9da5aa1be Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>
…tant, dynamic upload timeout Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/b2183a86-2887-4db0-82aa-07d9da5aa1be Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>
…ates - Changed default cron from every 6h to every 3h (8 videos/day) - Added utils/title_history.py: tracks used titles in JSON - threads_client.py: skips threads with already-used titles - scheduler/pipeline.py: saves title after successful video creation - main.py: saves title in manual mode too - Updated config template with new scheduler defaults Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/17c7c41c-cf86-4279-88b8-01cba23ee763 Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>
- Replace all unsafe eval() calls with safe _SAFE_TYPE_MAP in settings.py, console.py, gui_utils.py - Replace hardcoded Flask secret key with os.urandom(24) in GUI.py - Replace os.system() with subprocess.run() in TTS/engine_wrapper.py - Fix install.sh Homebrew URL (was uninstall.sh instead of install.sh) - Create HUONG_DAN_VPS.md - comprehensive VPS deployment guide in Vietnamese Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/75dc3e66-3e99-4406-af07-9de9be3200b6 Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>
|
@copilot review and pull mearge |
Agent-Logs-Url: https://github.com/thaitien280401-stack/RedditVideoMakerBot/sessions/29d5b341-a15e-4235-bb26-8e2de812f684 Co-authored-by: thaitien280401-stack <271128961+thaitien280401-stack@users.noreply.github.com>
PR is reviewed and ready for merge. The latest commit Summary of all changes:
|
There was a problem hiding this comment.
Pull request overview
This PR shifts the project toward a Vietnam-focused “Threads Video Maker Bot” by adding a Threads-based content pipeline, scheduling, and multi-platform upload support, while also addressing several security issues (removing eval() usage, hardcoded Flask secret, and os.system()).
Changes:
- Added Threads content ingestion + Pillow-based “Threads UI” image rendering, plus title-history deduplication.
- Introduced an APScheduler-based pipeline runner and a new uploader framework (YouTube/TikTok/Facebook + manager).
- Updated config template, docs, and entrypoint (
main.py) to support Threads-first operation and new modes.
Reviewed changes
Copilot reviewed 23 out of 28 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| video_creation/threads_screenshot.py | New Pillow renderer to generate Threads-style post/comment images. |
| video_creation/final_video.py | Supports both Threads and Reddit config keys for language and channel/subreddit naming. |
| utils/videos.py | Generalized “already done” check to work with Threads/Reddit config. |
| utils/title_history.py | New persisted title history used for title de-duplication. |
| utils/settings.py | Replaced eval() with safe type casting for config validation. |
| utils/gui_utils.py | Replaced eval() with safe type casting for GUI validation. |
| utils/console.py | Removed eval() in CLI input handling (but introduces a boolean parsing bug in options branch). |
| utils/.config.template.toml | Threads-first template + uploaders + scheduler config; legacy Reddit section kept. |
| uploaders/base_uploader.py | New base uploader + metadata model + retry wrapper. |
| uploaders/youtube_uploader.py | YouTube resumable upload implementation (missing auth header on upload PUT). |
| uploaders/tiktok_uploader.py | TikTok Content Posting API-based uploader. |
| uploaders/facebook_uploader.py | Facebook Graph resumable upload implementation (schedule time type mismatch). |
| uploaders/upload_manager.py | Orchestrates uploads across enabled platforms. |
| threads/threads_client.py | Threads API client + keyword filter + blocked words + title dedup selection. |
| scheduler/pipeline.py | End-to-end pipeline runner + cron scheduling with daily cap. |
| TTS/GTTS.py | Uses Threads/Reddit config language selection for gTTS. |
| TTS/engine_wrapper.py | Replaced os.system() with subprocess.run(); Threads/Reddit config language support. |
| requirements.txt | Added APScheduler and Google API libs (Google libs appear unused currently). |
| main.py | New modes (manual/auto/scheduled), Threads default, legacy Reddit mode retained. |
| README.md | Rewritten for Vietnam/Threads-first usage and setup instructions. |
| PLAN.md | Added migration plan documentation. |
| HUONG_DAN_VPS.md | Added Vietnamese VPS deployment/run guide (systemd/Docker/troubleshooting). |
| install.sh | Fixed Homebrew install URL. |
| GUI.py | Uses env/random Flask secret key instead of hardcoded bytes. |
| .gitignore | Ignores video_creation/data/title_history.json. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| console.print(message, end="") | ||
| user_input = input("").strip() | ||
| if check_type is not False: | ||
| try: | ||
| isinstance(eval(user_input), check_type) # fixme: remove eval | ||
| check_type(user_input) | ||
| return check_type(user_input) | ||
| except: |
There was a problem hiding this comment.
In the options branch, boolean input will be parsed incorrectly because bool('false') evaluates to True. This also calls check_type(user_input) twice. Consider using the same safe casting logic as in settings._safe_type_cast (special-case bool strings like true/false/1/0/yes/no), cast once, then validate against options before returning.
| text_lines = _wrap_text(text, body_font, content_width - avatar_size - 30) | ||
| line_height = 42 | ||
| text_height = len(text_lines) * line_height | ||
|
|
||
| total_height = padding + avatar_size + 20 + text_height + 60 + padding | ||
|
|
||
| # Tạo image | ||
| img = Image.new("RGBA", (width, total_height), theme["bg_color"]) | ||
| draw = ImageDraw.Draw(img) | ||
|
|
||
| y_cursor = padding | ||
|
|
||
| # Avatar | ||
| _draw_avatar(draw, padding, y_cursor, avatar_size, theme["accent_color"]) | ||
|
|
||
| # Username | ||
| draw.text( | ||
| (padding + avatar_size + 15, y_cursor + 5), | ||
| author, | ||
| fill=theme["text_color"], | ||
| font=username_font, | ||
| ) | ||
|
|
||
| # Timestamp | ||
| draw.text( | ||
| (padding + avatar_size + 15, y_cursor + 35), | ||
| "🧵 Threads", | ||
| fill=theme["secondary_text"], | ||
| font=meta_font, | ||
| ) | ||
|
|
||
| y_cursor += avatar_size + 20 | ||
|
|
||
| # Thread line (vertical line from avatar to content) | ||
| line_x = padding + avatar_size // 2 | ||
| draw.line( | ||
| [(line_x, padding + avatar_size + 5), (line_x, y_cursor - 5)], | ||
| fill=theme["reply_line"], | ||
| width=3, | ||
| ) | ||
|
|
||
| # Body text | ||
| for line in text_lines: | ||
| draw.text( | ||
| (padding + 10, y_cursor), | ||
| line, | ||
| fill=theme["text_color"], | ||
| font=body_font, | ||
| ) | ||
| y_cursor += line_height |
There was a problem hiding this comment.
The body text X-position uses padding + 10, but wrapping width is computed as content_width - avatar_size - 30 (i.e., expecting text to start to the right of the avatar). As written, the rendered text will overlap the avatar/left margin area. Align the body text start X with the username area (e.g., padding + avatar_size + <gap>), and keep the wrap width consistent with that left margin.
| import textwrap | ||
| from pathlib import Path | ||
| from typing import Dict, Final, List, Optional, Tuple |
There was a problem hiding this comment.
Several imports appear unused in this module (textwrap, Dict, Optional, and H is assigned but never used). Removing unused imports/variables will prevent lint warnings and keeps the screenshot generator easier to maintain.
| import textwrap | |
| from pathlib import Path | |
| from typing import Dict, Final, List, Optional, Tuple | |
| from pathlib import Path | |
| from typing import Final, List, Tuple |
| from utils import settings | ||
| from utils.console import print_step, print_substep | ||
| from utils.title_history import is_title_used | ||
| from utils.videos import check_done |
There was a problem hiding this comment.
check_done is imported but never used in this module. Since the selection logic currently relies on title_history, either remove this import or integrate check_done() into the thread selection so already-rendered thread IDs are skipped consistently.
| from utils.videos import check_done |
| if not threads_list: | ||
| print_substep("Không tìm thấy threads nào!", style="bold red") | ||
| raise ValueError("No threads found") | ||
|
|
||
| # Lọc theo từ khóa nếu có | ||
| keywords = thread_config.get("keywords", "") | ||
| if keywords: | ||
| keyword_list = [k.strip() for k in keywords.split(",") if k.strip()] | ||
| threads_list = client.search_threads_by_keyword(threads_list, keyword_list) | ||
|
|
||
| # Chọn thread phù hợp (chưa tạo video, đủ replies, title chưa dùng) | ||
| thread = None | ||
| for t in threads_list: | ||
| thread_id = t.get("id", "") | ||
| # Kiểm tra xem đã tạo video cho thread này chưa | ||
| text = t.get("text", "") | ||
| if not text or _contains_blocked_words(text): | ||
| continue | ||
| # Kiểm tra title đã được sử dụng chưa (tránh trùng lặp) | ||
| title_candidate = text[:200] if len(text) > 200 else text | ||
| if is_title_used(title_candidate): | ||
| print_substep( | ||
| f"Bỏ qua thread đã tạo video: {text[:50]}...", | ||
| style="bold yellow", | ||
| ) | ||
| continue | ||
| # Kiểm tra số lượng replies | ||
| try: | ||
| replies = client.get_thread_replies(thread_id, limit=min_comments + 5) | ||
| if len(replies) >= min_comments: | ||
| thread = t | ||
| break |
There was a problem hiding this comment.
If no thread meets the selection criteria, the fallback thread = threads_list[0] can bypass blocked-words filtering and title de-duplication (and may select a thread with too few replies). This undermines the purpose of blocked_words/is_title_used and can lead to duplicate or unwanted content. Prefer either (1) raising a clear error, or (2) selecting the first thread that still passes the safety/dedup filters while relaxing only the replies constraint, explicitly.
| print_substep("YouTube: Không thể khởi tạo upload session", style="bold red") | ||
| return None | ||
|
|
||
| # Step 2: Upload video file | ||
| file_size = os.path.getsize(metadata.file_path) | ||
| # Dynamic timeout: minimum 120s, add 60s per 100MB | ||
| upload_timeout = max(120, 60 * (file_size // (100 * 1024 * 1024) + 1)) | ||
| with open(metadata.file_path, "rb") as video_file: | ||
| upload_response = requests.put( | ||
| upload_url, | ||
| headers={ | ||
| "Content-Type": "video/mp4", | ||
| "Content-Length": str(file_size), |
There was a problem hiding this comment.
The resumable upload PUT request is missing the Authorization: Bearer ... header. Resumable session URLs still require auth; without it the upload will typically fail with 401/403. Include the same Authorization header used in the init request when calling requests.put(upload_url, ...) (and ideally also include X-Upload-Content-Length/Type when initiating the session).
| "description": description[: self.MAX_DESCRIPTION_LENGTH], | ||
| } | ||
|
|
There was a problem hiding this comment.
scheduled_publish_time expects a Unix timestamp (seconds) for Facebook’s API, but metadata.schedule_time is documented/typed as an ISO 8601 string in VideoMetadata/UploadManager. This mismatch will cause scheduling to fail when used. Either change schedule_time to an int timestamp throughout, or convert ISO 8601 → timestamp before assigning scheduled_publish_time.
|
|
||
| timezone = scheduler_config.get("timezone", "Asia/Ho_Chi_Minh") | ||
| cron_expression = scheduler_config.get( | ||
| "cron", "0 */3 * * *" | ||
| ) # Mặc định mỗi 3 giờ (8 lần/ngày: 00, 03, 06, 09, 12, 15, 18, 21h) | ||
| max_videos_per_day = scheduler_config.get("max_videos_per_day", 8) | ||
|
|
||
| # Parse cron expression | ||
| cron_parts = cron_expression.split() | ||
| if len(cron_parts) != 5: | ||
| print_substep( | ||
| "Cron expression không hợp lệ! Format: minute hour day month weekday", style="bold red" | ||
| ) | ||
| return | ||
|
|
||
| scheduler = BlockingScheduler(timezone=timezone) | ||
|
|
||
| videos_today = {"count": 0, "date": datetime.now().strftime("%Y-%m-%d")} | ||
|
|
||
| def scheduled_job(): | ||
| """Job được chạy theo lịch.""" | ||
| current_date = datetime.now().strftime("%Y-%m-%d") | ||
|
|
||
| # Reset counter nếu sang ngày mới | ||
| if current_date != videos_today["date"]: | ||
| videos_today["count"] = 0 | ||
| videos_today["date"] = current_date | ||
|
|
||
| if videos_today["count"] >= max_videos_per_day: | ||
| print_substep( | ||
| f"Đã đạt giới hạn {max_videos_per_day} video/ngày. Bỏ qua.", | ||
| style="bold yellow", |
There was a problem hiding this comment.
Scheduler date/time bookkeeping uses datetime.now() (naive/local) while the APScheduler trigger is configured with an explicit timezone. If the host OS timezone differs (common on VPS set to UTC), the daily reset (videos_today['date']) and log timestamps will be wrong relative to the scheduler trigger. Use timezone-aware "now" based on the configured timezone (e.g., datetime.now(ZoneInfo(timezone))) for both the date reset and log output.
| import sys | ||
| from datetime import datetime | ||
| from os import name | ||
| from pathlib import Path | ||
| from subprocess import Popen | ||
| from typing import Optional | ||
|
|
||
| from utils import settings | ||
| from utils.cleanup import cleanup | ||
| from utils.console import print_markdown, print_step, print_substep |
There was a problem hiding this comment.
There are several unused imports at the top of this module (sys, name, Path, Popen, cleanup, print_markdown, and save_title is only used in run_pipeline but cleanup/print_markdown aren’t used at all). Cleaning these up reduces noise and avoids lint/type-check warnings.
| import sys | |
| from datetime import datetime | |
| from os import name | |
| from pathlib import Path | |
| from subprocess import Popen | |
| from typing import Optional | |
| from utils import settings | |
| from utils.cleanup import cleanup | |
| from utils.console import print_markdown, print_step, print_substep | |
| from datetime import datetime | |
| from typing import Optional | |
| from utils import settings | |
| from utils.console import print_step, print_substep |
eval()calls inutils/settings.py(lines 33, 81) - replaced with safe_SAFE_TYPE_MAPtype mappingeval()call inutils/console.py(line 105) - replaced with directcheck_type()calleval()call inutils/gui_utils.py(line 49) - replaced with safe_SAFE_TYPE_MAPtype mappingGUI.py(line 26) - replaced withos.urandom(24)os.system()inTTS/engine_wrapper.py(line 130) - replaced withsubprocess.run()with argument listinstall.shwrong Homebrew URL (line 53) - wasuninstall.sh, fixed toinstall.shHUONG_DAN_VPS.md- comprehensive VPS deployment guide in Vietnamese