Add loading status page and multi-source bypass fallbacks#43
Merged
wasi-master merged 2 commits intoMay 11, 2026
Merged
Conversation
Replaces the previous synchronous form-POST flow (which left users staring at a white screen for the duration of the upstream fetch) with an SSE-driven status page that shows real-time progress through each stage of the bypass attempt. Bypass pipeline: - Direct fetch with Googlebot UA (unchanged for non-Medium URLs) - Detect Cloudflare-style anti-bot challenge pages in the response - Fallback 1: Freedium (freedium-mirror.cfd, then freedium.cfd) for medium.com hosts — short-circuited before direct fetch since Cloudflare consistently blocks the Googlebot UA from non-Google IPs - Fallback 2: archive.org Wayback Machine snapshot - Fallback 3: archive.today / archive.ph / archive.is / archive.li mirrors (all four tried in sequence) - Surface a clear error if every source fails UI: - Animated step list with spinner / checkmark per stage - Progress bar and elapsed-time counter - Honest error box with retry button instead of a blank page Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
|
Thanks for the PR, let me clear the merge conflicts |
There was a problem hiding this comment.
Pull request overview
This PR improves the 13ft “portable” Flask app by replacing the synchronous form POST UX with an SSE-driven loading/status view, and by adding multiple fallback sources (Freedium + archive services) to recover content when direct fetches hit anti-bot challenge pages (e.g., Cloudflare).
Changes:
- Adds a client-side status UI that connects to a new SSE endpoint (
/status/<url>) to show real-time progress and final content/error. - Adds challenge-page detection plus fallback fetchers for Medium via Freedium and general URLs via archive.org and archive.today mirrors.
- Refactors the fetch worker to update job status and return results through the SSE stream.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from flask import request, Response | ||
| from bs4 import BeautifulSoup | ||
| from flask import request | ||
| from urllib.parse import urlparse, urljoin, quote |
Comment on lines
+343
to
+345
| area.innerHTML = '<div class="error-box"><strong>Something went wrong</strong>' + | ||
| message + '</div>' + | ||
| '<a href="/" class="retry-btn">Try Another URL</a>'; |
Comment on lines
+363
to
+373
| const formData = new FormData(); | ||
| formData.append('link', link); | ||
|
|
||
| const evtSource = new EventSource('/status/' + encodeURIComponent(link)); | ||
| let currentStep = 0; | ||
|
|
||
| evtSource.addEventListener('step', function(e) { | ||
| const data = JSON.parse(e.data); | ||
| const idx = STEPS.findIndex(s => s.id === data.step); | ||
| if (idx >= 0) { | ||
| currentStep = idx; |
Comment on lines
+510
to
+512
| wayback_api = f"https://archive.org/wayback/available?url={url}" | ||
| try: | ||
| meta = requests.get(wayback_api, timeout=15).json() |
Comment on lines
+520
to
+525
| archived_url = archived_url.replace("/web/", "/web/", 1) | ||
| archived_url = archived_url.replace( | ||
| archived_url.split("/web/")[1].split("/")[0], | ||
| archived_url.split("/web/")[1].split("/")[0] + "id_", | ||
| 1, | ||
| ) |
| set_step(job_id, 'fallback_ph') | ||
| for mirror in ARCHIVE_PH_MIRRORS: | ||
| try: | ||
| newest_url = f"https://{mirror}/newest/{url}" |
Comment on lines
+683
to
+687
| def generate(): | ||
| last_step = None | ||
| while True: | ||
| job = jobs.get(job_id) | ||
| if not job: |
Comment on lines
+596
to
+622
| if not medium: | ||
| archived_html, archived_url = fetch_via_archive_org(url, job_id) | ||
| if archived_html and not is_challenge_page(archived_html): | ||
| html_text = archived_html | ||
| final_url = archived_url | ||
| recovered = True | ||
|
|
||
| if not recovered: | ||
| archived_html, archived_url = fetch_via_archive_ph(url, job_id) | ||
| if archived_html and not is_challenge_page(archived_html): | ||
| html_text = archived_html | ||
| final_url = archived_url | ||
| recovered = True | ||
| else: | ||
| archived_html, archived_url = fetch_via_archive_org(url, job_id) | ||
| if archived_html and not is_challenge_page(archived_html): | ||
| html_text = archived_html | ||
| final_url = archived_url | ||
| recovered = True | ||
|
|
||
| if not recovered: | ||
| archived_html, archived_url = fetch_via_archive_ph(url, job_id) | ||
| if archived_html and not is_challenge_page(archived_html): | ||
| html_text = archived_html | ||
| final_url = archived_url | ||
| recovered = True | ||
|
|
Owner
|
@copilot apply changes based on the comments in this thread |
Owner
|
Copilot seems to be languid so I'll fix the issues myself. Thanks |
Copilot AI
added a commit
that referenced
this pull request
May 11, 2026
Agent-Logs-Url: https://github.com/wasi-master/13ft/sessions/62de0c61-bb98-4941-9db2-3e6b22b08033 Co-authored-by: wasi-master <63045920+wasi-master@users.noreply.github.com>
wasi-master
added a commit
that referenced
this pull request
May 11, 2026
Fix review issues from PR #43: XSS, duplicate imports, dead code, URL encoding, threading, deduplication
This was referenced May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the previous synchronous form-POST flow (which left users staring at a white screen for the duration of the upstream fetch) with an SSE-driven status page that shows real-time progress through each stage of the bypass attempt. Also adds three additional bypass sources that significantly improve success rates on Cloudflare-protected sites.
Motivation
When I tried fetching a Medium article (
michalmalewicz.medium.com/vibe-coding-is-over-...) with the current 13ft, the page just sat blank — Medium's Cloudflare protection now validates that the Googlebot UA is actually coming from a Google-owned IP, so 13ft was silently being served a Cloudflare challenge page ("Just a moment..."). The user has no idea what's happening or why.This PR addresses both the UX problem (silent failure) and the technical problem (Cloudflare blocks).
Changes
UX — Loading status page
/status/<url>that streams step-by-step progressPOST /articleendpoint is preserved for backwards compatibilityBypass pipeline
"just a moment","checking your browser",cf-challenge, etc.) so we know when the upstream returned a bot wall instead of content*.medium.comURLs (viafreedium-mirror.cfd, thenfreedium.cfd). Short-circuits direct fetch for Medium since Cloudflare consistently blocks the Googlebot UA there. This is the same mechanism the OpenArticle Chrome extension usesid_raw snapshotarchive.ph,archive.today,archive.is,archive.li) in sequence for resilienceTest plan
example.com— direct fetch path, completes throughfetch→donefallback_freediumand returns the full article in ~130msPOST /articleendpoint still works (unchanged)Notes
re,uuid,threading,timeplus existingflask/requests/bs4jobsdict is process-local and entries are cleaned up after each SSE stream closes; suitable for the single-worker Flask dev server but would need Redis/similar if you ever move to a multi-worker deploymentfreedium.cfdwas unreachable when I tested;freedium-mirror.cfdworks). Both are tried for resilience🤖 Generated with Claude Code