Skip to content

Add loading status page and multi-source bypass fallbacks#43

Merged
wasi-master merged 2 commits into
wasi-master:mainfrom
JRS1986:feat/status-page-and-bypass-fallbacks
May 11, 2026
Merged

Add loading status page and multi-source bypass fallbacks#43
wasi-master merged 2 commits into
wasi-master:mainfrom
JRS1986:feat/status-page-and-bypass-fallbacks

Conversation

@JRS1986
Copy link
Copy Markdown

@JRS1986 JRS1986 commented May 11, 2026

Summary

Replaces the previous synchronous form-POST flow (which left users staring at a white screen for the duration of the upstream fetch) with an SSE-driven status page that shows real-time progress through each stage of the bypass attempt. Also adds three additional bypass sources that significantly improve success rates on Cloudflare-protected sites.

Motivation

When I tried fetching a Medium article (michalmalewicz.medium.com/vibe-coding-is-over-...) with the current 13ft, the page just sat blank — Medium's Cloudflare protection now validates that the Googlebot UA is actually coming from a Google-owned IP, so 13ft was silently being served a Cloudflare challenge page ("Just a moment..."). The user has no idea what's happening or why.

This PR addresses both the UX problem (silent failure) and the technical problem (Cloudflare blocks).

Changes

UX — Loading status page

  • New SSE endpoint /status/<url> that streams step-by-step progress
  • Animated step list with spinner → checkmark transitions
  • Progress bar and elapsed-time counter
  • Honest error box with a "Try Another URL" button when everything fails
  • Dark mode carries over from the existing setting
  • The legacy POST /article endpoint is preserved for backwards compatibility

Bypass pipeline

  • Challenge detection — scan response body for signatures ("just a moment", "checking your browser", cf-challenge, etc.) so we know when the upstream returned a bot wall instead of content
  • Freedium fallback for *.medium.com URLs (via freedium-mirror.cfd, then freedium.cfd). Short-circuits direct fetch for Medium since Cloudflare consistently blocks the Googlebot UA there. This is the same mechanism the OpenArticle Chrome extension uses
  • archive.org Wayback Machine fallback — queries the availability API, fetches the id_ raw snapshot
  • archive.today fallback — tries all four mirrors (archive.ph, archive.today, archive.is, archive.li) in sequence for resilience
  • Clear final error message when no source has the article

Test plan

  • example.com — direct fetch path, completes through fetchdone
  • Cloudflare-protected Medium article — short-circuits to fallback_freedium and returns the full article in ~130ms
  • A URL with no archive anywhere — surfaces a clear error instead of a blank page
  • Existing POST /article endpoint still works (unchanged)
  • Manual browser test on dark and light mode
  • Test with sites that have archive.org snapshots but Cloudflare blocks direct (e.g. some news sites)

Notes

  • No new Python dependencies — uses stdlib re, uuid, threading, time plus existing flask / requests / bs4
  • The in-memory jobs dict is process-local and entries are cleaned up after each SSE stream closes; suitable for the single-worker Flask dev server but would need Redis/similar if you ever move to a multi-worker deployment
  • Freedium is a third-party service that has had domain churn (the original freedium.cfd was unreachable when I tested; freedium-mirror.cfd works). Both are tried for resilience

🤖 Generated with Claude Code

Replaces the previous synchronous form-POST flow (which left users
staring at a white screen for the duration of the upstream fetch)
with an SSE-driven status page that shows real-time progress through
each stage of the bypass attempt.

Bypass pipeline:
- Direct fetch with Googlebot UA (unchanged for non-Medium URLs)
- Detect Cloudflare-style anti-bot challenge pages in the response
- Fallback 1: Freedium (freedium-mirror.cfd, then freedium.cfd) for
  medium.com hosts — short-circuited before direct fetch since
  Cloudflare consistently blocks the Googlebot UA from non-Google IPs
- Fallback 2: archive.org Wayback Machine snapshot
- Fallback 3: archive.today / archive.ph / archive.is / archive.li
  mirrors (all four tried in sequence)
- Surface a clear error if every source fails

UI:
- Animated step list with spinner / checkmark per stage
- Progress bar and elapsed-time counter
- Honest error box with retry button instead of a blank page

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wasi-master
Copy link
Copy Markdown
Owner

wasi-master commented May 11, 2026

Thanks for the PR, let me clear the merge conflicts

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the 13ft “portable” Flask app by replacing the synchronous form POST UX with an SSE-driven loading/status view, and by adding multiple fallback sources (Freedium + archive services) to recover content when direct fetches hit anti-bot challenge pages (e.g., Cloudflare).

Changes:

  • Adds a client-side status UI that connects to a new SSE endpoint (/status/<url>) to show real-time progress and final content/error.
  • Adds challenge-page detection plus fallback fetchers for Medium via Freedium and general URLs via archive.org and archive.today mirrors.
  • Refactors the fetch worker to update job status and return results through the SSE stream.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/portable.py
from flask import request, Response
from bs4 import BeautifulSoup
from flask import request
from urllib.parse import urlparse, urljoin, quote
Comment thread app/portable.py
Comment on lines +343 to +345
area.innerHTML = '<div class="error-box"><strong>Something went wrong</strong>' +
message + '</div>' +
'<a href="/" class="retry-btn">Try Another URL</a>';
Comment thread app/portable.py
Comment on lines +363 to +373
const formData = new FormData();
formData.append('link', link);

const evtSource = new EventSource('/status/' + encodeURIComponent(link));
let currentStep = 0;

evtSource.addEventListener('step', function(e) {
const data = JSON.parse(e.data);
const idx = STEPS.findIndex(s => s.id === data.step);
if (idx >= 0) {
currentStep = idx;
Comment thread app/portable.py
Comment on lines +510 to +512
wayback_api = f"https://archive.org/wayback/available?url={url}"
try:
meta = requests.get(wayback_api, timeout=15).json()
Comment thread app/portable.py
Comment on lines +520 to +525
archived_url = archived_url.replace("/web/", "/web/", 1)
archived_url = archived_url.replace(
archived_url.split("/web/")[1].split("/")[0],
archived_url.split("/web/")[1].split("/")[0] + "id_",
1,
)
Comment thread app/portable.py
set_step(job_id, 'fallback_ph')
for mirror in ARCHIVE_PH_MIRRORS:
try:
newest_url = f"https://{mirror}/newest/{url}"
Comment thread app/portable.py
Comment on lines +683 to +687
def generate():
last_step = None
while True:
job = jobs.get(job_id)
if not job:
Comment thread app/portable.py
Comment on lines +596 to +622
if not medium:
archived_html, archived_url = fetch_via_archive_org(url, job_id)
if archived_html and not is_challenge_page(archived_html):
html_text = archived_html
final_url = archived_url
recovered = True

if not recovered:
archived_html, archived_url = fetch_via_archive_ph(url, job_id)
if archived_html and not is_challenge_page(archived_html):
html_text = archived_html
final_url = archived_url
recovered = True
else:
archived_html, archived_url = fetch_via_archive_org(url, job_id)
if archived_html and not is_challenge_page(archived_html):
html_text = archived_html
final_url = archived_url
recovered = True

if not recovered:
archived_html, archived_url = fetch_via_archive_ph(url, job_id)
if archived_html and not is_challenge_page(archived_html):
html_text = archived_html
final_url = archived_url
recovered = True

@wasi-master
Copy link
Copy Markdown
Owner

@copilot apply changes based on the comments in this thread

@wasi-master
Copy link
Copy Markdown
Owner

Copilot seems to be languid so I'll fix the issues myself. Thanks

@wasi-master wasi-master merged commit e1dd392 into wasi-master:main May 11, 2026
5 checks passed
Copilot AI added a commit that referenced this pull request May 11, 2026
Agent-Logs-Url: https://github.com/wasi-master/13ft/sessions/62de0c61-bb98-4941-9db2-3e6b22b08033

Co-authored-by: wasi-master <63045920+wasi-master@users.noreply.github.com>
wasi-master added a commit that referenced this pull request May 11, 2026
Fix review issues from PR #43: XSS, duplicate imports, dead code, URL encoding, threading, deduplication
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants