🌐 Clone-It - Website Cloning Tool

A user-friendly command-line tool for cloning websites using wget with an interactive menu interface.

Features

Interactive Menu: Easy-to-use CLI interface with colored output
Domain Validation: Ensures valid domain format before proceeding
Project Organization: Creates organized folder structure in projects/domain.com
Live Output: Shows real-time wget progress and output
Post-Clone Options: View files, open folder, or clone another site
Error Handling: Graceful error handling with helpful messages

Prerequisites

wget: The tool requires wget to be installed
- macOS: brew install wget
- Ubuntu/Debian: sudo apt-get install wget
- CentOS/RHEL: sudo yum install wget

Installation

Clone or download the script to your desired location
Make it executable: chmod +x clone-it.sh
(Optional) Run the installer for easier access: ./install.sh

Usage

Direct execution:

./clone-it.sh

After installation:

clone-it

How It Works

Menu Interface: The script presents a clean, colored menu asking "What are we cloning today?"
Domain Input: Enter the domain you want to clone (e.g., example.com)
Validation: The script validates the domain format
Link Conversion Option: Choose whether to convert internal links to .html extension
Directory Creation: Creates projects/domain.com/ folder structure

Cloning Process: Runs the wget command with optimal parameters:

wget --mirror -w 2 -p --html-extension --convert-links https://domain.com/

Link Processing: If enabled, converts internal links to work with .html extensions
Live Output: Shows the wget output in real-time within a bordered window
Completion Menu: Offers options to:
- Clone another website
- Open the project folder
- View project contents
- Exit

wget Parameters Explained

--mirror: Creates a complete mirror of the site
-w 2: Waits 2 seconds between downloads (respectful crawling)
-p: Downloads all page prerequisites (images, CSS, etc.)
--html-extension: Adds .html extension to files
--convert-links: Converts links for offline browsing

Project Structure

CLONE-IT/
├── clone-it.sh        # Main cloning script
├── fix-links.sh       # Link conversion utility
├── install.sh         # Installation helper
├── README.md          # This file
└── projects/          # Created when first used
    └── domain.com/    # Individual site folders
        └── domain.com/  # Actual site files

Error Handling

Checks for wget installation before running
validates domain format
Handles existing directories with user confirmation
Reports wget exit codes and errors
Graceful handling of user cancellations

Colors and UI

The script uses ANSI color codes for better user experience:

🔵 Blue: Process information
🟢 Green: Success messages
🟡 Yellow: Warnings and prompts
🔴 Red: Error messages
🔷 Cyan: Headers and menus

Link Conversion Feature

The tool includes a link conversion feature to fix a common issue with cloned websites:

The Problem: When wget clones a site, it adds .html extensions to files that didn't originally have them. This breaks internal links like /office-visits which become /office-visits.html but the HTML still links to the original path without the extension.

The Solution: When you choose "Y" for "Convert all internal links to .html?", the script will:

Scan all HTML files in the cloned site
Convert internal links from /page-name to /page-name.html
Handle both absolute and relative links
Preserve existing .html links unchanged
Create backups when using the standalone fix-links utility

Standalone Link Fixer

For sites already cloned without link conversion, use the separate utility:

./fix-links.sh              # Interactive mode
./fix-links.sh example.com  # Direct mode

Tips

The script is respectful to servers with a 2-second delay between requests
Large sites may take considerable time to clone completely
Check robots.txt and site terms of service before cloning
The cloned site will work offline with converted links
Use link conversion if the original site used clean URLs without extensions

Troubleshooting

"wget not found": Install wget using your package manager "Invalid domain": Ensure domain format like example.com (no http://) "Permission denied": Make sure the script is executable (chmod +x) Slow cloning: This is normal for large sites due to the respectful 2-second delay

Playwright Edition (bypasses JS challenges)

Some sites sit behind bot-mitigation challenges (Vercel, Cloudflare, etc.) that return HTTP 429 to wget because it can't execute JavaScript. For those sites, use clone-playwright.js, which drives a real Chromium browser so the challenge solves transparently.

Setup

npm install playwright
npx playwright install chromium

Usage

node clone-playwright.js <domain> [--max-pages=N] [--headful] [--delay=ms]

# Examples
node clone-playwright.js example.com
node clone-playwright.js example.com --max-pages=500
node clone-playwright.js example.com --headful   # watch it run

What it does

Launches headless Chromium with a real user-agent
BFS-crawls same-origin links from the landing page (cap: --max-pages)
Captures every successful response (CSS, JS, images, fonts, …) via the browser network layer and writes it to disk
Saves the rendered post-JS HTML for each page
Rewrites absolute URLs, root-relative paths, and internal links to relative local paths so the mirror works fully offline

Output

projects/<domain>/<domain>/
├── index.html
├── about-us.html
├── _next/...        # framework assets
└── ...

Flags

Flag	Default	Description
`--max-pages=N`	200	Hard cap on pages crawled
`--headful`	off	Show the browser window (debugging)
`--delay=ms`	500	Pause after `networkidle` per page

Viewing the clone

cd projects/<domain>/<domain>
python3 -m http.server 8000
# open http://localhost:8000/

Caveats

Lazy-loaded / scroll-triggered assets won't be captured unless you add a scroll step in the page loop.
URLs with query strings get a hashed suffix in the filename to avoid collisions (notably common with Next.js RSC payloads ?_rsc=...).
Use --headful the first time on a new site if anything looks wrong.

License

Free to use and modify as needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Clone-It - Website Cloning Tool

Features

Prerequisites

Installation

Usage

Direct execution:

After installation:

How It Works

wget Parameters Explained

Project Structure

Error Handling

Colors and UI

Link Conversion Feature

Standalone Link Fixer

Tips

Troubleshooting

Playwright Edition (bypasses JS challenges)

Setup

Usage

What it does

Output

Flags

Viewing the clone

Caveats

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
clone-it.sh		clone-it.sh
clone-playwright.js		clone-playwright.js
fix-links.sh		fix-links.sh
install.sh		install.sh
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

🌐 Clone-It - Website Cloning Tool

Features

Prerequisites

Installation

Usage

Direct execution:

After installation:

How It Works

wget Parameters Explained

Project Structure

Error Handling

Colors and UI

Link Conversion Feature

Standalone Link Fixer

Tips

Troubleshooting

Playwright Edition (bypasses JS challenges)

Setup

Usage

What it does

Output

Flags

Viewing the clone

Caveats

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages