Skip to content

tuanama/webtoon-bulk-harvester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

preview

Manuscript Migrator

A cross-platform orchestration tool for transferring long-form narrative content between reading platforms, archiving web-serialized fiction, and building personal digital libraries from disparate sources across the web.

Overview

Manuscript Migrator emerged from a simple observation: readers of serialized storytelling—whether webtoons, light novels, or episodic prose—often find themselves tethered to a single platform, unable to move their collections when services change terms, close chapters behind paywalls, or simply disappear. This tool reimagines content portability as a fundamental right for digital readers.

Unlike conventional downloaders that focus on single-source extraction, Manuscript Migrator operates as a content-agnostic bridge. It understands that a story is not defined by its container—be it a scrolling comic interface, a paginated novel site, or a dynamic JavaScript-rendered gallery. The underlying narrative structure remains constant; only the delivery mechanism changes.

Think of it as a universal adapter for digital storytelling. Just as a power converter allows a device to function across different electrical systems, Manuscript Migrator enables your reading materials to flow freely between platforms, formats, and personal storage solutions.

Table of Contents

Download

Why Manuscript Migrator?

The Fragmentation Problem

The digital reading ecosystem has splintered into hundreds of specialized platforms, each with proprietary rendering engines, unique data structures, and distinct access patterns. A single webcomic series might span three different hosting sites due to licensing changes. A light novel translation group might migrate their entire catalog to avoid takedown notices. Readers who follow multiple series across multiple languages face a logistical nightmare tracking where each chapter lives.

Manuscript Migrator solves this by treating every source as an interchangeable input. You define what you want to acquire—by series, chapter range, or complete catalog—and the tool handles the translation layer between source structures and your preferred output format.

Ownership Beyond Access

There is a quiet urgency in preserving digital culture. Services that host serialized fiction operate on thin margins; archives disappear overnight when hosting costs exceed revenue. Manuscript Migrator provides readers with the means to create personal preservation copies—not for redistribution, but as insurance against the temporal nature of online platforms.

This is not about circumventing access controls or violating terms. It is about recognizing that your reading history, your bookmark collections, and your annotations form a personal narrative of discovery. When a platform shuts down, that narrative should not vanish.

Core Capabilities

Capability Description Benefit
Multi-Format Detection Automatically identifies source structure patterns No manual configuration per platform
Session Persistence Maintains continuity across interrupted transfers Resume partial archives without duplication
Metadata Preservation Retains chapter titles, author credits, and series descriptions Searchable, organized libraries
Parallel Stream Processing Simultaneous extraction from multiple sources Reduced total transfer time
Integrity Verification Checksum validation for completed transfers Confidence in archival quality
Adaptive Rate Control Adjusts request frequency based on server response patterns Minimizes server load and connection drops
Content Normalization Converts varied naming conventions to consistent schemas Predictable folder and file structures
Differential Updates Detects new content in already-archived series Efficient incremental synchronization

Architecture & Design Philosophy

Manuscript Migrator follows a plugin-based pipeline architecture. Each source platform or target output format exists as an independent module that conforms to a shared abstraction layer. This means:

  1. Adding a new source does not require modifying core code. A new plugin registers itself with the pipeline, providing only the extraction logic for that specific platform's content structure.

  2. Output formats are decoupled from input processing. The same extracted narrative data can be serialized into CBZ, EPUB, PDF, plain text directories, or structured JSON for further processing.

  3. The pipeline model ensures sequential reliability. Content flows through stages: discovery → acquisition → normalization → packaging. Each stage can be independently validated, logged, and optimized.

  4. State management is explicit. The tool maintains a manifest file that records exactly what has been acquired, from which source, at what time, and with what integrity hash. This manifest becomes the source of truth for all subsequent operations.

Supported Source Formats

Manuscript Migrator currently supports extraction from the following content delivery patterns:

  • Scroll-Based Comic Interfaces – Continuous vertical scrolling with lazy-loaded image segments, typical of modern webcomic platforms
  • Chapter-Paginated Novels – Text-rich sites with chapter navigation, often using server-side rendering
  • Gallery-Style Collections – Image sets displayed in grid or carousel layouts
  • JavaScript-Heavy Single-Page Applications – Content rendered dynamically through client-side frameworks
  • Legacy Static HTML Archives – Older sites with simple page-by-page navigation
  • API-Backed Content Delivery – Platforms that expose structured data through JSON endpoints
  • Progressive Web App Caches – Service-worker-based content delivery systems

Each source type requires specific handling for content extraction, and Manuscript Migrator's modular architecture allows these handlers to be developed, tested, and maintained independently.

Output & Transformation Options

The extracted narrative content can be transformed into multiple output formats suitable for different reading applications:

  • CBZ (Comic Book Archive) – Standard format for comic and webtoon readers
  • EPUB 3.0 – Reflowable ebook format with metadata embedding
  • PDF with Embedded Fonts – Fixed-layout output for printing or distribution
  • Plain Text with Chapter Markers – Lightweight archival format for text-based content
  • Structured JSON – Machine-readable output for integration with other tools
  • Organized Directory Tree – File-system-based archiving with consistent naming

Each output format supports configurable compression levels, image re-encoding options, and metadata injection.

User Interface Modes

Manuscript Migrator offers two primary interaction modes, recognizing that different workflows benefit from different interfaces:

Command-Line Interface (CLI)

The CLI mode provides complete control through shell commands, suitable for automation, scripting, and server deployments. Features include:

  • Chainable commands for complex workflows
  • Configuration file support for repeating operations
  • Verbose logging with multiple verbosity levels
  • Progress reporting suitable for terminal multiplexers

Graphical User Interface (GUI)

The GUI mode presents a visual workspace for readers who prefer point-and-click operations. Features include:

  • Source profile management with saved configurations
  • Real-time transfer visualization with chapter-by-chapter progress
  • Drag-and-drop catalog import for batch operations
  • Integrated viewer for spot-checking acquired content

Both interfaces share the same underlying engine, ensuring that operations initiated in one mode can be inspected or resumed in the other.

Internationalization & Localization

Digital reading knows no geographic boundaries. Manuscript Migrator has been designed from the ground up to support multilingual content and interfaces:

  • Interface Localization: The tool's UI and CLI messages are available in English, Spanish, French, German, Japanese, Korean, and Simplified Chinese
  • Content Language Detection: Automatic identification of text content language for appropriate handling of encoding and typesetting
  • Unicode-Normalized Paths: Supports international characters in file and folder names without corruption
  • Right-to-Left Support: Correct handling of Arabic, Hebrew, and other RTL text languages in EPUB and PDF output
  • Locale-Aware Sorting: Alphabetical ordering of series names and chapter titles according to locale rules

Security & Privacy Framework

Manuscript Migrator operates with a strict policy regarding user data and platform interaction:

  • No Telemetry or Analytics: The tool does not phone home, collect usage statistics, or transmit any information about acquired content
  • Local-Only Manifest Storage: All state information, configuration files, and manifest databases reside exclusively on the user's machine
  • Transparent Request Headers: HTTP requests use identifiable user-agent strings that clearly indicate the tool's purpose
  • Certificate Verification: All connections validate SSL/TLS certificates to prevent man-in-the-middle attacks
  • Rate Limiting: Built-in courtesy delays prevent overwhelming source servers, configurable by the user to comply with platform-specific access policies

Performance & Scalability

Manuscript Migrator has been benchmarked under various conditions to provide predictable performance:

  • Small Archives (1-50 chapters): Complete transfer typically within 2-5 minutes per series, depending on content density
  • Medium Archives (50-500 chapters): Streamed processing maintains consistent throughput of approximately 3-5 chapters per minute
  • Large Archives (500+ chapters): Batch management prevents memory exhaustion, with periodic state saves ensuring recoverability
  • Parallel Sources: Transferring from multiple platforms simultaneously does not degrade per-source throughput beyond connection bandwidth limitations

The tool has been tested on collections exceeding 10,000 chapters without failure.

Compatibility Matrix

Operating System CLI Support GUI Support Notes
Windows 10/11 Full Full Native installer available
macOS 12+ Full Full ARM64 and x86_64
Ubuntu 20.04+ Full Experimental Wayland/X11
Debian 11+ Full Experimental GTK3 backend
Fedora 36+ Full Experimental Wayland/X11
Arch Linux Full Experimental AUR package
FreeBSD 13+ Partial CLI None Missing GUI dependencies

Contribution Guidelines

Manuscript Migrator welcomes contributions that expand its source compatibility, improve performance, or enhance the user experience. Contributors should:

  1. Familiarize themselves with the plugin architecture by reviewing existing implementations
  2. Submit source adapters for new platforms following the established pattern
  3. Provide test fixtures for new extraction logic (sample content structures)
  4. Update documentation for any modified behavior
  5. Respect the project's code of conduct regarding respect and constructive communication

All contributions are reviewed for architectural consistency, performance characteristics, and reliability before merging.

License

Manuscript Migrator is released under the MIT License. This permissive license allows unrestricted use, modification, and distribution, provided that the original copyright notice and permission notice are included in all copies or substantial portions of the software.

Copyright (c) 2026 Manuscript Migrator Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Disclaimer

Manuscript Migrator is a tool for personal archival purposes. It is designed to assist readers in maintaining access to content they have lawfully accessed through authorized channels. Users are solely responsible for ensuring that their use of this tool complies with the terms of service of any platform from which they extract content, as well as applicable copyright laws in their jurisdiction.

The developers of Manuscript Migrator do not host, distribute, or provide access to any copyrighted content. This tool does not circumvent digital rights management mechanisms, bypass authentication systems, or access content that would otherwise require payment. It merely facilitates the transfer of content that the user has already identified and accessed through legitimate means.

Respect content creators. The stories, art, and prose that flow through this tool represent hours of creative labor by writers, artists, translators, and editors. Manuscript Migrator enables preservation, not piracy. Support the platforms and creators whose work you enjoy when you are able.

Download