TextCleaner is a web, Chrome extension, and Android utility for cleaning noisy copied text before you paste it into an LLM, issue comment, document, or note.
It removes obvious site chrome from copied content such as GitHub pull requests, GitHub issues, documentation pages, articles, and chat transcripts while preserving the lines that matter.
| Implementation | Best for | Install |
|---|---|---|
| Web app (GitHub Pages) | Quick one-off cleaning in a browser tab, no install required | Open TextCleaner |
| Chrome extension | Cleaning text directly from any tab without leaving your browser | Releases → Browser Extension – latest build |
| Android app | Cleaning text on mobile via share-target or direct paste | Releases → Android – latest build |
- Heuristic cleanup with no backend or hosted AI dependency
- Source auto-detection for:
- GitHub pull requests
- GitHub issues
- Documentation pages
- Articles
- Chat transcripts
- Generic copied text
- Multiple output modes:
- Cleaned text
- Markdown excerpt
- Reusable prompt text
- Local history in the web and Android apps
- Chrome popup workflow with active-tab text selection import
- Export helpers for copy and download/save
- Normalize pasted text (line endings, Unicode noise, no-break spaces).
- Detect the source type when auto-detect is enabled.
- Protect valuable context blocks before any removal runs — code fences are always protected; rule sets can also declare
preserveBlockPatterns(e.g. diff hunks) that are immune from all cleanup rules. - Clean — remove noisy prefix/suffix chrome, structural bot sections (
blockPatterns), and repeated UI lines anywhere in the body. - Format — collapse blank lines, then return cleaned text, a Markdown excerpt, and a reusable LLM prompt.
- GitHub pull requests
- GitHub issues
- Documentation pages
- Articles
- Chat transcripts
- Cleanup is conservative by design and may leave some page-specific noise behind.
- Extremely unusual site layouts may still need manual touch-up.
- Markdown output is a cleaned wrapper around the extracted text, not a full semantic reconstruction of the original page.
npm ci
npm run devThe Vite dev server starts on port 3000.
npm run lint
npm run build
npm testThe repository includes a GitHub Pages workflow that:
- installs dependencies,
- builds the Vite app,
- uploads the
dist/artifact, and - deploys it to GitHub Pages.
The workflow sets VITE_BASE_PATH=/TextCleaner/ so built assets resolve correctly under the project page URL.
The repository also includes a Chrome-based browser extension implementation in chome/.
Download the latest extension:
Go to Releases → Browser Extension – latest build and download TextCleaner-extension.zip.
Unzip it, open chrome://extensions, enable Developer mode, and click Load unpacked to select the folder.
npm ci
npm run dev:chromenpm run build:chromeLoad chome/dist as an unpacked extension in a Chromium-based browser.
The popup can import the current tab selection, clean it with the shared TypeScript engine, keep a local history, and export the cleaned result.
src/App.tsx— main UIchome/App.tsx— Chrome extension popup UIchome/public/manifest.json— Chrome extension manifestsrc/core/engine.ts— cleanup pipeline (normalizeText,computeProtectedLines,removeBlocks,cleanMiddle,cleanText)src/core/types.ts— shared types (BlockPattern,CleanupRuleSetwithblockPatterns/preserveBlockPatterns)src/core/detector.ts— source detectionsrc/core/rules/generic.ts— generic cleanup rulessrc/core/rules/github.ts— GitHub-specific cleanup rulessrc/core/rules/docs.ts— documentation cleanup rulessrc/core/rules/article.ts— article cleanup rulessrc/core/rules/chat.ts— chat cleanup rulessrc/core/__tests__/engine.test.ts— cleanup tests
Use this prompt after cleaning repository text with TextCleaner:
You are reviewing cleaned repository text. Identify the files that are most important for understanding the feature, bug, or change being discussed.
Return:
1. the likely key files,
2. why each file matters,
3. the order I should read them in,
4. any missing files or tests I should look for next.
Be concise and focus on implementation-critical files, not boilerplate.
The repository includes a build-extension.yml workflow that:
- installs dependencies,
- builds the Chrome extension (
chome/dist), - zips
chome/distintoTextCleaner-extension.zip, - attaches it to the rolling pre-release tag
extension-latest, and - uploads it as a workflow artifact (retained for 30 days).
The release is updated automatically on every push to main.
A native Android app lives in the android/ directory.
Download the latest APK:
Go to Releases → Android – latest build and download TextCleaner.apk.
Enable Install unknown apps on your device before opening the file.
Requires JDK 17 and the Android SDK (API 35).
cd android
./gradlew testDebugUnitTest assembleCi
# APK: app/build/outputs/apk/ci/app-ci.apkThe repository includes a build-android.yml workflow that:
- runs the Android unit tests,
- builds a minified resource-shrunk CI APK signed with the debug key,
- attaches it to the rolling pre-release tag
android-latest, and - uploads it as a workflow artifact (retained for 30 days).
The release is updated automatically on every push to main.
- React + TypeScript frontend
- Chrome extension popup in
chome/(React + TypeScript, manifest v3) - Vite build pipeline
- Pure rule-based cleanup engine in
src/core/ - Static hosting via GitHub Pages
- Android native app in
android/(Kotlin + Jetpack Compose)
- Polish the web app
- Improve cleanup quality for real copied content
- Ship and maintain the GitHub Pages deployment
- Android native app (Kotlin + Jetpack Compose)
- Kotlin rewrite / port of the full cleanup engine
- Native share-target flow for Android (
ACTION_SENDandPROCESS_TEXTintent filters) - Explicit "protect blocks" pipeline step —
computeProtectedLinesruns before any removal;preserveBlockPatternsfield lets rule sets declare content that must never be stripped - Browser extension CI pipeline with rolling pre-release artifact (
extension-latest)
Issues, suggestions, and pull requests are welcome: