XPfilter

XPfilter is a Telegram (and potentially more platforms in the future) / media downloading pipeline with scorer-based filtering.

Its core idea is: download candidate media as broadly as possible, score each item one by one with a scorer, and only materialize content that reaches the threshold into the target media library.

What this project is really for

This project provides a complete toolchain covering labeling, training, inference, and the downloader.

The full workflow is:

Use the WebUI labeling tool to import and preview media from a specified folder, label them directly, store the labeling results in the database, and export them at any time.
Once the number of labeled items exceeds 1000 and you believe the high-score / low-score distribution is reasonably balanced, you can export to labels.json and run training. Training will produce a scorer aligned with your own labels.
The project provides multiple inference scripts that can create score-based soft-link buckets for local folders, helping you directly stratify content into layers like “which files in this folder I like” and “which files I do not like.” This can also be used to evaluate whether your scoring model matches your expectations.
The project inherits a Telegram downloader. The downloader continuously downloads all media files available to the account, then filters them locally through the scorer and keeps only media above a specific threshold, enabling preference-based high-quality media downloading.

The core is:

Use a scorer to drive download filtering and automatically build a cleaner, higher-quality media collection.

How to use it

The most suitable way to use this project is to connect it to an agent and let the agent provide the commands and instructions for each stage. The documentation in this project is very sufficient for agents.

Repository structure

configs/        runtime and training config
docs/           detailed documentation
scripts/        training / download / rebucket / cleanup scripts
src/            API, model, training, storage, services
tg_downloader/  Telegram gated download implementation
tests/          tests
webui/          frontend labeling and pipeline UI

Quick start

Install dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Before starting the frontend for the first time, install frontend dependencies:

cd webui
npm install
cd ..

Start backend + WebUI

./start.sh

Optional modes:

./start.sh api
./start.sh frontend
./start.sh --build-webui
./start.sh --del-cache

Default ports:

backend: 31211
frontend: 31212

Typical usage path

1. Label data

Open:

http://localhost:31212/label

Export labels.json.

2. Train the scorer

python scripts/train_frozen_clip.py \
  --labels_path labels.json \
  --output_dir checkpoints/frozen_clip \
  --epochs 10 \
  --batch_size 16 \
  --learning_rate 1e-4 \
  --clip_model_name openai/clip-vit-large-patch14

3. Run gated download

python scripts/run_tg_gated_download.py --min-score 7.0

4. Run the full Telegram pipeline

python scripts/run_telegram_global_pipeline.py --min-score 7.0

The full orchestration can run, in order:

Telegram gated download
Optional backfill inference
Score-based rebucketing
Optional cleanup of files below the threshold

Notes

This repository publishes the baseline implementation of Frozen CLIP scoring + Telegram gated download + API/WebUI.
Model checkpoints, Telegram sessions, local databases, and downloaded data are not distributed with the repository.
The current filtering mechanism is post-download filtering, because the model must first receive the complete file before it can score it.
You should distinguish cache_root and target_root: low-score media may still remain in the cache layer, but by default only passing results are materialized into the target media library.
Although the repository includes training code, the final system-level goal is still:

A scorer-driven download filter, rather than a standalone training experiment repository.

Related docs

docs/frozen_clip_model.md
docs/telegram_global_pipeline.md
progress.md
SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XPfilter

What this project is really for

How to use it

Repository structure

Quick start

Install dependencies

Start backend + WebUI

Typical usage path

1. Label data

2. Train the scorer

3. Run gated download

4. Run the full Telegram pipeline

Notes

Related docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
docs		docs
scripts		scripts
src		src
tests/training		tests/training
tg_downloader		tg_downloader
webui		webui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
SKILL.md		SKILL.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

XPfilter

What this project is really for

How to use it

Repository structure

Quick start

Install dependencies

Start backend + WebUI

Typical usage path

1. Label data

2. Train the scorer

3. Run gated download

4. Run the full Telegram pipeline

Notes

Related docs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages