A small pet project I built to practice web parsing end-to-end: a Flask service hands out one-time codes through a login-gated UI, and a Telegram bot drives a headless browser to log in, click the button, and read the code back to the user
- Flask service with login + dashboard + a
/generate_codeendpoint that returns a fresh 16-character alphanumeric code - Telegram bot that runs Chrome headless via Selenium, fills the login form, clicks the generate button, and reads the result out of the DOM
- Env-driven configuration: every credential, URL, port, and debug flag lives
in
.env - Pytest suite covering the Flask routes (login, redirects, generate, logout)
- Single CI workflow that lints with ruff and runs the test suite on Python 3.11 and 3.12
.
├── web_parsing_bot/ # Python package — both halves of the project
│ ├── __init__.py
│ ├── __main__.py # entry point for `python -m web_parsing_bot`
│ ├── bot.py # Telegram bot wiring (handlers + Application)
│ ├── config.py # env loading via python-dotenv
│ ├── parser.py # Selenium routines: log in, click, scrape
│ └── web/
│ ├── service.py # Flask app factory + routes
│ ├── templates/ # login.html, index.html
│ └── static/ # css + js for the UI
├── bin/ # convenience launchers (bash + .cmd for Windows)
├── tests/ # pytest specs for the Flask service and config
├── .env.example # documented env vars
├── pyproject.toml
├── requirements.txt
└── .github/workflows/ci.yml
The bot is the product — the Flask service exists so the bot has something
realistic to parse, and the two halves are deliberately kept in one package so
they share config.py and version metadata
- Python 3.11 or newer
- Google Chrome installed locally (Selenium Manager downloads a matching
chromedriver automatically on first run — see the
CHROMEDRIVER_PATHenv var if you prefer a pinned binary) - A Telegram bot token from
@BotFather
Copy .env.example to .env and fill in the values:
| Variable | Required | Description |
|---|---|---|
TELEGRAM_BOT_TOKEN |
yes | Token issued by @BotFather for your bot |
WEB_SERVICE_LOGIN |
yes | Login the bot uses to authenticate against the Flask service |
WEB_SERVICE_PASSWORD |
yes | Password paired with WEB_SERVICE_LOGIN |
WEB_SERVICE_URL |
no | Base URL of the Flask service (default http://localhost:3000) |
WEB_SERVICE_HOST |
no | Host the Flask service binds to (default 127.0.0.1) |
WEB_SERVICE_PORT |
no | Port the Flask service listens on (default 3000) |
FLASK_SECRET_KEY |
no | Secret key for Flask sessions (default value is for development only) |
FLASK_DEBUG |
no | Set to 1 to enable Flask debug mode |
CHROMEDRIVER_PATH |
no | Path to a chromedriver binary if you don't want Selenium Manager to pick |
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / macOS
source .venv/bin/activate
pip install -e ".[dev]"Two processes, two terminals — the Flask service first, then the bot:
# Terminal 1: Flask service on http://localhost:3000
./bin/run-web # Linux / macOS
bin\run-web.cmd # Windows
# Terminal 2: Telegram bot (long-polls Telegram and drives Chrome)
./bin/run-bot # Linux / macOS
bin\run-bot.cmd # WindowsOnce both are up, open Telegram and talk to your bot:
| Command | What it does |
|---|---|
/start |
Prints the welcome message |
/get_code |
Launches Chrome, logs into the Flask service, returns a code |
./bin/test # Linux / macOS
bin\test.cmd # WindowsThe suite mocks nothing fancy — it uses Flask's built-in test client to hit every route and asserts on status codes, redirects, and the response payloads
ruff check .
ruff format --check .Ruff is configured in pyproject.toml with a 120-character line length and a
small set of pyflakes / pyupgrade / bugbear / isort rules
Every push and pull request triggers .github/workflows/ci.yml, which runs
ruff check, ruff format --check, and pytest on a matrix of Python 3.11
and 3.12
This is a pet project meant for local experimentation. The default Flask
secret key, the default credentials in .env.example, and the open CORS
posture on /generate_code are all deliberately permissive. If you point this
at anything that matters, change every default and put the service behind a
real reverse proxy with TLS