Offline Windows app (local UI + CLI) to validate and transform Excel workbooks (old/new schema). Produces a standardized row-level workbook and a manifest. No network access required; packaging to .exe follows validation.
- Requires Python 3.10+ and Poetry
- Install deps:
poetry install - UI:
make run(starts local NiceGUI at http://localhost:8080) - CLI:
poetry run python -m treebot.main --input path\to\results.xlsx --classes configs\classes.yaml [--mapping mapping.xlsx] [--config configs\config.yaml] [--out runs] [--max-errors 50] [--quality-threshold 80] [--min-count 2] [--stage full|headers]
Outputs are written to ./runs/<UTC timestamp>/.
--input: Results workbook (.xlsx)--classes: Path toclasses.yaml(Compound -> Class mapping; keys must match normalized compound names)--mapping(optional): Species mapping workbook (columns:Site,CartridgeNum,PlantSpecies). Used to fill missingSpecieswithout overwriting existing values.--config(optional): YAML config file with runtime overrides--out(optional): Output base directory (runsby default)--max-errors(optional): Limit for errors shown in logs/reports--quality-threshold(optional): Minimum MatchScore for “high quality” groups in summary--min-count(optional): Minimum frequency per compound for summary sheets--stage(optional):headersto validate headers only, orfull(default)
Build Windows exes with Poetry + PyInstaller.
- Install:
poetry install
UI build (opens your browser)
- One-folder UI exe:
make exe(pretty, colorized build output). The build prints a clickable path to the exe where supported.- Outputs:
dist/TreeBot-UI/TreeBot-UI.exe - Configs bundled; installer (below) places writable configs and runs under
%APPDATA%\TreeBot
- Outputs:
- One-file UI exe (optional):
make exe-ui-onefile
Windows installer (UI)
- Requires Inno Setup (iscc) installed on Windows.
- Build UI exe first:
make exe - Compile installer:
make installer-ui- Output:
dist/installer/TreeBot-UI-Setup.exe - Installs to per-user Program Files and creates a Start Menu/Desktop shortcut
- Shortcut working directory is
%APPDATA%\TreeBotto ensure logs and runs are writable
- Output:
src/treebot/app/: container, orchestrator, run manager, modular steps (sheet processing)src/treebot/services/: IO, validation rules+service, transform, aggregation (aggregate/summary.py), output utilities (manifest, summary writer)src/treebot/domain/: error categories and issue typessrc/treebot/utils/: logging, normalization helperssrc/treebot/main.py: thin CLI entrypointconfigs/classes.yaml: Compound -> Class map (keys must be normalized; see Normalization)tests/: unit + integration tests
See projectplan.md for design scope; note the “Current Implementation” section for deviations.
- File-based only (no env vars). Example:
configs/config.yaml configs/classes.yamlmaps normalized Compound -> Class- Optional species mapping workbook (
--mapping) can fill missing Species using(Site, CartridgeNum) -> PlantSpeciespairs.
make check— lint (ruff + mypy) then test with coverage (requires 100% coverage)make run— starts the local UI (NiceGUI)make run-cli INPUT=... [CLASSES=...] [MAPPING=...] [OUT=runs] [CONFIG=...]— run from terminalmake lint— ruff fix/format + mypymake test— pytest with coverage and parallel executionmake lock— update Poetry lock
standardized_*.xlsx(row-level output, when no blocking errors). Summary sheets (HQ Multiple,HQ Single,Lq Multiple,Lq Single) are appended to this workbook.run_manifest.yaml(provenance, parameters)- Logs:
latest_run.log(human) andlogs.jsonl(structured)
- Rich-styled sections for skipped sheets and validation warnings (empty Species, CartridgeNum, DataFolderName; empty Quality columns)
- Detect schema per sheet and normalize headers
- Forward-fill identity columns within a sheet (DataFolderName, CartridgeNum)
- Optionally fill missing
Speciesvia mapping workbook (no overwrite) - For old schema sheets, transform to new schema and derive
Compound,Class, andMatchScore - Write
standardized_*.xlsx - Build and append summary sheets; write
run_manifest.yaml
- Compound names are normalized with a safe, non-destructive function (
normalize_compound_name):- Unicode normalize (NFKC), fold Greek letters, lowercase/trim
- Two-pass typo handling: embedded-safe replacements, then token-bounded regex fixes
- Remove bare stereochem markers like
(r),(s), unify spacing/hyphens, strip trailing noise - Does not change chemical identity or apply synonym mapping
- Class mapping keys in
classes.yamlmust match the normalized form. - Name canonicalization (global synonym mapping) is not performed inside the transform pipeline.
- 100% test coverage (statements and branches) enforced via
make check - Strict typing: no
Any,cast, ortype: ignore - Protocol-based typing for external dependencies (openpyxl, nicegui)
- Uses polars for DataFrame operations