π§ Repository under construction β core ingestion pipeline is functional; analysis modules, views, and query helpers are coming next.
DigiMuh consolidates ~8.9 GB of heterogeneous dairy-cow CSV sensor data into a single normalised SQLite database. The data spans 3.5 years (April 2021 β September 2024) of continuous monitoring from multiple on-farm systems:
| System | What it measures |
|---|---|
| smaXtec bolus | Rumen temperature, pH, activity, motility, rumination, water intake, estrus/calving indices |
| smaXtec barn sensors | Barn temperature, humidity, THI |
| HerdePlus | Milking events, MLP test-day results, calving/lactation records |
| HerdePlus diseases | Health events and diagnoses |
| Gouna | Respiration frequency |
| BCS | Body condition scores |
| LoRaWAN | Environmental sensor battery/current |
| HOBO | Weather station (temperature, humidity, solar radiation, wind, wetness) |
| DWD | German Weather Service THI and enthalpy |
The database uses a star schema: four dimension tables (animals, sensors,
barns, source_files) and twelve fact tables, connected by integer foreign
keys. Every row carries a file_id for full provenance tracing back to the
original CSV.
See docs/database_structure.md for the full
schema and docs/column_dictionary.md for a
description of every column.
# Clone the repository
git clone https://github.com/zerotonin/digimuh.git
cd digimuh
# Option A: conda (recommended)
conda env create -f environment.yml
conda activate digimuh
# Option B: pip
pip install -e ".[dev]"# 1. Smoke test with 5 files per folder (~1 min)
digimuh-ingest /path/to/DigiMuh-Export --db cow_test.db --test-n 5
# 2. Full ingestion (~2β3 hours)
rm cow_test.db
digimuh-ingest /path/to/DigiMuh-Export --db cow.db
# 3. Query the database
python -c "
import sqlite3
con = sqlite3.connect('cow.db')
cur = con.execute('SELECT COUNT(*) FROM smaxtec_derived')
print(f'smaxtec_derived rows: {cur.fetchone()[0]:,}')
"The ingestion script expects the DigiMuh CSV export directory to have this structure:
DigiMuh-Export_2021-04-01_2024-09-30/
βββ output_allocations/
β βββ allocations.csv
βββ outputs_bcs/
β βββ {animal_id}_bcs_{date_range}.csv Γ715
βββ outputs_gouna/
β βββ {animal_id}_gouna_{date_range}.csv Γ91
βββ outputs_herdeplus_mlp_gemelk_kalbung/
β βββ {animal_id}_herdeplus_{date_range}.csv Γ965
βββ outputs_hobo/
β βββ hobo_exports_{date_range}.csv
βββ outputs_lorawan/
β βββ {sensor_name}_LoRaWAN_raw_{date_range}.csv Γ22
βββ outputs_smaxtec_barns/
β βββ {barn_name}_smaxtec_raw_{date_range}.csv Γ4
βββ outputs_smaxtec_derived/
β βββ {animal_id}_smaxtec_derived_{date_range}.csv Γ837
βββ outputs_smaxtec_events/
β βββ {animal_id}_events.csv Γ837
βββ outputs_smaxtec_water_intake/
β βββ {animal_id}_smaxtec_derived_{date_range}.csv Γ837
βββ herdeplus_diseases.csv
βββ outputs_dwd.csv
Animal IDs are 15-digit EU ear tag numbers. The entity identifier is always the first underscore-delimited segment of each filename.
digimuh-ingest [-h] [--db DB] [--chunk-size N] [--verbose] [--test-n N] root_dir
| Argument | Description |
|---|---|
root_dir |
Root directory containing all CSV folders |
--db |
Output SQLite path (default: cow.db) |
--chunk-size |
Rows per INSERT batch (default: 50 000) |
--test-n N |
Only ingest first N files per folder |
--verbose, -v |
Print CREATE TABLE SQL and debug info |
python -m pytestAfter ingestion, five analysis scripts are available as CLI commands. Each creates analysis views on first run, queries the database, and writes results (CSV data + figures) to an output directory.
# Install with analysis dependencies
pip install -e ".[analysis]"
# 0. Individual heat stress thresholds (broken-stick regression)
digimuh-broken-stick --db cow.db --tierauswahl Tierauswahl.xlsx --out results/broken_stick
# 1. Subclinical ketosis risk β FPR Γ rumination Γ milk yield
digimuh-ketosis --db cow.db --out results/ketosis
# 2. Heat stress β rumen temp Γ THI Γ water Γ respiration
digimuh-heat --db cow.db --out results/heat
# 3. Digestive efficiency β motility Γ pH β milk composition (time-lagged)
digimuh-digestive --db cow.db --out results/digestive
# 4. Circadian disruption β 24h Fourier decomposition as welfare marker
digimuh-circadian --db cow.db --out results/circadian
# 5. Motility entropy β rumen HRV analogue via information theory
digimuh-entropy --db cow.db --out results/entropyEach script writes:
- A CSV of the extracted features (for further analysis in R, Python, etc.)
- Publication-ready SVG + PNG figures
- A JSON summary of key results (where applicable)
See docs/database_structure.md for the SQL view
definitions that power these analyses.
- CSV β SQLite ingestion with star schema
- SQL views for analysis (daily summaries + cross-table joins)
- Analysis: individual heat stress thresholds (broken-stick regression)
- Analysis: subclinical ketosis detection (FPR + RF classifier)
- Analysis: heat stress multi-sensor fusion
- Analysis: digestive efficiency (motilityβpH coupling)
- Analysis: circadian rhythm disruption index
- Analysis: motility pattern entropy (novel)
- Data validation and quality-check reports
- Parallelised entropy computation for full dataset
- Sphinx documentation on GitHub Pages
Bart R. H. Geurten β Department of Zoology, University of Otago
MIT