Make saturation EMA time-weighted for sample-rate invariance by tomquist · Pull Request #321 · tomquist/AstraMeter

tomquist · 2026-04-12T06:51:23Z

Summary

This PR converts the saturation tracker from a per-sample EMA to a time-weighted EMA that produces consistent results regardless of polling cadence. This fixes a regression where V3 batteries (polling ~0.45s) and V2 batteries (polling ~3.1s) would converge to different saturation scores under identical physical conditions.

Key Changes

Core Algorithm Changes

Time-weighted EMA formula: The effective alpha and decay factor are now computed as 1 - (1 - alpha) ** (dt / dt_ref) and decay_factor ** (dt / dt_ref) respectively, where dt is the actual elapsed time and dt_ref is a reference interval (1.0 second)
New constants:
- SATURATION_REFERENCE_DT = 1.0: Reference poll interval at which configured alpha and decay_factor apply one full step
- SATURATION_LONG_GAP_SECONDS = 30.0: Threshold above which gaps between updates are dropped (re-seeded) rather than integrated into the EMA to avoid huge spurious steps when batteries go offline

State Tracking

Added last_saturation_update field to BalancerConsumerState to track wall-clock timestamp of the most recent EMA step
First sample (when last_saturation_update == 0.0) is treated as a full reference-period step for proper cold-start behavior
Backwards clock corrections (NTP) are clamped to zero; long gaps are dropped and re-seeded

Deprioritization Logic

Saturation score is now cleared when a consumer transitions from active → deprioritized, since the score reflects the previous role and is no longer relevant
This prevents false-positive saturation spikes during the fade window from blocking promotion back to active via _maybe_force_swap_saturated
Saturation updates are skipped for deprioritized consumers to avoid transient non-zero targets from the fade path triggering false saturation detection

Test Coverage

Added _FakeClock helper class for deterministic time-weighted EMA testing
Added _make_tracker_with_clock() helper method
New tests verify:
- Sample-rate invariance for both rise and decay paths over 30-60 second windows
- Long gap handling (gaps above threshold are dropped, not integrated)
- Existing per-sample EMA behavior is preserved when dt == SATURATION_REFERENCE_DT

Minor Fixes

Changed dedupe_time_window config parsing from getint() to getfloat() to support fractional seconds
Updated SaturationTracker docstring to document the time-weighted approach

https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD

Summary by CodeRabbit

Bug Fixes
- Saturation tracking now uses a time-weighted EMA, making score changes fair across powermeters with different update rates; raise EFFICIENCY_SATURATION_THRESHOLD (e.g., ~0.8) if you see unnecessary swaps with slow meters.
- Deduplication window now accepts fractional seconds for finer timing control.
Documentation
- Clarified EFFICIENCY_SATURATION_THRESHOLD behavior and guidance in README and example config.

coderabbitai · 2026-04-12T06:51:33Z

Walkthrough

The changes convert saturation tracking to a time-weighted exponential moving average (EMA) in the load balancer, adding a last-update timestamp and guards for zero and long gaps. Documentation and example config clarify the behavior and recommend threshold adjustments for slow powermeters. CT002 dedupe timing was made floating-point and tests were extended for time-weighted behavior.

Changes

Cohort / File(s)	Summary
Documentation Updates `README.md`, `config.ini.example`	Clarified that saturation is computed via a time-weighted EMA; noted slow powermeters (>10s) accumulate saturation faster per sample and suggested raising `EFFICIENCY_SATURATION_THRESHOLD` (e.g., ~0.8) to avoid unnecessary swaps. Comments only; no runtime behavior change.
Saturation EMA Implementation `src/astrameter/ct002/balancer.py`	Reworked `SaturationTracker` to a time-weighted EMA using elapsed `dt` and reference interval constants (`SATURATION_REFERENCE_DT`, `SATURATION_LONG_GAP_SECONDS`). Added `last_saturation_update` to `BalancerConsumerState`. Skips or reseeds on `dt == 0` or long gaps; updated deprioritization to clear full saturation state and adjusted related logic/docstrings.
Type Precision Updates `src/astrameter/ct002/ct002.py`, `src/astrameter/main.py`, `src/astrameter/web_config.py`	Changed `CT002.__init__` default `dedupe_time_window` from `0` to `0.0`. `main.py` now parses `DEDUPE_TIME_WINDOW` with `getfloat(..., fallback=0.0)`. Web config metadata updated: `CT002.DEDUPE_TIME_WINDOW` type changed from `integer` to `float` with `min: 0`.
Test Coverage Expansion `tests/test_balancer.py`	Added `_FakeClock` and `_make_tracker_with_clock()` to control `dt` in tests. Updated EMA tests to advance time between updates and added regression tests for sample-rate invariance and long-gap reseed behavior.

Sequence Diagram

sequenceDiagram
    participant Time as Time / Clock
    participant Tracker as SaturationTracker
    participant State as BalancerConsumerState
    participant LB as LoadBalancer

    Note over Time,LB: Initialization / Grace Clear
    Time->>State: last_saturation_update = 0

    Note over Time,Tracker: First update after reseed
    Time->>Tracker: update() at t1
    Tracker->>Tracker: dt = t1 - last_saturation_update (<=0) => use SATURATION_REFERENCE_DT
    Tracker->>Tracker: ratio = dt / SATURATION_REFERENCE_DT
    Tracker->>Tracker: apply decay: saturation *= decay_factor ** ratio
    Tracker->>State: last_saturation_update = t1

    Note over Time,Tracker: Subsequent normal updates
    Time->>Tracker: update() at t2
    Tracker->>Tracker: dt = t2 - last_saturation_update
    alt dt == 0
        Tracker->>Tracker: skip update
    else dt > 0 and dt ≤ SATURATION_LONG_GAP_SECONDS
        Tracker->>Tracker: ratio = dt / SATURATION_REFERENCE_DT
        Tracker->>Tracker: alpha_eff = 1 - (1 - alpha) ** ratio
        Tracker->>Tracker: apply time-weighted EMA (rise/decay)
        Tracker->>State: last_saturation_update = t2
    else dt > SATURATION_LONG_GAP_SECONDS
        Tracker->>Tracker: drop/re-seed (do not apply EMA)
        Tracker->>State: last_saturation_update = 0
    end

    Note over LB,Tracker: Decision flow
    LB->>Tracker: query saturation score
    Tracker-->>LB: current EMA value
    LB->>LB: decide swap / deprioritize based on threshold and candidate health

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: converting saturation EMA to time-weighted for sample-rate invariance, which is the core technical transformation across the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/fix-battery-polling-oscillations-BiixX

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Switch DEDUPE_TIME_WINDOW parsing from getint to getfloat so users can configure sub-second dedupe windows (e.g. 0.5). Previously any decimal value crashed the emulator at startup with a ValueError from configparser's int coercion. The comparisons in CT002 already operate on floats; only the config reader needed to change. Default remains 0.0 (no dedupe). https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD

The efficiency saturation tracker used a per-sample EMA whose rise and decay weights were baked into the alpha/decay_factor constants. For V3 Marstek batteries polling every ~0.45 s the EMA accumulated ~7x faster than for V2 batteries polling every ~3 s, so under identical physical conditions the two fleets converged to very different scores and could oscillate between probe/rotate decisions — visible in the field as a balancer that alternately promoted and demoted both batteries while the grid drifted uncompensated. Rework the EMA to be time-weighted against a fixed reference period (SATURATION_REFERENCE_DT = 1.0 s): the effective per-update rise weight becomes ``1 - (1 - alpha) ** (dt / dt_ref)`` and the decay becomes ``decay_factor ** (dt / dt_ref)``. At dt == dt_ref both reduce to the previous per-sample formulas, so the tuned defaults keep their meaning. Guard against pathologies: a long gap (battery offline for >30 s) drops the update rather than dosing the EMA with a huge step, and a backwards clock is clamped to zero. Fix a related post-probe lockup exposed by the stronger EMA: during the efficiency fade window that follows a probe handoff, the deprioritized consumer's ``last_target`` still carried transient fade values, and feeding those into the saturation EMA raised a false "cannot follow target" spike high enough to stay above the swap threshold for many ticks — leaving ``_maybe_force_swap_saturated`` unable to find a healthy backup and pinning the active battery at target = 0 while the grid imported. Skip saturation updates entirely for deprioritized consumers (they are being steered to zero, so the score has no meaningful interpretation there), and clear the saturation score symmetrically on the active → deprioritized transition so the symmetric clear already used for deprioritized → active works in both directions. Tests: drive the existing per-sample tests off a FakeClock so they keep exercising the reference-period formula, and add sample-rate-invariance tests for both the rise and decay branches plus a regression guard for the long-gap re-seed.

Seed ``last_saturation_update = clock()`` before the loop so both fast and slow trackers cover exactly the same wall-clock window. Previously the first iteration used the reference-period bootstrap (dt=1.0) regardless of the test's dt, skewing effective EMA time by ~2.5 s between the two cadences. Also move ``clock.advance(dt)`` before ``tracker.update()`` so elapsed time is consumed before the EMA step, matching production order. https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD

The time-weighted saturation EMA accumulates faster per sample when the powermeter update interval is large (e.g. >10 s), which can cause unnecessary forced swaps. Note the workaround (raise the threshold) in both README.md and config.ini.example. https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD

The config loader was changed from getint to getfloat to accept fractional seconds (commit 9ff9f0d), but the web configuration editor schema still declared the key as integer. Update to float with min=0 so the editor renders a decimal input. https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/astrameter/web_config.py`:
- Line 277: The web schema added "DEDUPE_TIME_WINDOW" with min: 0 but runtime
still accepts negative values; update the runtime where dedupe_time_window is
consumed (e.g., in CT002.__init__) to validate and clamp/raise on invalid
values: read the incoming config value for dedupe_time_window (or use the
DEDUPE_TIME_WINDOW key), check if value is None or < 0, then either set it to 0
(clamp) or raise a clear ValueError/ConfigError, and ensure downstream code uses
this validated value; add a small unit test for CT002 to assert negative inputs
are rejected or clamped accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 48621ed5-ad2a-4976-bfb6-9ace2babe313

📥 Commits

Reviewing files that changed from the base of the PR and between 6c0eb73 and bcb29de.

📒 Files selected for processing (7)

README.md
config.ini.example
src/astrameter/ct002/balancer.py
src/astrameter/ct002/ct002.py
src/astrameter/main.py
src/astrameter/web_config.py
tests/test_balancer.py

✅ Files skipped from review due to trivial changes (3)

src/astrameter/main.py
config.ini.example
README.md

🚧 Files skipped from review as they are similar to previous changes (1)

src/astrameter/ct002/ct002.py

coderabbitai · 2026-04-12T14:30:29Z

        "UDP_PORT": {"type": "integer"},
        "WIFI_RSSI": {"type": "integer"},
-        "DEDUPE_TIME_WINDOW": {"type": "integer"},
+        "DEDUPE_TIME_WINDOW": {"type": "float", "min": 0},


⚠️ Potential issue | 🟡 Minor

min: 0 is schema-only unless runtime also enforces it

Good change on Line 277, but this constraint currently appears enforced only by the web editor metadata. CT002 logic still accepts direct negative values from config files, which can bypass dedupe behavior unexpectedly. Consider adding runtime validation/clamping where dedupe_time_window is consumed (e.g., in CT002.__init__).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/astrameter/web_config.py` at line 277, The web schema added "DEDUPE_TIME_WINDOW" with min: 0 but runtime still accepts negative values; update the runtime where dedupe_time_window is consumed (e.g., in CT002.__init__) to validate and clamp/raise on invalid values: read the incoming config value for dedupe_time_window (or use the DEDUPE_TIME_WINDOW key), check if value is None or < 0, then either set it to 0 (clamp) or raise a clear ValueError/ConfigError, and ensure downstream code uses this validated value; add a small unit test for CT002 to assert negative inputs are rejected or clamped accordingly.

tomquist marked this pull request as ready for review April 12, 2026 06:55

claude added 5 commits April 12, 2026 14:18

tomquist force-pushed the claude/fix-battery-polling-oscillations-BiixX branch from 6c0eb73 to bcb29de Compare April 12, 2026 14:20

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

tomquist merged commit fdc8056 into develop Apr 12, 2026
13 checks passed

coderabbitai Bot mentioned this pull request Apr 19, 2026

Add request deduplication support to Shelly emulator #333

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make saturation EMA time-weighted for sample-rate invariance#321

Make saturation EMA time-weighted for sample-rate invariance#321
tomquist merged 5 commits intodevelopfrom
claude/fix-battery-polling-oscillations-BiixX

tomquist commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomquist commented Apr 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Core Algorithm Changes

State Tracking

Deprioritization Logic

Test Coverage

Minor Fixes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomquist commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading