Skip to content

Make saturation EMA time-weighted for sample-rate invariance#321

Merged
tomquist merged 5 commits intodevelopfrom
claude/fix-battery-polling-oscillations-BiixX
Apr 12, 2026
Merged

Make saturation EMA time-weighted for sample-rate invariance#321
tomquist merged 5 commits intodevelopfrom
claude/fix-battery-polling-oscillations-BiixX

Conversation

@tomquist
Copy link
Copy Markdown
Owner

@tomquist tomquist commented Apr 12, 2026

Summary

This PR converts the saturation tracker from a per-sample EMA to a time-weighted EMA that produces consistent results regardless of polling cadence. This fixes a regression where V3 batteries (polling ~0.45s) and V2 batteries (polling ~3.1s) would converge to different saturation scores under identical physical conditions.

Key Changes

Core Algorithm Changes

  • Time-weighted EMA formula: The effective alpha and decay factor are now computed as 1 - (1 - alpha) ** (dt / dt_ref) and decay_factor ** (dt / dt_ref) respectively, where dt is the actual elapsed time and dt_ref is a reference interval (1.0 second)
  • New constants:
    • SATURATION_REFERENCE_DT = 1.0: Reference poll interval at which configured alpha and decay_factor apply one full step
    • SATURATION_LONG_GAP_SECONDS = 30.0: Threshold above which gaps between updates are dropped (re-seeded) rather than integrated into the EMA to avoid huge spurious steps when batteries go offline

State Tracking

  • Added last_saturation_update field to BalancerConsumerState to track wall-clock timestamp of the most recent EMA step
  • First sample (when last_saturation_update == 0.0) is treated as a full reference-period step for proper cold-start behavior
  • Backwards clock corrections (NTP) are clamped to zero; long gaps are dropped and re-seeded

Deprioritization Logic

  • Saturation score is now cleared when a consumer transitions from active → deprioritized, since the score reflects the previous role and is no longer relevant
  • This prevents false-positive saturation spikes during the fade window from blocking promotion back to active via _maybe_force_swap_saturated
  • Saturation updates are skipped for deprioritized consumers to avoid transient non-zero targets from the fade path triggering false saturation detection

Test Coverage

  • Added _FakeClock helper class for deterministic time-weighted EMA testing
  • Added _make_tracker_with_clock() helper method
  • New tests verify:
    • Sample-rate invariance for both rise and decay paths over 30-60 second windows
    • Long gap handling (gaps above threshold are dropped, not integrated)
    • Existing per-sample EMA behavior is preserved when dt == SATURATION_REFERENCE_DT

Minor Fixes

  • Changed dedupe_time_window config parsing from getint() to getfloat() to support fractional seconds
  • Updated SaturationTracker docstring to document the time-weighted approach

https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD

Summary by CodeRabbit

  • Bug Fixes

    • Saturation tracking now uses a time-weighted EMA, making score changes fair across powermeters with different update rates; raise EFFICIENCY_SATURATION_THRESHOLD (e.g., ~0.8) if you see unnecessary swaps with slow meters.
    • Deduplication window now accepts fractional seconds for finer timing control.
  • Documentation

    • Clarified EFFICIENCY_SATURATION_THRESHOLD behavior and guidance in README and example config.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Walkthrough

The changes convert saturation tracking to a time-weighted exponential moving average (EMA) in the load balancer, adding a last-update timestamp and guards for zero and long gaps. Documentation and example config clarify the behavior and recommend threshold adjustments for slow powermeters. CT002 dedupe timing was made floating-point and tests were extended for time-weighted behavior.

Changes

Cohort / File(s) Summary
Documentation Updates
README.md, config.ini.example
Clarified that saturation is computed via a time-weighted EMA; noted slow powermeters (>10s) accumulate saturation faster per sample and suggested raising EFFICIENCY_SATURATION_THRESHOLD (e.g., ~0.8) to avoid unnecessary swaps. Comments only; no runtime behavior change.
Saturation EMA Implementation
src/astrameter/ct002/balancer.py
Reworked SaturationTracker to a time-weighted EMA using elapsed dt and reference interval constants (SATURATION_REFERENCE_DT, SATURATION_LONG_GAP_SECONDS). Added last_saturation_update to BalancerConsumerState. Skips or reseeds on dt == 0 or long gaps; updated deprioritization to clear full saturation state and adjusted related logic/docstrings.
Type Precision Updates
src/astrameter/ct002/ct002.py, src/astrameter/main.py, src/astrameter/web_config.py
Changed CT002.__init__ default dedupe_time_window from 0 to 0.0. main.py now parses DEDUPE_TIME_WINDOW with getfloat(..., fallback=0.0). Web config metadata updated: CT002.DEDUPE_TIME_WINDOW type changed from integer to float with min: 0.
Test Coverage Expansion
tests/test_balancer.py
Added _FakeClock and _make_tracker_with_clock() to control dt in tests. Updated EMA tests to advance time between updates and added regression tests for sample-rate invariance and long-gap reseed behavior.

Sequence Diagram

sequenceDiagram
    participant Time as Time / Clock
    participant Tracker as SaturationTracker
    participant State as BalancerConsumerState
    participant LB as LoadBalancer

    Note over Time,LB: Initialization / Grace Clear
    Time->>State: last_saturation_update = 0

    Note over Time,Tracker: First update after reseed
    Time->>Tracker: update() at t1
    Tracker->>Tracker: dt = t1 - last_saturation_update (<=0) => use SATURATION_REFERENCE_DT
    Tracker->>Tracker: ratio = dt / SATURATION_REFERENCE_DT
    Tracker->>Tracker: apply decay: saturation *= decay_factor ** ratio
    Tracker->>State: last_saturation_update = t1

    Note over Time,Tracker: Subsequent normal updates
    Time->>Tracker: update() at t2
    Tracker->>Tracker: dt = t2 - last_saturation_update
    alt dt == 0
        Tracker->>Tracker: skip update
    else dt > 0 and dt ≤ SATURATION_LONG_GAP_SECONDS
        Tracker->>Tracker: ratio = dt / SATURATION_REFERENCE_DT
        Tracker->>Tracker: alpha_eff = 1 - (1 - alpha) ** ratio
        Tracker->>Tracker: apply time-weighted EMA (rise/decay)
        Tracker->>State: last_saturation_update = t2
    else dt > SATURATION_LONG_GAP_SECONDS
        Tracker->>Tracker: drop/re-seed (do not apply EMA)
        Tracker->>State: last_saturation_update = 0
    end

    Note over LB,Tracker: Decision flow
    LB->>Tracker: query saturation score
    Tracker-->>LB: current EMA value
    LB->>LB: decide swap / deprioritize based on threshold and candidate health
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: converting saturation EMA to time-weighted for sample-rate invariance, which is the core technical transformation across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-battery-polling-oscillations-BiixX

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tomquist tomquist marked this pull request as ready for review April 12, 2026 06:55
claude added 5 commits April 12, 2026 14:18
Switch DEDUPE_TIME_WINDOW parsing from getint to getfloat so users can
configure sub-second dedupe windows (e.g. 0.5). Previously any decimal
value crashed the emulator at startup with a ValueError from
configparser's int coercion.

The comparisons in CT002 already operate on floats; only the config
reader needed to change. Default remains 0.0 (no dedupe).

https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
The efficiency saturation tracker used a per-sample EMA whose rise and
decay weights were baked into the alpha/decay_factor constants.  For
V3 Marstek batteries polling every ~0.45 s the EMA accumulated ~7x
faster than for V2 batteries polling every ~3 s, so under identical
physical conditions the two fleets converged to very different scores
and could oscillate between probe/rotate decisions — visible in the
field as a balancer that alternately promoted and demoted both
batteries while the grid drifted uncompensated.

Rework the EMA to be time-weighted against a fixed reference period
(SATURATION_REFERENCE_DT = 1.0 s): the effective per-update rise weight
becomes ``1 - (1 - alpha) ** (dt / dt_ref)`` and the decay becomes
``decay_factor ** (dt / dt_ref)``.  At dt == dt_ref both reduce to the
previous per-sample formulas, so the tuned defaults keep their
meaning.  Guard against pathologies: a long gap (battery offline for
>30 s) drops the update rather than dosing the EMA with a huge step,
and a backwards clock is clamped to zero.

Fix a related post-probe lockup exposed by the stronger EMA: during
the efficiency fade window that follows a probe handoff, the
deprioritized consumer's ``last_target`` still carried transient fade
values, and feeding those into the saturation EMA raised a false
"cannot follow target" spike high enough to stay above the swap
threshold for many ticks — leaving ``_maybe_force_swap_saturated``
unable to find a healthy backup and pinning the active battery at
target = 0 while the grid imported.  Skip saturation updates entirely
for deprioritized consumers (they are being steered to zero, so the
score has no meaningful interpretation there), and clear the
saturation score symmetrically on the active → deprioritized
transition so the symmetric clear already used for deprioritized →
active works in both directions.

Tests: drive the existing per-sample tests off a FakeClock so they
keep exercising the reference-period formula, and add
sample-rate-invariance tests for both the rise and decay branches plus
a regression guard for the long-gap re-seed.
Seed ``last_saturation_update = clock()`` before the loop so both
fast and slow trackers cover exactly the same wall-clock window.
Previously the first iteration used the reference-period bootstrap
(dt=1.0) regardless of the test's dt, skewing effective EMA time
by ~2.5 s between the two cadences.

Also move ``clock.advance(dt)`` before ``tracker.update()`` so elapsed
time is consumed before the EMA step, matching production order.

https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
The time-weighted saturation EMA accumulates faster per sample when
the powermeter update interval is large (e.g. >10 s), which can
cause unnecessary forced swaps.  Note the workaround (raise the
threshold) in both README.md and config.ini.example.

https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
The config loader was changed from getint to getfloat to accept
fractional seconds (commit 9ff9f0d), but the web configuration
editor schema still declared the key as integer.  Update to float
with min=0 so the editor renders a decimal input.

https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
@tomquist tomquist force-pushed the claude/fix-battery-polling-oscillations-BiixX branch from 6c0eb73 to bcb29de Compare April 12, 2026 14:20
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/astrameter/web_config.py`:
- Line 277: The web schema added "DEDUPE_TIME_WINDOW" with min: 0 but runtime
still accepts negative values; update the runtime where dedupe_time_window is
consumed (e.g., in CT002.__init__) to validate and clamp/raise on invalid
values: read the incoming config value for dedupe_time_window (or use the
DEDUPE_TIME_WINDOW key), check if value is None or < 0, then either set it to 0
(clamp) or raise a clear ValueError/ConfigError, and ensure downstream code uses
this validated value; add a small unit test for CT002 to assert negative inputs
are rejected or clamped accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 48621ed5-ad2a-4976-bfb6-9ace2babe313

📥 Commits

Reviewing files that changed from the base of the PR and between 6c0eb73 and bcb29de.

📒 Files selected for processing (7)
  • README.md
  • config.ini.example
  • src/astrameter/ct002/balancer.py
  • src/astrameter/ct002/ct002.py
  • src/astrameter/main.py
  • src/astrameter/web_config.py
  • tests/test_balancer.py
✅ Files skipped from review due to trivial changes (3)
  • src/astrameter/main.py
  • config.ini.example
  • README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/astrameter/ct002/ct002.py

"UDP_PORT": {"type": "integer"},
"WIFI_RSSI": {"type": "integer"},
"DEDUPE_TIME_WINDOW": {"type": "integer"},
"DEDUPE_TIME_WINDOW": {"type": "float", "min": 0},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

min: 0 is schema-only unless runtime also enforces it

Good change on Line 277, but this constraint currently appears enforced only by the web editor metadata. CT002 logic still accepts direct negative values from config files, which can bypass dedupe behavior unexpectedly. Consider adding runtime validation/clamping where dedupe_time_window is consumed (e.g., in CT002.__init__).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/astrameter/web_config.py` at line 277, The web schema added
"DEDUPE_TIME_WINDOW" with min: 0 but runtime still accepts negative values;
update the runtime where dedupe_time_window is consumed (e.g., in
CT002.__init__) to validate and clamp/raise on invalid values: read the incoming
config value for dedupe_time_window (or use the DEDUPE_TIME_WINDOW key), check
if value is None or < 0, then either set it to 0 (clamp) or raise a clear
ValueError/ConfigError, and ensure downstream code uses this validated value;
add a small unit test for CT002 to assert negative inputs are rejected or
clamped accordingly.

@tomquist tomquist merged commit fdc8056 into develop Apr 12, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants