Make saturation EMA time-weighted for sample-rate invariance#321
Make saturation EMA time-weighted for sample-rate invariance#321
Conversation
WalkthroughThe changes convert saturation tracking to a time-weighted exponential moving average (EMA) in the load balancer, adding a last-update timestamp and guards for zero and long gaps. Documentation and example config clarify the behavior and recommend threshold adjustments for slow powermeters. CT002 dedupe timing was made floating-point and tests were extended for time-weighted behavior. Changes
Sequence DiagramsequenceDiagram
participant Time as Time / Clock
participant Tracker as SaturationTracker
participant State as BalancerConsumerState
participant LB as LoadBalancer
Note over Time,LB: Initialization / Grace Clear
Time->>State: last_saturation_update = 0
Note over Time,Tracker: First update after reseed
Time->>Tracker: update() at t1
Tracker->>Tracker: dt = t1 - last_saturation_update (<=0) => use SATURATION_REFERENCE_DT
Tracker->>Tracker: ratio = dt / SATURATION_REFERENCE_DT
Tracker->>Tracker: apply decay: saturation *= decay_factor ** ratio
Tracker->>State: last_saturation_update = t1
Note over Time,Tracker: Subsequent normal updates
Time->>Tracker: update() at t2
Tracker->>Tracker: dt = t2 - last_saturation_update
alt dt == 0
Tracker->>Tracker: skip update
else dt > 0 and dt ≤ SATURATION_LONG_GAP_SECONDS
Tracker->>Tracker: ratio = dt / SATURATION_REFERENCE_DT
Tracker->>Tracker: alpha_eff = 1 - (1 - alpha) ** ratio
Tracker->>Tracker: apply time-weighted EMA (rise/decay)
Tracker->>State: last_saturation_update = t2
else dt > SATURATION_LONG_GAP_SECONDS
Tracker->>Tracker: drop/re-seed (do not apply EMA)
Tracker->>State: last_saturation_update = 0
end
Note over LB,Tracker: Decision flow
LB->>Tracker: query saturation score
Tracker-->>LB: current EMA value
LB->>LB: decide swap / deprioritize based on threshold and candidate health
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Switch DEDUPE_TIME_WINDOW parsing from getint to getfloat so users can configure sub-second dedupe windows (e.g. 0.5). Previously any decimal value crashed the emulator at startup with a ValueError from configparser's int coercion. The comparisons in CT002 already operate on floats; only the config reader needed to change. Default remains 0.0 (no dedupe). https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
The efficiency saturation tracker used a per-sample EMA whose rise and decay weights were baked into the alpha/decay_factor constants. For V3 Marstek batteries polling every ~0.45 s the EMA accumulated ~7x faster than for V2 batteries polling every ~3 s, so under identical physical conditions the two fleets converged to very different scores and could oscillate between probe/rotate decisions — visible in the field as a balancer that alternately promoted and demoted both batteries while the grid drifted uncompensated. Rework the EMA to be time-weighted against a fixed reference period (SATURATION_REFERENCE_DT = 1.0 s): the effective per-update rise weight becomes ``1 - (1 - alpha) ** (dt / dt_ref)`` and the decay becomes ``decay_factor ** (dt / dt_ref)``. At dt == dt_ref both reduce to the previous per-sample formulas, so the tuned defaults keep their meaning. Guard against pathologies: a long gap (battery offline for >30 s) drops the update rather than dosing the EMA with a huge step, and a backwards clock is clamped to zero. Fix a related post-probe lockup exposed by the stronger EMA: during the efficiency fade window that follows a probe handoff, the deprioritized consumer's ``last_target`` still carried transient fade values, and feeding those into the saturation EMA raised a false "cannot follow target" spike high enough to stay above the swap threshold for many ticks — leaving ``_maybe_force_swap_saturated`` unable to find a healthy backup and pinning the active battery at target = 0 while the grid imported. Skip saturation updates entirely for deprioritized consumers (they are being steered to zero, so the score has no meaningful interpretation there), and clear the saturation score symmetrically on the active → deprioritized transition so the symmetric clear already used for deprioritized → active works in both directions. Tests: drive the existing per-sample tests off a FakeClock so they keep exercising the reference-period formula, and add sample-rate-invariance tests for both the rise and decay branches plus a regression guard for the long-gap re-seed.
Seed ``last_saturation_update = clock()`` before the loop so both fast and slow trackers cover exactly the same wall-clock window. Previously the first iteration used the reference-period bootstrap (dt=1.0) regardless of the test's dt, skewing effective EMA time by ~2.5 s between the two cadences. Also move ``clock.advance(dt)`` before ``tracker.update()`` so elapsed time is consumed before the EMA step, matching production order. https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
The time-weighted saturation EMA accumulates faster per sample when the powermeter update interval is large (e.g. >10 s), which can cause unnecessary forced swaps. Note the workaround (raise the threshold) in both README.md and config.ini.example. https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
The config loader was changed from getint to getfloat to accept fractional seconds (commit 9ff9f0d), but the web configuration editor schema still declared the key as integer. Update to float with min=0 so the editor renders a decimal input. https://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
6c0eb73 to
bcb29de
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/astrameter/web_config.py`:
- Line 277: The web schema added "DEDUPE_TIME_WINDOW" with min: 0 but runtime
still accepts negative values; update the runtime where dedupe_time_window is
consumed (e.g., in CT002.__init__) to validate and clamp/raise on invalid
values: read the incoming config value for dedupe_time_window (or use the
DEDUPE_TIME_WINDOW key), check if value is None or < 0, then either set it to 0
(clamp) or raise a clear ValueError/ConfigError, and ensure downstream code uses
this validated value; add a small unit test for CT002 to assert negative inputs
are rejected or clamped accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 48621ed5-ad2a-4976-bfb6-9ace2babe313
📒 Files selected for processing (7)
README.mdconfig.ini.examplesrc/astrameter/ct002/balancer.pysrc/astrameter/ct002/ct002.pysrc/astrameter/main.pysrc/astrameter/web_config.pytests/test_balancer.py
✅ Files skipped from review due to trivial changes (3)
- src/astrameter/main.py
- config.ini.example
- README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- src/astrameter/ct002/ct002.py
| "UDP_PORT": {"type": "integer"}, | ||
| "WIFI_RSSI": {"type": "integer"}, | ||
| "DEDUPE_TIME_WINDOW": {"type": "integer"}, | ||
| "DEDUPE_TIME_WINDOW": {"type": "float", "min": 0}, |
There was a problem hiding this comment.
min: 0 is schema-only unless runtime also enforces it
Good change on Line 277, but this constraint currently appears enforced only by the web editor metadata. CT002 logic still accepts direct negative values from config files, which can bypass dedupe behavior unexpectedly. Consider adding runtime validation/clamping where dedupe_time_window is consumed (e.g., in CT002.__init__).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/astrameter/web_config.py` at line 277, The web schema added
"DEDUPE_TIME_WINDOW" with min: 0 but runtime still accepts negative values;
update the runtime where dedupe_time_window is consumed (e.g., in
CT002.__init__) to validate and clamp/raise on invalid values: read the incoming
config value for dedupe_time_window (or use the DEDUPE_TIME_WINDOW key), check
if value is None or < 0, then either set it to 0 (clamp) or raise a clear
ValueError/ConfigError, and ensure downstream code uses this validated value;
add a small unit test for CT002 to assert negative inputs are rejected or
clamped accordingly.
Summary
This PR converts the saturation tracker from a per-sample EMA to a time-weighted EMA that produces consistent results regardless of polling cadence. This fixes a regression where V3 batteries (polling ~0.45s) and V2 batteries (polling ~3.1s) would converge to different saturation scores under identical physical conditions.
Key Changes
Core Algorithm Changes
1 - (1 - alpha) ** (dt / dt_ref)anddecay_factor ** (dt / dt_ref)respectively, wheredtis the actual elapsed time anddt_refis a reference interval (1.0 second)SATURATION_REFERENCE_DT = 1.0: Reference poll interval at which configured alpha and decay_factor apply one full stepSATURATION_LONG_GAP_SECONDS = 30.0: Threshold above which gaps between updates are dropped (re-seeded) rather than integrated into the EMA to avoid huge spurious steps when batteries go offlineState Tracking
last_saturation_updatefield toBalancerConsumerStateto track wall-clock timestamp of the most recent EMA steplast_saturation_update == 0.0) is treated as a full reference-period step for proper cold-start behaviorDeprioritization Logic
_maybe_force_swap_saturatedTest Coverage
_FakeClockhelper class for deterministic time-weighted EMA testing_make_tracker_with_clock()helper methoddt == SATURATION_REFERENCE_DTMinor Fixes
dedupe_time_windowconfig parsing fromgetint()togetfloat()to support fractional secondsSaturationTrackerdocstring to document the time-weighted approachhttps://claude.ai/code/session_01AaF4EqZPib3pmM44w8DJXD
Summary by CodeRabbit
Bug Fixes
Documentation