<a href="https://colab.research.google.com/github/m-zaniolo/CEE690-ESAA/blob/main/Lab05_SDP_class.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 3 — Stochastic Dynamic Programming for Lake Operations

This notebook shows how to use **stochastic dynamic programming (SDP)** to compute a stationary release policy for a reservoir.

- Decision: **daily release** $r_t$  
- Uncertainty: **daily inflow** $q_t$

We optimize a trade-off between:

- **Hydropower production** from releasing water through the turbines
- **Meeting agricultural demand**, represented as a penalty for **unmet demand** relative to $D=80\,\mathrm{m^3/s}$

For SDP we also need a stochastic inflow model, so we approximate inflow uncertainty with a simple empirical distribution built from the historical record.


## 1. Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['figure.dpi'] = 150
np.set_printoptions(precision=3, suppress=True)

# Data location (raw GitHub URLs work in Colab)
DATA_URL = "https://raw.githubusercontent.com/m-zaniolo/CEE690-ESAA/main/data/"

# Daily inflow time series [m3/s]
inflow = np.loadtxt(DATA_URL + "inflow.txt", delimiter="\t")

# A few example historical release strategies [m3/s] (used only for comparison plots)
release_hist = np.loadtxt(DATA_URL + "release1.txt", delimiter="\t")

# Cyclostationary net evaporation series [mm/day] (length 365)
# (The repository file name uses 'Gibe1', even though we apply it to this lake model.)
evap_mm_day = np.loadtxt(DATA_URL + "netevap_Gibe1.txt", delimiter=" ")

## 3. Reservoir simulation model

We use the same daily mass balance as in Lab 1, implemented in `mass_balance_gibeIII`.

$$
S_{t+1} = S_t + (q_{t+1} - r_{t+1} - e_t)*\Delta t
$$

- $q$: inflow $[\mathrm{m^3/s}]$  
- $r$: release $[\mathrm{m^3/s}]$  
- $e$: evaporation expressed as an equivalent outflow $[\mathrm{m^3/s}]$  
- $\Delta t$: 1 day in seconds

Lab 1 also uses an empirical storage–level relationship to compute the level  from storage.


In [None]:
# Reservoir parameters (same as Lab 1)
S_max = 1.47e10          # [m3] maximum storage
sim_step = 60*60*24            # [s] one day
stor_to_surface = 0.0142 # surface = stor_to_surface * storage

# Storage -> level relationship used in Lab 1 (empirical)
# l = 0.0521 * (s**0.3589)

def mass_balance_gibeIII(s0, inflow, release, evap_mm_day, S):

    # time convention (prepend a dummy value so we can use t+1 indexing)
    inflow_ = np.concatenate(([-999], np.asarray(inflow)))
    release_ = np.concatenate(([-999], np.asarray(release)))

    # initialize storage vector
    s = np.zeros(len(inflow_))
    s[0] = s0

    # define constants
    H = len(inflow)  # simulation horizon (days)

    for t in range(H):
        # evaporation expressed as an equivalent outflow [m3/s]
        evaporation_t = evap_mm_day[t % 365] / 1000.0 * s[t] * stor_to_surface / sim_step

        # mass balance
        s[t+1] = s[t] + (inflow_[t+1] - release_[t+1] - evaporation_t) * sim_step
        s[t+1] = min(max(0, s[t+1]), S)  # physical bounds

    # storage -> level (same as Lab 1)
    l = 0.0521 * (s ** 0.3589)
    return s, l


### Test the simulator

We simulate the historical release series as a sanity check.

In [None]:
s0 = 0.7 * S_max

# Use the Lab 1 simulator with a historical release series (for comparison)
s_hist, l_hist = mass_balance_gibeIII(s0, inflow, release_hist, evap_mm_day, S_max)

plt.figure()
plt.plot(s_hist)
plt.xlabel("Day")
plt.ylabel("Storage (m3)")
plt.ylim(0, S_max * 1.05)
plt.grid(True, alpha=0.3)
plt.show()

plt.figure()
plt.plot(l_hist)
plt.xlabel("Day")
plt.ylabel("Level (m)")
plt.grid(True, alpha=0.3)
plt.show()


## 4. Objectives

We define two **step costs** evaluated each day.

### Hydropower
We use the same algebraic form you will see again in Lab 2:

$$
\texttt{hp}_t = \ell_t\, r_t\, \texttt{efficiency}\, g\, d  10^-3
$$

Here $\ell_t$ is the level , $r_t$ is release, $d$ is water density, and $g$ is gravity.
`hp1` is a **benefit**, so it enters the scalar step cost with a minus sign.

### Agriculture
We penalize **unmet demand** relative to $D,\mathrm{m^3/s}$:

$$
\texttt{food}_t = \max(D-r_t, 0)^2
$$



### Weighted sum
SDP needs a single scalar step cost to minimize. We use:

$$
g_t = \lambda\,\texttt{food}_t - (1-\lambda)\,\texttt{hp1}_t
$$




In [None]:
# Objectives (same as Lab 2)

D = 450            # [m3/s] agricultural demand
efficiency = 0.90  # turbine efficiency (assumed constant)
g = 9.81           # [m/s2]
d = 1000.0         # [kg/m3] water density


def step_cost_components(l, r):
    hp1 = l * r * efficiency * g * d / 10e3
    food = (np.maximum(D - r, 0.0)) ** 2
    return hp1, food


## 5. Stochastic inflow model

A basic SDP needs a probability model for inflow. Here we build an **empirical discrete distribution** from the historical inflow record.

Option A: bins have the same width, and different probabilities

In [None]:
def empirical_inflow_distribution_optionA(q_series, n_bins=25):
    q = np.asarray(q_series)
    q = q[np.isfinite(q)]

    edges = np.linspace(q.min(), q.max(), n_bins + 1)
    counts, _ = np.histogram(q, bins=edges)
    probs = counts / counts.sum()

    reps = 0.5 * (edges[:-1] + edges[1:])  # midpoint of each bin
    return reps, probs, edges


q_vals, q_probs, q_edges = empirical_inflow_distribution_optionA(inflow, n_bins=25)

plt.figure()
plt.bar(np.arange(len(q_vals)), q_probs)
plt.xlabel("Inflow bin")
plt.ylabel("Probability")
plt.show()



it is mathematically acceptable for SDP to use equal-width bins. But for hydrologic inflows it is often a poor numerical approximation.
If the inflow distribution is strongly right-skewed, then equal-width bins put most samples in the first bin near zero while the long tail gets many bins with tiny probability mass. That has two consequences:


*   You get very low resolution where the system spends most of its time. Many distinct low–moderate inflows that matter for day-to-day releases are collapsed into one dominant bin.
*   You get noisy information in the tail. High-flow bins have few samples, so their representative value is unstable. Yet those rare events can drive storage and hydropower outcomes.

*Option B* Equal-probability (quantile) bins creates bins with equal probability but different widths.  

In [None]:
def empirical_inflow_distribution(q_series, n_bins=25, rep="mean"):
    q = np.asarray(q_series)
    q = q[np.isfinite(q)]
    q = q[q >= 0]

    # Quantile edges
    edges = np.quantile(q, np.linspace(0, 1, n_bins + 1))
    edges = np.unique(edges)  # handle ties that create repeated edges

    # Bin assignment
    idx = np.digitize(q, edges[1:-1], right=True)  # 0..(K-1)
    K = len(edges) - 1

    probs = np.array([(idx == k).mean() for k in range(K)])

    reps = np.zeros(K)
    for k in range(K):
        qk = q[idx == k]
        if len(qk) == 0:
            reps[k] = 0.5 * (edges[k] + edges[k+1])
        else:
            reps[k] = qk.mean() if rep == "mean" else np.median(qk)

    return reps, probs, edges

q_vals, q_probs, q_edges = empirical_inflow_distribution(inflow, n_bins=25)

# Bin geometry
widths = np.diff(q_edges)
lefts  = q_edges[:-1]

# Probability mass per bin (widths vary; heights are p_k)
plt.figure(figsize=(7,4))
plt.bar(lefts, q_probs, width=widths, align="edge", edgecolor="k")
plt.xlabel("Inflow q (m³/s)")
plt.ylabel("Probability mass in bin")
plt.title("Quantile bins: varying widths (each bin has ~equal probability)")
plt.show()



## 6. SDP formulation

**State:** storage $s$ $[\mathrm{m^3}]$

**Action:** release $r$ $[\mathrm{m^3/s}]$

**Transition:** one-day mass balance with a stochastic inflow draw $q$

We compute the infinite-horizon discounted value function:

$$
H(s(t))=\min_u\; \mathbb{E}\left[g(s,u,q) + \gamma H(s(t+1))\right]
$$

We solve it by **value iteration** on a discretized storage grid.


In [None]:
# Discretization grids
n_s =
s_grid =
l_grid = 0.0521 * (s_grid ** 0.3589)


# Action grid:
r_max = 1200
n_r = 36
r_grid = np.linspace(0.0, r_max, n_r)

gamma = 1   # discount factor (close to 1 -> long horizon)

q_vals_arr = np.asarray(q_vals, dtype=float)
q_probs_arr = np.asarray(q_probs, dtype=float)


### Value iteration solver

In [None]:
def solve_sdp_value_iteration(lam, max_iter=1000, tol=1e-3, verbose=True):
    """Solve the discounted infinite-horizon SDP by value iteration.

    We minimize the expected discounted sum of scalar step costs.

    Parameters
    ----------
    lam : float
        Trade-off parameter in [0, 1].
        lam = 1   -> prioritize agriculture (minimize unmet demand)
        lam = 0   -> prioritize hydropower (maximize hp1)
    """
    lam = float(lam)

    n_s = len(s_grid)
    n_r = len(r_grid)

    H = np.zeros(n_s, dtype=float)
    policy = np.zeros(n_s, dtype=float)

    # Allocate these so we can print them
    Q = np.zeros((n_s, n_r), dtype=float)
    policy_idx = np.zeros(n_s, dtype=int)

    for it in range(max_iter):
        H_old = H.copy()

        for i, s in enumerate(s_grid):

            # feasibility check on release (cannot release more than the storage)
            r_feas_max = s / sim_step
            feasible = (r_grid <= r_feas_max)

            # cap release for the transition (so s_next never over-releases)
            r_use = np.minimum(r_grid, r_feas_max)

            # evaporation as in the lab loop (here: stationary approximation using mean evap)
            surface = s * stor_to_surface
            evap_outflow = (np.mean(evap_mm_day) / 1000.0) / sim_step * surface  # [m3/s]

            # transition for each (q_bin, action)
            s_next =
            s_next = np.clip(s_next, 0.0, S_max)

            # value-to-go at next state
            H_next =

            # expected value across inflow uncertainty
            exp_future =  # (n_r,)

            # --- step cost computed for each release decision ---
            l_now = l_grid[i]  # or level_from_storage(s) if you prefer
            hp1 = l_now * r_grid * efficiency * g * d / 10e3
            food = (np.maximum(D - r_grid, 0.0)) ** 2
            step_cost_vec = lam * food - (1.0 - lam) * hp1           # (n_r,)

            # Bellman update for each action
            Q_actions =
            Q_actions = Q_actions.copy()
            Q_actions[~feasible] = np.inf

            Q[i, :] = Q_actions

            u_star =
            H[i] = Q_actions[u_star]
            policy_idx[i] = u_star
            policy[i] = r_grid[u_star]

        err = float(np.max(np.abs(H - H_old)))
        if err < tol:
            break

    if verbose:
        np.set_printoptions(precision=4, suppress=True, linewidth=160)

        print("Storage grid s_grid (m3):")
        print("shape:", s_grid.shape)
        print(s_grid)

        print("\nRelease grid r_grid (m3/s):")
        print("shape:", r_grid.shape)
        print(r_grid)

        print("\nValue function H(s):")
        print("shape:", H.shape)
        print(H)

        print("\nQ-function Q(s,a): step cost + discounted expected value-to-go")
        print("shape:", Q.shape)
        print(Q)

        print("\nOptimal release per storage:")
        print("shape:", policy.shape)
        print(policy)

    return H, policy


## 7. Compute policies for several trade-offs

We solve the SDP for a set of $\lambda$ values and plot the resulting policies.


In [None]:
lam = 0.5
solutions = {}

H, pol = solve_sdp_value_iteration(lam)
solutions = {"H": H, "policy": pol}

plt.figure()
plt.plot(s_grid / S_max, solutions["policy"], label=f"λ={lam}")
plt.plot(D)
plt.xlabel("Storage / S_max")
plt.ylabel("Optimal release (m3/s)")
plt.ylim([0, r_max*1.1])
plt.legend()
plt.title('Policy')
plt.grid(True, alpha=0.3)
plt.show()

plt.figure()
plt.plot(s_grid / S_max, H, label=f"λ={lam}")
plt.plot(D)
plt.xlabel("Storage / S_max")
plt.ylabel("H(s)")
plt.legend()
plt.grid(True, alpha=0.3)
plt.title('Optimal cost-to-go/penalty')
plt.show()



In [None]:
lams = [0.0, 0.10, 0.25, 0.5, 0.75, 0.9, 1.0]
solutions = {}

for k, lam in enumerate(lams):
    print("\nSolving for lambda =", lam)
    H, pol = solve_sdp_value_iteration(
        lam, verbose=False
    )
    solutions[lam] = {"H": H, "policy": pol}

plt.figure()
for lam in lams:
    plt.plot(s_grid / S_max, solutions[lam]["policy"], label=f"λ={lam}")
plt.xlabel("Storage / S_max")
plt.ylabel("Optimal release (m3/s)")
plt.legend()
plt.grid(True, alpha=0.3)
plt.title("Note: solutions may not be converged. Longer runtime needed")
plt.show()



## 8. Simulate and evaluate each policy

We evaluate each policy by simulating the reservoir over the historical inflow record with `mass_balance_gibeIII`.

We report:

- Mean hydropower benefit `hp1` (and the equivalent mean power in MW)
- Mean unmet-demand penalty `food`


In [None]:
def simulate_policy(policy_r_on_grid, s0, inflow_series, evap_series):
    """Simulate a policy over a given inflow record using the Lab 1 simulator."""
    inflow_series = np.asarray(inflow_series)
    T = len(inflow_series)

    # Build a release time series from the policy (storage -> release)
    r = np.zeros(T)
    s_tmp = float(s0)

    for t in range(T):
        r[t] = float(np.interp(s_tmp, s_grid, policy_r_on_grid))

        # One-step update (same mass balance as in mass_balance_gibeIII)
        evaporation_t = float(evap_series[t % 365]) / 1000.0 * s_tmp * stor_to_surface / sim_step
        s_tmp = s_tmp + (float(inflow_series[t]) - r[t] - evaporation_t) * sim_step
        s_tmp = float(np.clip(s_tmp, 0.0, S_max))

    # Full simulation with the constructed releases (Lab 1 function)
    s, l = mass_balance_gibeIII(s0, inflow_series, r, evap_series, S_max)

    # Step costs on decision days t=0..T-1, using level at time t
    hp1, food = step_cost_components(l[:-1], r)

    return s, l, r, hp1, food

def summarize_run(hp1, food):
    # hp1 = P / 10e6, so MW = (P / 1e6) = hp1 * 10
    mean_power_MW = float(np.mean(hp1) * 10.0)
    return {
        "mean_hp1": float(np.mean(hp1)),
        "mean_power_MW": mean_power_MW,
        "mean_food_penalty": float(np.mean(food)),
    }

results = []
s0 = 0.7 * S_max

for lam in lams:
    s_sim, l_sim, r_sim, hp1_sim, food_sim = simulate_policy(
        solutions[lam]["policy"], s0, inflow, evap_mm_day
    )
    summary = summarize_run(hp1_sim, food_sim)
    summary["lambda"] = lam
    results.append(summary)

results = sorted(results, key=lambda d: d["lambda"])
results


### Objective-space plot (trade-off curve)

In [None]:
mean_power = np.array([r["mean_power_MW"] for r in results])
mean_food = np.array([r["mean_food_penalty"] for r in results])

plt.figure()
plt.scatter(mean_food, mean_power)

for r in results:
    plt.text(r["mean_food_penalty"], r["mean_power_MW"], f"  λ={r['lambda']}")

plt.xlabel("Mean unmet-demand penalty  [max(D - r, 0)^2]")
plt.ylabel("Mean power (MW)")
plt.grid(True, alpha=0.3)
plt.show()


### Time series example

Below we plot storage, release, and power for one policy.

In [None]:
lam_show = 0.1
s_sim, l_sim, r_sim, hp1_sim, food_sim = simulate_policy(
    solutions[lam_show]["policy"], s0, inflow, evap_mm_day
)

plt.figure()
plt.plot(s_sim / S_max)
plt.xlabel("Day")
plt.ylabel("Storage / S_max")
plt.grid(True, alpha=0.3)
plt.show()

plt.figure()
plt.plot(r_sim, label="Optimized release")
plt.axhline(D, linestyle="--", label="Demand D=80")
plt.xlabel("Day")
plt.ylabel("Release (m3/s)")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

plt.figure()
plt.plot(hp1_sim * 10.0)
plt.xlabel("Day")
plt.ylabel("Power (MW)")
plt.grid(True, alpha=0.3)
plt.show()

plt.figure()
plt.plot(food_sim)
plt.xlabel("Day")
plt.ylabel("Unmet-demand penalty  [max(D - r, 0)^2]")
plt.grid(True, alpha=0.3)
plt.show()


## 9. Discussion and extensions

1. **Scaling and weights.** Hydropower and demand deviation have different units. We normalized both terms to make $\lambda$ more interpretable.

2. **Seasonality.** We used a *mean* evaporation term in the SDP to keep the problem stationary. A more realistic SDP can include a seasonal index in the state, for example $x_t=(s_t,\mathrm{day\_of\_year})$.

3. **Operational constraints.** You can add turbine capacity, minimum environmental flow, ramping limits, or piecewise costs.


## 10. Seasonal SDP extension

In the stationary SDP above, we approximate inflow as an i.i.d. draw from a single empirical distribution.

Here we show what changes if inflow is **seasonal**. The state becomes:

- storage $s$
- day of year $t \in \{1,\dots,365\}$

We keep the same step cost. We make the inflow distribution time-varying by estimating
$\Pr(q \mid t)$ from the historical inflow record using a moving window around each day of the year.


## Seasonal inflow probabilities

Here we estimate a *day-of-year* inflow distribution, \(P(q \mid t)\), using the same inflow support \((q\_edges, q\_vals)\) used in the stationary SDP.  
For each day \(t\), we pool observations from a \(\pm 15\)-day window to reduce sampling noise.


In [None]:
# --- Estimate daily inflow probabilities P(q | day) on the existing inflow support ---

lam_seasonal = 0.25  # pick one trade-off value for the seasonal demo
T = 365

# Day-of-year index for each inflow observation.
# The dataset is daily, so we map index -> day of year by modulo 365.
doy_obs = (np.arange(len(inflow)) % 365) + 1  # 1..365

window = 15  # +/- days around each target day
K = len(q_vals)

q_probs_by_day = np.zeros((T, K), dtype=float)

for t in range(1, T + 1):
    dist = np.abs(doy_obs - t)
    dist = np.minimum(dist, 365 - dist)  # circular distance on {1..365}
    mask = dist <= window

    counts, _ = np.histogram(np.asarray(inflow)[mask], bins=q_edges)

    if counts.sum() == 0:
        q_probs_by_day[t - 1, :] = q_probs_arr  # fallback to stationary
    else:
        q_probs_by_day[t - 1, :] = counts / counts.sum()

print("q_probs_by_day shape:", q_probs_by_day.shape)

# Plot: a heatmap of daily probabilities (day on x, inflow-bin index on y)
plt.figure(figsize=(9, 3.8))
plt.imshow(q_probs_by_day.T, aspect="auto", origin="lower",
           extent=[1, 365, 0, K-1])
plt.xlabel("Day of year")
plt.ylabel("Inflow bin index")
plt.title("Estimated inflow probabilities by day (15-day window)")
plt.colorbar(label="Probability")
plt.tight_layout()
plt.show()


## Seasonal SDP with value iteration

We now solve a periodic SDP where the state is \((t, s)\).  
The transition depends on the day through \(P(q \mid t)\). We keep the same storage and release grids used above.


In [None]:
def solve_seasonal_sdp_value_iteration(lam, q_probs_by_day, max_iter=300, tol=1e-4, verbose=True):
    """Seasonal SDP by value iteration.

    State: (day of year t, storage s)
    Action: release r

    We minimize: step_cost + gamma * E[ H(t+1, s_next) ]
    using a periodic year (t_next = (t+1) mod 365).
    """
    lam = float(lam)

    T = q_probs_by_day.shape[0]
    K = q_probs_by_day.shape[1]
    n_s = len(s_grid)
    n_r = len(r_grid)

    H = np.zeros((T, n_s), dtype=float)
    policy = np.zeros((T, n_s), dtype=float)
    policy_idx = np.zeros((T, n_s), dtype=int)

    for it in range(max_iter):
        H_old = H.copy()

        for t in range(T):
            t_next = (t + 1) % T
            p = q_probs_by_day[t, :]  # (K,)

            for i, s in enumerate(s_grid):
                # Feasibility: cannot release more than current storage over one step
                r_feas_max = s / sim_step
                feasible = (r_grid <= r_feas_max)

                # Transition uses capped release (so s_next never “over-releases”)
                r_use = np.minimum(r_grid, r_feas_max)

                surface = s * stor_to_surface
                evap_outflow = (evap_mm_day[t % 365] / 1000.0) / sim_step * surface  # [m3/s]

                s_next = s + (q_vals_arr[:, None] - r_use[None, :] - evap_outflow) * sim_step
                s_next = np.clip(s_next, 0.0, S_max)  # (K, n_r)

                # Interpolate value at next storages for next day
                H_next = np.interp(s_next, s_grid, H_old[t_next])  # (K, n_r)

                exp_future = (p[:, None] * H_next).sum(axis=0)  # (n_r,)

                # --- step cost computed inside the loop (state-dependent) ---
                l_now = l_grid[i]  # or level_from_storage(s) if you prefer
                hp1 = l_now * r_grid * efficiency * g * d / 10e3
                food = (np.maximum(D - r_grid, 0.0)) ** 2
                step_cost_vec = lam * food - (1.0 - lam) * hp1  # (n_r,)

                Q_actions = step_cost_vec + gamma * exp_future

                # Enforce feasibility in argmin
                Q_actions = Q_actions.copy()
                Q_actions[~feasible] = np.inf

                j_star = int(np.argmin(Q_actions))
                H[t, i] = Q_actions[j_star]
                policy_idx[t, i] = j_star
                policy[t, i] = r_grid[j_star]

        err = float(np.max(np.abs(H - H_old)))
        if err < tol:
            if verbose:
                print(f"Seasonal SDP converged in {it+1} iterations (max ΔV = {err:.2e}).")
            break
    else:
        if verbose:
            print(f"Seasonal SDP reached max_iter={max_iter} (max ΔV = {err:.2e}).")

    # Compute and store the full Q matrix for Jan 1 (t=0) for visualization
    t0 = 0
    t1 = 1  # Jan 2
    p0 = q_probs_by_day[t0, :]

    Q_jan1_sa = np.zeros((n_s, n_r), dtype=float)  # rows: storage, cols: action

    for i, s in enumerate(s_grid):
        r_feas_max = s / sim_step
        feasible = (r_grid <= r_feas_max)
        r_use = np.minimum(r_grid, r_feas_max)

        surface = s * stor_to_surface
        evap_outflow = (evap_mm_day[t0 % 365] / 1000.0) / sim_step * surface  # Jan 1 evap

        s_next = s + (q_vals_arr[:, None] - r_use[None, :] - evap_outflow) * sim_step
        s_next = np.clip(s_next, 0.0, S_max)  # (K, n_r)

        H_next = np.interp(s_next, s_grid, H[t1])        # (K, n_r)
        exp_future = (p0[:, None] * H_next).sum(axis=0)  # (n_r,)

        l_now = l_grid[i]
        hp1 = l_now * r_grid * efficiency * g * d / 10e3
        food = (np.maximum(D - r_grid, 0.0)) ** 2
        step_cost_vec = lam * food - (1.0 - lam) * hp1

        Qa = step_cost_vec + gamma * exp_future
        Qa = Qa.copy()
        Qa[~feasible] = np.inf
        Q_jan1_sa[i, :] = Qa

    Q_jan1_as = Q_jan1_sa.T  # for printing as Q(a, s)

    if verbose:
        np.set_printoptions(precision=4, suppress=True, linewidth=160)

        print("\n==============================")
        print("SEASONAL SDP: ARRAY SIZES")
        print("==============================\n")

        print(f"Days (T): {T}")
        print(f"Storage states (n_s): {n_s}")
        print(f"Release actions (n_r): {n_r}")
        print(f"Inflow bins (K): {K}")

        print("\nH(t, s) shape:", H.shape)
        print("Q(Jan 1 as Q(a, s)) shape:", Q_jan1_as.shape)
        print("Policy(t, s) shape:", policy.shape)

    return H, policy, policy_idx, Q_jan1_as


In [None]:
H, policy, policy_idx, Q_jan1_as = solve_seasonal_sdp_value_iteration(lam, q_probs_by_day)

## Plot the seasonal optimal policy

We plot one optimal release curve per day. The color goes from light (Jan 1) to dark (Dec 31).


In [None]:
fig, ax = plt.subplots(figsize=(7, 5))
cmap = plt.cm.viridis

for t in range(T):
    shade = 0.15 + 0.85 * (t / (T - 1))
    ax.plot(s_grid / S_max, policy[t, :], color=cmap(shade), linewidth=0.7)

ax.set_xlabel("Storage / S_max")
ax.set_ylabel("Optimal release (m3/s)")
ax.set_title(f"Seasonal optimal release policy (one line per day), λ={lam_seasonal}")

import matplotlib as mpl
sm = mpl.cm.ScalarMappable(cmap=cmap, norm=mpl.colors.Normalize(vmin=1, vmax=365))
sm.set_array([])
cbar = fig.colorbar(sm, ax=ax, pad=0.02)
cbar.set_label("Day of year (1=Jan 1, 365=Dec 31)")

ax.grid(True, alpha=0.2)
fig.tight_layout()
plt.show()
