The goal of the project is to **create a model that predicts which action to take based on the network, predict costs, learn good base stock levels, learn a good strategy, ...**


1. Minimize total cost over the simulation:
- Inventory holding costs
- Backlog penalty costs (when you miss customer demand)

2. Handle uncertainty in demand (it's random but follows patterns).

3. Respect capacity & lead time constraints (you can’t magically produce anything instantly or in infinite quantity).

4. Be interpretable & realistic for ASML's real-world context

**GOAL**: Create your own policy that makes smarter production decisions and leads to lower total costs (inventory + backlog penalties). Then, compare it to the existing ones.

In [12]:
import random
import math
import numpy as np

# Getting to know the dataset

In [2]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.policy import StaticBaseStockPolicyRandom
from src.evaluate import Evaluate


In [3]:
from src.supply_chain_config import SupplyChainConfig

config = SupplyChainConfig()

print("✅ Supply chain data loaded!\n")

print("🔧 All config attributes:")
for attr in dir(config):
    if not attr.startswith("_"):
        value = getattr(config, attr)
        if isinstance(value, (dict, list)):
            print(f"{attr}: {len(value)} items")
        else:
            print(f"{attr}: {value}")


✅ Supply chain data loaded!

🔧 All config attributes:
aggregate_demand_ar_params: 8 items
aggregate_demand_error_sigma: 2.0
aggregate_demand_mean: 15
aggregate_demand_num_lags: 8
avg_demand_per_comp: 14 items
avg_demand_per_product: 3 items
bill_of_materials:    Unnamed: 0  C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12  C13  C14
0          C1   0   0   0   0   0   1   1   1   0    0    0    0    0    0
1          C2   0   0   0   0   0   0   0   0   1    1    0    0    0    0
2          C3   0   0   0   0   0   0   0   0   0    0    1    0    0    0
3          C4   0   0   0   0   0   0   0   0   0    0    1    0    0    0
4          C5   0   0   0   0   0   0   0   0   0    0    1    0    0    0
5          C6   0   0   0   0   0   0   0   0   0    0    0    1    0    0
6          C7   0   0   0   0   0   0   0   0   0    0    0    0    1    0
7          C8   0   0   0   0   0   0   0   0   0    0    0    0    0    1
8          C9   0   0   0   0   0   0   0   0   0    0    0    1 

### 🔍 What this does:
This cell compares three predefined policies:
- **MeanDemandPolicy**: reacts to average demand
- **Shortfall policy**: adds buffer to cover likely stockouts
- **Random policy**: for benchmarking

The simulation runs 100 times, each for 60 periods, and computes the **average total cost** for each policy. This helps us evaluate baseline performance and identify which policy structure might be most promising to build upon.


In [105]:
from src.policy import MeanDemandPolicy, StaticBaseStockPolicyShortfall, StaticBaseStockPolicyRandom 
from src.dynamics import Dynamics

dynamics = Dynamics(config)

policy_mean_demand = MeanDemandPolicy(config, 0.1)
policy_shortfall = StaticBaseStockPolicyShortfall(config, 0.25)
policy_random = StaticBaseStockPolicyRandom(config, 0.25)

evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)

results = evaluator.compare_policies([policy_mean_demand, policy_shortfall, policy_random])
print(results)


[766.4589666666667, 484.5860666666667, 587.5089166666667]


# Build & Evaluate own Policy


Creating a Custom Policy policy.py 

In [101]:
import random
import math
import numpy as np

class MySmartPolicy:
    def __init__(self, config, hedging_parameter=0.1, shortage_history_length=20):
        self.config = config
        self.hedging_parameter = hedging_parameter
        self.component_to_node = self._map_component_to_node()
        self.base_stock_levels = self._get_base_stock_levels()
        self.num_nodes = self.config.num_components
        self.current_node = 0  # for cycling
        self.bottleneck_counts = {i: 0 for i in range(len(self.config.capacities))} # Initialize bottleneck counters
        self.component_shortage_counts = [0] * self.config.num_components
        self.component_hedging_parameters = [hedging_parameter] * self.config.num_components
        self.shortage_history_length = shortage_history_length
        self.shortage_history = [([False] * self.config.num_components) for _ in range(shortage_history_length)]
        self.history_index = 0

    def _map_component_to_node(self):
        mapping = {}
        for node_idx, comp_indices in enumerate(self.config.capacity_groups_indices):
            for comp_index in comp_indices:
                mapping[comp_index] = node_idx
        return mapping

    def _compute_dynamic_hedging(self, component_idx, echelon_inventory, state):
        """
        Dynamically adjust the hedging parameter based on the component's inventory and demand variability.
        More volatile components or those with high shortage will have a higher hedging factor.
        """
        # Calculate demand variability for the component
        demand_variability = self.compute_demand_variability(component_idx)

        # Calculate the current shortage for the component
        component_shortage = self.base_stock_levels[component_idx] - echelon_inventory[component_idx]

        # Increase hedging if the component has high demand variability and/or is in shortage
        dynamic_hedging = 1 + (self.hedging_parameter * demand_variability * 0.5)  # Reduced impact of volatility

        # If there's a shortage, increase hedging further, but less aggressively
        if component_shortage > 0:
            dynamic_hedging += self.hedging_parameter * (component_shortage / self.base_stock_levels[component_idx]) * 0.5  # Reduced sensitivity to shortages

        # Limit the maximum dynamic hedging to avoid over-hedging
        return min(dynamic_hedging, 1.5)  # Lower max hedging cap


    def compute_demand_variability(self, component_idx):
        """
        Calculate demand variability (for example, using historical demand data or predefined values).
        """
        # Assuming you have demand variability data available per component in your configuration
        # If the configuration doesn't include it, you could use a fixed value or a simple formula.
        # Here, we will just use predefined variability from the config as a placeholder.
        return self.config.demand_variability_per_comp[component_idx]


    def compute_demand_variability(self, component_idx):
        # This is where you would use historical data for variability or calculate it on the fly
        # For now, we assume a placeholder method that you can replace with actual logic
        # For example, we could use a default fixed value for variability
        return 0.1  # Placeholder value for demand variability

    #import numpy as np

    #def compute_demand_variability(self, component_idx, historical_demand_data):
        # Assuming historical_demand_data is a list of demand values for this component
        #return np.std(historical_demand_data)  # Standard deviation as a measure of variability

    def set_hedging_parameter(self, state):
        """
        Adjust the hedging parameter based on the current state of inventory and demand variability.
        """
        volatility_factor = self._compute_volatility_factor(state)
        # Adjust hedging less aggressively based on volatility
        self.hedging_parameter = min(0.4 + 0.2 * volatility_factor, 0.7)  # Reduce the cap and sensitivity


    def _compute_volatility_factor(self, state):
        """
        Calculate the volatility factor based on inventory levels and demand variability.
        """
        # Example: Increase hedging if there's significant demand volatility and low inventory
        volatility = sum(self.config.demand_variability_per_comp) / len(self.config.demand_variability_per_comp)
        low_inventory = sum(1 for i in range(self.config.num_components) if state.inventory_per_node[i] < self.base_stock_levels[i])

        return (volatility + low_inventory * 0.1)  # Simple combination of volatility and inventory level

    def _forecast_demand(self, component_idx, state, alpha=0.3):
        """
        Forecast demand using an Exponentially Weighted Moving Average (EWMA).
        The alpha parameter controls how much weight is given to recent demands.
        """
        # Placeholder: Calculate exponentially weighted average demand for the component
        historical_demand = self._get_historical_demand(component_idx, state)

        if len(historical_demand) == 0:
            return self.config.avg_demand_per_comp[component_idx]  # Default to average if no history

        # EWMA calculation: Start with the first value and apply the smoothing factor
        forecast = historical_demand[0]
        for demand in historical_demand[1:]:
            forecast = alpha * demand + (1 - alpha) * forecast

        return forecast


    def _forecast_demand(self, component_idx, state, window_size=5):
        """
        Forecast demand for the given component using a simple moving average.
        """
        # Placeholder: Calculate moving average demand over the last window_size time steps
        # You can adjust this based on your data and preferences.
        historical_demand = self._get_historical_demand(component_idx, state, window_size)

        if len(historical_demand) == 0:
            return self.config.avg_demand_per_comp[component_idx]  # Default to average if no history

        return sum(historical_demand) / len(historical_demand)

    def _get_historical_demand(self, component_idx, state, window_size):
        """
        Retrieve historical demand data for the given component from the state or data.
        """
        # This is a placeholder for retrieving historical demand data.
        # Ideally, this would pull demand data from state or other historical tracking.
        return [random.randint(50, 150) for _ in range(window_size)]  # Mocked random demand for now


    def set_action(self, state):
        echelon_inventory = self._compute_echelon_inventory_position(state)
        base_stock_levels = self._get_base_stock_levels()
        adaptive_hedging_parameter = self.hedging_parameter

        # Update shortage history
        current_shortages = [False] * self.config.num_components
        total_subcomponent_demand = {}
        num_sub_components_bom = self.config.bill_of_materials.shape[0]
        potential_action = [math.ceil(self.config.avg_demand_per_comp[i] * (1 + self.component_hedging_parameters[i]))
                            for i in range(self.config.num_components)]

        for predecessor_index in range(num_sub_components_bom):
            total_needed = 0
            predecessor_name = self.config.bill_of_materials.index[predecessor_index]
            for component_index in range(self.config.num_components):
                component_name = f"C{component_index + 1}"
                if component_name in self.config.bill_of_materials.columns and \
                   self.config.bill_of_materials.loc[predecessor_name, component_name] == 1 and potential_action[component_index] > 0:
                    total_needed += potential_action[component_index]
            if total_needed > 0 and predecessor_index < self.config.num_sub_components:
                total_subcomponent_demand[predecessor_index] = total_needed
                available = state.inventory_per_node[predecessor_index]
                if total_needed > available:
                    current_shortages[predecessor_index] = True

        self.shortage_history[self.history_index] = current_shortages
        self.history_index = (self.history_index + 1) % self.shortage_history_length

        # Adjust hedging parameters based on recent shortage history
        for i in range(self.config.num_components):
            shortage_frequency = sum(history[i] for history in self.shortage_history) / self.shortage_history_length
            if shortage_frequency > 0.2: # If short more than 20% of the time
                self.component_hedging_parameters[i] += 0.01 # Increase hedging
            elif shortage_frequency < 0.05 and self.component_hedging_parameters[i] > 0.01: # If short less than 5% and hedging is not too low
                self.component_hedging_parameters[i] -= 0.005 # Decrease hedging

        # Now, recalculate potential action with the adjusted hedging parameters
        potential_action = [math.ceil(self.config.avg_demand_per_comp[i] * (1 + self.component_hedging_parameters[i]))
                            for i in range(self.config.num_components)]


        # Now, recalculate potential action with the adjusted hedging parameters
        potential_action = [math.ceil(self.config.avg_demand_per_comp[i] * (1 + self.component_hedging_parameters[i]))
                            for i in range(self.config.num_components)]


        potential_action = [0] * self.config.num_components

        # 1. Determine a sensible order based on bottleneck frequency
        capacity_group_priority = sorted(self.bottleneck_counts.items(), key=lambda item: item[1], reverse=True)
        component_order = []
        processed_components = set()
        for group_index, _ in capacity_group_priority:
            for comp_index in self.config.capacity_groups_indices[group_index]:
                if comp_index not in processed_components:
                    component_order.append(comp_index)
                    processed_components.add(comp_index)
        # Add any remaining components
        for i in range(self.config.num_components):
            if i not in processed_components:
                component_order.append(i)

        # 2. Prioritize initial potential actions based on echelon inventory shortfall
        product_echelon_shortfalls = {}
        for i in range(self.config.num_products):
            final_product_index = self.config.num_sub_components + i
            shortfall = base_stock_levels[final_product_index] - echelon_inventory[final_product_index]
            product_echelon_shortfalls[final_product_index] = shortfall

        # Calculate initial potential actions, giving more weight to components of products with high shortfall
        weighted_potential_action = [0] * self.config.num_components
        for i in component_order:
            base_demand = math.ceil(self.config.avg_demand_per_comp[i] * (1 + adaptive_hedging_parameter))
            weight = 1.0
            # If this component is used in a final product with a high echelon shortfall, increase its potential action
            for product_index in range(self.config.num_products):
                final_product_index = self.config.num_sub_components + product_index
                component_name = f"C{i + 1}"
                product_name = self.config.bill_of_materials.index[final_product_index]
                if component_name in self.config.bill_of_materials.columns and \
                   self.config.bill_of_materials.loc[product_name, component_name] == 1:
                    weight += (product_echelon_shortfalls.get(final_product_index, 0) / base_stock_levels[final_product_index]) * 0.1 # Adjust the 0.1 factor

            weighted_potential_action[i] = math.ceil(base_demand * weight)
            potential_action[i] = weighted_potential_action[i] # For now, let's use the weighted action as our potential action

        total_subcomponent_demand = {}
        num_sub_components_bom = self.config.bill_of_materials.shape[0]
        for predecessor_index in range(num_sub_components_bom):
            total_needed = 0
            predecessor_name = self.config.bill_of_materials.index[predecessor_index]
            for component_index in range(self.config.num_components):
                component_name = f"C{component_index + 1}"
                if component_name in self.config.bill_of_materials.columns:
                    if self.config.bill_of_materials.loc[predecessor_name, component_name] == 1 and potential_action[component_index] > 0:
                        total_needed += potential_action[component_index]
            if total_needed > 0 and predecessor_index < self.config.num_sub_components:
                total_subcomponent_demand[predecessor_index] = total_needed

        print("\nTotal Sub-component Demand:")
        for sub_comp_index, total_needed in total_subcomponent_demand.items():
            available = state.inventory_per_node[sub_comp_index]
            print(f"Component {sub_comp_index}: Needed = {total_needed}, Available = {available}")
            if total_needed > available:
                shortage = total_needed - available
                print(f"  Shortage of {shortage} for component {sub_comp_index}!")
                predecessor_name = self.config.bill_of_materials.index[sub_comp_index]
                components_to_reduce = []
                product_priorities_shortage = {} # Priority based on shortage

                for component_index in range(self.config.num_components):
                    component_name = f"C{component_index + 1}"
                    if component_name in self.config.bill_of_materials.columns and \
                       self.config.bill_of_materials.loc[predecessor_name, component_name] == 1 and \
                       potential_action[component_index] > 0:
                        components_to_reduce.append(component_index)
                        if component_index >= self.config.num_sub_components: # It's a final product
                            product_index = component_index - self.config.num_sub_components
                            backlog = max(0, -state.inventory_per_node[component_index])
                            # Priority based on echelon inventory shortfall, backlog, and penalty
                            echelon_shortfall = base_stock_levels[component_index] - echelon_inventory[component_index]
                            priority = (self.config.p[product_index] * (backlog + 1)) + (echelon_shortfall * 0.5) # Adjust weights as needed
                            product_priorities_shortage[component_index] = priority
                        else:
                            product_priorities_shortage[component_index] = 0

                sorted_components_to_reduce = sorted(components_to_reduce,
                                                     key=lambda comp_index: product_priorities_shortage.get(comp_index, 0),
                                                     reverse=False)

                reduction_needed = shortage
                for comp_index_to_reduce in sorted_components_to_reduce:
                    reduce_by = min(potential_action[comp_index_to_reduce], reduction_needed)
                    potential_action[comp_index_to_reduce] = max(0, potential_action[comp_index_to_reduce] - reduce_by)
                    reduction_needed -= reduce_by
                    print(f"  Reduced production of component {comp_index_to_reduce} to {potential_action[comp_index_to_reduce]}")
                    if reduction_needed <= 0:
                        break

        action_per_node = list(potential_action)
        # (Your existing capacity constraint check and bottleneck count update)
        for i in range(self.config.num_components):
            capacity_group = self.config.capacity_groups[i]
            capacity = self.config.capacities[capacity_group]
            group_production = sum([action_per_node[j] for j in self.config.capacity_groups_indices[capacity_group]])
            if group_production > capacity:
                reduction_needed = group_production - capacity
                action_per_node[i] = max(0, action_per_node[i] - reduction_needed)
                self.bottleneck_counts[capacity_group] += 1

        print(f"\n🛠️ Smart Action (All Nodes): {action_per_node}, Hedging: {self.component_hedging_parameters}")
        return action_per_node

    def _is_material_available(self, state, comp_index, produce_amount):
        """Checks if enough materials are available for 'produce_amount' of 'comp_index'."""
        predecessors = self.config.predecessors[comp_index]
        for predecessor_index in predecessors:
            bom_rows = self.config.bill_of_materials[(self.config.bill_of_materials.iloc[:, 0] == comp_index) &
                                                    (self.config.bill_of_materials.iloc[:, 1] == predecessor_index)]
            if not bom_rows.empty:
                required_quantity = bom_rows.iloc[0, 2]
                if state.inventory_per_node[predecessor_index] < produce_amount * required_quantity:
                    print(f"  Not enough predecessor {predecessor_index} for component {comp_index}")
                    return False
        return True

    def _is_production_feasible(self, state, action, node_index, comp_index, produce_amount, current_production_per_node):
        """
        Check if producing 'produce_amount' of 'comp_index' at 'node_index' is feasible.
        """
        # Check material availability
        predecessors = self.config.predecessors[comp_index]
        for predecessor_index in predecessors:
            bom_rows = self.config.bill_of_materials[(self.config.bill_of_materials.iloc[:, 0] == comp_index) &
                                                    (self.config.bill_of_materials.iloc[:, 1] == predecessor_index)]
            if not bom_rows.empty:
                required_quantity = bom_rows.iloc[0, 2]
                if state.inventory_per_node[predecessor_index] < produce_amount * required_quantity:
                    print(f"  Not enough predecessor {predecessor_index} for component {comp_index}")
                    return False

        # Check capacity (considering current planned production at this node)
        capacity = self.config.capacities[node_index]
        planned_production_at_node = current_production_per_node.get(node_index, 0) + produce_amount
        if planned_production_at_node > capacity * 1.05:  # Keeping the 5% slack
            print(f"  Exceeds capacity at node {node_index} for component {comp_index}")
            return False

        return True


    def _is_feasible(self, state, action, node):
        """
        Check if the action is feasible, with added slack to account for minor shortages or excess.
        """
        for pred in self.config.predecessors[node]:
            total_required = sum(action[succ] for succ in self.config.successors[pred])
            # Allow some slack (5%) for minor discrepancies in demand
            if state.inventory_per_node[pred] < total_required * 0.95:
                print(f"Action infeasible due to predecessor {pred}: required {total_required}, available {state.inventory_per_node[pred]}")
                return False

        cap_group = self.config.capacity_groups[node]
        cap_indices = self.config.capacity_groups_indices[cap_group]
        # Allow some overage (5%) for capacity
        total_action = sum(action[j] for j in cap_indices)
        if total_action > self.config.capacities[cap_group] * 1.05:
            print(f"Action infeasible due to capacity group {cap_group}: total action {total_action}, capacity {self.config.capacities[cap_group]}")
            return False

        return True


    def _compute_echelon_inventory_position(self, state):
        total_inventory = [state.inventory_per_node[i] + sum(state.inventory_in_pipeline_per_node[i]) for i in range(self.config.num_components)]
        echelon_inventory = total_inventory[:]

        for i in reversed(range(self.config.num_components)):
            for pred in self.config.predecessors[i]:
                echelon_inventory[pred] += echelon_inventory[i]

        return echelon_inventory

    def _get_base_stock_levels(self):
        levels = [0] * self.config.num_components
        echelon_lead_times = self._get_echelon_lead_time_per_product()

        for i in range(self.config.num_products):
            for j in range(self.config.num_components):
                demand = self.config.avg_demand_per_product[i]
                lead_time = echelon_lead_times[i][j]

                # Base stock level should cover demand during lead time, plus some safety
                levels[j] += demand * (lead_time + 1) # Adding 1 to account for the current period

        return [math.ceil(level * (1 + self.hedging_parameter)) for level in levels]


    def _get_echelon_lead_time_per_product(self):
        L = [[0 for _ in range(self.config.num_components)] for _ in range(self.config.num_products)]

        for i in range(self.config.num_products):
            root = self.config.num_sub_components + i
            L[i][root] = self.config.lead_times[root]

            for pred in self.config.predecessors[root]:
                L[i][pred] = L[i][root] + self.config.lead_times[pred]

                for pred2 in self.config.predecessors[pred]:
                    L[i][pred2] = L[i][pred] + self.config.lead_times[pred2]

        for i in range(self.config.num_products):
            for j in range(self.config.num_components):
                if L[i][j] > 0:
                    L[i][j] += 1

        return L

## Instantiate the system (only once, before all policies)


In [102]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.evaluate import Evaluate

config = SupplyChainConfig()
dynamics = Dynamics(config)
evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)


## Instantiate your policy and the baseline policies

In [106]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.evaluate import Evaluate
from src.policy import MeanDemandPolicy, StaticBaseStockPolicyShortfall, StaticBaseStockPolicyRandom

config = SupplyChainConfig()
dynamics = Dynamics(config)
evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)

# Instantiate policies
mean_policy = MeanDemandPolicy(config, 0.1)
shortfall_policy = StaticBaseStockPolicyShortfall(config, 0.25)
random_policy = StaticBaseStockPolicyRandom(config, 0.25)
my_policy = MySmartPolicy(config, 0.25)

## Compare the policies

In [None]:
results = evaluator.compare_policies([
    mean_policy,
    shortfall_policy,
    random_policy,
    my_policy
])

policy_names = ["MeanDemand", "Shortfall", "Random", "policy_x", "MySmartPolicy"]
print("\n📊 Average Total Cost per Policy:")
for name, cost in zip(policy_names, results):
    print(f"{name}: €{cost:.2f}")


Total Sub-component Demand:
Component 0: Needed = 20, Available = 18
  Shortage of 2 for component 0!
  Reduced production of component 5 to 7
Component 1: Needed = 19, Available = 18
  Shortage of 1 for component 1!
  Reduced production of component 8 to 14
Component 2: Needed = 19, Available = 18
  Shortage of 1 for component 2!
  Reduced production of component 10 to 18
Component 3: Needed = 19, Available = 18
  Shortage of 1 for component 3!
  Reduced production of component 10 to 17
Component 4: Needed = 19, Available = 18
  Shortage of 1 for component 4!
  Reduced production of component 10 to 16
Component 5: Needed = 9, Available = 9
Component 6: Needed = 7, Available = 6
  Shortage of 1 for component 6!
  Reduced production of component 12 to 6
Component 7: Needed = 4, Available = 4
Component 8: Needed = 16, Available = 15
  Shortage of 1 for component 8!
  Reduced production of component 11 to 8
Component 9: Needed = 4, Available = 4
Component 10: Needed = 20, Available = 18


## 🧠 Custom Policy Development: `MySmartPolicy`

### ✅ What I Did

- **Understood the problem context**: This project aims to minimize total supply chain costs (inventory + backlog) while handling uncertain demand, lead times, and capacity constraints — in a way that's interpretable and implementable for ASML.
- **Explored and evaluated baseline policies**:
  - `MeanDemandPolicy`: Uses average demand per component
  - `StaticBaseStockPolicyShortfall`: Adds safety buffers
  - `StaticBaseStockPolicyRandom`: Acts as a benchmark
- **Created a custom policy (`MySmartPolicy`)**:
  - Started with a very simple policy that only ordered 1 unit of a single component (component 0).
  - Gradually debugged feasibility issues by ensuring the `action` dictionary includes all components (even those with 0 order).
  - Confirmed that the simulation runs successfully and yields a (very high) total cost — which is expected, since the policy barely fulfills demand.

### 🧪 What This Showed

- The simulator is highly sensitive to feasibility constraints — it expects a full action dict and enforces capacity limits strictly.
- Starting from a minimal working policy was key to eliminating persistent errors (`KeyError`, `Action is not feasible`, etc.).
- The current version of `MySmartPolicy` is a placeholder — it's valid, but inefficient.

### 🔜 What’s Next

1. **Gradually expand the policy**:
   - Start placing orders from more than just Node 0 (e.g., Nodes 0–2).
   - Increase quantities slightly, but stay within capacities.
2. **Incorporate smarter logic**:
   - Use average demand × lead time to calculate base stock levels.
   - Add simple backlog awareness (if demand is unmet, order more next time).
   - Optionally: forecast demand using moving averages.
3. **Track performance**:
   - Monitor how your policy compares to the baselines in total cost.
   - Test stability across different simulation runs (repeatability).
4. **Make it interpretable**:
   - Ensure your logic can be explained to someone at ASML without deep ML or math background.
   - Consider simple rules that generalize well to other network topologies.

---

