The goal of the project is to **create a model that predicts which action to take based on the network, predict costs, learn good base stock levels, learn a good strategy, ...**


1. Minimize total cost over the simulation:
- Inventory holding costs
- Backlog penalty costs (when you miss customer demand)

2. Handle uncertainty in demand (it's random but follows patterns).

3. Respect capacity & lead time constraints (you can’t magically produce anything instantly or in infinite quantity).

4. Be interpretable & realistic for ASML's real-world context

**GOAL**: Create your own policy that makes smarter production decisions and leads to lower total costs (inventory + backlog penalties). Then, compare it to the existing ones.

In [1]:
import random
import math
import numpy as np

# Getting to know the dataset

In [2]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.policy import StaticBaseStockPolicyRandom
from src.evaluate import Evaluate


In [3]:
from src.supply_chain_config import SupplyChainConfig

config = SupplyChainConfig()

print("✅ Supply chain data loaded!\n")

print("🔧 All config attributes:")
for attr in dir(config):
    if not attr.startswith("_"):
        value = getattr(config, attr)
        if isinstance(value, (dict, list)):
            print(f"{attr}: {len(value)} items")
        else:
            print(f"{attr}: {value}")


✅ Supply chain data loaded!

🔧 All config attributes:
aggregate_demand_ar_params: 8 items
aggregate_demand_error_sigma: 2.0
aggregate_demand_mean: 15
aggregate_demand_num_lags: 8
avg_demand_per_comp: 14 items
avg_demand_per_product: 3 items
bill_of_materials:    Unnamed: 0  C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12  C13  C14
0          C1   0   0   0   0   0   1   1   1   0    0    0    0    0    0
1          C2   0   0   0   0   0   0   0   0   1    1    0    0    0    0
2          C3   0   0   0   0   0   0   0   0   0    0    1    0    0    0
3          C4   0   0   0   0   0   0   0   0   0    0    1    0    0    0
4          C5   0   0   0   0   0   0   0   0   0    0    1    0    0    0
5          C6   0   0   0   0   0   0   0   0   0    0    0    1    0    0
6          C7   0   0   0   0   0   0   0   0   0    0    0    0    1    0
7          C8   0   0   0   0   0   0   0   0   0    0    0    0    0    1
8          C9   0   0   0   0   0   0   0   0   0    0    0    1 

### 🔍 What this does:
This cell compares three predefined policies:
- **MeanDemandPolicy**: reacts to average demand
- **Shortfall policy**: adds buffer to cover likely stockouts
- **Random policy**: for benchmarking

The simulation runs 100 times, each for 60 periods, and computes the **average total cost** for each policy. This helps us evaluate baseline performance and identify which policy structure might be most promising to build upon.


In [4]:
from src.policy import MeanDemandPolicy, StaticBaseStockPolicyShortfall, StaticBaseStockPolicyRandom 
from src.dynamics import Dynamics

dynamics = Dynamics(config)

policy_mean_demand = MeanDemandPolicy(config, 0.1)
policy_shortfall = StaticBaseStockPolicyShortfall(config, 0.25)
policy_random = StaticBaseStockPolicyRandom(config, 0.25)

evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)

results = evaluator.compare_policies([policy_mean_demand, policy_shortfall, policy_random])
print(results)


[809.5621333333333, 497.92259999999993, 614.8901333333333]


# Build & Evaluate own Policy


Creating a Custom Policy policy.py 

In [10]:
import random
import math
import numpy as np  # Although it's not directly used, I'll leave it as you might use it later

class MySmartPolicy:
    """
    Implements a smart policy for inventory management in a multi-echelon supply chain.

    This policy calculates production actions based on base stock levels,
    demand forecasting, and dynamic hedging, while considering capacity constraints
    and material availability. It also incorporates a mechanism to adjust hedging
    parameters based on shortage history.  It now includes node importance.
    """

    def __init__(self, config, hedging_parameter=0.1, shortage_history_length=20):
        """
        Initializes the MySmartPolicy object.
    
        Args:
            config: Configuration object containing supply chain parameters.
            hedging_parameter: Initial hedging parameter.
            shortage_history_length: Length of the shortage history to track.
        """
        self.config = config
        self.hedging_parameter = hedging_parameter
        self.shortage_history_length = shortage_history_length
    
        # Pre-compute and store mappings for efficiency
        self.component_to_node = self._map_component_to_node()
        self.node_to_components = self._map_node_to_components()  # New mapping
    
        # Initialize dynamic attributes
        self.num_nodes = self.config.num_components
        self.current_node = 0
        self.bottleneck_counts = {i: 0 for i in range(len(self.config.capacities))}
        self.component_shortage_counts = [0] * self.config.num_components
        self.component_hedging_parameters = [hedging_parameter] * self.config.num_components
        self.shortage_history = [([False] * self.config.num_components) for _ in range(shortage_history_length)]
        self.history_index = 0
    
        self.component_importance = [1.0] * self.config.num_components  # Default importance
        self.node_shortage_counts = {i: 0 for i in range(len(self.config.capacities))}  # Track node shortages
        self.node_importance = [0.0] * len(self.config.capacities) # Initialize node importance
        self.demand_variability_per_comp = (
            self.config.demand_variability_per_comp
            if hasattr(self.config, "demand_variability_per_comp")
            else [0.1] * self.config.num_components
        )
    
        self.node_shortage_history = {i: [] for i in range(len(self.config.capacities))} # Initialize HERE
        self.node_shortage_threshold = 3 # Example threshold for triggering increased base stock
    
        self.base_stock_levels = self._get_base_stock_levels()
    
    def _map_component_to_node(self):
        """
        Creates a mapping from component index to node index.
    
        This mapping is based on the capacity groups defined in the configuration.
        """
        mapping = {}
        for node_idx, comp_indices in enumerate(self.config.capacity_groups_indices):
            for comp_index in comp_indices:
                mapping[comp_index] = node_idx
        return mapping
    
    def _map_node_to_components(self):
        """
        Creates a mapping from node index to a list of component indices.
        """
        mapping = [[] for _ in range(len(self.config.capacities))]
        for comp_index, node_index in self.component_to_node.items():
            mapping[node_index].append(comp_index)
        return mapping
    
    def _calculate_node_importance(self):
        """
        Calculates the importance of each node based on how many other nodes
        depend on it (through its components).
        """
        node_dependencies = [0] * len(self.config.capacities)
        for node_index in range(len(self.config.capacities)):
            components_at_node = self.node_to_components[node_index]
            for comp_index in components_at_node:
                for successor_comp_index in self.config.successors[comp_index]:
                    successor_node_index = self.component_to_node[successor_comp_index]
                    if successor_node_index != node_index:  # Don't count self-dependency
                        node_dependencies[node_index] += 1
    
        # Normalize node importance, and add a small constant.
        max_dependencies = max(node_dependencies) if node_dependencies else 1
        self.node_importance = [(deps / max_dependencies) + 0.1 for deps in node_dependencies] # Ensure importance is > 0
    
    def _get_base_stock_levels(self):
        """
        Calculates the base stock levels for each component.
    
        Base stock level is calculated based on echelon lead times, average demand,
        and the hedging parameter.
        """
        echelon_lead_times = self._get_echelon_lead_time_per_product()
        levels = [0] * self.config.num_components
    
        # Calculate node importance.
        self._calculate_node_importance()
    
        for i in range(self.config.num_products):
            for j in range(self.config.num_components):
                demand = self.config.avg_demand_per_product[i]
                lead_time = echelon_lead_times[i][j]
                node_index = self.component_to_node[j]
                node_importance_factor = self.node_importance[node_index]
                # Make the effect of node importance more extreme:
                levels[j] += demand * (lead_time + 1) * (1 + (node_importance_factor - 0.1) * 3)  # Increased factor
    
            # Further increase base stock levels for nodes with high shortage counts
            if hasattr(self, 'node_shortage_counts'): # Check if the attribute exists
                for node_index, shortage_count in self.node_shortage_counts.items():
                    if shortage_count > 5:  # You can adjust this threshold
                        for j in self.node_to_components[node_index]:
                            levels[j] *= 1.2  # Increase by 20% (adjust as needed)
    
        levels = [0] * self.config.num_components
        self._calculate_node_importance()
    
        for i in range(self.config.num_products):
            for j in range(self.config.num_components):
                demand = self.config.avg_demand_per_product[i]
                lead_time = echelon_lead_times[i][j]
                node_index = self.component_to_node[j]
                node_importance_factor = self.node_importance[node_index]
                levels[j] += demand * (lead_time + 1) * (1 + (node_importance_factor - 0.1) * 2)
    
        # Dynamically adjust based on node shortage history
        for node_index in range(len(self.config.capacities)):
            if sum(self.node_shortage_history[node_index][-10:]) > self.node_shortage_threshold: # Check recent shortages
                for j in self.node_to_components[node_index]:
                    levels[j] *= 1.15 # Temporarily increase base stock
    
        return [math.ceil(level * (1 + self.hedging_parameter)) for level in levels]
    
    

    def _get_echelon_lead_time_per_product(self):
        """
        Calculates the echelon lead time for each product and component.

        This method determines the longest path (lead time) from each component
        to the final product.
        """
        num_products = self.config.num_products
        num_components = self.config.num_components
        L = [[0 for _ in range(num_components)] for _ in range(num_products)]

        for i in range(num_products):
            root = self.config.num_sub_components + i
            L[i][root] = self.config.lead_times[root]

            for pred in self.config.predecessors[root]:
                L[i][pred] = L[i][root] + self.config.lead_times[pred]

                for pred2 in self.config.predecessors[pred]:
                    L[i][pred2] = L[i][pred] + self.config.lead_times[pred2]

        # Add 1 to all lead times, as in the original code.
        for i in range(num_products):
            for j in range(num_components):
                if L[i][j] > 0:
                    L[i][j] += 1
        return L

    def _compute_echelon_inventory_position(self, state):
        """
        Calculates the echelon inventory position for each component.

        Echelon inventory is the total inventory at a given stage and all downstream stages.
        """
        total_inventory = [
            state.inventory_per_node[i] + sum(state.inventory_in_pipeline_per_node[i])
            for i in range(self.config.num_components)
        ]
        echelon_inventory = total_inventory[:]  # Copy the list

        for i in reversed(range(self.config.num_components)):
            for pred in self.config.predecessors[i]:
                echelon_inventory[pred] += echelon_inventory[i]
        return echelon_inventory

    def _forecast_demand(self, component_idx, state, window_size=5):
        """
        Forecasts demand for a component using a moving average.

        Args:
            component_idx: Index of the component to forecast demand for.
            state: Current state of the supply chain.
            window_size: Number of periods to use in the moving average.

        Returns:
            The forecasted demand for the component.
        """
        historical_demand = self._get_historical_demand(component_idx, state, window_size)
        if not historical_demand:
            return self.config.avg_demand_per_comp[component_idx]
        return sum(historical_demand) / len(historical_demand)

    def _get_historical_demand(self, component_idx, state, window_size):
        """
        Retrieves historical demand data for a component.

        This is a placeholder method.  In a real application, this would
        access a data source or the state object to get actual historical demand.

        Args:
            component_idx: The index of the component.
            state: The current state.
            window_size: The number of historical periods to retrieve.

        Returns:
            A list of historical demand values.  Returns an empty list if no data.
        """
        # Placeholder: Return a list of random demand values.  Replace this.
        return [random.randint(50, 150) for _ in range(window_size)]  # Mock data

    def _compute_demand_variability(self, component_idx):
        """
        Computes the demand variability for a component.

        This method calculates the variability of demand for a given component.
        It uses the pre-defined variability from the config if available.

        Args:
            component_idx: The index of the component.

        Returns:
            The demand variability for the component.  Higher = more variable.
        """
        return self.demand_variability_per_comp[component_idx]

    def _compute_dynamic_hedging(self, component_idx, echelon_inventory, state):
        """
        Dynamically adjusts the hedging parameter for a component.

        The hedging parameter is increased if the component has high demand
        variability or is experiencing a shortage.

        Args:
            component_idx: The index of the component.
            echelon_inventory: The current echelon inventory levels.
            state: The current state of the system.

        Returns:
            The adjusted hedging parameter for the component.
        """
        demand_variability = self._compute_demand_variability(component_idx)
        component_shortage = self.base_stock_levels[component_idx] - echelon_inventory[component_idx]

        importance_factor = self.component_importance[component_idx]
        dynamic_hedging = 1 + (self.hedging_parameter * demand_variability * 0.5) * importance_factor
        if component_shortage > 0:
            dynamic_hedging += self.hedging_parameter * (component_shortage / self.base_stock_levels[component_idx]) * 0.5 * importance_factor
        return min(dynamic_hedging, 1.5)

    def set_hedging_parameter(self, state):
        """
        Adjusts the global hedging parameter based on overall system state.

        This adjustment considers demand volatility and overall inventory levels.

        Args:
            state: The current state of the supply chain.
        """
        volatility_factor = self._compute_volatility_factor(state)
        self.hedging_parameter = min(0.4 + 0.2 * volatility_factor, 0.7)  # Reduced cap and sensitivity

    def _compute_volatility_factor(self, state):
        """
        Calculates a volatility factor based on inventory and demand.

        This factor is used to adjust the hedging parameter.

        Args:
            state: The current state of the supply chain.

        Returns:
            A value representing the overall volatility in the system.
        """
        volatility = sum(self.demand_variability_per_comp) / len(self.demand_variability_per_comp)
        low_inventory = sum(
            1 for i in range(self.config.num_components)
            if state.inventory_per_node[i] < self.base_stock_levels[i]
        )
        return volatility + low_inventory * 0.1  # Simple combination

    def _is_material_available(self, state, comp_index, produce_amount):
        """
        Checks if enough materials are available to produce a given amount of a component.

        Args:
            state: The current state of the supply chain.
            comp_index: The index of the component to produce.
            produce_amount: The amount of the component to produce.

        Returns:
            True if materials are sufficient, False otherwise.
        """
        for pred_index in self.config.predecessors[comp_index]:
            bom_rows = self.config.bill_of_materials[
                (self.config.bill_of_materials.iloc[:, 0] == comp_index)
                & (self.config.bill_of_materials.iloc[:, 1] == pred_index)
            ]
            if not bom_rows.empty:
                required_quantity = bom_rows.iloc[0, 2]
                if state.inventory_per_node[pred_index] < produce_amount * required_quantity:
                    print(f"  Not enough predecessor {pred_index} for component {comp_index}")
                    return False
        return True

    def _is_production_feasible(self, state, action, node_index, comp_index, produce_amount, current_production_per_node):
        """
        Checks if producing a given amount of a component is feasible at a node.

        Considers both material availability and capacity constraints.

        Args:
            state: The current state of the supply chain.
            action: The planned production actions.  (Not used here, but kept for consistency).
            node_index: The index of the node where production is being considered.
            comp_index: The index of the component to produce.
            produce_amount: The amount of the component to produce.
            current_production_per_node: A dictionary tracking current production at each node.

        Returns:
            True if production is feasible, False otherwise.
        """
        # Check material availability
        if not self._is_material_available(state, comp_index, produce_amount):
            return False

        # Check capacity
        capacity = self.config.capacities[node_index]
        planned_production_at_node = current_production_per_node.get(node_index, 0) + produce_amount
        if planned_production_at_node > capacity * 1.05:  # 5% slack
            print(f"  Exceeds capacity at node {node_index} for component {comp_index}")
            return False
        return True

    def _is_feasible(self, state, action, node):
        """
        Checks if a given action is feasible at a node, considering predecessors and capacity.

        Args:
            state: The current state of the supply chain.
            action: The production action to check.
            node: The node index.

        Returns:
            True if the action is feasible, False otherwise.
        """
        for pred in self.config.predecessors[node]:
            total_required = sum(action[succ] for succ in self.config.successors[pred])
            if state.inventory_per_node[pred] < total_required * 0.95:  # 5% slack
                print(
                    f"Action infeasible due to predecessor {pred}: required {total_required}, "
                    f"available {state.inventory_per_node[pred]}"
                )
                return False

        cap_group = self.config.capacity_groups[node]
        cap_indices = self.config.capacity_groups_indices[cap_group]
        total_action = sum(action[j] for j in cap_indices)
        if total_action > self.config.capacities[cap_group] * 1.05:  # 5% slack
            print(
                f"Action infeasible due to capacity group {cap_group}: total action {total_action}, "
                f"capacity {self.config.capacities[cap_group]}"
            )
            return False
        return True

    def set_action(self, state):
        """
        Calculates the production actions for all components.
        """
        echelon_inventory = self._compute_echelon_inventory_position(state)
        base_stock_levels = self._get_base_stock_levels() # Get updated base stock levels
        adaptive_hedging_parameter = self.hedging_parameter

        # Update shortage history
        current_shortages = [False] * self.config.num_components
        num_sub_components_bom = self.config.bill_of_materials.shape[0]

        # Initial potential actions, using the component hedging parameters
        potential_action = [
            math.ceil(self.config.avg_demand_per_comp[i] * (1 + self.component_hedging_parameters[i]))
            for i in range(self.config.num_components)
        ]

        # Determine subcomponent demand and shortages.
        total_subcomponent_demand = {}
        for predecessor_index in range(num_sub_components_bom):
            total_needed = 0
            predecessor_name = self.config.bill_of_materials.index[predecessor_index]
            for component_index in range(self.config.num_components):
                component_name = f"C{component_index + 1}"
                if (
                    component_name in self.config.bill_of_materials.columns
                    and self.config.bill_of_materials.loc[predecessor_name, component_name] == 1
                    and potential_action[component_index] > 0
                ):
                    total_needed += potential_action[component_index]

            if total_needed > 0 and predecessor_index < self.config.num_sub_components:
                total_subcomponent_demand[predecessor_index] = total_needed
                available = state.inventory_per_node[predecessor_index]
                if total_needed > available:
                    current_shortages[predecessor_index] = True

        # Update shortage history.
        self.shortage_history[self.history_index] = current_shortages
        self.history_index = (self.history_index + 1) % self.shortage_history_length

        # Adjust component hedging parameters based on recent shortage history
        max_hedging_parameter = 2.0  # Introduce a maximum value
        shortage_threshold_increase = 0.2
        shortage_threshold_decrease = 0.05
        shortage_increase_factor = 0.01  # Smaller increase
        shortage_decrease_factor = 0.002 # Smaller decrease
        shortage_count_decay = 0.95     # Decay factor for shortage counts

        for i in range(self.config.num_components):
            shortage_frequency = sum(history[i] for history in self.shortage_history) / self.shortage_history_length
            node_index = self.component_to_node[i]

            # Adjust hedging parameter based on shortage frequency and inventory
            if shortage_frequency > shortage_threshold_increase:
                inventory_level = state.inventory_per_node[i]
                if inventory_level < base_stock_levels[i] * 1.1: # Don't increase if inventory is already high
                    self.component_hedging_parameters[i] += shortage_increase_factor * (shortage_frequency - shortage_threshold_increase)
                    self.node_shortage_counts[node_index] += 1
            elif shortage_frequency < shortage_threshold_decrease and self.component_hedging_parameters[i] > 0.01:
                self.component_hedging_parameters[i] -= shortage_decrease_factor
                self.node_shortage_counts[node_index] *= shortage_count_decay # Apply decay
            else:
                self.node_shortage_counts[node_index] *= shortage_count_decay # Apply decay even if no change

            self.component_hedging_parameters[i] = min(self.component_hedging_parameters[i], max_hedging_parameter)
            self.component_hedging_parameters[i] = max(self.component_hedging_parameters[i], 0.0) # Ensure it doesn't go negative

        # Re-calculate potential action with adjusted hedging
        potential_action = [
            math.ceil(self.config.avg_demand_per_comp[i] * (1 + self.component_hedging_parameters[i]))
            for i in range(self.config.num_components)
        ]

        # Prioritize component order based on bottleneck frequency
        capacity_group_priority = sorted(
            self.bottleneck_counts.items(), key=lambda item: item[1], reverse=True
        )
        component_order = []
        processed_components = set()
        for group_index, _ in capacity_group_priority:
            for comp_index in self.config.capacity_groups_indices[group_index]:
                if comp_index not in processed_components:
                    component_order.append(comp_index)
                    processed_components.add(comp_index)
        component_order.extend(
            i for i in range(self.config.num_components) if i not in processed_components
        )  # Add remaining

        # Prioritize initial potential actions based on echelon inventory shortfall
        product_echelon_shortfalls = {
            self.config.num_sub_components + i: base_stock_levels[self.config.num_sub_components + i]
            - echelon_inventory[self.config.num_sub_components + i]
            for i in range(self.config.num_products)
        }

        # Calculate weighted potential actions
        weighted_potential_action = [0] * self.config.num_components
        for i in component_order:
            base_demand = math.ceil(self.config.avg_demand_per_comp[i] * (1 + adaptive_hedging_parameter))
            weight = 1.0
            for product_index in range(self.config.num_products):
                final_product_index = self.config.num_sub_components + product_index
                component_name = f"C{i + 1}"
                product_name = self.config.bill_of_materials.index[final_product_index]
                if (
                    component_name in self.config.bill_of_materials.columns
                    and self.config.bill_of_materials.loc[product_name, component_name] == 1
                ):
                    weight += (
                        product_echelon_shortfalls.get(final_product_index, 0)
                        / (base_stock_levels[final_product_index] + 1e-9) # Avoid division by zero
                    ) * 0.05  # Reduced weight
            weighted_potential_action[i] = math.ceil(base_demand * weight)
            potential_action[i] = weighted_potential_action[i]  # Use weighted action

        # Adjust potential actions to handle subcomponent shortages
        total_subcomponent_demand = {}
        num_sub_components_bom = self.config.bill_of_materials.shape[0]
        for predecessor_index in range(num_sub_components_bom):
            total_needed = 0
            predecessor_name = self.config.bill_of_materials.index[predecessor_index]
            for component_index in range(self.config.num_components):
                component_name = f"C{component_index + 1}"
                if (
                    component_name in self.config.bill_of_materials.columns
                    and self.config.bill_of_materials.loc[predecessor_name, component_name] == 1
                    and potential_action[component_index] > 0
                ):
                    total_needed += potential_action[component_index]
            if total_needed > 0 and predecessor_index < self.config.num_sub_components:
                total_subcomponent_demand[predecessor_index] = total_needed

        print("\nTotal Sub-component Demand:")  # Debugging output
        for sub_comp_index, total_needed in total_subcomponent_demand.items():
            available = state.inventory_per_node[sub_comp_index]
            print(f"Component {sub_comp_index}: Needed = {total_needed}, Available = {available}")
            if total_needed > available:
                shortage = total_needed - available
                print(f"  Shortage of {shortage} for component {sub_comp_index}!")
                predecessor_name = self.config.bill_of_materials.index[sub_comp_index]
                components_to_reduce = []
                product_priorities_shortage = {}  # Priority based on shortage

                for component_index in range(self.config.num_components):
                    component_name = f"C{component_index + 1}"
                    if (
                        component_name in self.config.bill_of_materials.columns
                        and self.config.bill_of_materials.loc[predecessor_name, component_name] == 1
                        and potential_action[component_index] > 0
                    ):
                        components_to_reduce.append(component_index)
                        if component_index >= self.config.num_sub_components:  # It's a final product
                            product_index = component_index - self.config.num_sub_components
                            backlog = max(0, -state.inventory_per_node[component_index])
                            echelon_shortfall = (
                                base_stock_levels[component_index] - echelon_inventory[component_index]
                            )
                            # Priority based on echelon inventory shortfall, backlog, and penalty
                            priority = (
                                self.config.p[product_index] * (backlog + 1)
                            ) + (
                                echelon_shortfall * 0.3 # Adjust weights as needed
                            )
                            product_priorities_shortage[component_index] = priority
                        else:
                            product_priorities_shortage[component_index] = 0

                sorted_components_to_reduce = sorted(
                    components_to_reduce,
                    key=lambda comp_index: product_priorities_shortage.get(comp_index, 0),
                    reverse=False,
                )

                reduction_needed = shortage
                for comp_index_to_reduce in sorted_components_to_reduce:
                    reduce_by = min(potential_action[comp_index_to_reduce], reduction_needed)
                    potential_action[comp_index_to_reduce] = max(
                        0, potential_action[comp_index_to_reduce] - reduce_by
                    )
                    reduction_needed -= reduce_by
                    print(
                        f"  Reduced production of component {comp_index_to_reduce} to "
                        f"{potential_action[comp_index_to_reduce]}"
                    )  # Debug
                    if reduction_needed <= 0:
                        break

        # Capacity constraints check and bottleneck count update
        action_per_node = list(potential_action)  # Create a copy to avoid modifying original
        for i in range(self.config.num_components):
            capacity_group = self.config.capacity_groups[i]
            capacity = self.config.capacities[capacity_group]
            group_production = sum(
                [action_per_node[j] for j in self.config.capacity_groups_indices[capacity_group]]
            )
            if group_production > capacity:
                reduction_needed = group_production - capacity
                action_per_node[i] = max(0, action_per_node[i] - reduction_needed)
                self.bottleneck_counts[capacity_group] += 1

        print(
            f"\nSmart Action (All Nodes): {action_per_node}, Hedging: {self.component_hedging_parameters}"
        )  # Debug
        return action_per_node


## Instantiate the system (only once, before all policies)


In [11]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.evaluate import Evaluate

config = SupplyChainConfig()
dynamics = Dynamics(config)
evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)


## Instantiate your policy and the baseline policies

In [12]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.evaluate import Evaluate
from src.policy import MeanDemandPolicy, StaticBaseStockPolicyShortfall, StaticBaseStockPolicyRandom

config = SupplyChainConfig()
dynamics = Dynamics(config)
evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)

# Instantiate policies
mean_policy = MeanDemandPolicy(config, 0.1)
shortfall_policy = StaticBaseStockPolicyShortfall(config, 0.1)
random_policy = StaticBaseStockPolicyRandom(config, 0.1)
my_policy = MySmartPolicy(config, 0.1)

## Compare the policies

In [13]:
results = evaluator.compare_policies([
    mean_policy,
    shortfall_policy,
    random_policy,
    my_policy
])

policy_names = ["MeanDemand", "Shortfall", "Random", "MySmartPolicy"]
print("\n📊 Average Total Cost per Policy:")
for name, cost in zip(policy_names, results):
    print(f"{name}: €{cost:.2f}")


Total Sub-component Demand:
Component 0: Needed = 18, Available = 18
Component 1: Needed = 18, Available = 18
Component 2: Needed = 17, Available = 18
Component 3: Needed = 17, Available = 18
Component 4: Needed = 17, Available = 18
Component 5: Needed = 8, Available = 9
Component 6: Needed = 6, Available = 6
Component 7: Needed = 4, Available = 4
Component 8: Needed = 14, Available = 15
Component 9: Needed = 4, Available = 4
Component 10: Needed = 18, Available = 18

Smart Action (All Nodes): [17, 17, 17, 17, 17, 8, 6, 4, 14, 4, 17, 8, 6, 4], Hedging: [0.098, 0.098, 0.098, 0.098, 0.098, 0.098, 0.098, 0.098, 0.098, 0.098, 0.098, 0.098, 0.098, 0.098]

Total Sub-component Demand:
Component 0: Needed = 18, Available = 15
  Shortage of 3 for component 0!
  Reduced production of component 5 to 5
Component 1: Needed = 18, Available = 15
  Shortage of 3 for component 1!
  Reduced production of component 8 to 11
Component 2: Needed = 17, Available = 16
  Shortage of 1 for component 2!
  Reduc

## 🧠 Custom Policy Development: `MySmartPolicy`

### ✅ What I Did

- **Understood the problem context**: This project aims to minimize total supply chain costs (inventory + backlog) while handling uncertain demand, lead times, and capacity constraints — in a way that's interpretable and implementable for ASML.
- **Explored and evaluated baseline policies**:
  - `MeanDemandPolicy`: Uses average demand per component
  - `StaticBaseStockPolicyShortfall`: Adds safety buffers
  - `StaticBaseStockPolicyRandom`: Acts as a benchmark
- **Created a custom policy (`MySmartPolicy`)**:
  - Started with a very simple policy that only ordered 1 unit of a single component (component 0).
  - Gradually debugged feasibility issues by ensuring the `action` dictionary includes all components (even those with 0 order).
  - Confirmed that the simulation runs successfully and yields a (very high) total cost — which is expected, since the policy barely fulfills demand.

### 🧪 What This Showed

- The simulator is highly sensitive to feasibility constraints — it expects a full action dict and enforces capacity limits strictly.
- Starting from a minimal working policy was key to eliminating persistent errors (`KeyError`, `Action is not feasible`, etc.).
- The current version of `MySmartPolicy` is a placeholder — it's valid, but inefficient.

### 🔜 What’s Next

1. **Gradually expand the policy**:
   - Start placing orders from more than just Node 0 (e.g., Nodes 0–2).
   - Increase quantities slightly, but stay within capacities.
2. **Incorporate smarter logic**:
   - Use average demand × lead time to calculate base stock levels.
   - Add simple backlog awareness (if demand is unmet, order more next time).
   - Optionally: forecast demand using moving averages.
3. **Track performance**:
   - Monitor how your policy compares to the baselines in total cost.
   - Test stability across different simulation runs (repeatability).
4. **Make it interpretable**:
   - Ensure your logic can be explained to someone at ASML without deep ML or math background.
   - Consider simple rules that generalize well to other network topologies.

---

