The goal of the project is to **create a model that predicts which action to take based on the network, predict costs, learn good base stock levels, learn a good strategy, ...**


1. Minimize total cost over the simulation:
- Inventory holding costs
- Backlog penalty costs (when you miss customer demand)

2. Handle uncertainty in demand (it's random but follows patterns).

3. Respect capacity & lead time constraints (you can’t magically produce anything instantly or in infinite quantity).

4. Be interpretable & realistic for ASML's real-world context

**GOAL**: Create your own policy that makes smarter production decisions and leads to lower total costs (inventory + backlog penalties). Then, compare it to the existing ones.

# Getting to know the dataset

In [75]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.policy import StaticBaseStockPolicyRandom
from src.evaluate import Evaluate

In [76]:
from src.supply_chain_config import SupplyChainConfig

config = SupplyChainConfig()

print("✅ Supply chain data loaded!\n")

print("🔧 All config attributes:")
for attr in dir(config):
    if not attr.startswith("_"):
        value = getattr(config, attr)
        if isinstance(value, (dict, list)):
            print(f"{attr}: {len(value)} items")
        else:
            print(f"{attr}: {value}")


✅ Supply chain data loaded!

🔧 All config attributes:
aggregate_demand_ar_params: 8 items
aggregate_demand_error_sigma: 2.0
aggregate_demand_mean: 15
aggregate_demand_num_lags: 8
avg_demand_per_comp: 14 items
avg_demand_per_product: 3 items
bill_of_materials:    Unnamed: 0  C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12  C13  C14
0          C1   0   0   0   0   0   1   1   1   0    0    0    0    0    0
1          C2   0   0   0   0   0   0   0   0   1    1    0    0    0    0
2          C3   0   0   0   0   0   0   0   0   0    0    1    0    0    0
3          C4   0   0   0   0   0   0   0   0   0    0    1    0    0    0
4          C5   0   0   0   0   0   0   0   0   0    0    1    0    0    0
5          C6   0   0   0   0   0   0   0   0   0    0    0    1    0    0
6          C7   0   0   0   0   0   0   0   0   0    0    0    0    1    0
7          C8   0   0   0   0   0   0   0   0   0    0    0    0    0    1
8          C9   0   0   0   0   0   0   0   0   0    0    0    1 

### 🔍 What this does:
This cell compares three predefined policies:
- **MeanDemandPolicy**: reacts to average demand
- **Shortfall policy**: adds buffer to cover likely stockouts
- **Random policy**: for benchmarking

The simulation runs 100 times, each for 60 periods, and computes the **average total cost** for each policy. This helps us evaluate baseline performance and identify which policy structure might be most promising to build upon.


In [77]:
from src.policy import MeanDemandPolicy, StaticBaseStockPolicyShortfall, StaticBaseStockPolicyRandom
from src.dynamics import Dynamics

dynamics = Dynamics(config)

policy_mean_demand = MeanDemandPolicy(config, 0.1)
policy_shortfall = StaticBaseStockPolicyShortfall(config, 0.25)
policy_random = StaticBaseStockPolicyRandom(config, 0.25)

evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)

results = evaluator.compare_policies([policy_mean_demand, policy_shortfall, policy_random])
print(results)


[726.4667, 445.03144999999995, 531.8817166666666]


# Build & Evaluate own Policy


Creating a Custom Policy policy.py 

In [120]:
class MySmartPolicy:
    def __init__(self, config):
        self.config = config
        self.node_to_test = 0  # Only order from Node 0
        self.component_to_node = self._map_component_to_node()
        self.base_stock_level = 1

    def _map_component_to_node(self):
        mapping = {}
        for node_idx, comp_indices in enumerate(self.config.capacity_groups_indices):
            for comp_index in comp_indices:
                mapping[comp_index] = node_idx
        return mapping

    def set_action(self, state):
        action = {}

        for comp_index in self.component_to_node:
            action[comp_index]=0

        for comp_index, node in self.component_to_node.items():
            if node == self.node_to_test:
                action[comp_index] = self.base_stock_level  # Try 1 unit

        print(f"\n🛠️ Trying only node {self.node_to_test}: action = {action}")
        return action


## Instantiate the system (only once, before all policies)


In [121]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.evaluate import Evaluate

config = SupplyChainConfig()
dynamics = Dynamics(config)
evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)


## Instantiate your policy and the baseline policies

In [122]:
from src.supply_chain_config import SupplyChainConfig
from src.dynamics import Dynamics
from src.evaluate import Evaluate
from src.policy import MeanDemandPolicy, StaticBaseStockPolicyShortfall, StaticBaseStockPolicyRandom

config = SupplyChainConfig()
dynamics = Dynamics(config)
evaluator = Evaluate(config, dynamics, num_trajectories=100, periods_per_trajectory=60)

# Instantiate policies
mean_policy = MeanDemandPolicy(config, 0.1)
shortfall_policy = StaticBaseStockPolicyShortfall(config, 0.25)
random_policy = StaticBaseStockPolicyRandom(config, 0.25)
my_policy = MySmartPolicy(config)

## Compare the policies

In [123]:
results = evaluator.compare_policies([
    mean_policy,
    shortfall_policy,
    random_policy,
    my_policy
])

policy_names = ["MeanDemand", "Shortfall", "Random", "MySmartPolicy"]
print("\n📊 Average Total Cost per Policy:")
for name, cost in zip(policy_names, results):
    print(f"{name}: €{cost:.2f}")


🛠️ Trying only node 0: action = {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0}

🛠️ Trying only node 0: action = {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0}

🛠️ Trying only node 0: action = {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0}

🛠️ Trying only node 0: action = {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0}

🛠️ Trying only node 0: action = {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0}

🛠️ Trying only node 0: action = {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0}

🛠️ Trying only node 0: action = {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0}

🛠️ Trying only node 0: action = {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0}

🛠️ Trying only node 0: 

## 🧠 Custom Policy Development: `MySmartPolicy`

### ✅ What I Did

- **Understood the problem context**: This project aims to minimize total supply chain costs (inventory + backlog) while handling uncertain demand, lead times, and capacity constraints — in a way that's interpretable and implementable for ASML.
- **Explored and evaluated baseline policies**:
  - `MeanDemandPolicy`: Uses average demand per component
  - `StaticBaseStockPolicyShortfall`: Adds safety buffers
  - `StaticBaseStockPolicyRandom`: Acts as a benchmark
- **Created a custom policy (`MySmartPolicy`)**:
  - Started with a very simple policy that only ordered 1 unit of a single component (component 0).
  - Gradually debugged feasibility issues by ensuring the `action` dictionary includes all components (even those with 0 order).
  - Confirmed that the simulation runs successfully and yields a (very high) total cost — which is expected, since the policy barely fulfills demand.

### 🧪 What This Showed

- The simulator is highly sensitive to feasibility constraints — it expects a full action dict and enforces capacity limits strictly.
- Starting from a minimal working policy was key to eliminating persistent errors (`KeyError`, `Action is not feasible`, etc.).
- The current version of `MySmartPolicy` is a placeholder — it's valid, but inefficient.

### 🔜 What’s Next

1. **Gradually expand the policy**:
   - Start placing orders from more than just Node 0 (e.g., Nodes 0–2).
   - Increase quantities slightly, but stay within capacities.
2. **Incorporate smarter logic**:
   - Use average demand × lead time to calculate base stock levels.
   - Add simple backlog awareness (if demand is unmet, order more next time).
   - Optionally: forecast demand using moving averages.
3. **Track performance**:
   - Monitor how your policy compares to the baselines in total cost.
   - Test stability across different simulation runs (repeatability).
4. **Make it interpretable**:
   - Ensure your logic can be explained to someone at ASML without deep ML or math background.
   - Consider simple rules that generalize well to other network topologies.

---

