## Data Loading: Forecast Output

In [1]:
import pandas as pd
import numpy as np
forecast_df=pd.read_csv("Forecast.csv")

In [2]:
forecast_df

Unnamed: 0,part_id,part_type,location_id,region,forecast_date,forecasted_demand,forecast_lower_bound,forecast_upper_bound,unit_cost,revenue,...,z_value,demand_variance_during_lt,safety_stock_units,expected_demand_during_lt,reorder_point_units,cv_demand,intermittent_adjustment,expected_shortage_per_cycle,fill_rate,days_of_supply
0,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-01-29,2.000000,1.600000,2.400000,2.94,5.88,...,2.053749,4.792541,4.496041,7.100000,11.596041,0.580950,1.290475,0.105996,0.985071,2.248021
1,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-01-30,2.100000,1.680000,2.520000,2.94,8.82,...,2.053749,4.779041,4.489704,7.434000,11.923704,0.553286,1.276643,0.105847,0.985762,2.137954
2,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-01-31,2.190000,1.752000,2.628000,2.94,8.82,...,2.053749,4.495539,4.354499,7.292700,11.647199,0.530548,1.265274,0.102659,0.985923,1.988356
3,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-02-01,2.171000,1.736800,2.605200,2.94,5.88,...,2.053749,3.874533,4.042565,6.230770,10.273335,0.535191,1.267596,0.095305,0.984704,1.862075
4,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-02-02,1.953900,1.563120,2.344680,2.94,0.00,...,2.053749,3.685532,3.942733,5.334147,9.276880,0.594657,1.297328,0.092952,0.982574,2.017879
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
439867,LED_PANEL_2_149,LED_PANEL_TYPE_2,WI_2,WI,2016-04-20,0.240835,0.192668,0.289002,0.97,0.00,...,1.281552,1.661163,1.651742,1.724379,3.376121,2.000000,2.000000,0.226193,0.868826,6.858397
439868,LED_PANEL_2_149,LED_PANEL_TYPE_2,WI_2,WI,2016-04-21,0.216752,0.173401,0.260102,0.97,0.00,...,1.281552,1.345542,1.486568,1.551941,3.038509,2.000000,2.000000,0.203574,0.868826,6.858397
439869,LED_PANEL_2_149,LED_PANEL_TYPE_2,WI_2,WI,2016-04-22,0.195076,0.156061,0.234092,0.97,0.00,...,1.281552,1.089889,1.337911,1.396747,2.734658,2.000000,2.000000,0.183216,0.868826,6.858397
439870,LED_PANEL_2_149,LED_PANEL_TYPE_2,WI_2,WI,2016-04-23,0.175569,0.140455,0.210682,0.97,0.00,...,1.281552,1.014738,1.290961,1.444931,2.735892,2.000000,2.000000,0.176787,0.877650,7.353024


##  Geocoding Distributor Locations

This script fetches geographic coordinates for internal location IDs using OpenStreetMap's Nominatim geocoder via `geopy`.

###  Purpose
Enrich internal facility codes (e.g., `TX_1`) with city names and their latitude/longitude for:
- Network design
- Route optimization
- Simulation models (e.g., AnyLogic)

###  Process
1. Map internal IDs to city names (`location_map`)
2. Geocode each city using `geopy.Nominatim`
3. Store results with a 1-second delay to respect API limits
4. Output as `coordinates_df` with:
   - `location_id`
   - `Assumed City`
   - `Latitude`, `Longitude`




In [3]:
import pandas as pd
from geopy.geocoders import Nominatim
import time

# --- Create the mapping from your internal ID to a real city ---
location_map = {
    'TX_1': 'Dallas, TX',
    'TX_2': 'Houston, TX',
    'TX_3': 'Austin, TX',
    'CA_1': 'Los Angeles, CA',
    'CA_2': 'San Diego, CA',
    'CA_3': 'San Jose, CA',
    'CA_4': 'Sacramento, CA',
    'WI_1': 'Milwaukee, WI',
    'WI_2': 'Madison, WI',
    'WI_3': 'Green Bay, WI'
}


location_ids = ['TX_1', 'CA_4', 'WI_3', 'WI_2', 'CA_3', 'TX_2', 'TX_3', 'WI_1', 'CA_1', 'CA_2']


geolocator = Nominatim(user_agent="anylogistix_geocoder_app")

results = []
print("Fetching coordinates for each location...")

for loc_id in location_ids:
    city_name = location_map.get(loc_id)
    if city_name:
        try:
            # Get the location data
            location_data = geolocator.geocode(city_name)
            if location_data:
                results.append({
                    'location_id': loc_id,
                    'Assumed City': city_name,
                    'Latitude': round(location_data.latitude, 4),
                    'Longitude': round(location_data.longitude, 4)
                })
                print(f"  Successfully found coordinates for {loc_id} ({city_name})")
            else:
                print(f"  Could not find coordinates for {loc_id} ({city_name})")
            
           
            time.sleep(1) 

        except Exception as e:
            print(f"An error occurred while processing {loc_id}: {e}")
            results.append({
                'location_id': loc_id,
                'Assumed City': city_name,
                'Latitude': 0.0,
                'Longitude': 0.0
            })

# Convert the results to a pandas DataFrame for easy viewing
coordinates_df = pd.DataFrame(results)
coordinates_df

Fetching coordinates for each location...
  Successfully found coordinates for TX_1 (Dallas, TX)
  Successfully found coordinates for CA_4 (Sacramento, CA)
  Successfully found coordinates for WI_3 (Green Bay, WI)
  Successfully found coordinates for WI_2 (Madison, WI)
  Successfully found coordinates for CA_3 (San Jose, CA)
  Successfully found coordinates for TX_2 (Houston, TX)
  Successfully found coordinates for TX_3 (Austin, TX)
  Successfully found coordinates for WI_1 (Milwaukee, WI)
  Successfully found coordinates for CA_1 (Los Angeles, CA)
  Successfully found coordinates for CA_2 (San Diego, CA)


Unnamed: 0,location_id,Assumed City,Latitude,Longitude
0,TX_1,"Dallas, TX",32.7763,-96.7969
1,CA_4,"Sacramento, CA",38.5811,-121.4939
2,WI_3,"Green Bay, WI",44.5126,-88.0126
3,WI_2,"Madison, WI",43.0748,-89.3838
4,CA_3,"San Jose, CA",37.3362,-121.8906
5,TX_2,"Houston, TX",29.7589,-95.3677
6,TX_3,"Austin, TX",30.2711,-97.7437
7,WI_1,"Milwaukee, WI",43.0386,-87.9091
8,CA_1,"Los Angeles, CA",34.0537,-118.2428
9,CA_2,"San Diego, CA",32.7174,-117.1628


In [4]:
forecast_df.rename(columns={
    'location_id': 'Location'
}, inplace=True)

### Define location-to-coordinates mapping

In [5]:
# Define location-to-coordinates mapping
location_coordinates = pd.DataFrame({
    'Location': ['TX_1', 'CA_4', 'WI_3', 'WI_2', 'CA_3', 'TX_2', 'TX_3', 'WI_1', 'CA_1', 'CA_2'],
    'City': ['Dallas, TX', 'Sacramento, CA', 'Green Bay, WI', 'Madison, WI', 'San Jose, CA',
             'Houston, TX', 'Austin, TX', 'Milwaukee, WI', 'Los Angeles, CA', 'San Diego, CA'],
    'Latitude': [32.7763, 38.5811, 44.5126, 43.0748, 37.3362, 29.7589, 30.2711, 43.0386, 34.0537, 32.7174],
    'Longitude': [-96.7969, -121.4939, -88.0126, -89.3838, -121.8906, -95.3677,
                  -97.7437, -87.9091, -118.2428, -117.1628]
})


### Merge `forecast_df` with `location_coordinates` for further analyses

In [6]:
forecast_df=forecast_df.merge(location_coordinates,on="Location",how="inner")
forecast_df.to_csv("FORECAST_BI.csv",index=False)
forecast_df

Unnamed: 0,part_id,part_type,Location,region,forecast_date,forecasted_demand,forecast_lower_bound,forecast_upper_bound,unit_cost,revenue,...,expected_demand_during_lt,reorder_point_units,cv_demand,intermittent_adjustment,expected_shortage_per_cycle,fill_rate,days_of_supply,City,Latitude,Longitude
0,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-01-29,2.000000,1.600000,2.400000,2.94,5.88,...,7.100000,11.596041,0.580950,1.290475,0.105996,0.985071,2.248021,"Austin, TX",30.2711,-97.7437
1,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-01-30,2.100000,1.680000,2.520000,2.94,8.82,...,7.434000,11.923704,0.553286,1.276643,0.105847,0.985762,2.137954,"Austin, TX",30.2711,-97.7437
2,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-01-31,2.190000,1.752000,2.628000,2.94,8.82,...,7.292700,11.647199,0.530548,1.265274,0.102659,0.985923,1.988356,"Austin, TX",30.2711,-97.7437
3,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-02-01,2.171000,1.736800,2.605200,2.94,5.88,...,6.230770,10.273335,0.535191,1.267596,0.095305,0.984704,1.862075,"Austin, TX",30.2711,-97.7437
4,BRAKE_PAD_1_005,BRAKE_PAD_TYPE_1,TX_3,TX,2011-02-02,1.953900,1.563120,2.344680,2.94,0.00,...,5.334147,9.276880,0.594657,1.297328,0.092952,0.982574,2.017879,"Austin, TX",30.2711,-97.7437
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
439867,LED_PANEL_2_149,LED_PANEL_TYPE_2,WI_2,WI,2016-04-20,0.240835,0.192668,0.289002,0.97,0.00,...,1.724379,3.376121,2.000000,2.000000,0.226193,0.868826,6.858397,"Madison, WI",43.0748,-89.3838
439868,LED_PANEL_2_149,LED_PANEL_TYPE_2,WI_2,WI,2016-04-21,0.216752,0.173401,0.260102,0.97,0.00,...,1.551941,3.038509,2.000000,2.000000,0.203574,0.868826,6.858397,"Madison, WI",43.0748,-89.3838
439869,LED_PANEL_2_149,LED_PANEL_TYPE_2,WI_2,WI,2016-04-22,0.195076,0.156061,0.234092,0.97,0.00,...,1.396747,2.734658,2.000000,2.000000,0.183216,0.868826,6.858397,"Madison, WI",43.0748,-89.3838
439870,LED_PANEL_2_149,LED_PANEL_TYPE_2,WI_2,WI,2016-04-23,0.175569,0.140455,0.210682,0.97,0.00,...,1.444931,2.735892,2.000000,2.000000,0.176787,0.877650,7.353024,"Madison, WI",43.0748,-89.3838


In [7]:
forecast_df.columns

Index(['part_id', 'part_type', 'Location', 'region', 'forecast_date',
       'forecasted_demand', 'forecast_lower_bound', 'forecast_upper_bound',
       'unit_cost', 'revenue', 'method_used', 'demand_pattern',
       'replenishment_strategy', 'ABC_Class', 'XYZ_Class', 'volume_class',
       'volatility_class', 'mae', 'rmse', 'mase', 'rmsse', 'bias',
       'avg_cost_impact', 'daily_demand_units', 'base_lead_time',
       'adjusted_lead_time', 'lead_time_mean', 'lead_time_std_dev',
       'lead_time_min', 'lead_time_max', 'lead_time_p90', 'lead_time_p95',
       'forecast_date_str', 'part_prefix', 'customer_id', 'service_level',
       'z_value', 'demand_variance_during_lt', 'safety_stock_units',
       'expected_demand_during_lt', 'reorder_point_units', 'cv_demand',
       'intermittent_adjustment', 'expected_shortage_per_cycle', 'fill_rate',
       'days_of_supply', 'City', 'Latitude', 'Longitude'],
      dtype='object')

##  Forecast Feature Preprocessing Pipeline

This script prepares a clean and standardized feature matrix for machine learning or clustering using scikit-learn.

###  Purpose
To preprocess a supply chain forecast dataset by:
- Scaling numerical features
- Encoding categorical features

###  Features Used

**Numerical**
- `forecasted_demand`, `unit_cost`, `revenue`
- `lead_time_mean`, `lead_time_std_dev`, `cv_demand`,
- `safety_stock_units`, `reorder_point_units`

**Categorical**
- `demand_pattern`, `ABC_Class`, `XYZ_Class`, `volatility_class`

###  Pipeline Steps
1. **Standardize** numerical values using `StandardScaler`
2. **Encode** categorical values with `OneHotEncoder` (ignores unseen labels)
3. **Transform** the combined features using `ColumnTransformer`
4. **Output**: `X_processed` – ready for ML models like KMeans, PCA, or classifiers


In [8]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

# Copy the base forecast DataFrame
df = forecast_df.copy()

# Define numerical and categorical features
numerical_features = [
    'forecasted_demand', 'unit_cost', 'revenue', 
    'lead_time_mean', 'lead_time_std_dev', 'cv_demand', 
     'safety_stock_units', 'reorder_point_units']

categorical_features = [
    'demand_pattern', 'ABC_Class', 'XYZ_Class', 'volatility_class'
]

# Define column transformer for preprocessing
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ]
)

# Prepare the feature matrix
X = df[numerical_features + categorical_features]
X_processed = preprocessor.fit_transform(X)


##  Demand Archetype Clustering with MiniBatchKMeans

This script segments 439,000 demand profiles into manageable clusters for downstream analysis or simulation.

###  Objective
Group SKUs into **2000 demand archetypes** using `MiniBatchKMeans` for scalable clustering.

### Key Steps
1. Set `n_clusters = 2000` to define desired segment count
2. Use `MiniBatchKMeans` for memory-efficient clustering
   - `batch_size=2048` for faster convergence
   - `n_init='auto'` to auto-select initialization strategy
3. Assign resulting `cluster_id` to each row in `df`




In [9]:
from sklearn.cluster import MiniBatchKMeans

# Set your target number of clusters
n_clusters = 2000

# Initialize and run the clustering algorithm
kmeans = MiniBatchKMeans(
    n_clusters=n_clusters,
    random_state=42,
    batch_size=2048,  
    n_init='auto'
)

# Assign a cluster label to each of the 439k rows
df['cluster_id'] = kmeans.fit_predict(X_processed)

print(f"Successfully clustered 439k rows into {n_clusters} demand archetypes.")



Successfully clustered 439k rows into 2000 demand archetypes.


## Clustered Demand Profile Aggregation for AnyLogistix

This script generates a **pre-simualtion dataset** by aggregating clustered SKU-level data into archetype-level demand profiles. This will be processed for final anylogistix tables.

### 🔹 Purpose
To convert 439k+ SKU-level rows into **2000 representative demand archetypes**, suitable for scalable supply chain simulation in **AnyLogistix**.

---

### 🔹 Aggregation Logic

- **Categorical Columns**: Mode (`x.mode()[0]`)
- **Key Identifiers**: First or mode
- **Demand, Revenue, Unit Cost**: Summed
- **Other Numerical Fields**: Mean

---

### 🔹 Output: `simulation_demand_df`

| Field            | Description                                |
|------------------|--------------------------------------------|
| `Customer_ID`    | Cluster ID (e.g., `Demand_Archetype_12`)   |
| `Product`        | Dominant part type in cluster              |
| `Location`       | Most frequent location in cluster          |
| `Demand`         | Total forecasted demand for the cluster    |
| `Revenue`        | Total revenue for the cluster              |
| `ReorderPoint`   | Average reorder point for the cluster      |
| ...              | Additional aggregated features             |

---




In [10]:
pd.set_option('display.max_columns',None)

In [11]:
import pandas as pd

# Mode function for categorical variables
mode_agg = lambda x: x.mode()[0] if not x.mode().empty else 'N/A'

# Define column categories
categorical_cols = [
    'part_type', 'region', 'method_used', 'demand_pattern',
    'replenishment_strategy', 'ABC_Class', 'XYZ_Class',
    'volume_class', 'volatility_class', 'part_prefix'
]
identifier_cols = ['part_id', 'location_id', 'customer_id']
date_cols = ['forecast_date_str']
excluded_cols = categorical_cols + identifier_cols + date_cols + ['cluster_id']

# Detect valid numerical columns automatically
numerical_cols = [
    col for col in df.columns
    if col not in excluded_cols and pd.api.types.is_numeric_dtype(df[col])
]

# Build aggregation dictionary
agg_dict = {
    'cluster_id': 'first',
    'part_id': mode_agg,
    'part_type': mode_agg,
    'City': mode_agg,
    'customer_id': mode_agg,
    'forecast_date': mode_agg
}

# Mode aggregation for categorical columns
agg_dict.update({col: mode_agg for col in categorical_cols})

# Sum aggregation for demand, revenue, unit_cost
agg_dict.update({
    'forecasted_demand': 'sum',
    'revenue': 'sum',
    'unit_cost': 'mean'
})
agg_dict.update({
    'Latitude': 'first',
    'Longitude': 'first'
})
# Mean aggregation for rest of numerical columns
agg_dict.update({
    col: 'mean' for col in numerical_cols if col not in ['forecasted_demand', 'revenue', 'unit_cost']
})

# Final grouped summary
simulation_demand_df = df.groupby('cluster_id').agg(agg_dict).reset_index(drop=True)

# Rename columns for AnyLogistix compatibility
simulation_demand_df.rename(columns={
    'cluster_id': 'Customer_ID',
    'part_type': 'Product',
    'City': 'Location',
    'forecasted_demand': 'Demand',
    'revenue': 'Revenue',
    'reorder_point_units': 'ReorderPoint'
}, inplace=True)

# Format Customer_ID as simulation-friendly string
simulation_demand_df['Customer_ID'] = simulation_demand_df['Customer_ID'].apply(
    lambda x: f"Demand_Archetype_{x}"
)

# Reorder key columns
priority_cols = ['Customer_ID', 'Product', 'Location', 'Demand', 'Revenue', 'ReorderPoint']
remaining_cols = [col for col in simulation_demand_df.columns if col not in priority_cols]
simulation_demand_df = simulation_demand_df[priority_cols + remaining_cols]
simulation_demand_df .drop(columns=['Latitude','Longitude'],inplace=True)
print("\nFinal simulation demand profile ready for AnyLogistix:")
simulation_demand_df['City']=simulation_demand_df['Location']
simulation_demand_df=simulation_demand_df.merge(location_coordinates,on="City",how="inner")
del simulation_demand_df['City']
del simulation_demand_df['Location_y']
simulation_demand_df.rename(columns={
    'Location_x': 'Location'}, inplace=True)
simulation_demand_df


Final simulation demand profile ready for AnyLogistix:


Unnamed: 0,Customer_ID,Product,Location,Demand,Revenue,ReorderPoint,part_id,customer_id,forecast_date,region,method_used,demand_pattern,replenishment_strategy,ABC_Class,XYZ_Class,volume_class,volatility_class,part_prefix,unit_cost,forecast_lower_bound,forecast_upper_bound,mae,rmse,mase,rmsse,bias,avg_cost_impact,daily_demand_units,base_lead_time,adjusted_lead_time,lead_time_mean,lead_time_std_dev,lead_time_min,lead_time_max,lead_time_p90,lead_time_p95,service_level,z_value,demand_variance_during_lt,safety_stock_units,expected_demand_during_lt,cv_demand,intermittent_adjustment,expected_shortage_per_cycle,fill_rate,days_of_supply,Latitude,Longitude
0,Demand_Archetype_0,BRAKE_PAD_TYPE_3,"Austin, TX",41.483947,282.87,3.334600,FAN_MOTOR_2_469,CUST_CACA1BP3_LUM_CZ_20141004_BRA_134985,2014-05-01,TX,TSB,Lumpy,Project-Based,C,Z,Low Volume,Variable,BRA,2.490189,0.313086,0.469630,0.440291,0.613270,0.847998,0.668387,0.010502,2.771226,1.113208,3.680283,4.346792,4.346792,1.086226,2.606226,8.695283,5.651698,6.520943,0.90,1.281552,1.711563,1.636789,1.697811,1.570207,1.785103,0.224145,0.867762,4.195035,30.2711,-97.7437
1,Demand_Archetype_1,FAN_MOTOR_TYPE_2,"San Jose, CA",65.937492,9.71,2.944722,LED_PANEL_1_165,CUST_CACA3FM2_LUM_BZ_20110901_FAN_296966,2015-05-14,CA,TSB,Lumpy,Project-Based,B,Z,Medium Volume,Variable,FAN,4.968015,0.201336,0.302004,0.453345,0.665155,0.896551,0.698128,0.006257,5.823168,0.007634,3.948206,4.660076,4.660076,1.165534,2.796794,9.321985,6.059313,6.991985,0.95,1.644854,1.251074,1.775091,1.169630,1.991433,1.995716,0.111302,0.904829,7.070622,37.3362,-121.8906
2,Demand_Archetype_2,BRAKE_PAD_TYPE_1,"Milwaukee, WI",512.751287,552.34,12.085945,BRAKE_PAD_1_124,CUST_CACA2BP3_LUM_AZ_20111225_BRA_139061,2015-07-01,WI,TSB,Lumpy,Project-Based,A,Z,High Volume,Stable,BRA,2.171393,1.142621,1.713932,0.938224,1.400893,0.758574,0.703238,0.006921,5.289749,0.724234,3.684624,4.298106,4.298106,1.075042,2.579248,8.599638,5.590028,6.449387,0.98,2.053749,8.539505,5.955953,6.129992,0.986082,1.493041,0.140415,0.976935,4.195734,43.0386,-87.9091
3,Demand_Archetype_3,FAN_MOTOR_TYPE_1,"Madison, WI",58.484399,0.00,4.725520,FAN_MOTOR_1_255,CUST_CACA1FM2_LUM_AZ_20150531_FAN_326118,2012-08-06,WI,TSB,Lumpy,Project-Based,A,Z,Medium Volume,Variable,FAN,9.296852,0.288812,0.433218,0.581135,0.794271,0.795301,0.662914,0.003084,14.051358,0.000000,4.070062,4.595123,4.595123,1.148086,2.756914,9.190309,5.972284,6.890494,0.98,2.053749,2.304286,3.069792,1.655729,1.943990,1.971995,0.072372,0.956064,8.558542,43.0748,-89.3838
4,Demand_Archetype_4,BRAKE_PAD_TYPE_3,"San Jose, CA",150.441013,338.18,4.588679,BRAKE_PAD_3_023,CUST_CACA1BP2_LUM_AZ_20120630_BRA_47944,2014-01-19,CA,TSB,Lumpy,Project-Based,A,Z,High Volume,Highly Volatile,BRA,3.555967,0.327937,0.491905,0.885930,1.286198,0.774293,0.696361,0.003747,7.934387,0.261580,2.787275,3.563869,3.563869,0.891417,2.138583,7.127766,4.632834,5.346240,0.98,2.053749,2.458597,3.132430,1.456249,1.980937,1.990469,0.073849,0.949104,7.676143,37.3362,-121.8906
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,Demand_Archetype_1995,FAN_MOTOR_TYPE_1,"Dallas, TX",120.579039,0.00,4.500376,FAN_MOTOR_1_224,CUST_CACA1BP2_LUM_AZ_20130121_BRA_56475,2012-06-24,TX,TSB,Lumpy,Project-Based,A,Z,Medium Volume,Variable,FAN,8.042687,0.424948,0.637422,0.463404,0.685677,0.825067,0.677904,0.002704,9.632819,0.000000,2.933348,3.513612,3.513612,0.877974,2.108590,7.027621,4.567357,5.270220,0.98,2.053749,1.762686,2.635377,1.864999,1.295487,1.647744,0.062130,0.966448,4.980996,32.7763,-96.7969
1996,Demand_Archetype_1996,FAN_MOTOR_TYPE_1,"San Jose, CA",43.054977,0.00,7.151624,FAN_MOTOR_1_467,CUST_CACA3FM1_LUM_AY_20111208_FAN_245746,2014-06-04,CA,TSB,Lumpy,Project-Based,A,Z,Medium Volume,Variable,FAN,12.224032,0.555548,0.833322,0.810952,1.080218,0.740584,0.674406,0.015003,24.569355,0.000000,3.434677,3.975000,3.975000,0.993387,2.385968,7.951935,5.167419,5.964355,0.98,2.053749,4.681726,4.408481,2.743143,1.572553,1.786276,0.103932,0.961752,6.437797,37.3362,-121.8906
1997,Demand_Archetype_1997,LED_PANEL_TYPE_1,"Green Bay, WI",2.492497,0.00,0.124932,FAN_MOTOR_2_074,CUST_CACA2LP1_LUM_BZ_20120917_LED_357871,2011-02-07,CA,TSB,Lumpy,Project-Based,B,Z,High Volume,Highly Stable,LED,0.757009,0.008902,0.013353,2.062656,3.139813,0.739194,0.673105,0.015516,3.246250,0.000000,3.829196,4.068482,4.068482,1.017321,2.441562,8.138973,5.290670,6.104464,0.95,1.644854,0.013472,0.077380,0.047552,2.000000,2.000000,0.004852,0.897678,6.634256,44.5126,-88.0126
1998,Demand_Archetype_1998,LED_PANEL_TYPE_2,"Houston, TX",0.887914,0.00,0.036065,LED_PANEL_2_114,CUST_CACA1BP3_LUM_CZ_20130915_BRA_134601,2013-07-22,TX,TSB,Lumpy,Project-Based,C,Z,Low Volume,Highly Stable,LED,2.252717,0.002574,0.003860,0.384029,0.680760,0.834530,0.674725,0.009952,2.147971,0.000000,3.979565,4.258659,4.258659,1.065362,2.555906,8.518152,5.537065,6.388768,0.90,1.281552,0.000900,0.019985,0.016080,2.000000,2.000000,0.002737,0.829860,5.288786,29.7589,-95.3677


##  Super-Customer Grouping for Simulation Scalability

This script clusters demand archetypes into **100 super-customers** for higher-level aggregation, aiding simulation and strategic analysis.

---

### 🔹 Objective
Simplify 2000 demand clusters into 100 super-groups (`SuperCustomer_0` to `SuperCustomer_99`) using **KMeans** based on demand, revenue, and unit cost.

---

### 🔹 Process Overview

1. **Feature Selection**:  
   `['Demand', 'Revenue', 'unit_cost']` → used for clustering.

2. **KMeans Clustering**:  
   Segments demand archetypes into `n_clusters = 100`.

3. **Weighted Cost Calculation**:  
   - `demand_weighted_cost = Demand × unit_cost`

4. **Aggregation Logic**:  
   - `Demand`, `Revenue`: sum  
   - `unit_cost`: both simple mean and weighted average  
   - `region`, `City`, `volatility_class`, `ABC_Class`: mode  
   - `Customer_ID`: count of merged clusters

5. **Final Labeling**:  
   Adds `super_customer_name` for AnyLogistix-friendly identifiers.

---

### 🔹 Output: `df_super_customers`

| Column               | Description                                 |
|----------------------|---------------------------------------------|
| `super_customer_id`  | Numeric ID from KMeans                      |
| `super_customer_name`| String label like `SuperCustomer_12`        |
| `total_demand`       | Total demand from all merged clusters       |
| `Revenue`            | Total revenue                               |
| `weighted_unit_cost` | Demand-weighted average cost                |
| `num_clusters_merged`| Number of demand clusters combined          |
| `region`, `City`     | Dominant regional info                      |
| `volatility_class`   | Cluster volatility mode                     |

---

###  Use Case
Use `df_super_customers` as **aggregate demand nodes** for making final customer table

In [12]:
import pandas as pd
from sklearn.cluster import KMeans

df_clusters = simulation_demand_df.copy()
# Step 1: Select features for grouping into super-customers
features = ['Demand', 'Revenue', 'unit_cost']
X = df_clusters[features]

# Step 2: KMeans clustering to get 100 super-customers
kmeans = KMeans(n_clusters=100, random_state=42)
df_clusters['super_customer_id'] = kmeans.fit_predict(X)

# Step 3: Compute weighted cost for later weighted average
df_clusters['demand_weighted_cost'] = df_clusters['Demand'] * df_clusters['unit_cost']

# Step 4: Aggregation logic to form 100 super-customers
agg_dict = {
    'Demand': 'sum',  # Total demand
    'Revenue': 'sum',
    'demand_weighted_cost': 'sum',
    'unit_cost': 'mean',  # Unweighted average (for comparison)
    'Customer_ID': 'count',
    'region': lambda x: x.mode().iloc[0] if not x.mode().empty else 'Mixed',
    'Location': lambda x: x.mode().iloc[0] if not x.mode().empty else 'Mixed',
    'volatility_class': lambda x: x.mode().iloc[0] if not x.mode().empty else 'Mixed',
    'ABC_Class': lambda x: x.mode().iloc[0] if not x.mode().empty else 'Mixed'
}

df_super_customers = df_clusters.groupby('super_customer_id').agg(agg_dict).reset_index()
df_super_customers = df_super_customers.rename(columns={
    'Demand': 'total_demand',
    'Customer_ID': 'num_clusters_merged'
})

# Step 5: Compute weighted unit cost
df_super_customers['weighted_unit_cost'] = (
    df_super_customers['demand_weighted_cost'] / df_super_customers['total_demand']
)

# Step 6: Add final simulation labels
df_super_customers['super_customer_name'] = 'SuperCustomer_' + df_super_customers['super_customer_id'].astype(str)

# Step 7: Final cleanup (drop demand_weighted_cost if not needed)
df_super_customers.drop(columns=['demand_weighted_cost'], inplace=True)


df_super_customers


Unnamed: 0,super_customer_id,total_demand,Revenue,unit_cost,num_clusters_merged,region,Location,volatility_class,ABC_Class,weighted_unit_cost,super_customer_name
0,0,5414.720012,39130.47,4.445661,10,CA,"Los Angeles, CA",Highly Stable,A,4.539026,SuperCustomer_0
1,1,4643.880044,14503.62,3.397137,43,WI,"Milwaukee, WI",Variable,B,3.124786,SuperCustomer_1
2,2,1275.701444,22361.07,11.441337,2,CA,"San Jose, CA",Highly Stable,A,11.498434,SuperCustomer_2
3,3,4716.088744,43919.33,5.354478,41,CA,"Dallas, TX",Variable,A,4.769020,SuperCustomer_3
4,4,406.152540,9012.45,10.092000,1,CA,"San Jose, CA",Highly Stable,A,10.092000,SuperCustomer_4
...,...,...,...,...,...,...,...,...,...,...,...
95,95,3548.370484,7894.82,1.818667,1,TX,"Austin, TX",Highly Stable,A,1.818667,SuperCustomer_95
96,96,3349.972253,3495.78,1.776210,1,TX,"Austin, TX",Highly Stable,A,1.776210,SuperCustomer_96
97,97,1566.912186,18356.34,7.302551,1,TX,"Houston, TX",Highly Stable,A,7.302551,SuperCustomer_97
98,98,3603.142363,6398.30,4.329977,83,WI,"Madison, WI",Highly Volatile,C,3.919358,SuperCustomer_98


### Super customer ID creation

In [13]:
simulation_demand_df['super_customer_id'] = 'SuperCustomer_' + df_clusters['super_customer_id'].astype(str)

In [14]:
simulation_demand_df

Unnamed: 0,Customer_ID,Product,Location,Demand,Revenue,ReorderPoint,part_id,customer_id,forecast_date,region,method_used,demand_pattern,replenishment_strategy,ABC_Class,XYZ_Class,volume_class,volatility_class,part_prefix,unit_cost,forecast_lower_bound,forecast_upper_bound,mae,rmse,mase,rmsse,bias,avg_cost_impact,daily_demand_units,base_lead_time,adjusted_lead_time,lead_time_mean,lead_time_std_dev,lead_time_min,lead_time_max,lead_time_p90,lead_time_p95,service_level,z_value,demand_variance_during_lt,safety_stock_units,expected_demand_during_lt,cv_demand,intermittent_adjustment,expected_shortage_per_cycle,fill_rate,days_of_supply,Latitude,Longitude,super_customer_id
0,Demand_Archetype_0,BRAKE_PAD_TYPE_3,"Austin, TX",41.483947,282.87,3.334600,FAN_MOTOR_2_469,CUST_CACA1BP3_LUM_CZ_20141004_BRA_134985,2014-05-01,TX,TSB,Lumpy,Project-Based,C,Z,Low Volume,Variable,BRA,2.490189,0.313086,0.469630,0.440291,0.613270,0.847998,0.668387,0.010502,2.771226,1.113208,3.680283,4.346792,4.346792,1.086226,2.606226,8.695283,5.651698,6.520943,0.90,1.281552,1.711563,1.636789,1.697811,1.570207,1.785103,0.224145,0.867762,4.195035,30.2711,-97.7437,SuperCustomer_1
1,Demand_Archetype_1,FAN_MOTOR_TYPE_2,"San Jose, CA",65.937492,9.71,2.944722,LED_PANEL_1_165,CUST_CACA3FM2_LUM_BZ_20110901_FAN_296966,2015-05-14,CA,TSB,Lumpy,Project-Based,B,Z,Medium Volume,Variable,FAN,4.968015,0.201336,0.302004,0.453345,0.665155,0.896551,0.698128,0.006257,5.823168,0.007634,3.948206,4.660076,4.660076,1.165534,2.796794,9.321985,6.059313,6.991985,0.95,1.644854,1.251074,1.775091,1.169630,1.991433,1.995716,0.111302,0.904829,7.070622,37.3362,-121.8906,SuperCustomer_12
2,Demand_Archetype_2,BRAKE_PAD_TYPE_1,"Milwaukee, WI",512.751287,552.34,12.085945,BRAKE_PAD_1_124,CUST_CACA2BP3_LUM_AZ_20111225_BRA_139061,2015-07-01,WI,TSB,Lumpy,Project-Based,A,Z,High Volume,Stable,BRA,2.171393,1.142621,1.713932,0.938224,1.400893,0.758574,0.703238,0.006921,5.289749,0.724234,3.684624,4.298106,4.298106,1.075042,2.579248,8.599638,5.590028,6.449387,0.98,2.053749,8.539505,5.955953,6.129992,0.986082,1.493041,0.140415,0.976935,4.195734,43.0386,-87.9091,SuperCustomer_93
3,Demand_Archetype_3,FAN_MOTOR_TYPE_1,"Madison, WI",58.484399,0.00,4.725520,FAN_MOTOR_1_255,CUST_CACA1FM2_LUM_AZ_20150531_FAN_326118,2012-08-06,WI,TSB,Lumpy,Project-Based,A,Z,Medium Volume,Variable,FAN,9.296852,0.288812,0.433218,0.581135,0.794271,0.795301,0.662914,0.003084,14.051358,0.000000,4.070062,4.595123,4.595123,1.148086,2.756914,9.190309,5.972284,6.890494,0.98,2.053749,2.304286,3.069792,1.655729,1.943990,1.971995,0.072372,0.956064,8.558542,43.0748,-89.3838,SuperCustomer_12
4,Demand_Archetype_4,BRAKE_PAD_TYPE_3,"San Jose, CA",150.441013,338.18,4.588679,BRAKE_PAD_3_023,CUST_CACA1BP2_LUM_AZ_20120630_BRA_47944,2014-01-19,CA,TSB,Lumpy,Project-Based,A,Z,High Volume,Highly Volatile,BRA,3.555967,0.327937,0.491905,0.885930,1.286198,0.774293,0.696361,0.003747,7.934387,0.261580,2.787275,3.563869,3.563869,0.891417,2.138583,7.127766,4.632834,5.346240,0.98,2.053749,2.458597,3.132430,1.456249,1.980937,1.990469,0.073849,0.949104,7.676143,37.3362,-121.8906,SuperCustomer_1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,Demand_Archetype_1995,FAN_MOTOR_TYPE_1,"Dallas, TX",120.579039,0.00,4.500376,FAN_MOTOR_1_224,CUST_CACA1BP2_LUM_AZ_20130121_BRA_56475,2012-06-24,TX,TSB,Lumpy,Project-Based,A,Z,Medium Volume,Variable,FAN,8.042687,0.424948,0.637422,0.463404,0.685677,0.825067,0.677904,0.002704,9.632819,0.000000,2.933348,3.513612,3.513612,0.877974,2.108590,7.027621,4.567357,5.270220,0.98,2.053749,1.762686,2.635377,1.864999,1.295487,1.647744,0.062130,0.966448,4.980996,32.7763,-96.7969,SuperCustomer_89
1996,Demand_Archetype_1996,FAN_MOTOR_TYPE_1,"San Jose, CA",43.054977,0.00,7.151624,FAN_MOTOR_1_467,CUST_CACA3FM1_LUM_AY_20111208_FAN_245746,2014-06-04,CA,TSB,Lumpy,Project-Based,A,Z,Medium Volume,Variable,FAN,12.224032,0.555548,0.833322,0.810952,1.080218,0.740584,0.674406,0.015003,24.569355,0.000000,3.434677,3.975000,3.975000,0.993387,2.385968,7.951935,5.167419,5.964355,0.98,2.053749,4.681726,4.408481,2.743143,1.572553,1.786276,0.103932,0.961752,6.437797,37.3362,-121.8906,SuperCustomer_12
1997,Demand_Archetype_1997,LED_PANEL_TYPE_1,"Green Bay, WI",2.492497,0.00,0.124932,FAN_MOTOR_2_074,CUST_CACA2LP1_LUM_BZ_20120917_LED_357871,2011-02-07,CA,TSB,Lumpy,Project-Based,B,Z,High Volume,Highly Stable,LED,0.757009,0.008902,0.013353,2.062656,3.139813,0.739194,0.673105,0.015516,3.246250,0.000000,3.829196,4.068482,4.068482,1.017321,2.441562,8.138973,5.290670,6.104464,0.95,1.644854,0.013472,0.077380,0.047552,2.000000,2.000000,0.004852,0.897678,6.634256,44.5126,-88.0126,SuperCustomer_73
1998,Demand_Archetype_1998,LED_PANEL_TYPE_2,"Houston, TX",0.887914,0.00,0.036065,LED_PANEL_2_114,CUST_CACA1BP3_LUM_CZ_20130915_BRA_134601,2013-07-22,TX,TSB,Lumpy,Project-Based,C,Z,Low Volume,Highly Stable,LED,2.252717,0.002574,0.003860,0.384029,0.680760,0.834530,0.674725,0.009952,2.147971,0.000000,3.979565,4.258659,4.258659,1.065362,2.555906,8.518152,5.537065,6.388768,0.90,1.281552,0.000900,0.019985,0.016080,2.000000,2.000000,0.002737,0.829860,5.288786,29.7589,-95.3677,SuperCustomer_73


### part type classification logic

In [15]:
import pandas as pd


def map_part_type(code):
    if "BRAKE_PAD" in code:
        return "BRAKE PAD"
    elif "LED_PANEL" in code:
        return "LED PANEL"
    elif "FAN_MOTOR" in code:
        return "FAN MOTOR"
    else:
        return code 
    
simulation_demand_df['part_type'] = simulation_demand_df['Product'].apply(map_part_type)




##  AnyLogistix Customer Table Generator

This function creates a simulation-ready **Customers table** for AnyLogistix based on super-customer groupings.

---

### 🔹 Purpose
To generate a formatted customer entity file containing:
- Unique customer IDs
- Descriptive names
- City-based locations (mode)
- Type, inclusion flag, and icon

---

### 🔹 Key Steps

1. **City Aggregation**:  
   Get the most frequent (`mode`) `City` per `super_customer_id`.

2. **Customer Entity Info**:  
   - `ID`: Named as `"Customer 1"` to `"Customer 100"`  
   - `Name`: Super customer ID  
   - `Type`: Set to `'Customer'`  
   - `Location`: Dominant city name  
   - `Inclusion Type`: `'Include'`  
   - `Icon`: Set to `116` (AnyLogistix visual)

3. **Final Output Columns**:  
   `['ID', 'Name', 'Type', 'Location', 'Inclusion Type', 'Icon']`

---

### 🔹 Output: `anylogix_customers_table.csv`

A CSV file ready for import into **AnyLogistix → Customers Table**, enabling:
- Scenario visualization by city
- Demand mapping by super-customer nodes


In [16]:
import pandas as pd

def generate_anylogistix_customers_table(simulation_demand_df):
    # Step 1: Compute mode city for each super customer
    city_mode_df = (
        simulation_demand_df.groupby('super_customer_id')['Location']
        .agg(lambda x: x.mode().iloc[0] if not x.mode().empty else 'Unknown')
        .reset_index()
        .rename(columns={'Location': 'City_Mode'})
    )

    # Step 2: Get unique super_customer_ids
    super_customers = simulation_demand_df[['super_customer_id']].drop_duplicates().reset_index(drop=True)
    super_customers['ID'] = [f'Customer {i+1}' for i in range(len(super_customers))]
    super_customers['Name'] = super_customers['super_customer_id']
    super_customers['Type'] = 'Customer'
    super_customers['Type'] = 'Customer'

    # Step 3: Merge with mode city info
    super_customers = super_customers.merge(city_mode_df, on='super_customer_id', how='left')
    super_customers = super_customers.rename(columns={'City_Mode': 'Location'})

    # Step 4: Add required columns
    super_customers['Inclusion Type'] = 'Include'
    super_customers['Icon'] = 116

    # Reorder columns
    final_columns = ['ID', 'Name', 'Type', 'Location', 'Inclusion Type', 'Icon']
    customer_df = super_customers[final_columns]

    return customer_df

# Generate table
anylogix_customers_table = generate_anylogistix_customers_table(simulation_demand_df)

# Preview
print(anylogix_customers_table.head())

# Optional: Save
anylogix_customers_table.to_csv("anylogistix_customers_table.csv", index=False)


           ID              Name      Type       Location Inclusion Type  Icon
0  Customer 1   SuperCustomer_1  Customer  Milwaukee, WI        Include   116
1  Customer 2  SuperCustomer_12  Customer     Dallas, TX        Include   116
2  Customer 3  SuperCustomer_93  Customer  Milwaukee, WI        Include   116
3  Customer 4   SuperCustomer_7  Customer  Milwaukee, WI        Include   116
4  Customer 5  SuperCustomer_73  Customer     Dallas, TX        Include   116


In [17]:
pd.set_option('display.max_columns',None)

##  AnyLogistix pre-Demand Table Generator

This function generates a **demand input table** for AnyLogistix using previously clustered and aggregated demand data.This will be used for final demand table creation.

---

### 🔹 Objective
Transform `simulation_demand_df` into the **AnyLogistix-compliant demand format**, including:
- Structured demand ID
- Reorder points
- Periodic demand flags
- Revenue and lead time

---

### 🔹 Key Columns

| Column              | Description                                      |
|---------------------|--------------------------------------------------|
| `ID`                | Unique demand row (e.g., `Demand 1`, `Demand 2`) |
| `Customer`          | Demand archetype ID                              |
| `Super Customer`    | Grouped cluster (regional demand node)           |
| `Product`           | Part/Item name                                   |
| `Demand Type`       | Set as `'PeriodicDemand'`                        |
| `Col 6`             | Reorder point units (rounded integer)            |
| `Col 10`            | Demand quantity                                  |
| `Revenue`           | Total revenue from cluster                       |
| `Expected Lead Time`| Average lead time (float, in days)               |
| `Backorder Policy`  | `'AllowedTotal'` to enable backlog               |
| `Inclusion Type`    | `'Include'` to activate in simulation            |

---

### 🔹 Usage
Import `anylogistix_simulation_demand_table.csv` into **AnyLogistix → Demand Table** to simulate demand behavior at a super-customer level with:
- Periodic replenishment
- Clustered product demand
- Strategic backordering



In [18]:

# Pre Transformation for Demand Table 
def generate_anylogistix_demand_table(simulation_demand_df):
    anylogix_df = pd.DataFrame()

    # ID column: Demand 1, Demand 2, ...
    anylogix_df['ID'] = [f'Demand {i+1}' for i in range(len(simulation_demand_df))]

    # Required columns
    anylogix_df['Customer'] = simulation_demand_df['Customer_ID']
    anylogix_df['Super Customer'] = simulation_demand_df['super_customer_id']
    anylogix_df['Product'] = simulation_demand_df['Product']
    anylogix_df['Demand Type'] = 'PeriodicDemand'

    # Demand descriptors
    anylogix_df['Col 1'] = 'First occurrence'
    anylogix_df['Col 2'] = 'Next day after interval'

    # Col 3 = Reorder point units rounded to int
    anylogix_df['Col 3'] = 'Order interval, days'

    anylogix_df['Col 4'] = 'Value'
    anylogix_df['Col 5'] = 'Value'
    anylogix_df['Col 6'] = simulation_demand_df['ReorderPoint'].round().astype(int)
    anylogix_df['Col 7'] = 'Quantity'
    anylogix_df['Col 8'] = 'Value'
    anylogix_df['Col 9'] = 'Value'
    anylogix_df['Col 10'] = simulation_demand_df['Demand']

    # Time Period
    anylogix_df['Time Period'] = '(All periods)'

    # Revenue
    anylogix_df['Revenue'] = simulation_demand_df['Revenue'].round(6)

    # Currency and Lead Time Info
    anylogix_df['Currency'] = 'USD'
    anylogix_df['Expected Lead Time'] = simulation_demand_df['adjusted_lead_time'].round(6)
    anylogix_df['Time Unit'] = 'day'

    # Simulation-specific flags
    anylogix_df['Minimum Split Ratio'] = 1
    anylogix_df['Backorder Policy'] = 'AllowedTotal'
    anylogix_df['Inclusion Type'] = 'Include'

    return anylogix_df

# Generate the formatted demand table
anylogix_demand_table = generate_anylogistix_demand_table(simulation_demand_df)

anylogix_demand_table.to_csv("anylogistix_simulation_demand_table.csv", index=False)



In [19]:
anylogix_demand_table

Unnamed: 0,ID,Customer,Super Customer,Product,Demand Type,Col 1,Col 2,Col 3,Col 4,Col 5,Col 6,Col 7,Col 8,Col 9,Col 10,Time Period,Revenue,Currency,Expected Lead Time,Time Unit,Minimum Split Ratio,Backorder Policy,Inclusion Type
0,Demand 1,Demand_Archetype_0,SuperCustomer_1,BRAKE_PAD_TYPE_3,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,3,Quantity,Value,Value,41.483947,(All periods),282.87,USD,4.346792,day,1,AllowedTotal,Include
1,Demand 2,Demand_Archetype_1,SuperCustomer_12,FAN_MOTOR_TYPE_2,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,3,Quantity,Value,Value,65.937492,(All periods),9.71,USD,4.660076,day,1,AllowedTotal,Include
2,Demand 3,Demand_Archetype_2,SuperCustomer_93,BRAKE_PAD_TYPE_1,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,12,Quantity,Value,Value,512.751287,(All periods),552.34,USD,4.298106,day,1,AllowedTotal,Include
3,Demand 4,Demand_Archetype_3,SuperCustomer_12,FAN_MOTOR_TYPE_1,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,5,Quantity,Value,Value,58.484399,(All periods),0.00,USD,4.595123,day,1,AllowedTotal,Include
4,Demand 5,Demand_Archetype_4,SuperCustomer_1,BRAKE_PAD_TYPE_3,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,5,Quantity,Value,Value,150.441013,(All periods),338.18,USD,3.563869,day,1,AllowedTotal,Include
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,Demand 1996,Demand_Archetype_1995,SuperCustomer_89,FAN_MOTOR_TYPE_1,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,5,Quantity,Value,Value,120.579039,(All periods),0.00,USD,3.513612,day,1,AllowedTotal,Include
1996,Demand 1997,Demand_Archetype_1996,SuperCustomer_12,FAN_MOTOR_TYPE_1,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,7,Quantity,Value,Value,43.054977,(All periods),0.00,USD,3.975000,day,1,AllowedTotal,Include
1997,Demand 1998,Demand_Archetype_1997,SuperCustomer_73,LED_PANEL_TYPE_1,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,0,Quantity,Value,Value,2.492497,(All periods),0.00,USD,4.068482,day,1,AllowedTotal,Include
1998,Demand 1999,Demand_Archetype_1998,SuperCustomer_73,LED_PANEL_TYPE_2,PeriodicDemand,First occurrence,Next day after interval,"Order interval, days",Value,Value,0,Quantity,Value,Value,0.887914,(All periods),0.00,USD,4.258659,day,1,AllowedTotal,Include


##  Final AnyLogistix Demand Table (Grouped by Super Customer & Product)

This script generates the **final aggregated demand table** (`DEM1.csv`) for simulation in AnyLogistix. It groups demand data by **Super Customer** and **Product**, consolidating multiple entries into a clean, simulation-optimized format.

---

### Aggregation Logic

| Column Type      | Aggregation Method |
|------------------|--------------------|
| `Col 6` (ROP)    | Mean → Rounded int |
| `Col 10` (Demand)| Sum                |
| `Revenue`        | Sum                |
| `Expected Lead Time` | Mean          |
| Categorical cols | Mode               |

---

###  Final Output Columns

| Column                | Description                                |
|------------------------|--------------------------------------------|
| `ID`                   | Aggregated demand ID                       |
| `Super Customer`       | Grouped demand cluster                     |
| `Product`              | End item                                   |
| `Col 6`                | Rounded reorder point                      |
| `Col 10`               | Total demand quantity                      |
| `Revenue`              | Aggregated revenue                         |
| `Expected Lead Time`   | Avg. lead time (days)                      |
| `Demand Type`, `Col 1`–`Col 5`, etc. | Preserved from base table |

---

### Usage
Import `DEM1.csv` into **AnyLogistix → Demand Table** to simulate periodic demand by **super customers** with consolidated demand quantities and financials.



In [20]:
df3 = anylogix_demand_table.copy()

# Custom mode function
def mode_agg(series):
    modes = series.mode()
    return modes.iloc[0] if not modes.empty else None

# Separate columns
numeric_cols = df3.select_dtypes(include='number').columns.tolist()
non_numeric_cols = [col for col in df3.columns if col not in numeric_cols and col != 'Customer']

# Aggregation: sum all numeric except Col 6 (reorder point) which should be mean
agg_dict1 = {
    col: 'mean' if col in ['Col 6', 'Expected Lead Time'] else 'sum'
    for col in numeric_cols
}
agg_dict1.update({col: mode_agg for col in non_numeric_cols})

# Group and aggregate
dem_11 = df3.groupby(['Super Customer', 'Product'], as_index=False).agg(agg_dict1)
dem_11['Col 6'] = dem_11['Col 6'].round().astype(int)
# Final column order as specified
final_column_order = [
    'ID', 'Super Customer', 'Product', 'Demand Type',
    'Col 1', 'Col 2', 'Col 3', 'Col 4', 'Col 5', 'Col 6',
    'Col 7', 'Col 8', 'Col 9', 'Col 10', 'Time Period',
    'Revenue', 'Currency', 'Expected Lead Time', 'Time Unit',
    'Minimum Split Ratio', 'Backorder Policy', 'Inclusion Type'
]

# Reorder columns
dem_11 = dem_11[final_column_order]

# Save output
dem_11.to_csv("DEM1.csv", index=False)


## Simulated Bill of Materials (BOM) Table Generator

This script creates a simplified **BOM (Bill of Materials)** table for use in **AnyLogistix**, based on forecasted product occurrences.

---

### 🔹 Objective
Estimate the number of finished products (as BOMs) by **counting occurrences** in the forecast data.

---

### 🔹 Output Columns

| Column        | Description                                |
|---------------|--------------------------------------------|
| `ID`          | Unique BOM ID (`<Product> BOM`)            |
| `Name`        | BOM Name (`<Product> BOM`)                 |
| `End Product` | Final product/part type                    |
| `Quantity`    | Occurrence count from `forecast_df`        |

---

### 🔹 Use Case
Import the generated `BOM.csv` into **AnyLogistix → Bill of Materials Table** to:
- Simulate simplified product structure
- Link demand to corresponding end items


In [21]:
# Count occurrences of each Product
product_counts = forecast_df['part_type'].value_counts().reset_index()
product_counts.columns = ['End Product', 'Quantity']

# Add BOM ID and Name
product_counts['ID'] = product_counts['End Product'] + ' BOM'
product_counts['Name'] = product_counts['End Product'] + ' BOM'

# Reorder columns
simulated_bom = product_counts[['ID', 'Name', 'End Product', 'Quantity']]
simulated_bom.to_csv("BOM.csv",index=False)
print(simulated_bom.head(5).to_markdown(index=False))

| ID                   | Name                 | End Product      |   Quantity |
|:---------------------|:---------------------|:-----------------|-----------:|
| BRAKE_PAD_TYPE_3 BOM | BRAKE_PAD_TYPE_3 BOM | BRAKE_PAD_TYPE_3 |     122557 |
| BRAKE_PAD_TYPE_2 BOM | BRAKE_PAD_TYPE_2 BOM | BRAKE_PAD_TYPE_2 |      68290 |
| LED_PANEL_TYPE_1 BOM | LED_PANEL_TYPE_1 BOM | LED_PANEL_TYPE_1 |      63156 |
| FAN_MOTOR_TYPE_2 BOM | FAN_MOTOR_TYPE_2 BOM | FAN_MOTOR_TYPE_2 |      61154 |
| FAN_MOTOR_TYPE_1 BOM | FAN_MOTOR_TYPE_1 BOM | FAN_MOTOR_TYPE_1 |      56729 |


##  AnyLogistix Distributor Table

This table defines **Distribution Centers (DCs)** for use in AnyLogistix based on the total simulated demand per city. Each DC is automatically assigned:

- A **20% buffer** on demand to determine its `Capacity`
- A unique **ID**, **Name**, and fixed **Icon (114)**
- Standardized fields for AnyLogistix simulation

---

###  Key Columns

| Column Name                  | Description                                |
|------------------------------|--------------------------------------------|
| `ID`                         | Unique identifier for each DC              |
| `Name`                       | Readable name (e.g., DC_Dallas)            |
| `Type`                       | Always `DC`                                |
| `Location`                   | City (based on simulation demand)          |
| `Capacity`                   | Total demand × 1.2 (rounded)               |
| `Initially Open`             | All set to TRUE                            |
| `Inclusion Type`             | All set to Consider                        |
| `Aggregate Orders by Location` | FALSE for independent routing           |
| `Priority`                   | Set to EQUAL                               |

---

###  Output

Saved to: `anylogistix_distributor_table_formatted.csv`  
Format is ready to import into **AnyLogistix → Distributor Table**.


In [22]:
import pandas as pd

# Step 1: Aggregate demand per location
location_capacity = simulation_demand_df.groupby('Location')['Demand'].sum().reset_index()
location_capacity.columns = ['Location', 'Total_Demand']

# Step 2: Apply 20% buffer and round capacity
location_capacity['Capacity'] = (location_capacity['Total_Demand'] * 1.2).round(2)

# Step 3: Build final distributor table
distributor_table = pd.DataFrame({
    'ID': [f'Distribution Center {i+1}' for i in range(len(location_capacity))],
    'Name': [f'DC_{loc}' for loc in location_capacity['Location']],
    'Type': ['DC'] * len(location_capacity),
    'Location': location_capacity['Location'],
    'Initially Open': ['TRUE'] * len(location_capacity),
    'Inclusion Type': ['Consider'] * len(location_capacity),
    'Capacity': location_capacity['Capacity'],
    'Capacity Unit': ['m³'] * len(location_capacity),
    'Priority': ['EQUAL'] * len(location_capacity),
    'Aggregate Orders by Location': ['FALSE'] * len(location_capacity),
    'Icon': [114] * len(location_capacity)
})


print(distributor_table.head().to_markdown(index=False))


distributor_table.to_csv("anylogistix_distributor_table_formatted.csv", index=False)


| ID                    | Name               | Type   | Location        | Initially Open   | Inclusion Type   |   Capacity | Capacity Unit   | Priority   | Aggregate Orders by Location   |   Icon |
|:----------------------|:-------------------|:-------|:----------------|:-----------------|:-----------------|-----------:|:----------------|:-----------|:-------------------------------|-------:|
| Distribution Center 1 | DC_Austin, TX      | DC     | Austin, TX      | TRUE             | Consider         |    81203   | m³              | EQUAL      | FALSE                          |    114 |
| Distribution Center 2 | DC_Dallas, TX      | DC     | Dallas, TX      | TRUE             | Consider         |    82461.7 | m³              | EQUAL      | FALSE                          |    114 |
| Distribution Center 3 | DC_Green Bay, WI   | DC     | Green Bay, WI   | TRUE             | Consider         |    45145.8 | m³              | EQUAL      | FALSE                          |    114 |
| Distribu

##  Inventory Stock Policy Table (Product Level)

This table presents calculated inventory control parameters for each unique `Product` across super customers, based on historical demand variability and lead time uncertainty.

The policy follows a **continuous review (Q,R) model** with a service level of **95% (Z = 1.645)**.

---

###  Methodology Summary

- **Daily Demand** = Total Demand / Time Period
- **Safety Stock** = `Z × σ(LT Demand)`  
- **σ(LT Demand)** = √(μ<sub>LT</sub> × σ²<sub>D</sub> + μ²<sub>D</sub> × σ²<sub>LT</sub>)
- **Reorder Point (ROP)** = Min Stock + Safety Stock
- **Max Stock** = ROP

---

###  Output Columns

| Column Name                 | Description                                                   |
|----------------------------|---------------------------------------------------------------|
| `Product`                  | Part type or SKU                                              |
| `Super_Customer`           | Super customer grouping (from clustering)                     |
| `total_demand`             | Total aggregated demand over all periods                      |
| `daily_demand`             | Aggregate daily demand across time windows                    |
| `mean_daily_demand`        | Mean of daily demand per product                              |
| `std_daily_demand`         | Standard deviation of daily demand                            |
| `cv_demand`                | Coefficient of variation for demand                           |
| `avg_lead_time`            | Mean lead time in days                                        |
| `std_lead_time`            | Standard deviation of lead time                               |
| `cv_lead_time`             | Coefficient of variation for lead time                        |
| `Safety Stock`             | Buffer stock to meet service level                            |
| `Min Stock`                | Expected stock during lead time                               |
| `Reorder Point`            | Trigger level for replenishment                               |
| `Max Stock`                | Upper limit for inventory holding                             |

---




In [23]:
import pandas as pd
import numpy as np

def calculate_inventory_stocks_by_product(df, service_level_z=1.645):
    """
    Calculates inventory stock levels (Min, Max, Safety, Reorder Point) for each Product 
    based on grouped demand and lead time variability.

    Args:
        df (pd.DataFrame): Raw transactional or historical demand data.
        service_level_z (float): Z-score corresponding to desired service level (default=1.645 for 95%).

    Returns:
        pd.DataFrame: Inventory policy table at Product level.
    """

    # --- Ensure numeric conversion ---
    df['demand'] = pd.to_numeric(df['Col 10'], errors='coerce')   # Demand
    df['lead_time_days'] = pd.to_numeric(df['Expected Lead Time'], errors='coerce')  # Lead time

 
    df['time_period_days'] =df['Col 6']
        # --- Filter out invalid time periods (0 or NaN) ---
    df = df[df['time_period_days'] > 0].copy()
    df['daily_demand']=df['demand']/df['time_period_days']
  
    # --- Group by Product ---
    grouped = df.groupby('Product').agg(
        total_demand=('demand', 'sum'),
        daily_demand=('daily_demand', 'sum'),
        avg_time_period=('time_period_days',"mean"),
        mean_daily_demand=('daily_demand', 'mean'),
        std_daily_demand=('daily_demand', 'std'),
        avg_lead_time=('lead_time_days', 'mean'),
        std_lead_time=('lead_time_days', 'std'),
        Super_Customer=('Super Customer', 'first'),  # keep any one customer value for reporting
        ID=('ID', 'first')  # same for ID
    ).reset_index()

    # Calculate Coefficients of Variation
    grouped['cv_demand'] = grouped['std_daily_demand'] / grouped['mean_daily_demand']
    grouped['cv_lead_time'] = grouped['std_lead_time'] / grouped['avg_lead_time']

    # --- Safety Stock Calculation ---
    variance_demand_during_lt = (grouped['avg_lead_time'] * grouped['std_daily_demand']**2) + \
                                 (grouped['mean_daily_demand']**2 * grouped['std_lead_time']**2)

    grouped['std_dev_demand_during_lt'] = np.sqrt(variance_demand_during_lt)
    grouped['Safety Stock'] = service_level_z * grouped['std_dev_demand_during_lt']

    # --- Final Inventory Levels ---
    grouped['Min Stock'] = grouped['daily_demand'] * grouped['avg_lead_time']
    grouped['Max Stock'] = grouped['Min Stock'] + grouped['Safety Stock']
    grouped['Reorder Point'] = grouped['Min Stock'] + grouped['Safety Stock']

    output_cols = [
        'ID', 'Super_Customer', 'Product','avg_time_period',
        'total_demand','daily_demand',
        'mean_daily_demand', 'std_daily_demand', 'cv_demand',
        'avg_lead_time', 'std_lead_time', 'cv_lead_time',
        'std_dev_demand_during_lt',
        'Min Stock', 'Safety Stock', 'Max Stock', 'Reorder Point'
    ]

    result_df = grouped[output_cols].round(2)
    return result_df

    


inventory_policy = calculate_inventory_stocks_by_product(dem_11)
inventory_policy


Unnamed: 0,ID,Super_Customer,Product,avg_time_period,total_demand,daily_demand,mean_daily_demand,std_daily_demand,cv_demand,avg_lead_time,std_lead_time,cv_lead_time,std_dev_demand_during_lt,Min Stock,Safety Stock,Max Stock,Reorder Point
0,Demand 843,SuperCustomer_0,BRAKE_PAD_TYPE_1,13.14,18332.97,1555.45,55.55,38.37,0.69,3.16,0.72,0.23,79.04,4916.51,130.02,5046.52,5046.52
1,Demand 1250,SuperCustomer_0,BRAKE_PAD_TYPE_2,16.33,82481.74,6400.55,100.01,91.44,0.91,2.83,0.39,0.14,158.47,18082.5,260.68,18343.17,18343.17
2,Demand 149,SuperCustomer_0,BRAKE_PAD_TYPE_3,23.69,224666.95,17125.38,241.2,226.53,0.94,2.91,0.4,0.14,398.58,49847.39,655.66,50503.05,50503.05
3,Demand 1488,SuperCustomer_0,FAN_MOTOR_TYPE_1,20.25,83665.06,5452.51,83.88,82.62,0.98,3.58,0.92,0.26,174.2,19530.49,286.56,19817.05,19817.05
4,Demand 117,SuperCustomer_1,FAN_MOTOR_TYPE_2,6.55,24585.89,6419.48,194.53,405.67,2.09,3.85,0.84,0.22,813.08,24744.8,1337.51,26082.31,26082.31
5,Demand 1088,SuperCustomer_1,LED_PANEL_TYPE_1,12.87,66325.81,5915.96,125.87,153.51,1.22,5.03,1.41,0.28,387.55,29781.57,637.52,30419.09,30419.09
6,Demand 1118,SuperCustomer_1,LED_PANEL_TYPE_2,5.0,11337.79,3149.75,262.48,255.5,0.97,6.2,0.93,0.15,681.27,19541.31,1120.69,20661.99,20661.99


##  Final Inventory Policy Table (AnyLogistix Format)

This table converts SKU-level inventory planning parameters into the format expected by **AnyLogistix** for simulation and execution of inventory policies. It uses the **Min-Max-Safety Stock** logic with periodic review.

---

###  Key Settings:

| Field                     | Value/Logic                                                  |
|--------------------------|--------------------------------------------------------------|
| `Policy Type`            | `InventoryPolicyMinMaxSafetyStock`                          |
| `Initial Stock, units`   | `Min Stock + Safety Stock` (rounded to 2 decimals)           |
| `First Periodic Check`   | Starts from `2-June-2011`, one day increment per product     |
| `Facility`               | `(All sites)` (useful when abstracted facility data)         |
| `Periodic Check`         | Enabled (`TRUE`)                                             |
| `Time Unit`              | Day                                                          |

---

###  Columns Description:

| Column Name              | Description                                                      |
|--------------------------|------------------------------------------------------------------|
| `ID`                     | Unique identifier (Inventory 1, Inventory 2, …)                 |
| `Facility`               | Where policy applies (default: All sites)                       |
| `Product`                | SKU / Part Number                                                |
| `Policy Type`            | Strategy used (Min-Max-SafetyStock based)                       |
| `Col 1` to `Col 6`       | Structured values (Max, Min, Safety Stock and their values)     |
| `Initial Stock, units`   | Starting stock for simulation                                    |
| `Periodic Check`         | Whether inventory is reviewed periodically                       |
| `Period`                 | Review frequency (set to 1 day)                                 |
| `First Periodic Check`   | When review begins                                               |
| `Policy Basis`           | Quantity-based policy                                            |
| `Stock Calculation Window`| Set to 0 (non-rolling window)                                  |
| `Time Period`            | All periods considered                                           |
| `Inclusion Type`         | Include in simulation                                            |

---




In [24]:
from datetime import datetime

def format_inventory_policy_table(inventory_df):
    """
    Converts inventory policy DataFrame to formatted policy table as per spec.
    """

    formatted_rows = []

    for i, row in inventory_df.iterrows():
        formatted_rows.append({
            'ID': f'Inventory {i + 1}',
            'Facility': '(All sites)',
            'Product': row['Product'],
            'Policy Type': 'InventoryPolicyMinMaxSafetyStock',
            'Col 1': 'Max',
            'Col 2': row['Max Stock'],
            'Col 3': 'Min',
            'Col 4': row['Min Stock'],
            'Col 5': 'Safety stock',
            'Col 6': row['Safety Stock'],
            'Initial Stock, units': round((row['Min Stock'] + row['Safety Stock']), 2),
            'Periodic Check': 'TRUE',
            'Period': 1,
            'First Periodic Check': datetime(2011, 6, 2 + i).strftime('%#d-%#m-%y 00:00:00') if hasattr(datetime(2011, 6, 2 + i), 'strftime') else f"{2+i}-6-11 00:00:00",
            'Policy Basis': 'Quantity',
            'Stock Calculation Window': 0,
            'Time Unit': 'day',
            'Minimum Split Ratio': 1,
            'Time Period': '(All periods)',
            'Inclusion Type': 'Include'
        })

    return pd.DataFrame(formatted_rows)

policy_table = format_inventory_policy_table(inventory_policy)





policy_table.to_excel("Final_Inventory_Policy_Table.xlsx", index=False)
policy_table

Unnamed: 0,ID,Facility,Product,Policy Type,Col 1,Col 2,Col 3,Col 4,Col 5,Col 6,"Initial Stock, units",Periodic Check,Period,First Periodic Check,Policy Basis,Stock Calculation Window,Time Unit,Minimum Split Ratio,Time Period,Inclusion Type
0,Inventory 1,(All sites),BRAKE_PAD_TYPE_1,InventoryPolicyMinMaxSafetyStock,Max,5046.52,Min,4916.51,Safety stock,130.02,5046.53,True,1,2-6-11 00:00:00,Quantity,0,day,1,(All periods),Include
1,Inventory 2,(All sites),BRAKE_PAD_TYPE_2,InventoryPolicyMinMaxSafetyStock,Max,18343.17,Min,18082.5,Safety stock,260.68,18343.18,True,1,3-6-11 00:00:00,Quantity,0,day,1,(All periods),Include
2,Inventory 3,(All sites),BRAKE_PAD_TYPE_3,InventoryPolicyMinMaxSafetyStock,Max,50503.05,Min,49847.39,Safety stock,655.66,50503.05,True,1,4-6-11 00:00:00,Quantity,0,day,1,(All periods),Include
3,Inventory 4,(All sites),FAN_MOTOR_TYPE_1,InventoryPolicyMinMaxSafetyStock,Max,19817.05,Min,19530.49,Safety stock,286.56,19817.05,True,1,5-6-11 00:00:00,Quantity,0,day,1,(All periods),Include
4,Inventory 5,(All sites),FAN_MOTOR_TYPE_2,InventoryPolicyMinMaxSafetyStock,Max,26082.31,Min,24744.8,Safety stock,1337.51,26082.31,True,1,6-6-11 00:00:00,Quantity,0,day,1,(All periods),Include
5,Inventory 6,(All sites),LED_PANEL_TYPE_1,InventoryPolicyMinMaxSafetyStock,Max,30419.09,Min,29781.57,Safety stock,637.52,30419.09,True,1,7-6-11 00:00:00,Quantity,0,day,1,(All periods),Include
6,Inventory 7,(All sites),LED_PANEL_TYPE_2,InventoryPolicyMinMaxSafetyStock,Max,20661.99,Min,19541.31,Safety stock,1120.69,20662.0,True,1,8-6-11 00:00:00,Quantity,0,day,1,(All periods),Include


##  Facility Expense Table (AnyLogistix Format)

This table defines **cost parameters** for each distribution center (DC) in your AnyLogistix simulation model. It captures three main types of facility-level expenses derived from your `forecast_df`.

---

###  Cost Components Computed

| Expense Type     | Formula / Logic                                                                 |
|------------------|----------------------------------------------------------------------------------|
| `carryingCost`   | `unit_cost × holding_cost_rate` (averaged per City)                             |
| `initialCost`    | `(365 × daily_demand ÷ reorder_point_units) × ordering_cost_per_order`          |
| `otherCost`      | stockout cost per unit (averaged per City)                                      |

---

###  Notes

- **City → DC mapping** is used via `DC_<CityName>` as Facility name.
- `initialCost` is an annualized estimate based on reorder logic and is left unitless.
- Each facility gets **3 cost entries**:
  1. Carrying cost per unit per day
  2. Initial ordering cost per year
  3. Stockout cost per day per unit

---


In [25]:
import pandas as pd

def generate_facility_expense_table(forecast_df):
    df = forecast_df.copy()

    # Define ordering cost per order based on ABC classification
    df['ordering_cost_per_order'] = df['ABC_Class'].map({'A': 800, 'B': 600, 'C': 400}).fillna(400)

    # Daily demand
    df['daily_demand'] = df['forecasted_demand']

    # Avoid division by zero in reorder point
    df['reorder_point_units'] = df['reorder_point_units'].replace(0, pd.NA)

    # Estimate initial cost using reorder point logic
    df['initial_cost'] = (
        (365 * df['daily_demand']) / df['reorder_point_units']
    ) * df['ordering_cost_per_order']
    df['initial_cost'] = df['initial_cost'].fillna(0)

    # Carrying cost per day
    df['holding_cost_rate'] = df['volatility_class'].map({
        'Highly Volatile': 0.25,
        'Variable': 0.22,
        'Stable': 0.18
    }).fillna(0.15)
    df['carrying_cost_per_day'] = df['unit_cost'] * df['holding_cost_rate']

    df['stockout_cost_per_unit'] = np.where(
    df['part_type'].str.contains('BRAKE_PAD', na=False), df['unit_cost'] * 5,
    np.where(df['part_type'].str.contains('FAN_MOTOR', na=False), df['unit_cost'] * 3, df['unit_cost'] * 2))
 
    # Grouped cost table by facility
    grouped = df.groupby('Location').agg({
        'carrying_cost_per_day': 'mean',
        'initial_cost': 'mean',
        'stockout_cost_per_unit': 'mean'
    }).reset_index()

    # Format records
    records = []
    for i, row in grouped.iterrows():
        facility = f"DC_{row['Location']}"
        records.extend([
            {
                "ID": f"Facility Expense {i*3+1}",
                "Facility": facility,
                "Expense Type": "carryingCost",
                "Value": round(row['carrying_cost_per_day'], 3),
                "Currency": "USD",
                "Time Unit": "day",
                "Product Unit": "m³",
                "Time Period": "(All periods)"
            },
            {
                "ID": f"Facility Expense {i*3+2}",
                "Facility": facility,
                "Expense Type": "initialCost",
                "Value": round(row['initial_cost'], 2),
                "Currency": "USD",
                "Time Unit": "",
                "Product Unit": "",
                "Time Period": "(All periods)"
            },
            {
                "ID": f"Facility Expense {i*3+3}",
                "Facility": facility,
                "Expense Type": "otherCost",
                "Value": round(row['stockout_cost_per_unit'], 2),
                "Currency": "USD",
                "Time Unit": "day",
                "Product Unit": "",
                "Time Period": "(All periods)"
            }
        ])

    return pd.DataFrame(records)

# Export
facility_expense_df = generate_facility_expense_table(forecast_df)
facility_expense_df.to_csv("facility_expense_table.csv", index=False)
facility_expense_df

Unnamed: 0,ID,Facility,Expense Type,Value,Currency,Time Unit,Product Unit,Time Period
0,Facility Expense 1,DC_CA_1,carryingCost,1.05,USD,day,m³,(All periods)
1,Facility Expense 2,DC_CA_1,initialCost,36281.44,USD,,,(All periods)
2,Facility Expense 3,DC_CA_1,otherCost,18.47,USD,day,,(All periods)
3,Facility Expense 4,DC_CA_2,carryingCost,0.798,USD,day,m³,(All periods)
4,Facility Expense 5,DC_CA_2,initialCost,32960.95,USD,,,(All periods)
5,Facility Expense 6,DC_CA_2,otherCost,14.09,USD,day,,(All periods)
6,Facility Expense 7,DC_CA_3,carryingCost,1.0,USD,day,m³,(All periods)
7,Facility Expense 8,DC_CA_3,initialCost,39126.43,USD,,,(All periods)
8,Facility Expense 9,DC_CA_3,otherCost,15.77,USD,day,,(All periods)
9,Facility Expense 10,DC_CA_4,carryingCost,0.828,USD,day,m³,(All periods)


##  AnyLogistix Locations Table

The **Locations Table** is an essential input file for supply chain simulation in **AnyLogistix**. It defines the geographic and structural details of all key nodes (e.g., customers, DCs, suppliers) in the network.

---

###  Purpose

To serve as the master reference for all physical locations used in the model, ensuring correct distance calculations, mapping, and network visualization.

---

###  Field Descriptions

| Column Name              | Description                                                                 |
|--------------------------|-----------------------------------------------------------------------------|
| `ID`                     | Unique identifier for the location (usually same as city name).            |
| `Code`                   | Short internal code for the location. Matches `Location` from raw data.    |
| `Name`                   | Descriptive label for display. Often same as `City`.                       |
| `City`                   | Full city name with state abbreviation (e.g., `"Dallas, TX"`).             |
| `Region`                 | Extracted state code or region from the city name (e.g., `"TX"`).          |
| `Country`                | Country where the location is situated (e.g., `"USA"`).                    |
| `Address`                | Optional field for detailed address. Often left blank.                     |
| `Latitude`               | Geographic latitude (in decimal degrees).                                  |
| `Longitude`              | Geographic longitude (in decimal degrees).                                 |
| `Autofill Coordinates`   | Set to `"FALSE"` to prevent AnyLogistix from overwriting coordinates.      |

---



### We will make the city location table and cusotmer location table then concatenate them for a export ready location table.



### City Location Table

In [26]:
# Create the Locations table for AnyLogistix
locations_table = simulation_demand_df[['Location', 'Latitude', 'Longitude']].drop_duplicates()

# Add static fields for required columns
locations_table['ID'] = locations_table['Location']
locations_table['Code'] = locations_table['Location']
locations_table['Name'] = locations_table['Location']
locations_table['City'] = locations_table['Location']
locations_table['Region'] = locations_table['Location'].apply(lambda x: x.split(',')[-1].strip() if ',' in x else 'N/A')
locations_table['Country'] = 'USA'
locations_table['Address'] = ''
locations_table['Autofill Coordinates'] = 'FALSE'

# Reorder columns to match AnyLogistix format
locations_table = locations_table[[
    'ID', 'Code', 'Name', 'City', 'Region', 'Country',
    'Address', 'Latitude', 'Longitude', 'Autofill Coordinates'
]]

# Optional: Save to CSV
locations_table.to_csv("locations_table_anylogistix.csv", index=False)

print(" AnyLogistix Locations Table Preview:")
locations_table

 AnyLogistix Locations Table Preview:


Unnamed: 0,ID,Code,Name,City,Region,Country,Address,Latitude,Longitude,Autofill Coordinates
0,"Austin, TX","Austin, TX","Austin, TX","Austin, TX",TX,USA,,30.2711,-97.7437,False
1,"San Jose, CA","San Jose, CA","San Jose, CA","San Jose, CA",CA,USA,,37.3362,-121.8906,False
2,"Milwaukee, WI","Milwaukee, WI","Milwaukee, WI","Milwaukee, WI",WI,USA,,43.0386,-87.9091,False
3,"Madison, WI","Madison, WI","Madison, WI","Madison, WI",WI,USA,,43.0748,-89.3838,False
6,"San Diego, CA","San Diego, CA","San Diego, CA","San Diego, CA",CA,USA,,32.7174,-117.1628,False
7,"Houston, TX","Houston, TX","Houston, TX","Houston, TX",TX,USA,,29.7589,-95.3677,False
8,"Dallas, TX","Dallas, TX","Dallas, TX","Dallas, TX",TX,USA,,32.7763,-96.7969,False
9,"Sacramento, CA","Sacramento, CA","Sacramento, CA","Sacramento, CA",CA,USA,,38.5811,-121.4939,False
29,"Green Bay, WI","Green Bay, WI","Green Bay, WI","Green Bay, WI",WI,USA,,44.5126,-88.0126,False
43,"Los Angeles, CA","Los Angeles, CA","Los Angeles, CA","Los Angeles, CA",CA,USA,,34.0537,-118.2428,False


### Customer Locations

In [27]:
import random
import pandas as pd
import re
import math

# 1. Define the anchor cities with their base coordinates
anchor_cities = [
    {"City": "Dallas, TX", "Lat": 32.7763, "Lon": -96.7969},
    {"City": "San Diego, CA", "Lat": 32.7174, "Lon": -117.1628},
    {"City": "Milwaukee, WI", "Lat": 43.0386, "Lon": -87.9091},
    {"City": "San Jose, CA", "Lat": 37.3362, "Lon": -121.8906},
    {"City": "Austin, TX", "Lat": 30.2711, "Lon": -97.7437},
    {"City": "Green Bay, WI", "Lat": 44.5126, "Lon": -88.0126},
    {"City": "Madison, WI", "Lat": 43.0748, "Lon": -89.3838},
    {"City": "Sacramento, CA", "Lat": 38.5811, "Lon": -121.4939},
    {"City": "Houston, TX", "Lat": 29.7589, "Lon": -95.3677},
    {"City": "Los Angeles, CA", "Lat": 34.0537, "Lon": -118.2428}
]

city_map = {c["City"]: c for c in anchor_cities}

# 2. The input sequence
location_sequence = """Dallas, TX Dallas, TX Dallas, TX Green Bay, WI Madison, WI Dallas, TX San Jose, CA 
Sacramento, CA Milwaukee, WI San Jose, CA Austin, TX Dallas, TX Madison, WI Dallas, TX 
Sacramento, CA Los Angeles, CA Milwaukee, WI San Diego, CA Dallas, TX Milwaukee, WI 
Dallas, TX Dallas, TX Los Angeles, CA Dallas, TX Sacramento, CA Dallas, TX Dallas, TX 
Madison, WI Madison, WI Sacramento, CA San Jose, CA Dallas, TX Houston, TX Los Angeles, CA 
Houston, TX Dallas, TX Green Bay, WI Milwaukee, WI Dallas, TX Los Angeles, CA Dallas, TX 
Dallas, TX San Jose, CA Sacramento, CA San Diego, CA Sacramento, CA San Jose, CA 
Dallas, TX Los Angeles, CA Green Bay, WI Dallas, TX Milwaukee, WI Dallas, TX Austin, TX 
San Jose, CA San Diego, CA Austin, TX Austin, TX Austin, TX Dallas, TX Madison, WI 
Houston, TX Green Bay, WI San Jose, CA Dallas, TX Los Angeles, CA San Jose, CA Austin, TX 
Los Angeles, CA Los Angeles, CA Los Angeles, CA Dallas, TX Dallas, TX Los Angeles, CA 
Sacramento, CA Austin, TX San Diego, CA Austin, TX San Jose, CA San Jose, CA Austin, TX 
San Jose, CA Austin, TX San Jose, CA Dallas, TX Los Angeles, CA Austin, TX Dallas, TX 
San Jose, CA Los Angeles, CA Austin, TX Austin, TX Houston, TX Green Bay, WI San Diego, CA 
Houston, TX San Jose, CA San Diego, CA San Diego, CA San Diego, CA"""

pattern = r'(Dallas, TX|San Diego, CA|Milwaukee, WI|San Jose, CA|Austin, TX|Green Bay, WI|Madison, WI|Sacramento, CA|Houston, TX|Los Angeles, CA)'
locations = re.findall(pattern, location_sequence)

random.seed(42)
simulated_data = []

# 250 km = ~2.25° lat, scale lon with cos(lat)
MAX_LAT_OFFSET = 2.25  # degrees

for idx, city_name in enumerate(locations):
    anchor = city_map.get(city_name)
    if anchor:
        base_lat = anchor["Lat"]
        base_lon = anchor["Lon"]

        # Longitude offset adjusted for latitude
        lat_offset = random.uniform(-MAX_LAT_OFFSET, MAX_LAT_OFFSET)
        lon_offset = random.uniform(-MAX_LAT_OFFSET, MAX_LAT_OFFSET) / math.cos(math.radians(base_lat))

        new_lat = base_lat + lat_offset
        new_lon = base_lon + lon_offset

        simulated_data.append({
            "ID": f"Customer {idx + 1}",
            "Code": city_name,
            "Latitude": round(new_lat, 6),
            "Longitude": round(new_lon, 6)
        })

df = pd.DataFrame(simulated_data)


df.to_csv("simulated_customer_locations.csv", index=False)
df=df.merge(anylogix_customers_table,on='ID',how="inner")
df

Unnamed: 0,ID,Code,Latitude,Longitude,Name,Type,Location,Inclusion Type,Icon
0,Customer 1,"Dallas, TX",33.403721,-99.339093,SuperCustomer_1,Customer,"Milwaukee, WI",Include,116
1,Customer 2,"Dallas, TX",31.763932,-98.278305,SuperCustomer_12,Customer,"Dallas, TX",Include,116
2,Customer 3,"Dallas, TX",33.840420,-95.851186,SuperCustomer_93,Customer,"Milwaukee, WI",Include,116
3,Customer 4,"Green Bay, WI",46.277408,-90.619226,SuperCustomer_7,Customer,"Milwaukee, WI",Include,116
4,Customer 5,"Madison, WI",42.723448,-92.280474,SuperCustomer_73,Customer,"Dallas, TX",Include,116
...,...,...,...,...,...,...,...,...,...
95,Customer 96,"Houston, TX",29.407367,-96.515634,SuperCustomer_95,Customer,"Austin, TX",Include,116
96,Customer 97,"San Jose, CA",36.210329,-119.495030,SuperCustomer_85,Customer,"San Jose, CA",Include,116
97,Customer 98,"San Diego, CA",32.461488,-115.230100,SuperCustomer_41,Customer,"Houston, TX",Include,116
98,Customer 99,"San Diego, CA",32.943864,-119.566509,SuperCustomer_9,Customer,"Austin, TX",Include,116


In [28]:
# Merge with customer metadata
df = df.merge(anylogix_customers_table, on='ID', how='inner')

# Format Name and ID to include "Location" suffix
df['ID'] = df['ID'] + ' Location'
df['Name'] = df['ID']  # Same as updated ID

# Extract city and assign region/country/address
df['City'] = df['Code']
df['Region'] = df['City'].str.extract(r',\s*(\w{2})')[0]  # e.g., 'TX' from 'Dallas, TX'
df['Region'] = df['Region'].replace({
    'TX': 'Texas',
    'CA': 'California',
    'WI': 'Wisconsin'
})
df['Country'] = 'United States'
df['Address'] = ''
df['Autofill Coordinates'] = 'FALSE'

# Final column order
df_final = df[[
    'ID', 'Code', 'Name', 'City', 'Region', 'Country', 'Address',
    'Latitude', 'Longitude', 'Autofill Coordinates'
]]

# Save to CSV
df_final.to_csv("formatted_customer_locations.csv", index=False)
df_final

Unnamed: 0,ID,Code,Name,City,Region,Country,Address,Latitude,Longitude,Autofill Coordinates
0,Customer 1 Location,"Dallas, TX",Customer 1 Location,"Dallas, TX",Texas,United States,,33.403721,-99.339093,FALSE
1,Customer 2 Location,"Dallas, TX",Customer 2 Location,"Dallas, TX",Texas,United States,,31.763932,-98.278305,FALSE
2,Customer 3 Location,"Dallas, TX",Customer 3 Location,"Dallas, TX",Texas,United States,,33.840420,-95.851186,FALSE
3,Customer 4 Location,"Green Bay, WI",Customer 4 Location,"Green Bay, WI",Wisconsin,United States,,46.277408,-90.619226,FALSE
4,Customer 5 Location,"Madison, WI",Customer 5 Location,"Madison, WI",Wisconsin,United States,,42.723448,-92.280474,FALSE
...,...,...,...,...,...,...,...,...,...,...
95,Customer 96 Location,"Houston, TX",Customer 96 Location,"Houston, TX",Texas,United States,,29.407367,-96.515634,FALSE
96,Customer 97 Location,"San Jose, CA",Customer 97 Location,"San Jose, CA",California,United States,,36.210329,-119.495030,FALSE
97,Customer 98 Location,"San Diego, CA",Customer 98 Location,"San Diego, CA",California,United States,,32.461488,-115.230100,FALSE
98,Customer 99 Location,"San Diego, CA",Customer 99 Location,"San Diego, CA",California,United States,,32.943864,-119.566509,FALSE


### Final Location Table for Anylogistix

In [29]:
final=pd.concat([df_final,locations_table])
final

Unnamed: 0,ID,Code,Name,City,Region,Country,Address,Latitude,Longitude,Autofill Coordinates
0,Customer 1 Location,"Dallas, TX",Customer 1 Location,"Dallas, TX",Texas,United States,,33.403721,-99.339093,FALSE
1,Customer 2 Location,"Dallas, TX",Customer 2 Location,"Dallas, TX",Texas,United States,,31.763932,-98.278305,FALSE
2,Customer 3 Location,"Dallas, TX",Customer 3 Location,"Dallas, TX",Texas,United States,,33.840420,-95.851186,FALSE
3,Customer 4 Location,"Green Bay, WI",Customer 4 Location,"Green Bay, WI",Wisconsin,United States,,46.277408,-90.619226,FALSE
4,Customer 5 Location,"Madison, WI",Customer 5 Location,"Madison, WI",Wisconsin,United States,,42.723448,-92.280474,FALSE
...,...,...,...,...,...,...,...,...,...,...
7,"Houston, TX","Houston, TX","Houston, TX","Houston, TX",TX,USA,,29.758900,-95.367700,FALSE
8,"Dallas, TX","Dallas, TX","Dallas, TX","Dallas, TX",TX,USA,,32.776300,-96.796900,FALSE
9,"Sacramento, CA","Sacramento, CA","Sacramento, CA","Sacramento, CA",CA,USA,,38.581100,-121.493900,FALSE
29,"Green Bay, WI","Green Bay, WI","Green Bay, WI","Green Bay, WI",WI,USA,,44.512600,-88.012600,FALSE


##  Products Table for AnyLogistix

This **Products Table** defines the catalog of items to be used across your supply chain simulation. Each product includes essential cost and pricing data for financial analysis within AnyLogistix.

---

###  Table Details

| Column          | Description                                                        |
|-----------------|--------------------------------------------------------------------|
| `ID`            | Product type identifier (from `part_type`)                         |
| `Name`          | Same as ID; used for display purposes                              |
| `Unit`          | Product unit of measurement — set as `pcs`                         |
| `Selling Price` | Mean per-unit revenue (based on cleaned forecasted demand)         |
| `Unit Cost`     | Mean per-unit procurement or production cost                       |
| `Currency`      | Currency used in simulation — set as `USD`                         |

---

###  Cleaning Rules Applied

- Excluded rows where `forecasted_demand` ≤ 1 or missing — ensures realistic per-unit values.
- Aggregated `unit_cost` and `revenue` (renamed to `selling_price`) at `part_type` level.
- Rounded monetary values to two decimal places.

---




In [30]:
import pandas as pd

# Step 1: Filter required columns
required_cols = ['part_id', 'part_type', 'unit_cost', 'revenue']
missing = [col for col in required_cols if col not in forecast_df.columns]
assert not missing, f"Missing columns: {missing}"

# Step 2: Clean zero or missing demand and calculate selling price
forecast_df = forecast_df.copy()

# Remove rows with zero or missing forecasted_demand to avoid division errors
forecast_df = forecast_df[forecast_df['forecasted_demand'].notna() & (forecast_df['forecasted_demand'] > 1)]

# Calculate per-unit selling price
forecast_df['selling_price'] = forecast_df['revenue']

# Step 3: Aggregate by product type
products_df = forecast_df.groupby('part_type').agg({
    'unit_cost': 'mean',
    'selling_price': 'mean',
    
}).reset_index()

# Step 4: Final formatting
products_df.rename(columns={
    'part_type': 'ID',
    'unit_cost': 'Unit Cost',
    'selling_price': 'Selling Price',
}, inplace=True)

products_df['Name'] = products_df['ID']
products_df['Unit'] = 'pcs'
products_df['Currency'] = 'USD'

# Reorder and round
products_df = products_df[['ID', 'Name', 'Unit', 'Selling Price', 'Unit Cost', 'Currency']]
products_df['Selling Price'] = products_df['Selling Price'].round(2)
products_df['Unit Cost'] = products_df['Unit Cost'].round(2)


# Sort by revenue descending
products_df = products_df.sort_values(by='Name', ascending=True).reset_index(drop=True)

# Save to CSV
products_df.to_csv("products_table_cleaned.csv", index=False)

# Final preview
products_df


Unnamed: 0,ID,Name,Unit,Selling Price,Unit Cost,Currency
0,BRAKE_PAD_TYPE_1,BRAKE_PAD_TYPE_1,pcs,5.74,2.4,USD
1,BRAKE_PAD_TYPE_2,BRAKE_PAD_TYPE_2,pcs,11.19,4.14,USD
2,BRAKE_PAD_TYPE_3,BRAKE_PAD_TYPE_3,pcs,7.07,2.48,USD
3,FAN_MOTOR_TYPE_1,FAN_MOTOR_TYPE_1,pcs,8.88,4.56,USD
4,FAN_MOTOR_TYPE_2,FAN_MOTOR_TYPE_2,pcs,5.3,3.25,USD
5,LED_PANEL_TYPE_1,LED_PANEL_TYPE_1,pcs,7.79,4.84,USD
6,LED_PANEL_TYPE_2,LED_PANEL_TYPE_2,pcs,2.96,2.02,USD


In [31]:
simulation_demand_df['Product'].value_counts()

Product
BRAKE_PAD_TYPE_3    639
LED_PANEL_TYPE_1    347
BRAKE_PAD_TYPE_2    268
FAN_MOTOR_TYPE_2    257
FAN_MOTOR_TYPE_1    228
LED_PANEL_TYPE_2    184
BRAKE_PAD_TYPE_1     77
Name: count, dtype: int64

#  Completing the AnyLogistix Simulation: Final Table Creation Strategy

After programmatically generating core supply chain tables like **Locations**, **Facility Expenses**, **Products**, **Inventory Policies**, and **Customer Coordinates**, the following approach outlines how the **remaining required tables** will be constructed to finalize the model for simulation in **AnyLogistix**.

---

##  Tables Already Created via Python Scripts

| Table Name           | Status  | Description |
|----------------------|---------|-------------|
| `Locations`          | ✅ Done | Geographic coordinates for all demand and supply nodes. |
| `Facility Expenses`  | ✅ Done | Captures carrying, initial, and stockout costs per facility. |
| `Products`           | ✅ Done | Cleaned product catalog with cost and selling price details. |
| `Inventory`          | ✅ Done | Min-Max-Safety stock policy table by product and facility. |
| `Customers`          | ✅ Done | Simulated customer list with coordinates and IDs. |
| `BOM`                          |✅ Done      | Define bill of materials for each finished product. |
| `Distribution Centers`         | ✅ Done     | Site types, capacities, and coordinates. |
| `Demand`                       | ✅ Done    | Historical/forecasted demand per customer-product-period. |
---

##  Remaining Tables and Construction Approach

All remaining tables will be created using a **hybrid method**:
- **Excel**: For structured tabular data input with formulas, dropdowns, and validation.
- **AnyLogistix GFA & NO Modules**: For configuring process-specific tables like factories, paths, production settings, and transportation logic.

| Table Name                     | Method     | Notes |
|--------------------------------|------------|-------|
| `Scenario Settings`            | Excel      | Define scenario name, units, and global parameters. |
| `Factories`                    | Excel      | Define manufacturing sites, capacities, and setup costs. |
| `Groups`, `Groups Customers`, `Groups Sites`, `Groups Suppliers` | Excel | For segmentation or zone-wise management. |
| `Unit Conversions`             | Excel      | If non-standard units (e.g., pallets, kg) are used. |
| `Paths`                        | ALX GFA    | Define allowed paths between sites. |
| `Periods`                      | Excel      | Define discrete simulation periods (e.g., days or weeks). |
| `Processing Cost`              | Excel      | Cost for value-adding activities like assembly, packaging. |
| `Production`                   | Excel      | Output capacity, cycle times, lot sizes. |
| `Shipping`                     | Excel      | Define shipping policies like lot size, shipment frequency. |
| `Shipping Destinations`        | Excel      | Explicit destinations and mapping for each DC/factory. |
| `Sourcing`                     | Excel      | Which facility sources which product from whom. |
| `Sourcing Sources`            | Excel      | Product-level mapping for inbound sourcing. |
| `Suppliers`                    | Excel      | Raw material or component suppliers and lead times. |
| `Tariffs`                      | Excel      | Cross-border or zone-wise tariffs. |
| `Vehicle Types`                | Excel      | Transport modes: truck, rail, air, etc. |
| `Experiments`, `Experiment Statistics Settings`, `Experiment Dashboards`, `Experiment Metrics` | ALX Interface | To configure and track simulation KPIs. |
| `Project Units`, `Project Unit Conversions` | Excel | Set currency, distance, volume, weight units for the model. |
| `Icons`                        | ALX Interface | For visual layer customization. |

---

##  Workflow Summary

1.  **Data-Driven Tables**: Generated using Python scripts (Locations, Products, Customers, Inventory, Expenses).
2.  **Excel-Driven Tables**: Structured manually based on data from Python exports and simulation planning logic.
3.  **AnyLogistix GUI (GFA & NO)**: Used to model paths, define experiments, set constraints, and run simulations.
4.  **Final Simulation**: After all tables are populated and imported, the model will be tested and validated using **AnyLogistix Experiment Modules** (Performance, Optimization, Risk Analysis, etc.).

---

## Final Deliverables

- `*.csv` files for all input tables
- `.alx` project file with simulation logic
- Documented assumptions, cost structures, and policies
- Dashboard exports and metrics after experiment runs

---

By combining **programmatic automation**, **structured Excel planning**, and **AnyLogistix's GFA + NO interface**, this approach ensures a scalable and simulation-ready supply chain model for robust decision-making.
