# Product Analysis Notebook: for MES conversion effort estimation

This notebook demonstrates how to perform a multi-step product analysis for a certain planning area to prepare for MES roll-out project.

### Step 1: Setup all supporting modules

First, we import the necessary libraries and use our helper function to get a direct `pyodbc` connection to the database.

In [27]:
import pandas as pd
import time
from database_connections import get_db_connection
import warnings

from config import FACILITY_CODE, PLANNING_AREA, RND_GROUP, FILE_USER_DATA

# --- TIMER START ---
start_time = time.time()
print("Analysis started. Timer initiated...")
# -------------------

# Add this line to ignore the specific UserWarning from pandas
warnings.filterwarnings(
    'ignore',
    category=UserWarning,
    module='pandas'
)

Analysis started. Timer initiated...


### Step 2: Run Initial Query and Create item list table `T_A00`

We define and execute our main SQL query for applicable item based on set of filter requirements. The results are loaded into a pandas DataFrame called `T_A00`. This DataFrame is now our temporary table, stored locally in the notebook's memory.

In [28]:
# Get a connection to the 'v12live' DSN
connection = get_db_connection('v12live')

T_A00 = pd.DataFrame()
planning_area_str = "','".join(PLANNING_AREA)

sql_query = f"""
SELECT 
    T2.MBWHLO,
    T1.MMSTAT,
    T1.MMCHCD,
    T1.MMINDI,
    T1.MMITCL,
    T2.MBSTAT,
    T2.MBRESP,
    T2.MBPUIT,
    T1.MMACRF,
    T1.MMITNO AS POPRNO,
    T1.MMITDS,
    T1.MMFUDS,
    T1.MMCFI3 AS PLC
FROM 
    MVXCDTA.MITMAS AS T1 
INNER JOIN 
    MVXCDTA.MITBAL AS T2 ON T1.MMITNO = T2.MBITNO AND T1.MMCONO = T2.MBCONO
WHERE 
    T2.MBWHLO = '{FACILITY_CODE}' 
    AND T1.MMSTAT < '80' 
    AND T2.MBSTAT < '80' 
    AND T1.MMACRF <> 'CUSTREP'
    AND T2.MBRESP IN ('{planning_area_str}')
    AND T2.MBPUIT = 1
"""

if connection:
    try:
        print("--- Fetching initial data from database ---")
        T_A00 = pd.read_sql_query(sql_query, connection)
        
        # FIX: Standardize all column names to lowercase for consistency
        T_A00.columns = T_A00.columns.str.lower()
        
        print(f"✅ Query successful! Found {len(T_A00)} rows.")
        print("Data loaded into 'T_A00' DataFrame.")
    except Exception as e:
        print(f"❌ Error executing query: {e}")
    finally:
        connection.close()
        print("\n🔌 Connection closed.")
else:
    print("⚠️ Cannot run query, no active database connection.")
    
if not T_A00.empty:
    print("\n--- Enriching data with Product Group descriptions ---")
    
    # MODIFIED: Create the mapping DataFrame directly from the imported RND_GROUP
    df_product_group_map = pd.DataFrame(RND_GROUP)

    # Prepare the join key by stripping whitespace
    T_A00['mmitcl'] = T_A00['mmitcl'].str.strip()
    
    # Perform a left merge to add the 'Product' column to T_A00
    T_A00 = pd.merge(
        T_A00,
        df_product_group_map,
        left_on='mmitcl',
        right_on='ProductGroup',
        how='left'
    )
    
    # Drop the redundant 'ProductGroup' column
    T_A00 = T_A00.drop(columns=['ProductGroup'])
    
    print("✅ Join successful. 'Product' column added to T_A00.")
    display(T_A00.head())

print(f"The number of rows is: {len(T_A00)}")

✅ Connection successful to 'v12live' (v12Live).
--- Fetching initial data from database ---


  T_A00 = pd.read_sql_query(sql_query, connection)


✅ Query successful! Found 555 rows.
Data loaded into 'T_A00' DataFrame.

🔌 Connection closed.

--- Enriching data with Product Group descriptions ---
✅ Join successful. 'Product' column added to T_A00.


Unnamed: 0,mbwhlo,mmstat,mmchcd,mmindi,mmitcl,mbstat,mbresp,mbpuit,mmacrf,poprno,mmitds,mmfuds,plc,Product
0,MF1,20,0.0,3.0,FD00,20,MP-5310,1.0,.6S,104092,UC500-30GM-IUR2-V15,Ultrasonic sensor UC500-30GM-IUR2-...,310,Ultraschall
1,MF1,20,0.0,3.0,FD00,20,MP-5310,1.0,.6S,104093,UC2000-30GM-IUR2-V15,Ultrasonic sensor UC2000-30GM-IUR2...,310,Ultraschall
2,MF1,20,0.0,3.0,FD00,20,MP-5310,1.0,.6S,104094,UC4000-30GM-IUR2-V15,Ultrasonic sensor UC4000-30GM-IUR2...,310,Ultraschall
3,MF1,20,0.0,3.0,FD00,20,MP-5310,1.0,.6S,104095,UC6000-30GM-IUR2-V15,Ultrasonic sensor UC6000-30GM-IUR2...,310,Ultraschall
4,MF1,20,0.0,0.0,FD00,20,MP-5310,1.0,.4S,107386,Osc.Hd F43/F104,Neigungssensor 03-6546B ...,310,Ultraschall


The number of rows is: 555


### Step 3: Product structure check: single line, master or variant items in table `A01_Variant_Check`

This step groups item list based on planning area and product type in column `MMCHCD` to identify variant generator items

In [29]:
A01_Variant_Check = pd.DataFrame()
product_map = {
    0: "Single line items",
    2: 'Master items',
    3: "Variant items"
}
if not T_A00.empty:
    print("--- Creating A01_Variant_Check ---")
    
    # MODIFIED: Added 'Product' to the grouping
    A01_Variant_Check = T_A00.groupby(['mbresp', 'Product', 'mmchcd'])['poprno'].count().reset_index()
    A01_Variant_Check = A01_Variant_Check.rename(columns={'poprno': 'ItemCount', 'mmchcd': 'ProductType', 'mbresp': 'PlanningArea'})
    A01_Variant_Check['ProductType'] = A01_Variant_Check['ProductType'].replace(product_map)
    A01_Variant_Check = A01_Variant_Check.sort_values(by=['PlanningArea', 'Product', 'ProductType'], ascending=True)
    
    print("✅ Analysis complete.")
    display(A01_Variant_Check.head(10))
else:
    print("⚠️ T_A00 DataFrame is empty, skipping analysis.")
    
print(f"The number of rows is: {len(A01_Variant_Check)}")

--- Creating A01_Variant_Check ---
✅ Analysis complete.


Unnamed: 0,PlanningArea,Product,ProductType,ItemCount
0,MP-5310,FA-Optp Generally,Single line items,8
2,MP-5310,Radar,Master items,2
1,MP-5310,Radar,Single line items,1
3,MP-5310,Radar,Variant items,18
5,MP-5310,Ultraschall,Master items,3
4,MP-5310,Ultraschall,Single line items,366
6,MP-5310,Ultraschall,Variant items,44
7,MP-5320,Ultraschall,Single line items,94
8,MP-5330,Ultraschall,Single line items,18


The number of rows is: 9


### Step 4: Account control object check: in table `A02_ACO_Check`

This cell performs another aggregation on our initial DataFrame to count items based on planner code and the Acount Control Object `MMACRF` field to identify Finish Good (FG, .6x) and Semi-Finish Good (SFG, .4x)

In [30]:
A02_ACO_Check = pd.DataFrame()

if not T_A00.empty:
    print("--- Creating A02_ACO_Check ---")
    
    # MODIFIED: Added 'Product' to the grouping
    A02_ACO_Check = T_A00.groupby(['mbresp', 'Product', 'mmacrf'])['poprno'].count().reset_index()
    
    # Rename the columns to match the SQL 'AS' clauses
    A02_ACO_Check = A02_ACO_Check.rename(columns={
        'mbresp': 'PlannerCode',
        'mmacrf': 'ACO',
        'poprno': 'ItemCount'
    })
    
    # Sort the results by MMACRF
    A02_ACO_Check = A02_ACO_Check.sort_values(by=['PlannerCode', 'Product', 'ACO'])
    
    print("✅ Analysis complete.")
    display(A02_ACO_Check)
else:
    print("⚠️ T_A00 DataFrame is empty, skipping analysis.")
    
print(f"The number of rows is: {len(A02_ACO_Check)}")

--- Creating A02_ACO_Check ---
✅ Analysis complete.


Unnamed: 0,PlannerCode,Product,ACO,ItemCount
0,MP-5310,FA-Optp Generally,.4S,8
1,MP-5310,Radar,.4S,1
2,MP-5310,Radar,.6S,20
3,MP-5310,Ultraschall,.4M,4
4,MP-5310,Ultraschall,.4S,14
5,MP-5310,Ultraschall,.6M,64
6,MP-5310,Ultraschall,.6S,331
7,MP-5320,Ultraschall,.4S,91
8,MP-5320,Ultraschall,.6S,3
9,MP-5330,Ultraschall,.4S,18


The number of rows is: 10


### Step 5: Check item with lot control in table `A03_LotControled`

This step performs a more detailed aggregation, grouping by three columns to count items based on their lot control status (`MMINDI`). Any item with `MMINDI` = 0 is not yet lot controlled.

In [31]:
A03_LotControled = pd.DataFrame()

if not T_A00.empty:
    print("--- Creating A03_LotControled ---")
    
    # MODIFIED: Added 'Product' to the grouping
    A03_LotControled = T_A00.groupby(['mbresp', 'Product', 'mmacrf', 'mmindi'])['poprno'].count().reset_index()
    
    # Rename the columns
    A03_LotControled = A03_LotControled.rename(columns={
        'mbresp': 'PlannerCode',
        'mmacrf': 'ACO',
        'mmindi': 'LotCtrolMethod',
        'poprno': 'LotCtrolCount'
    })
    
    # Sort the results
    A03_LotControled = A03_LotControled.sort_values(by=['PlannerCode', 'Product', 'ACO', 'LotCtrolMethod'])
    
    print("✅ Analysis complete.")
    display(A03_LotControled)
else:
    print("⚠️ T_A00 DataFrame is empty, skipping analysis.")
    
print(f"The number of rows is: {len(A03_LotControled)}")

--- Creating A03_LotControled ---
✅ Analysis complete.


Unnamed: 0,PlannerCode,Product,ACO,LotCtrolMethod,LotCtrolCount
0,MP-5310,FA-Optp Generally,.4S,0.0,2
1,MP-5310,FA-Optp Generally,.4S,3.0,6
2,MP-5310,Radar,.4S,3.0,1
3,MP-5310,Radar,.6S,3.0,20
4,MP-5310,Ultraschall,.4M,0.0,2
5,MP-5310,Ultraschall,.4M,1.0,1
6,MP-5310,Ultraschall,.4M,3.0,1
7,MP-5310,Ultraschall,.4S,0.0,4
8,MP-5310,Ultraschall,.4S,3.0,10
9,MP-5310,Ultraschall,.6M,3.0,64


The number of rows is: 18


### Step 6: Check product life cycle in `A03_PLCCheck`

This aggregation counts the number of items for each combination of planner and product line code (`PLC`). Item with `PLC` = [311,411] are subjected to transfer, while items with PLC >=490 are phased out.

In [32]:
A03_PLCCheck = pd.DataFrame()

if not T_A00.empty:
    print("--- Creating A03_PLCCheck ---")
    
    # MODIFIED: Added 'Product' to the grouping
    A03_PLCCheck = T_A00.groupby(['mbresp', 'Product', 'plc'])['poprno'].count().reset_index()
    
    # Rename the columns
    A03_PLCCheck = A03_PLCCheck.rename(columns={
        'mbresp': 'PlannerCode',
        'plc': 'ProductLifeCycle',
        'poprno': 'ItemCount'
    })
    
    # Sort the results
    A03_PLCCheck = A03_PLCCheck.sort_values(by=['PlannerCode', 'Product', 'ProductLifeCycle'])
    
    print("✅ Analysis complete.")
    display(A03_PLCCheck)
else:
    print("⚠️ T_A00 DataFrame is empty, skipping analysis.")
    
print(f"The number of rows is: {len(A03_PLCCheck)}")

--- Creating A03_PLCCheck ---
✅ Analysis complete.


Unnamed: 0,PlannerCode,Product,ProductLifeCycle,ItemCount
0,MP-5310,FA-Optp Generally,310,7
1,MP-5310,FA-Optp Generally,490,1
2,MP-5310,Radar,200,13
3,MP-5310,Radar,310,8
4,MP-5310,Ultraschall,200,26
5,MP-5310,Ultraschall,221,1
6,MP-5310,Ultraschall,240,13
7,MP-5310,Ultraschall,300,1
8,MP-5310,Ultraschall,310,285
9,MP-5310,Ultraschall,311,64


The number of rows is: 22


### Step 7: Generate full product routing by linking item table `T_A0O` to database & create routing table `T_A10`

This step links our initial DataFrame (`T_A00`) with two new database tables (`MPDOPE` and `MPDHED`).

1.  We extract the unique item numbers (`POPRNO`) from our DataFrame.
2.  We use these item numbers to build a new SQL query that efficiently fetches only the required data from the database.
3.  We execute this query and get a new DataFrame with the details.
4.  Finally, we perform a `merge` (join) in pandas to combine our original data with the new details, creating the final `T_A10` DataFrame.

In [33]:
T_A10 = pd.DataFrame() # Initialize T_A10 as an empty DataFrame

if not T_A00.empty:
    print("--- Linking DataFrame to MPDOPE and MPDHED tables ---")
    
    # 1. Extract the unique keys from your DataFrame using the lowercase name
    item_numbers = T_A00['poprno'].unique().tolist()
    
    if item_numbers:
        # 2. Format the list of keys for the SQL 'IN' clause
        formatted_keys = ", ".join([f"'{item}'" for item in item_numbers])

        # 3. Construct the new SQL query
        details_sql_query = f"""
        SELECT 
            T_OPE.POPRNO, T_HED.PHSTAT, T_OPE.POSTRT, T_OPE.POOPNO, T_OPE.POPLGR, 
            T_OPE.POOPDS, T_OPE.POTXT1, T_OPE.POTXT2, T_OPE.PODOID, T_OPE.POAURP, 
            T_OPE.POCONO, T_OPE.POFACI
        FROM 
            MVXCDTA.MPDOPE AS T_OPE
        INNER JOIN 
            MVXCDTA.MPDHED AS T_HED ON T_OPE.POPRNO = T_HED.PHPRNO 
                                    AND T_OPE.POFACI = T_HED.PHFACI 
                                    AND T_OPE.POCONO = T_HED.PHCONO
        WHERE 
            T_OPE.POSTRT = 'STD'
            AND T_OPE.POCONO = 1
            AND T_OPE.POFACI = '{FACILITY_CODE}'
            AND T_OPE.POTDAT = 99999999
            AND T_OPE.POPRNO IN ({formatted_keys})
        """

        # 4. Execute the query to get details
        df_details = pd.DataFrame()
        connection = get_db_connection('v12live') # Re-open a connection
        if connection:
            try:
                print(f"Fetching details for {len(item_numbers)} items...")
                df_details = pd.read_sql_query(details_sql_query, connection)
                
                # Standardize details columns to lowercase
                df_details.columns = df_details.columns.str.lower()
                
                print(f"✅ Query successful! Found {len(df_details)} matching detail rows.")

                # 5. Perform the final join in pandas to create T_A10
                T_A10 = pd.merge(
                    left=T_A00,
                    right=df_details,
                    on='poprno', # Join on the lowercase common item number column
                    how='inner'  # Use 'inner' to match the SQL INNER JOIN
                )
                
                # MODIFIED: Select and reorder columns, adding 'Product'
                T_A10 = T_A10[[
                    'plc', 'poprno', 'mmacrf', 'mbpuit', 'phstat', 'postrt', 
                    'poopno', 'poplgr', 'poopds', 'potxt1', 'potxt2', 'podoid', 'poaurp', 
                    'pocono', 'pofaci', 'mbresp', 'Product'
                ]]

                print("\n--- Final T_A10 DataFrame created ---")
                display(T_A10.head())

            except Exception as e:
                print(f"❌ Error executing the details query: {e}")
            finally:
                connection.close()
                print("\n🔌 Connection closed.")
    else:
        print("⚠️ No item numbers in T_A00 to use for the next query.")
else:
    print("⚠️ T_A00 DataFrame is empty, skipping the linking step.")

--- Linking DataFrame to MPDOPE and MPDHED tables ---
✅ Connection successful to 'v12live' (v12Live).
Fetching details for 555 items...


  df_details = pd.read_sql_query(details_sql_query, connection)


✅ Query successful! Found 18504 matching detail rows.

--- Final T_A10 DataFrame created ---


Unnamed: 0,plc,poprno,mmacrf,mbpuit,phstat,postrt,poopno,poplgr,poopds,potxt1,potxt2,podoid,poaurp,pocono,pofaci,mbresp,Product
0,310,104092,.6S,1.0,20,STD,10.0,5310 M,CHANGE,ECN 31915 ...,...,,1.0,1.0,MF1,MP-5310,Ultraschall
1,310,104092,.6S,1.0,20,STD,15.0,5310 K,KANBAN KITTING ULTRASONIC-US1,...,...,,1.0,1.0,MF1,MP-5310,Ultraschall
2,310,104092,.6S,1.0,20,STD,20.0,0000 N,CIRCUIT DIAGRAM,ECN-55598 ...,...,01-A3W7B,1.0,1.0,MF1,MP-5310,Ultraschall
3,310,104092,.6S,1.0,20,STD,30.0,5345,LASER MARKING,ECN-56291 ...,...,57-7227N,2.0,1.0,MF1,MP-5310,Ultraschall
4,310,104092,.6S,1.0,20,STD,31.0,5345,INSTRUCTION,PlanTime CIF-A4J1 ...,...,57-9948,1.0,1.0,MF1,MP-5310,Ultraschall



🔌 Connection closed.


### Step 8: Consolidate total Document used inside routing in table `T_A11`

This step creates the `T_A11` table by aggregating the `T_A10` data. It trims whitespace from the document ID, groups by planner and the cleaned ID, and counts the number of document ID.

In [34]:
T_A11 = pd.DataFrame() # Initialize T_A11 as an empty DataFrame

if not T_A10.empty:
    print("--- Creating T_A11 by grouping T_A10 ---")
    
    # Create a working copy to avoid SettingWithCopyWarning
    temp_df = T_A10.copy()
    
    # Trim whitespace from the 'podoid' column, handling potential non-string data
    temp_df['m3doid'] = temp_df['podoid'].astype(str).str.strip()
    
    # MODIFIED: Added 'Product' to the grouping
    T_A11 = temp_df.groupby(['mbresp', 'Product', 'm3doid'])['poopno'].count().reset_index()
    
    # Rename the columns to match the desired output
    T_A11 = T_A11.rename(columns={
        'mbresp': 'PlannerCode',
        'm3doid': 'M3DOID',
        'poopno': 'UsageCnt'
    })
    
    print("✅ T_A11 DataFrame created successfully.")
    display(T_A11.head())
else:
    print("⚠️ T_A10 DataFrame is empty, skipping creation of T_A11.")
    
print(f"The number of rows is: {len(T_A11)}")

--- Creating T_A11 by grouping T_A10 ---
✅ T_A11 DataFrame created successfully.


Unnamed: 0,PlannerCode,Product,M3DOID,UsageCnt
0,MP-5310,FA-Optp Generally,,16
1,MP-5310,FA-Optp Generally,57-A2B5,2
2,MP-5310,FA-Optp Generally,57-A5A1C,12
3,MP-5310,FA-Optp Generally,65-1840A,6
4,MP-5310,FA-Optp Generally,65-6695,2


The number of rows is: 2765


### Step 9: Connect M3DOID to EDM database `EDMEWAREAD` to create `A12_DoKID_EDM`
This step connects to a different database (EDMEWAREAD) to enrich our T_A11 data with basic information `Info1` and `Info2`

1. We extract the unique document IDs (M3DOID) from T_A11.
2. We connect to the edw DSN.
3. We query the ADMEDP_EDM_DOCS table using these IDs, breaking them into chunks to avoid database errors (each trunk = 999 items because limit = 1000 items).
4. We perform a final merge (join) in pandas to combine the usage counts with the document details.

In [35]:
A12_DoKID_EDM = pd.DataFrame() # Initialize as empty
if not T_A11.empty:
    print("--- Joining T_A11 with ADMEDP_EDM_DOCS from EDW database ---")
    
    # 1. Extract the unique document IDs from T_A11 and filter out empty strings
    doc_ids = [doc_id for doc_id in T_A11['M3DOID'].unique().tolist() if doc_id]
    
    if doc_ids:
        # 2. Connect to the 'edw' database
        edw_connection = get_db_connection('edw')
        
        if edw_connection:
            df_docs_details_list = []
            # FIX: Break the list of IDs into chunks of 999 to avoid Oracle DB error
            chunk_size = 999
            id_chunks = [doc_ids[i:i + chunk_size] for i in range(0, len(doc_ids), chunk_size)]
            
            try:
                for i, chunk in enumerate(id_chunks):
                    print(f"Fetching document details from EDW: chunk {i+1}/{len(id_chunks)}")
                    formatted_doc_ids = ", ".join([f"'{doc_id}'" for doc_id in chunk])
                    
                    # 3. Construct the query for the EDW database
                    docs_sql_query = f"""
                    SELECT 
                        DOCNUMBER, 
                        INFO, 
                        INFO1
                    FROM 
                        ADMEDP.EDM_DOCS
                    WHERE 
                        DOCNUMBER IN ({formatted_doc_ids})
                    """
                    df_chunk = pd.read_sql_query(docs_sql_query, edw_connection)
                    df_docs_details_list.append(df_chunk)
                
                # Combine all chunks into one DataFrame
                df_docs_details = pd.concat(df_docs_details_list, ignore_index=True)
                df_docs_details.columns = df_docs_details.columns.str.lower()
                print(f"✅ Query successful! Found {len(df_docs_details)} matching documents in total.")
                
                # 4. Join the T_A11 data with the document details
                A12_DoKID_EDM = pd.merge(
                    left=T_A11,
                    right=df_docs_details,
                    left_on='M3DOID',
                    right_on='docnumber',
                    how='inner'
                )
                
                # Sort the final results
                A12_DoKID_EDM = A12_DoKID_EDM.sort_values(by=['PlannerCode', 'Product', 'M3DOID'])
                
                print("\n--- Final Enriched Data --- ")
                display(A12_DoKID_EDM.head())

            except Exception as e:
                print(f"❌ Error executing EDW query: {e}")
            finally:
                edw_connection.close()
                print("\n🔌 EDW Connection closed.")
else:
    print("⚠️ T_A11 DataFrame is empty, skipping final join.")
    
print(f"The number of rows is: {len(A12_DoKID_EDM)}")

--- Joining T_A11 with ADMEDP_EDM_DOCS from EDW database ---
✅ Connection successful to 'edw' (EDMEWAREAD).
Fetching document details from EDW: chunk 1/3


  df_chunk = pd.read_sql_query(docs_sql_query, edw_connection)


Fetching document details from EDW: chunk 2/3
Fetching document details from EDW: chunk 3/3
✅ Query successful! Found 2735 matching documents in total.

--- Final Enriched Data --- 


Unnamed: 0,PlannerCode,Product,M3DOID,UsageCnt,docnumber,info,info1
0,MP-5310,FA-Optp Generally,57-A2B5,2,57-A2B5,Laser marking instruction for potentiometer plug.,
1,MP-5310,FA-Optp Generally,57-A5A1C,12,57-A5A1C,GLV30-8 Filling,
2,MP-5310,FA-Optp Generally,65-1840A,6,65-1840A,CURING FIXTURE (M30)\r\nSIMILAR DETAIL REFER T...,F&T DESIGN
3,MP-5310,FA-Optp Generally,65-6695,2,65-6695,F12 Laser Marking Fixture,
4,MP-5310,FA-Optp Generally,T04-BDZ8AEN,2,T04-BDZ8AEN,Potentiometer plug with imprint (with imprinte...,Amendment by IMC.



🔌 EDW Connection closed.
The number of rows is: 2758


### Step 10: Get all files inside each document ID in table `A13_Doc_type`

This step enriches the data further by connecting to the `EDMEWAREAD` database again to get file-specific details from the `ADMEDP_EDM_FILES` table.

1.  We extract the unique document IDs (`M3DOID`) from `A12_DoKID_EDM`.
2.  We query the `ADMEDP_EDM_FILES` table for matching documents.
3.  We join the results and then create two new columns: `FileExt` by extracting the file extension from the filename

In [36]:
A13_Doc_type = pd.DataFrame() # Initialize as empty
if not A12_DoKID_EDM.empty:
    print("--- Joining with ADMEDP_EDM_FILES from EDW database ---")
    
    # 1. Extract the unique document IDs from A12_DoKID_EDM
    doc_ids = A12_DoKID_EDM['M3DOID'].unique().tolist()
    
    if doc_ids:
        # 2. Connect to the 'edw' database
        edw_connection = get_db_connection('edw')
        
        if edw_connection:
            df_files_details_list = []
            #Break the list of IDs into chunks of 999
            chunk_size = 999
            id_chunks = [doc_ids[i:i + chunk_size] for i in range(0, len(doc_ids), chunk_size)]
            
            try:
                for i, chunk in enumerate(id_chunks):
                    print(f"Fetching file details from EDW: chunk {i+1}/{len(id_chunks)}")
                    formatted_doc_ids = ", ".join([f"'{doc_id}'" for doc_id in chunk])
                    
                    # Construct the query for the EDW database
                    files_sql_query = f"""
                    SELECT 
                        DOCNUMBER, 
                        FILENAME, 
                        FILEUSER
                    FROM 
                        ADMEDP.EDM_FILES
                    WHERE 
                        DOCNUMBER IN ({formatted_doc_ids})
                    """
                    df_chunk = pd.read_sql_query(files_sql_query, edw_connection)
                    df_files_details_list.append(df_chunk)

                # Combine all chunks into one DataFrame
                df_files_details = pd.concat(df_files_details_list, ignore_index=True)
                df_files_details.columns = df_files_details.columns.str.lower()
                print(f"✅ Query successful! Found {len(df_files_details)} matching files.")
                
                # Join the A12 data with the file details
                merged_df = pd.merge(
                    left=A12_DoKID_EDM,
                    right=df_files_details,
                    left_on='M3DOID',
                    right_on='docnumber',
                    how='inner'
                )
                
                # 3. Create the new columns
                # Replicate: Mid([FILENAME],InStr(1,Trim([FILENAME]),".")+1,3)
                merged_df['FileExt'] = merged_df['filename'].str.strip().str.split('.').str[-1].str.slice(0, 3)            
                
                # MODIFIED: Select and reorder final columns, adding 'Product'
                A13_Doc_type = merged_df[[
                    'PlannerCode', 'Product', 'M3DOID', 'UsageCnt', 'info', 'info1', 
                    'FileExt', 'fileuser'
                ]]
                
                print("\n--- Final A13_Doc_type DataFrame Created ---")
                display(A13_Doc_type.head())

            except Exception as e:
                print(f"❌ Error executing EDW query: {e}")
            finally:
                edw_connection.close()
                print("\n🔌 EDW Connection closed.")
else:
    print("⚠️ A12_DoKID_EDM DataFrame is empty, skipping final join.")
    
print(f"The number of rows is: {len(A13_Doc_type)}")

--- Joining with ADMEDP_EDM_FILES from EDW database ---
✅ Connection successful to 'edw' (EDMEWAREAD).
Fetching file details from EDW: chunk 1/3


  df_chunk = pd.read_sql_query(files_sql_query, edw_connection)


Fetching file details from EDW: chunk 2/3
Fetching file details from EDW: chunk 3/3
✅ Query successful! Found 5626 matching files.

--- Final A13_Doc_type DataFrame Created ---


Unnamed: 0,PlannerCode,Product,M3DOID,UsageCnt,info,info1,FileExt,fileuser
0,MP-5310,FA-Optp Generally,57-A2B5,2,Laser marking instruction for potentiometer plug.,,pdf,6024.0
1,MP-5310,FA-Optp Generally,57-A2B5,2,Laser marking instruction for potentiometer plug.,,doc,6024.0
2,MP-5310,FA-Optp Generally,57-A5A1C,12,GLV30-8 Filling,,pdf,5915.0
3,MP-5310,FA-Optp Generally,57-A5A1C,12,GLV30-8 Filling,,doc,5915.0
4,MP-5310,FA-Optp Generally,65-1840A,6,CURING FIXTURE (M30)\r\nSIMILAR DETAIL REFER T...,F&T DESIGN,doc,5058.0



🔌 EDW Connection closed.
The number of rows is: 5684


### Step 11: Pivot Data to Analyze File Types in table `A14_Doc_type`
This step replicates a crosstab query (similar to a PIVOT in SQL) to transform the `A13_Doc_type` DataFrame. We will use the powerful pivot_table function from pandas to achieve this.

The operation reorganizes the data so that each unique file extension (FileExt) becomes its own column. The values in these new columns will represent the count of each file type for a given document, which is perfect for quickly analyzing the composition of files within each document ID.

In [37]:
A14_Doc_type = pd.DataFrame() # Initialize the final DataFrame

if not A13_Doc_type.empty:
    print("--- Grouping by M3DOID and counting file types ---")
    
    try:
        # Step 1: Create the indicator columns for each file type (like the previous step)
        df_with_dummies = pd.get_dummies(A13_Doc_type, columns=['FileExt'], prefix='', prefix_sep='')

        # Get the names of the newly created file extension columns
        original_cols = A13_Doc_type.columns.tolist()
        dummy_cols = [col for col in df_with_dummies.columns if col not in original_cols]

        # MODIFIED: Added 'Product' to the aggregation operations
        agg_operations = {
            'PlannerCode': 'first',
            'Product': 'first',
            'UsageCnt': 'first',
            'info': 'first',
            'info1': 'first',
            'fileuser': 'first',
        }
        for col in dummy_cols:
            agg_operations[col] = 'sum'
            
        # Step 3: Group by M3DOID and apply the aggregations
        A14_Doc_type = df_with_dummies.groupby('M3DOID').agg(agg_operations).reset_index()

        print("✅ Aggregation successful. A14_Doc_type created.")
        display(A14_Doc_type.head())

    except Exception as e:
        print(f"❌ Error during aggregation: {e}")
else:
    print("⚠️ A13_Doc_type is empty, skipping the aggregation.")

print(f"The number of rows is: {len(A14_Doc_type)}")

--- Grouping by M3DOID and counting file types ---
✅ Aggregation successful. A14_Doc_type created.


Unnamed: 0,M3DOID,PlannerCode,Product,UsageCnt,info,info1,fileuser,1,2,3,...,ste,stl,stp,txt,vlf,vlm,vsd,xls,xml,zip
0,01-3240C,MP-5310,Ultraschall,1,UJ...-30GM-E22,"""Knittel*Rehbein*15.07.93""",964.0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,01-4767E,MP-5310,Ultraschall,8,Shematic for 05-3263... UC50...,,76.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,01-4785E,MP-5310,Ultraschall,1,Shematic for 05-3263... UC50...,,76.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,01-4786C,MP-5310,Ultraschall,2,Shematic for 05-3263... ...,,76.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,01-4927F,MP-5310,Ultraschall,1,Schematic to 05-3358... (UB300-F54-I),,76.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The number of rows is: 2726


### Step 12: Create a Clean Copy as `A15_EDM_Doc_ext_type`
The provided SQL query simply selects all columns and rows from the `A14_Doc-type` table. The equivalent action in pandas is to create a copy of the `A14_Doc_type` DataFrame.

We'll name this new DataFrame `A15_EDM_Doc_ext_type` to save the results from the previous aggregation and prepare for the next stage of the analysis. It's important to use the .copy() method to ensure the new DataFrame is independent of the original.

In [38]:
A15_EDM_Doc_ext_type = pd.DataFrame() # Initialize the DataFrame

if not A14_Doc_type.empty:
    print("--- Creating a copy of A14_Doc_type ---")
    
    # The SQL query is effectively a SELECT *, so we just copy the DataFrame.
    A15_EDM_Doc_ext_type = A14_Doc_type.copy()
    
    print("✅ DataFrame copied successfully to A15_EDM_Doc_ext_type.")
    display(A15_EDM_Doc_ext_type.head())
else:
    print("⚠️ A14_Doc_type is empty, so an empty A15_EDM_Doc_ext_type was created.")

print(f"The number of rows is: {len(A15_EDM_Doc_ext_type)}")

--- Creating a copy of A14_Doc_type ---
✅ DataFrame copied successfully to A15_EDM_Doc_ext_type.


Unnamed: 0,M3DOID,PlannerCode,Product,UsageCnt,info,info1,fileuser,1,2,3,...,ste,stl,stp,txt,vlf,vlm,vsd,xls,xml,zip
0,01-3240C,MP-5310,Ultraschall,1,UJ...-30GM-E22,"""Knittel*Rehbein*15.07.93""",964.0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,01-4767E,MP-5310,Ultraschall,8,Shematic for 05-3263... UC50...,,76.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,01-4785E,MP-5310,Ultraschall,1,Shematic for 05-3263... UC50...,,76.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,01-4786C,MP-5310,Ultraschall,2,Shematic for 05-3263... ...,,76.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,01-4927F,MP-5310,Ultraschall,1,Schematic to 05-3358... (UB300-F54-I),,76.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The number of rows is: 2726


### Step 13: Join with User Data and Filter for CAB Files
This step enriches our data by joining it with a table of user information (`T_FileUser`). We'll then filter the results to isolate only the documents that contain `.cab` files (a common file type for labels or packages) and sort the final output by the user ID.

First, we need to create the `T_FileUser` DataFrame from the data you provided. Then, we'll perform the join and filter operations.

In [39]:
import pandas as pd

# Initialize the final DataFrame
A16_EDM_Doc_CAB_Labels = pd.DataFrame()

# --- Create the T_FileUser DataFrame from the imported config variable ---
T_FileUser = pd.DataFrame(FILE_USER_DATA)

if not A15_EDM_Doc_ext_type.empty:
    print("--- Joining with T_FileUser and filtering for 'cab' files ---")
    
    # Check if the 'cab' column exists before trying to filter
    if 'cab' in A15_EDM_Doc_ext_type.columns:
        
        # Create a working copy
        temp_df = A15_EDM_Doc_ext_type.copy()
        
        # --- 1. LEFT JOIN ---
        # Ensure the join keys have the same data type.
        temp_df['fileuser'] = temp_df['fileuser'].astype('Int64') # Handles potential missing values
        T_FileUser['fileuser'] = T_FileUser['fileuser'].astype('Int64')
        
        # Perform the left merge, equivalent to a LEFT JOIN
        merged_df = pd.merge(
            left=temp_df,
            right=T_FileUser,
            on='fileuser',
            how='left'
        )
        
        # --- 2. WHERE clause ---
        # Filter the rows where the 'cab' column has a value of 1
        filtered_df = merged_df[merged_df['cab'] >= 1].copy()
        
        # --- 3. ORDER BY clause ---
        # Sort the results by the FILEUSER column
        sorted_df = filtered_df.sort_values(by='fileuser')
        
        # MODIFIED: Added 'Product' to the final column list
        final_columns = [
            'PlannerCode', 'Product', 'M3DOID', 'UsageCnt', 'info', 'info1', 'fileuser', 'Name',
            'cab', 'cas', 'cdr', 'eti', 'fmt', 'jpg', 'pdf', 'png', 'zip', 'zzz'
        ]
        
        # Filter the list to only include columns that actually exist in the DataFrame
        existing_columns = [col for col in final_columns if col in sorted_df.columns]
        
        A16_EDM_Doc_CAB_Labels = sorted_df[existing_columns]
        A16_EDM_Doc_CAB_Labels = A16_EDM_Doc_CAB_Labels.sort_values([ 'PlannerCode', 'Product', 'M3DOID'])

        print("✅ Join and filter operation successful.")
        display(A16_EDM_Doc_CAB_Labels.head())

    else:
        print("⚠️ Column 'cab' not found in A15_EDM_Doc_ext_type. No filtering performed.")
        print("The resulting DataFrame will be empty.")

else:
    print("⚠️ A15_EDM_Doc_ext_type is empty, skipping operation.")

print(f"The number of rows is: {len(A16_EDM_Doc_CAB_Labels)}")

--- Joining with T_FileUser and filtering for 'cab' files ---
✅ Join and filter operation successful.


Unnamed: 0,PlannerCode,Product,M3DOID,UsageCnt,info,info1,fileuser,Name,cab,cas,cdr,eti,fmt,jpg,pdf,png,zip
730,MP-5310,Radar,10-D7X1A,2,"Standard packaging label FA with SN+DMC, with ...",new packaging label,7756,,2,2,2,2,2,0,2,2,0
741,MP-5310,Radar,10-DB47,2,"Standard packaging label with SN+DMC, with UL-...",new packaging label,7756,,1,1,1,1,1,0,1,1,0
500,MP-5310,Ultraschall,10-BLG1A,1,UB400-12GM-E5-V1-SOP #191074 with UKCA\n(...,new packaging label,155,,1,1,1,1,1,0,1,1,0
501,MP-5310,Ultraschall,10-BLG2A,1,UB400-12GM-I-V1-SOP #191075 with UKCA\n(UT...,new packaging label,155,,1,1,1,1,1,0,1,1,0
502,MP-5310,Ultraschall,10-BLG3A,1,UB300-18GM40-E5-V1-SOP #220358\n(UT 18-270-P...,new packaging label,155,,1,1,1,1,1,0,1,1,0


The number of rows is: 69


### Step 14: Filter for '916' Documents and Generate URLs
This step filters the dataset to find all documents where the ID (`M3DOID`) starts with "916". Then, it constructs a unique URL for each document by combining a base address with the document's ID. Finally, it selects a specific set of columns and sorts the result.

In [40]:
# Initialize the final DataFrame
A17_916_Files = pd.DataFrame()

if not A15_EDM_Doc_ext_type.empty:
    print("--- Filtering for M3DOID starting with '916' and creating URLs ---")
    
    try:
        # Create a working copy
        temp_df = A15_EDM_Doc_ext_type.copy()

        # --- 1. WHERE clause ---
        # Filter rows where M3DOID starts with "916". We use na=False to handle potential non-string values.
        filtered_df = temp_df[temp_df['M3DOID'].astype(str).str.startswith('916', na=False)].copy()

        # --- 2. Create the new URL column ---
        if not filtered_df.empty:
            base_url = "https://pfde-docs-prd.eu.p-f.biz/service-edocs/DocumentRestService.svc/document/"
            
            # Replicate the string manipulation from the SQL query using .str methods
            # Here, we remove the hyphen from the M3DOID for the last part of the URL
            doc_id_no_hyphen = filtered_df['M3DOID'].str.replace('-', '', n=1)
            
            # Use f-string formatting to build the full URL
            filtered_df['URL'] = (
                f"{base_url}" + 
                filtered_df['M3DOID'].str.strip() + 
                "/txt/" + 
                doc_id_no_hyphen
            )

            # --- 3. ORDER BY clause ---
            # Sort the DataFrame by PlannerCode
            sorted_df = filtered_df.sort_values(by='PlannerCode')
            
            # MODIFIED: Added 'Product' to the final column list
            final_columns = [
                'PlannerCode', 'Product', 'M3DOID', 'UsageCnt', 'info', 
                'info1', 'fileuser', 'pcx', 'txt', 'URL'
            ]
            
            # Ensure we only select columns that actually exist in the DataFrame to prevent errors
            existing_columns = [col for col in final_columns if col in sorted_df.columns]
            
            A17_916_Files = sorted_df[existing_columns]
            A17_916_Files = A17_916_Files.sort_values([ 'PlannerCode', 'Product', 'M3DOID'])
            
            print("✅ Filter, URL generation, and sort completed successfully.")
            display(A17_916_Files.head())
        else:
            print("ℹ️ No documents found with an M3DOID starting with '916'.")

    except Exception as e:
        print(f"❌ An error occurred: {e}")

else:
    print("⚠️ A15_EDM_Doc_ext_type is empty, skipping operation.")

print(f"The number of rows is: {len(A17_916_Files)}")

--- Filtering for M3DOID starting with '916' and creating URLs ---
✅ Filter, URL generation, and sort completed successfully.


Unnamed: 0,PlannerCode,Product,M3DOID,UsageCnt,info,info1,fileuser,pcx,txt,URL
2125,MP-5310,Radar,916-A013A,3,Universal Marriage for Ultrasonic Sensors: 14d...,Test Planning,5454.0,6,3,https://pfde-docs-prd.eu.p-f.biz/service-edocs...
2120,MP-5310,Ultraschall,916-0088,2,Universal marriage for serial number to MO num...,,335.0,1,1,https://pfde-docs-prd.eu.p-f.biz/service-edocs...
2121,MP-5310,Ultraschall,916-0111,2,Verheiratung von Wandler (10digit) mit der Ser...,,376.0,2,1,https://pfde-docs-prd.eu.p-f.biz/service-edocs...
2122,MP-5310,Ultraschall,916-0484C,1,Marriage 10-digit code on transducer with 10-d...,,1100.0,2,1,https://pfde-docs-prd.eu.p-f.biz/service-edocs...
2123,MP-5310,Ultraschall,916-0495,1,"US 18 GS - Marriage Housing (14digit, no meas ...",,334.0,2,1,https://pfde-docs-prd.eu.p-f.biz/service-edocs...


The number of rows is: 16


### Step 15: Select and Sort Key Document Data
This step creates a new, focused DataFrame named A18_EDM_Doc_KeyValue_File. The goal is to select a specific subset of columns from the A15_EDM_Doc_ext_type table and then sort the results based on the M3DOID to ensure a consistent order. This is a common step for cleaning up a table and preparing it for final review or export.

In [41]:
# Initialize the final DataFrame
A18_EDM_Doc_KeyValue_File = pd.DataFrame()

if not A15_EDM_Doc_ext_type.empty:
    print("--- Selecting specific columns and sorting by M3DOID ---")
    
    try:
        # MODIFIED: Added 'Product' to the final column list
        final_columns = [
            'PlannerCode', 'Product', 'M3DOID', 'UsageCnt', 'info', 'info1', 'fileuser',
            'bmp', 'cab', 'cas', 'doc', 'dwg', 'eti', 'fmt', 'jpg', 'pcx',
            'pdf', 'stp', 'txt', 'xls', 'zip', 'zzz'
        ]
        
        # We create a final list of columns that actually exist in the A15 DataFrame.
        # This prevents errors if some file extension columns (e.g., 'bmp') were never created.
        existing_columns = [col for col in final_columns if col in A15_EDM_Doc_ext_type.columns]
        
        # Create a new DataFrame with only the selected columns.
        selected_df = A15_EDM_Doc_ext_type[existing_columns]
        
        # --- 2. ORDER BY clause ---
        # Sort the new DataFrame by the M3DOID column.
        A18_EDM_Doc_KeyValue_File = selected_df.sort_values(by=['PlannerCode', 'Product', 'M3DOID']).reset_index(drop=True)
        
        print("✅ Column selection and sorting completed successfully.")
        display(A18_EDM_Doc_KeyValue_File.head())

    except Exception as e:
        print(f"❌ An error occurred: {e}")

else:
    print("⚠️ A15_EDM_Doc_ext_type is empty, skipping operation.")

print(f"The number of rows is: {len(A18_EDM_Doc_KeyValue_File)}")

--- Selecting specific columns and sorting by M3DOID ---
✅ Column selection and sorting completed successfully.


Unnamed: 0,PlannerCode,Product,M3DOID,UsageCnt,info,info1,fileuser,cab,cas,doc,dwg,eti,fmt,jpg,pcx,pdf,stp,txt,xls,zip
0,MP-5310,FA-Optp Generally,57-A2B5,2,Laser marking instruction for potentiometer plug.,,6024.0,0,0,1,0,0,0,0,0,1,0,0,0,0
1,MP-5310,FA-Optp Generally,57-A5A1C,12,GLV30-8 Filling,,5915.0,0,0,1,0,0,0,0,0,1,0,0,0,0
2,MP-5310,FA-Optp Generally,65-1840A,6,CURING FIXTURE (M30)\r\nSIMILAR DETAIL REFER T...,F&T DESIGN,5058.0,0,0,1,0,0,0,0,0,1,0,0,0,0
3,MP-5310,FA-Optp Generally,65-6695,2,F12 Laser Marking Fixture,,5819.0,0,0,1,0,0,0,0,0,1,0,0,0,0
4,MP-5310,FA-Optp Generally,T04-BDZ8AEN,2,Potentiometer plug with imprint (with imprinte...,Amendment by IMC.,151.0,0,0,0,0,0,0,0,0,1,0,0,0,0


The number of rows is: 2726


### Step 16: Link Operations to Work Center Descriptions to make AprisoWorkCenter AWC
This step identifies specific manufacturing operations by filtering `T_A10` for rows where `POAURP` is 2. It then joins this data with the `MVXCDTA_MPDWCT` database table to pull in the work center description (`PPPLGD`). This allows us to see the human-readable name for the work centers associated with these specific operations.

In [1]:
# Initialize the final DataFrame
A20_AprisoWorkCenter = pd.DataFrame()

if not T_A10.empty:
    print("--- Joining T_A10 with MPDWCT to get Work Center details ---")
    
    # --- 1. WHERE clause: Filter T_A10 first for efficiency ---
    t10_filtered = T_A10[T_A10['poaurp'] == 2].copy()
    
    if not t10_filtered.empty:
        # --- 2. Get the necessary data from the database ---
        # Get the unique work center codes (poplgr) to query the database efficiently
        work_center_codes = t10_filtered['poplgr'].unique().tolist()
        formatted_codes = ", ".join([f"'{code}'" for code in work_center_codes])

        # Construct the SQL query
        db_query = f"""
        SELECT PPPLGR, PPPLGD, PPCONO, PPFACI
        FROM MVXCDTA.MPDWCT
        WHERE PPPLGR IN ({formatted_codes})
        """
        
        df_mpdwct = pd.DataFrame()
        connection = get_db_connection('v12live') # Assuming 'v12live' is the correct DSN
        if connection:
            try:
                print(f"Fetching details for {len(work_center_codes)} work centers from the database...")
                df_mpdwct = pd.read_sql_query(db_query, connection)
                
                # Standardize column names to lowercase for consistent merging
                df_mpdwct.columns = df_mpdwct.columns.str.lower()
                print(f"✅ Found {len(df_mpdwct)} matching work centers.")
                
            except Exception as e:
                print(f"❌ Error executing database query: {e}")
            finally:
                connection.close()
                print("🔌 Connection closed.")
        
        # --- 3. INNER JOIN ---
        if not df_mpdwct.empty:
            A20_AprisoWorkCenter = pd.merge(
                left=t10_filtered,
                right=df_mpdwct,
                left_on=['pocono', 'pofaci', 'poplgr'],
                right_on=['ppcono', 'ppfaci', 'ppplgr'],
                how='inner'
            )
            
            # MODIFIED: Added 'Product' to final columns
            A20_AprisoWorkCenter = A20_AprisoWorkCenter.rename(columns={'mbresp': 'PlannerCode'})
            final_columns = ['PlannerCode', 'ppplgd', 'poopds', 'poaurp']
            A20_AprisoWorkCenter = A20_AprisoWorkCenter[final_columns]
            
            print("\n✅ Join successful. Final DataFrame created.")
            display(A20_AprisoWorkCenter.head())
        else:
            print("\n⚠️ Could not fetch work center data from the database. Final DataFrame is empty.")
    else:
        print("ℹ️ No rows found in T_A10 with POAURP = 2. Nothing to process.")

else:
    print("⚠️ T_A10 DataFrame is empty, skipping operation.")

print(f"The number of rows is: {len(A20_AprisoWorkCenter)}")

NameError: name 'pd' is not defined

### Step 17: Find Unique Apriso Work Center
This step creates a summary table named `A21_AprisoWorkCenter`. It distills the previous table down to only the unique combinations of planner codes, work center descriptions, and operation descriptions, and then sorts the result.

In [None]:
# Initialize the final DataFrame
A21_AprisoWorkCenter = pd.DataFrame()

if not A20_AprisoWorkCenter.empty:
    print("--- Finding unique combinations of PlannerCode, Work Center, and Operation ---")
    
    try:
        # MODIFIED: Added 'Product' to the columns to check for uniqueness
        columns_to_check = ['PlannerCode', 'ppplgd', 'poopds']
        
        # 1. Select the columns, drop duplicate rows, and sort the result
        A21_AprisoWorkCenter = (
            A20_AprisoWorkCenter[columns_to_check]
            .drop_duplicates()
            .sort_values(by=columns_to_check)
            .reset_index(drop=True)
        )
        A21_AprisoWorkCenter = A21_AprisoWorkCenter.sort_values([ 'PlannerCode', 'Product', 'ppplgd'])
        print("✅ Found unique combinations successfully.")
        display(A21_AprisoWorkCenter.head())

    except Exception as e:
        print(f"❌ An error occurred: {e}")

else:
    print("⚠️ A20_AprisoWorkCenter is empty, skipping operation.")

print(f"The number of rows is: {len(A21_AprisoWorkCenter)}")

--- Finding unique combinations of PlannerCode, Work Center, and Operation ---
✅ Found unique combinations successfully.


Unnamed: 0,PlannerCode,Product,ppplgd,poopds
0,MP-5310,FA-Optp Generally,0000R01D,DISPATCH
1,MP-5310,FA-Optp Generally,5345R01,LASER MARKING
2,MP-5310,FA-Optp Generally,5355R01,FINAL FILLING
3,MP-5310,FA-Optp Generally,5355R01,PRE-FILLING
4,MP-5310,Radar,0000R01D,DISPATCH


The number of rows is: 96


### Step 18: Extract '916' Merging Operations
This step creates a new DataFrame, `A20_Merging_XStep`, by filtering the main operations table `T_A10`. It isolates all rows where the operation is linked to a document ID (`PODOID`) that starts with "916", effectively creating a list of all "MERGING" steps for our analysis.

In [44]:
# Initialize the final DataFrame
A30_Merging_XStep = pd.DataFrame()

if not T_A10.empty:
    print("--- Filtering T_A10 for operations linked to '916' documents ---")
    
    try:
        # --- 1. WHERE clause ---
        # Filter rows where the 'podoid' column starts with "916"
        mask = T_A10['podoid'].astype(str).str.startswith('916', na=False)
        filtered_df = T_A10[mask]
        
        # MODIFIED: Added 'Product' to columns to rename
        columns_to_rename = {
            'mbresp': 'PlannerCode',
            'Product': 'Product',
            'pocono': 'POCONO',
            'pofaci': 'POFACI',
            'poprno': 'POPRNO',
            'poopno': 'POOPNO',
            'poplgr': 'POPLGR',
            'poopds': 'POOPDS',
            'podoid': 'PODOID'
        }
        
        A30_Merging_XStep = filtered_df.rename(columns=columns_to_rename)
        
        # Ensure we only keep the columns that were in the rename map
        final_columns = list(columns_to_rename.values())
        A30_Merging_XStep = A30_Merging_XStep[final_columns]
        
        print("✅ Filtering successful. A20_Merging_XStep DataFrame created.")
        display(A30_Merging_XStep.head())

    except Exception as e:
        print(f"❌ An error occurred: {e}")

else:
    print("⚠️ T_A10 DataFrame is empty, skipping operation.")

print(f"The number of rows is: {len(A30_Merging_XStep)}")

--- Filtering T_A10 for operations linked to '916' documents ---
✅ Filtering successful. A20_Merging_XStep DataFrame created.


Unnamed: 0,PlannerCode,Product,POCONO,POFACI,POPRNO,POOPNO,POPLGR,POOPDS,PODOID
22,MP-5310,Ultraschall,1.0,MF1,104092,225.0,5310,MARRIAGE,916-A013A
75,MP-5310,Ultraschall,1.0,MF1,104093,220.0,5310,MARRIAGE,916-A013A
125,MP-5310,Ultraschall,1.0,MF1,104094,220.0,5310,MARRIAGE,916-A013A
175,MP-5310,Ultraschall,1.0,MF1,104095,220.0,5310,MARRIAGE,916-A013A
226,MP-5310,Ultraschall,1.0,MF1,133053,225.0,5310,MARRIAGE,916-A013A


The number of rows is: 327


### Step 19: Count Merging Steps per Part Number
This step creates the `A31_Merging_XStep` DataFrame. It aggregates the results from the previous step by grouping by `PlannerCode`, `POCONO`, `POFACI`, and `POPRNO`, and then counts the number of merging steps (`PODOID`) within each group.

In [45]:
# Initialize the final DataFrame
A31_Merging_XStep = pd.DataFrame()

if not A30_Merging_XStep.empty:
    print("--- Grouping and counting merging steps per part number ---")
    
    try:
        # MODIFIED: Added 'Product' to the grouping columns
        grouping_cols = ['PlannerCode', 'Product', 'POCONO', 'POFACI', 'POPRNO']
        
        # Group by the specified columns, count the 'PODOID' for each group,
        # and then reset the index to turn the grouped columns back into regular columns.
        A31_Merging_XStep = (
            A30_Merging_XStep.groupby(grouping_cols)['PODOID']
            .count()
            .reset_index()
            .rename(columns={'PODOID': 'Cnt_MergingStep'})
        )
        A31_Merging_XStep = A31_Merging_XStep.sort_values([ 'PlannerCode', 'Product', 'POPRNO'])
        print("✅ Grouping and counting successful.")
        display(A31_Merging_XStep.head())
        print(f"The number of rows is: {len(A31_Merging_XStep)}")
        A31_Merging_XStep_More_Than_One = A31_Merging_XStep[A31_Merging_XStep['Cnt_MergingStep'] > 1]
        display(A31_Merging_XStep_More_Than_One.head())
        print(f"The number of rows is: {len(A31_Merging_XStep_More_Than_One)}")

    except Exception as e:
        print(f"❌ An error occurred: {e}")

else:
    print("⚠️ A20_Merging_XStep is empty, skipping operation.")

--- Grouping and counting merging steps per part number ---
✅ Grouping and counting successful.


Unnamed: 0,PlannerCode,Product,POCONO,POFACI,POPRNO,Cnt_MergingStep
0,MP-5310,Radar,1.0,MF1,70134318,2
1,MP-5310,Radar,1.0,MF1,70185537,1
2,MP-5310,Ultraschall,1.0,MF1,48481,1
3,MP-5310,Ultraschall,1.0,MF1,93950,1
4,MP-5310,Ultraschall,1.0,MF1,97966,1


The number of rows is: 279


Unnamed: 0,PlannerCode,Product,POCONO,POFACI,POPRNO,Cnt_MergingStep
0,MP-5310,Radar,1.0,MF1,70134318,2
19,MP-5310,Ultraschall,1.0,MF1,104715,2
20,MP-5310,Ultraschall,1.0,MF1,105512,2
22,MP-5310,Ultraschall,1.0,MF1,108158,2
23,MP-5310,Ultraschall,1.0,MF1,108159,2


The number of rows is: 44


### Step 20: Fetch Bill of Materials (BOM) for All Products
This step creates the `B10_BOM` DataFrame. It retrieves the component list (Bill of Materials) for every parent product in our initial `T_A00` list by querying the `MPDMAT` (BOM) and `MITMAS` (Item Master) tables from the database.

In [46]:
# Initialize the final DataFrame
B10_BOM = pd.DataFrame()

if not T_A00.empty:
    print("--- Fetching Bill of Materials data from the database ---")
    
    # --- 1. Get the list of parent products from T_A00 ---
    parent_products = T_A00['poprno'].unique().tolist()
    formatted_parents = ", ".join([f"'{p}'" for p in parent_products])
    
    # --- 2. Construct a targeted SQL query ---
    bom_sql_query = f"""
    SELECT
        T1.PMCONO, T1.PMFACI, T1.PMPRNO, T1.PMSTRT,
        T1.PMMSEQ, T1.PMFDAT, T1.PMTDAT, T1.PMMTNO, T1.PMCNQT,
        T2.MMITDS, T2.MMACRF, T2.MMINDI
    FROM
        MVXCDTA.MPDMAT AS T1
    INNER JOIN
        MVXCDTA.MITMAS AS T2 ON T1.PMMTNO = T2.MMITNO AND T1.PMCONO = T2.MMCONO
    WHERE
        T1.PMPRNO IN ({formatted_parents})
        AND T1.PMSTRT = 'STD'
        AND T1.PMTDAT = 99999999
    """
    
    df_bom_data = pd.DataFrame()
    connection = get_db_connection('v12live') # Re-establish connection
    if connection:
        try:
            print(f"Fetching BOM for {len(parent_products)} parent products...")
            df_bom_data = pd.read_sql_query(bom_sql_query, connection)
            df_bom_data.columns = df_bom_data.columns.str.lower()
            print(f"✅ Query successful! Found {len(df_bom_data)} BOM components.")
        except Exception as e:
            print(f"❌ Error executing database query: {e}")
        finally:
            connection.close()
            print("🔌 Connection closed.")
            
    # --- 3. Join database results with T_A00 to add PlannerCode and PLC ---
    if not df_bom_data.empty:
        # MODIFIED: Added 'Product' to the subset of columns from T_A00
        t_a00_subset = T_A00[['poprno', 'mbresp', 'plc', 'Product']]
        
        merged_df = pd.merge(
            left=df_bom_data,
            right=t_a00_subset,
            left_on='pmprno',
            right_on='poprno',
            how='inner'
        )
        
        # MODIFIED: Added 'Product' to the final column map
        final_columns_map = {
            'mbresp': 'PlannerCode',
            'Product': 'Product',
            'plc': 'PLC',
            'pmcono': 'PMCONO',
            'pmfaci': 'PMFACI',
            'pmprno': 'PMPRNO',
            'pmstrt': 'PMSTRT',
            'pmmseq': 'PMMSEQ',
            'pmfdat': 'PMFDAT',
            'pmtdat': 'PMTDAT',
            'pmmtno': 'PMMTNO',
            'mmitds': 'PMITDS',
            'pmcnqt': 'PMCNQT',
            'mmacrf': 'MMACRF',
            'mmindi': 'MMINDI'
        }
        
        B10_BOM = merged_df.rename(columns=final_columns_map)
        B10_BOM = B10_BOM.sort_values(['PlannerCode', 'Product', 'PMPRNO', 'PMMSEQ'])
        
        # Ensure correct column order and selection
        B10_BOM = B10_BOM[list(final_columns_map.values())]
        
        print("\n✅ Final B10_BOM DataFrame created successfully.")
        display(B10_BOM.head())
    else:
        print("\n⚠️ No BOM data was fetched from the database.")
else:
    print("⚠️ T_A00 is empty, cannot fetch BOM data.")

print(f"The number of rows is: {len(B10_BOM)}")

--- Fetching Bill of Materials data from the database ---
✅ Connection successful to 'v12live' (v12Live).
Fetching BOM for 555 parent products...


  df_bom_data = pd.read_sql_query(bom_sql_query, connection)


✅ Query successful! Found 10801 BOM components.
🔌 Connection closed.

✅ Final B10_BOM DataFrame created successfully.


Unnamed: 0,PlannerCode,Product,PLC,PMCONO,PMFACI,PMPRNO,PMSTRT,PMMSEQ,PMFDAT,PMTDAT,PMMTNO,PMITDS,PMCNQT,MMACRF,MMINDI
5205,MP-5310,FA-Optp Generally,310,1.0,MF1,181673,STD,10.0,0.0,99999999.0,189592,POTISTOPFEN O.A. DK12,1.0,.5P,0.0
8098,MP-5310,FA-Optp Generally,310,1.0,MF1,818459,STD,10.0,0.0,99999999.0,818239,Halbzg GLV30-8-2000,1.0,.4S,3.0
4221,MP-5310,FA-Optp Generally,310,1.0,MF1,818459,STD,50.0,0.0,99999999.0,413612,BLINDPLATTE RLK6 FRO,2.0,.5P,0.0
3174,MP-5310,FA-Optp Generally,310,1.0,MF1,818459,STD,100.0,0.0,99999999.0,109060,CHM RESN FERMADUR 180/18 PF,0.015,.0,0.0
2093,MP-5310,FA-Optp Generally,310,1.0,MF1,818459,STD,110.0,0.0,99999999.0,36495,CHM HARD FERMADUR B-174,0.015,.0,0.0


The number of rows is: 10801


### Step 21: Summarize Component Usage
This step creates the `B11_PMMTNO_group` DataFrame by aggregating the BOM data from the previous step. It groups by each unique component (`PMMTNO`) and its attributes, then calculates a UsageCnt to show how many different parent products use that specific component. The final result is sorted by the component's description.

In [47]:
# Initialize the final DataFrame
B11_PMMTNO_group = pd.DataFrame()

if not B10_BOM.empty:
    print("--- Grouping and counting component usage from BOM ---")
    
    try:
        # MODIFIED: Added 'Product' to grouping. This changes the aggregation to be per-component, per-product-group of parent.
        grouping_cols = [
            'PlannerCode', 
            'Product',
            'PMMTNO', 
            'PMITDS', 
            'MMACRF', 
            'MMINDI'
        ]
        
        # Group by the component details, count their usage, sort the result, and clean up the index
        B11_PMMTNO_group = (
            B10_BOM.groupby(grouping_cols)['PMCNQT']
            .count()
            .reset_index()
            .rename(columns={'PMCNQT': 'UsageCnt'})
            .sort_values(by=['PlannerCode', 'Product', 'PMMTNO'])
            .reset_index(drop=True)
        )
        
        print("✅ Component usage summary created successfully.")
        display(B11_PMMTNO_group.head())

    except Exception as e:
        print(f"❌ An error occurred: {e}")

else:
    print("⚠️ B10_BOM DataFrame is empty, skipping operation.")

print(f"The number of rows is: {len(B11_PMMTNO_group)}")

--- Grouping and counting component usage from BOM ---
✅ Component usage summary created successfully.


Unnamed: 0,PlannerCode,Product,PMMTNO,PMITDS,MMACRF,MMINDI,UsageCnt
0,MP-5310,FA-Optp Generally,36495,CHM HARD FERMADUR B-174,.0,0.0,6
1,MP-5310,FA-Optp Generally,109060,CHM RESN FERMADUR 180/18 PF,.0,0.0,6
2,MP-5310,FA-Optp Generally,189592,POTISTOPFEN O.A. DK12,.5P,0.0,1
3,MP-5310,FA-Optp Generally,413612,BLINDPLATTE RLK6 FRO,.5P,0.0,6
4,MP-5310,FA-Optp Generally,818239,Halbzg GLV30-8-2000,.4S,3.0,1


The number of rows is: 1253


### Step 22: Count Lot-Controlled Materials in BOM
This step creates the `B12_BOM_Lot_Ctr` DataFrame. Its purpose is to analyze the components from the previous summary and count how many of them are subject to lot control (where `MMINDI` is not 0). The result is a summary grouped by `PlannerCode`, `MMACRF` (the accounting control object), and the specific lot control method (`MMINDI`).

In [48]:
# Initialize the final DataFrame
B12_BOM_Lot_Ctr = pd.DataFrame()

if not B11_PMMTNO_group.empty:
    print("--- Counting lot-controlled components from the BOM summary ---")
    
    try:
        # --- 1. HAVING clause (applied as a WHERE filter first for efficiency) ---
        # Filter for rows where MMINDI is not 0
        filtered_df = B11_PMMTNO_group[B11_PMMTNO_group['MMINDI'] != 0].copy()

        # MODIFIED: Added 'Product' to the grouping columns
        grouping_cols = ['PlannerCode', 'Product', 'MMACRF', 'MMINDI']
        
        # --- 3. Chain all operations: Group, Count, Rename, and Sort ---
        B12_BOM_Lot_Ctr = (
            filtered_df.groupby(grouping_cols)['PMMTNO']
            .count()
            .reset_index()
            .rename(columns={'PMMTNO': 'CntLotCtrl'})
            .sort_values(by=['PlannerCode', 'Product', 'MMACRF', 'MMINDI'])
            .reset_index(drop=True)
        )
        
        print("✅ Summary of lot-controlled components created successfully.")
        display(B12_BOM_Lot_Ctr.head())

    except Exception as e:
        print(f"❌ An error occurred: {e}")

else:
    print("⚠️ B11_PMMTNO_group DataFrame is empty, skipping operation.")

print(f"The number of rows is: {len(B12_BOM_Lot_Ctr)}")

--- Counting lot-controlled components from the BOM summary ---
✅ Summary of lot-controlled components created successfully.


Unnamed: 0,PlannerCode,Product,MMACRF,MMINDI,CntLotCtrl
0,MP-5310,FA-Optp Generally,.4S,3.0,6
1,MP-5310,Radar,.2,3.0,1
2,MP-5310,Radar,.4B,1.0,1
3,MP-5310,Radar,.4B,3.0,2
4,MP-5310,Radar,.4S,3.0,11


The number of rows is: 25


### Step 23: Fetch Filtered Employee List
This step creates the E01_Employee DataFrame by querying the `MVXCDTA_CEAEMP` table from the database. The goal is to retrieve a specific list of employees who work in facility MF1 and belong to planner groups starting with "42", "43", or "45".

In [49]:
# Initialize the final DataFrame
E01_Employee = pd.DataFrame()

if PLANNING_AREA:
    employee_area_prefix = PLANNING_AREA[0][3:5]
else:
    employee_area_prefix = '' # Handle empty list case
    
# This SQL query will be executed directly against the database
employee_sql_query = f"""
SELECT 
    EACONO, EADIVI, EAEMNO, EAEMNM, EACANO, EAFACI, 
    EAPLGR, EADEPT, EAREAR, EAACEM
FROM 
    MVXCDTA.CEAEMP
WHERE 
    EACONO = 1
    AND EAFACI = '{FACILITY_CODE}'
     AND (EAPLGR LIKE '{employee_area_prefix}%')
    AND EAACEM = 1
"""

connection = get_db_connection('v12live') # Assuming 'v12live' is the correct DSN
if connection:
    try:
        print("--- Fetching employee data from the database ---")
        E01_Employee = pd.read_sql_query(employee_sql_query, connection)
        
        # Standardize column names to lowercase for consistency in pandas
        E01_Employee.columns = E01_Employee.columns.str.lower()
        
        print(f"✅ Query successful! Found {len(E01_Employee)} matching employees.")
        display(E01_Employee.head())
        
    except Exception as e:
        print(f"❌ Error executing database query: {e}")
    finally:
        connection.close()
        print("🔌 Connection closed.")
else:
    print("⚠️ Could not connect to the database. Cannot fetch employee data.")

print(f"The number of rows is: {len(E01_Employee)}")

✅ Connection successful to 'v12live' (v12Live).
--- Fetching employee data from the database ---


  E01_Employee = pd.read_sql_query(employee_sql_query, connection)


✅ Query successful! Found 87 matching employees.


Unnamed: 0,eacono,eadivi,eaemno,eaemnm,eacano,eafaci,eaplgr,eadept,earear,eaacem
0,1.0,100,MF10002781,Zalina Bte Ishak,2781.0,MF1,5310,A-US1,2-US,1.0
1,1.0,100,MF10002784,Soh Siew Eng,2784.0,MF1,5310,A-US1,2-US,1.0
2,1.0,100,MF10003654,Cheah Mary,3654.0,MF1,5310,A-US1,1-US,1.0
3,1.0,100,MF10003662,Seah Geok Moi,3662.0,MF1,5310,A-US1,1-US,1.0
4,1.0,100,MF10003670,Eng Di Hoon,3670.0,MF1,5310,A-US1,2-US,1.0


🔌 Connection closed.
The number of rows is: 87


### Step 24: Fetch Serial Number Request Data
This step creates the `S01_SerialNo_Request` DataFrame. It fetches all serial number requests made since the start of 2023 for the products in our `T_A00` list. This involves querying and joining two new tables, `PFXCDTA_SN5REQ` and `PFXCDTA_SN5RHD`, and then linking the results back to our initial product list.

In [50]:
# Initialize the final DataFrame
S01_SerialNo_Request = pd.DataFrame()

if not T_A00.empty:
    print("--- Fetching Serial Number Request data from the database ---")
    
    # 1. Get the list of products from T_A00 to use in our query
    products_in_scope = T_A00['poprno'].unique().tolist()
    formatted_products = ", ".join([f"'{p}'" for p in products_in_scope])

    # 2. Construct a targeted SQL query to get data from the two new tables
    sn_sql_query = f"""
    SELECT
        T1.SRRFUS, T1.SRRFID, T1.SRFACI, T1.SRWHLO, T1.SRRORC, T1.SRRONO,
        T1.SRITNO, T1.SRRGDT, T1.SRTPPF, T1.SRDOID, T1.SRTYP1, T1.SRTYP2,
        T1.SRTYP3, T1.SRTYP4, T1.SRNQTY, T1.SRREFT,
        T2.SAFRPF, T2.SATOPF, T2.SAFRT1, T2.SATOT1
    FROM
        PFXCDTA.SN5REQ AS T1
    INNER JOIN
        PFXCDTA.SN5RHD AS T2 ON T1.SRRFID = T2.SARFID
    WHERE
        T1.SRRGDT > 20230000
        AND T1.SRITNO IN ({formatted_products})
    """
    
    df_sn_data = pd.DataFrame()
    connection = get_db_connection('v12live') # Assuming 'v12live' DSN
    if connection:
        try:
            print(f"Fetching serial number requests for {len(products_in_scope)} products...")
            df_sn_data = pd.read_sql_query(sn_sql_query, connection)
            df_sn_data.columns = df_sn_data.columns.str.lower()
            print(f"✅ Query successful! Found {len(df_sn_data)} matching requests.")
        except Exception as e:
            print(f"❌ Error executing database query: {e}")
        finally:
            connection.close()
            print("🔌 Connection closed.")
            
    # 3. Join the new data with T_A00 to add PlannerCode and other details
    if not df_sn_data.empty:
        # MODIFIED: Added 'Product' to the subset
        t_a00_subset = T_A00[['poprno', 'mbresp', 'mmitds', 'Product']]
        
        merged_df = pd.merge(
            left=df_sn_data,
            right=t_a00_subset,
            left_on='sritno',
            right_on='poprno',
            how='inner'
        )
        
        # MODIFIED: Added 'Product' to the final column map
        final_columns_map = {
            'mbresp': 'PlannerCode', 'Product': 'Product', 'srrfus': 'SRRFUS', 'srrfid': 'SRRFID', 'srfaci': 'SRFACI',
            'srwhlo': 'SRWHLO', 'srrorc': 'SRRORC', 'srrono': 'SRRONO', 'sritno': 'SRITNO',
            'poprno': 'POPRNO', 'mmitds': 'MMITDS', 'srrgdt': 'SRRGDT', 'srtppf': 'SRTPPF',
            'srdoid': 'SRDOID', 'srtyp1': 'SRTYP1', 'srtyp2': 'SRTYP2', 'srtyp3': 'SRTYP3',
            'srtyp4': 'SRTYP4', 'srnqty': 'SRNQTY', 'srreft': 'SRREFT', 'safrpf': 'SAFRPF',
            'satopf': 'SATOPF', 'safrt1': 'CustomerSN', 'satot1': 'SATOT1'
        }
        
        S01_SerialNo_Request = merged_df.rename(columns=final_columns_map)
        
        # Ensure correct column order
        S01_SerialNo_Request = S01_SerialNo_Request[list(final_columns_map.values())]
        
        print("\n✅ Final S01_SerialNo_Request DataFrame created successfully.")
        display(S01_SerialNo_Request.head())
    else:
        print("\n⚠️ No serial number request data was fetched from the database.")
else:
    print("⚠️ T_A00 is empty, cannot fetch serial number data.")

print(f"The number of rows is: {len(S01_SerialNo_Request)}")

--- Fetching Serial Number Request data from the database ---
✅ Connection successful to 'v12live' (v12Live).
Fetching serial number requests for 555 products...


  df_sn_data = pd.read_sql_query(sn_sql_query, connection)


✅ Query successful! Found 17 matching requests.
🔌 Connection closed.

✅ Final S01_SerialNo_Request DataFrame created successfully.


Unnamed: 0,PlannerCode,Product,SRRFUS,SRRFID,SRFACI,SRWHLO,SRRORC,SRRONO,SRITNO,POPRNO,...,SRTYP1,SRTYP2,SRTYP3,SRTYP4,SRNQTY,SRREFT,SAFRPF,SATOPF,CustomerSN,SATOT1
0,MP-5310,Ultraschall,FBD4A9E57937,45BB3A1B-BCA0-4FEC-BD11-50AB9DC1AA6C,MF1,MF1,1.0,7003520000.0,304928-100001,304928-100001,...,0.0,0.0,0.0,0.0,1.0,SDC005,...,...,...,...
1,MP-5310,Ultraschall,FBD4A9E57937,C98A8576-F065-4E7D-83FF-57907C60EB82,MF1,MF1,1.0,7003520000.0,304928-100001,304928-100001,...,0.0,0.0,0.0,0.0,1.0,SDC005,...,...,...,...
2,MP-5310,Ultraschall,FBD4A9E57937,15CCDC17-8E6D-4C5D-B920-2D2FB02074CF,MF1,MF1,1.0,7003520000.0,304928-100001,304928-100001,...,0.0,0.0,0.0,0.0,1.0,SDC005,...,...,...,...
3,MP-5310,Ultraschall,FBD4A9E57937,D38559BB-E0A0-483E-A5F3-169694CF16B8,MF1,MF1,1.0,7003520000.0,304928-100001,304928-100001,...,0.0,0.0,0.0,0.0,1.0,SDC005,...,...,...,...
4,MP-5310,Ultraschall,FBD4A9E57937,D7A78F52-D194-41DD-BC31-C5E3C7735FA2,MF1,MF1,1.0,7003520000.0,304928-100001,304928-100001,...,0.0,0.0,0.0,0.0,1.0,SDC005,...,...,...,...


The number of rows is: 17


### Step 25: Count Product with print out serial number
This step creates the `S02_Product_Cnt_req` DataFrame. It aggregates the data from the previous step to count how many individual serial number requests (`SRRFUS`) were made for each unique combination of `PlannerCode`, `SRITNO` (Item Number), `MMITDS` (Description), and `SRDOID` (Document ID). From here we can identify how many item needs to print out serial number.

In [51]:
# Initialize the final DataFrame
S02_Product_Cnt_req = pd.DataFrame()

if not S01_SerialNo_Request.empty:
    print("--- Counting serial number requests per product and document ---")
    
    try:
        # MODIFIED: Added 'Product' to the grouping columns
        grouping_cols = [
            'PlannerCode', 
            'Product',
            'SRITNO', 
            'MMITDS', 
            'SRDOID'
        ]
        
        # Group by the specified columns, count the requests, and clean up the DataFrame
        S02_Product_Cnt_req = (
            S01_SerialNo_Request.groupby(grouping_cols)['SRRFUS']
            .count()
            .reset_index()
            .rename(columns={'SRRFUS': 'AnzahlvonSRRFUS'})
        )
        S02_Product_Cnt_req = S02_Product_Cnt_req.sort_values(['PlannerCode', 'Product', 'SRITNO'])
        print("✅ Request count summary created successfully.")
        display(S02_Product_Cnt_req.head())

    except Exception as e:
        print(f"❌ An error occurred: {e}")

else:
    print("⚠️ S01_SerialNo_Request DataFrame is empty, skipping operation.")

print(f"The number of rows is: {len(S02_Product_Cnt_req)}")

--- Counting serial number requests per product and document ---
✅ Request count summary created successfully.


Unnamed: 0,PlannerCode,Product,SRITNO,MMITDS,SRDOID,AnzahlvonSRRFUS
0,MP-5310,Ultraschall,304928-100001,UC500-18GS-IUEP-IO-V15,,8
1,MP-5310,Ultraschall,70146027,UB2000-F42-E4-V15-Y70146027,,9


The number of rows is: 2


### Step 26: Trim Whitespace from All DataFrames

Before exporting, this step ensures all text data is clean by removing any leading or trailing whitespace from every text column in every DataFrame that will be saved to Excel.

In [52]:
def trim_dataframe_strings(df):
    """Iterates over columns of a DataFrame and trims whitespace from object/string types."""
    df_copy = df.copy()
    for col in df_copy.columns:
        if df_copy[col].dtype == 'object':
            df_copy[col] = df_copy[col].str.strip()
    return df_copy

# A dictionary mapping the original DataFrame name to its trimmed version
dataframes_to_export = {
    # Initial Data
    "Product_List": T_A00,
    "Routing": T_A10,
    "Doc_Usage": T_A11,
    
    # Initial Analysis
    "Variant_Check": A01_Variant_Check,
    "ACO_Check": A02_ACO_Check,
    "LotControled": A03_LotControled,
    "PLCCheck": A03_PLCCheck,
    
    # EDM Document Analysis
    "Document": A12_DoKID_EDM,
    "Labels": A16_EDM_Doc_CAB_Labels,
    "916-": A17_916_Files,
    
    # Work Center Analysis
    "AWC": A21_AprisoWorkCenter,
    
    # Merging Step Analysis
    "MergingStep": A31_Merging_XStep,
    "MergingStepMoreThanOne": A31_Merging_XStep_More_Than_One,

    # Bill of Materials (BOM) Analysis
    "BOM": B10_BOM,
    "Raw Material with Lot control": B12_BOM_Lot_Ctr,
    
    # Employee Data
    "Employee": E01_Employee,
    
    # Serial Number Analysis
    "ProductWithSerial": S02_Product_Cnt_req
}

print("--- Trimming whitespace from all dataframes for export ---")
trimmed_dataframes = {}
for name, df in dataframes_to_export.items():
    if not df.empty:
        trimmed_dataframes[name] = trim_dataframe_strings(df)
    else:
        trimmed_dataframes[name] = df # Keep empty dataframes as they are
print("✅ Trimming complete.")

--- Trimming whitespace from all dataframes for export ---
✅ Trimming complete.


### Final Step: Export All Results to Excel
This last piece of code will gather every DataFrame we've created throughout this analysis and save them into one Excel file named `analysis_output.xlsx`. Each DataFrame will be placed on its own worksheet, with the sheet name corresponding to the DataFrame's name.

This creates a complete, organized report of your entire product analysis that you can easily share or review.

In [None]:
# --- Final Step: Update Excel File, Create Tables, and Protect Pivot Sheet ---

import pandas as pd
from openpyxl import Workbook
from openpyxl.utils import get_column_letter
from openpyxl.worksheet.table import Table, TableStyleInfo
import re
import os

# Use the 'trimmed_dataframes' dictionary from the previous step
# It already contains all the dataframes ready for export.

# Set the output filename
output_filename = 'analysis_output.xlsx'

# --- NEW: Check if the file exists. If not, create it with a blank 'pivot' sheet. ---
if not os.path.exists(output_filename):
    print(f"File '{output_filename}' not found. Creating a new file with a 'pivot' sheet.")
    # Create a new workbook object
    book = Workbook()
    # Create the pivot sheet
    book.create_sheet("pivot")
    # Remove the default "Sheet" that openpyxl creates
    if 'Sheet' in book.sheetnames:
        del book['Sheet']
    # Save the new, empty workbook structure
    book.save(output_filename)
# -------------------------------------------------------------------------------------

try:
    # We use mode='a' (append) to open the existing file.
    # 'if_sheet_exists='replace'' tells pandas to overwrite the sheet's contents if it exists.
    with pd.ExcelWriter(
        output_filename,
        engine='openpyxl',
        mode='a',
        if_sheet_exists='replace'
    ) as writer:

        print(f"--- Updating worksheets in {output_filename} ---")
        
        # Loop through each dataframe that needs to be exported
        for sheet_name, df in trimmed_dataframes.items():
            
            # --- REQUIREMENT: Do not touch the 'pivot' sheet ---
            if sheet_name.lower() in ('pivot', 'summary'):
                print(f"Skipping protected sheet: 'pivot'")
                continue

            if not df.empty:
                print(f"Writing sheet: {sheet_name} ({len(df)} rows)")
                # Write the dataframe to the specific sheet
                df.to_excel(writer, sheet_name=sheet_name, index=False)

                # --- REQUIREMENT: Create a named Excel Table ---
                worksheet = writer.sheets[sheet_name]
                last_col = get_column_letter(df.shape[1])
                last_row = df.shape[0] + 1
                table_range = f"A1:{last_col}{last_row}"

                # Create a robust, valid Excel table name
                # 1. Remove any character that is not a letter, number, or underscore
                table_name = re.sub(r'[^A-Za-z0-9_]', '', sheet_name)
                # 2. If the sanitized name starts with a number, prepend a letter
                if table_name and table_name[0].isdigit():
                    table_name = 'T_' + table_name

                # Create the Table object with the fully sanitized name.
                excel_table = Table(displayName=table_name, ref=table_range)

                # Add a default style to the table
                style = TableStyleInfo(
                    name="TableStyleMedium9", showFirstColumn=False, showLastColumn=False,
                    showRowStripes=True, showColumnStripes=False
                )
                excel_table.tableStyleInfo = style

                # Add the table to the worksheet
                worksheet.add_table(excel_table)
            else:
                print(f"Skipping empty sheet: {sheet_name}")
                # If a sheet for an empty dataframe exists in the file, we can choose to remove it
                if sheet_name in writer.book.sheetnames:
                    del writer.book[sheet_name]

    print(f"\n✅ Export complete! Your file '{output_filename}' has been updated.")

except Exception as e:
    print(f"❌ An error occurred during export: {e}")
    print("Please make sure you have 'openpyxl' installed.")
    
# --- TIMER END ---
end_time = time.time()
total_time = end_time - start_time
print("\n--- Total Execution Time ---")
print(f"The entire notebook took {total_time:.2f} seconds to run.")
# -----------------

File 'analysis_output.xlsx' not found. Creating a new file with a 'pivot' sheet.
--- Updating worksheets in analysis_output.xlsx ---
Writing sheet: Product_List (555 rows)
Writing sheet: Routing (18504 rows)
Writing sheet: Doc_Usage (2765 rows)
Writing sheet: Variant_Check (9 rows)
Writing sheet: ACO_Check (10 rows)
Writing sheet: LotControled (18 rows)
Writing sheet: PLCCheck (22 rows)
Writing sheet: Document (2758 rows)
Writing sheet: Labels (69 rows)
Writing sheet: 916- (16 rows)
Writing sheet: AWC (96 rows)
Writing sheet: MergingStep (279 rows)
Writing sheet: MergingStepMoreThanOne (44 rows)
Writing sheet: BOM (10801 rows)
Writing sheet: Raw Material with Lot control (25 rows)
Writing sheet: Employee (87 rows)
Writing sheet: ProductWithSerial (2 rows)

✅ Export complete! Your file 'analysis_output.xlsx' has been updated.

--- Total Execution Time ---
The entire notebook took 510.71 seconds to run.
