# DATA PREPROCESSING - MERGING THE SIMULATION AND MOSFET Paramters from the Datasheets
#### Data Merging 
This is the main preprocessing and merging file 
Each MOSFETs Simualtion Data and its corresponding datasheet parameters are merged together into one file for clear integration 

---

#### 1. Simulation Data Inputs

| Parameter | Unit | Description |
|-----------|------|-------------|
| Vbus      | V    | DC bus voltage |
| Rg        | Ω    | Gate resistance |
| Ls4       | nH   | Parasitic inductance branch 4 |
| Ls5       | nH   | Parasitic inductance branch 5 |
| Ls6       | nH   | Parasitic inductance branch 6 |
| Ls7       | nH   | Parasitic inductance branch 7 |
| Ls8       | nH   | Parasitic inductance branch 8 |
| Ls9       | nH   | Parasitic inductance branch 9 |
| Ls10      | nH   | Parasitic inductance branch 10 |
| Ls11      | nH   | Parasitic inductance branch 11 |

---

#### 2. Simulation Outputs

| Parameter | Unit | Description |
|-----------|------|-------------|
| voltage_rise_time_pulse1  | ns  | Voltage rise time (Pulse 1) |
| voltage_rise_time_pulse2  | ns  | Voltage rise time (Pulse 2) |
| voltage_fall_time_pulse1  | ns  | Voltage fall time (Pulse 1) |
| voltage_fall_time_pulse2  | ns  | Voltage fall time (Pulse 2) |
| current_rise_time_pulse1  | ns  | Current rise time (Pulse 1) |
| current_rise_time_pulse2  | ns  | Current rise time (Pulse 2) |
| current_fall_time_pulse1  | ns  | Current fall time (Pulse 1) |
| current_fall_time_pulse2  | ns  | Current fall time (Pulse 2) |
| overshoot_pulse_1         | V   | Voltage overshoot (Pulse 1) |
| overshoot_pulse_2         | V   | Voltage overshoot (Pulse 2) |
| undershoot_pulse_1        | V   | Voltage undershoot (Pulse 1) |
| undershoot_pulse_2        | V   | Voltage undershoot (Pulse 2) |
| ringing_frequency_MHz     | MHz | Ringing frequency |

---

#### 3. Static Simulation Parameters

| Parameter | Unit | Description |
|-----------|------|-------------|
| Tp1    | ns  | Pulse 1 duration |
| Tp2    | ns  | Pulse 2 duration |
| Toff   | ns  | Turn-off time |
| Tstart | ns  | Simulation start time |
| L1     | nH  | Inductance L1 |
| L2     | nH  | Inductance L2 |
| L3     | nH  | Inductance L3 |
| R1     | Ω   | Resistance R1 |

---

#### 4. MOSFET Datasheet Parameters

| Parameter | Unit | Description |
|-----------|------|-------------|
| Part_Number   | -    | MOSFET identifier |
| VDS_max       | V    | Max drain-source voltage |
| ID_max_25C    | A    | Max drain current @ 25°C |
| RDS_on_typ    | mΩ   | Typical on-resistance |
| RDS_on_max    | mΩ   | Maximum on-resistance |
| VGS_th_min    | V    | Min gate threshold voltage |
| VGS_th_typ    | V    | Typical gate threshold voltage |
| VGS_th_max    | V    | Max gate threshold voltage |
| Qg_total      | nC   | Total gate charge |
| Qrr_typ       | nC   | Reverse recovery charge |
| Irrm_typ      | A    | Peak reverse recovery current |
| Eon_typ       | µJ   | Typical turn-on energy loss |
| Eoff_typ      | µJ   | Typical turn-off energy loss |
| Ciss          | pF   | Input capacitance |
| Coss          | pF   | Output capacitance |
| Crss          | pF   | Reverse transfer capacitance |
| Rth_JC_typ    | °C/W | Typical thermal resistance (junction–case) |
| Rth_JC_max    | °C/W | Max thermal resistance (junction–case) |
| Tj_max        | °C   | Max junction temperature |

---


In [2]:
import os
import pandas as pd

# The input paths for the simulation data and the static parametrs and each MOSFET's parameters
dpt_folder = "DPT"
mosfet_param_folder = "MOSFET_Parameters"
dpt_static_file = "DPT_Static_Parameters.csv"
output_dir = "merged_mosfets_cleaned"

# The Devisce ID is known as the Part Number
device_id_column = "Part_Number"

# These are the Target prediction columns 
target_columns = [
    'voltage_rise_time_pulse1', 'voltage_rise_time_pulse2',
    'voltage_fall_time_pulse1', 'voltage_fall_time_pulse2',
    'current_rise_time_pulse1', 'current_rise_time_pulse2',
    'current_fall_time_pulse1', 'current_fall_time_pulse2',
    'overshoot_pulse_1', 'overshoot_pulse_2',
    'undershoot_pulse_1', 'undershoot_pulse_2',
    'ringing_frequency_MHz'
]

# Creating the output folder to store the merged files
os.makedirs(output_dir, exist_ok=True)

# Adding the DPT static vlaues across the simulation data 
dpt_static = pd.read_csv(dpt_static_file)

# For each Mosfet
print("Starting merge and clean for all MOSFETs...\n")

for filename in os.listdir(dpt_folder):
    if not filename.endswith(".csv"):
        continue

    part_name = filename.replace("DPT_", "").replace(".csv", "").strip()
    sim_path = os.path.join(dpt_folder, filename)
    mosfet_param_path = os.path.join(mosfet_param_folder, f"{part_name}_Parameters.csv")

    # Loading the simulation data
    df = pd.read_csv(sim_path)

    # Separating the input and the target columns to have a clean merge 
    # and distinguish between the input and target values
    df_y = df[target_columns].copy()
    df_x = df.drop(columns=target_columns)

    # Repeating the same for the static 
    dpt_static_repeated = pd.concat([dpt_static] * len(df_x), ignore_index=True)
    df_x = pd.concat([df_x.reset_index(drop=True), dpt_static_repeated], axis=1)

    # MOSFET parameters 
    df_params = pd.read_csv(mosfet_param_path)
    df_params_repeated = pd.concat([df_params] * len(df_x), ignore_index=True)

    # Fianlly the merged complete Dataframe 
    df_full = pd.concat([df_x, df_params_repeated, df_y], axis=1)

    # this is the main part to drop the Null values Rows 
    before = len(df_full)
    df_full = df_full.dropna()
    after = len(df_full)
    dropped = before - after
    # Need the printing to knwo the number of rows dropped fro each MOSFET
    print(f" {part_name}: Dropped {dropped} rows with missing values. Remaining: {after}")

    # Saving the cleaned files for review and maual cheecks for the next steps 
    output_path = os.path.join(output_dir, f"{part_name}_merged.csv")
    df_full.to_csv(output_path, index=False)
    print(f"Saved cleaned file: {output_path}")

print("\n All MOSFET datasets processed and saved without missing values.")


Starting merge and clean for all MOSFETs...

 C2M0025120D: Dropped 384 rows with missing values. Remaining: 158082
Saved cleaned file: merged_mosfets_cleaned\C2M0025120D_merged.csv
 C2M0040120D: Dropped 1785 rows with missing values. Remaining: 404934
Saved cleaned file: merged_mosfets_cleaned\C2M0040120D_merged.csv
 C2M0080120D: Dropped 172 rows with missing values. Remaining: 410961
Saved cleaned file: merged_mosfets_cleaned\C2M0080120D_merged.csv
 C2M0160120D: Dropped 67 rows with missing values. Remaining: 158538
Saved cleaned file: merged_mosfets_cleaned\C2M0160120D_merged.csv
 C2M0280120D: Dropped 187 rows with missing values. Remaining: 429245
Saved cleaned file: merged_mosfets_cleaned\C2M0280120D_merged.csv
 C2M1000170D: Dropped 27 rows with missing values. Remaining: 174037
Saved cleaned file: merged_mosfets_cleaned\C2M1000170D_merged.csv

 All MOSFET datasets processed and saved without missing values.
