<a href="https://colab.research.google.com/github/yashfirkedata/DL-Critical-Heat-Flux-Prediction/blob/main/Critical_Heat_Flux_Prediction_Modelling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import numpy as np

In [2]:
df= pd.read_csv('https://raw.githubusercontent.com/yashfirkedata/DL-Critical-Heat-Flux-Prediction/main/Data_CHF_Zhao_2020_ATE.csv')
df

Unnamed: 0,id,author,geometry,pressure [MPa],mass_flux [kg/m2-s],x_e_out [-],D_e [mm],D_h [mm],length [mm],chf_exp [MW/m2]
0,1,Inasaka,tube,0.39,5600,-0.1041,3.0,3.0,100,11.3
1,2,Inasaka,tube,0.31,6700,-0.0596,3.0,3.0,100,10.6
2,3,Inasaka,tube,0.33,4300,-0.0395,3.0,3.0,100,7.3
3,4,Inasaka,tube,0.62,6400,-0.1460,3.0,3.0,100,12.8
4,5,Inasaka,tube,0.64,4700,-0.0849,3.0,3.0,100,11.0
...,...,...,...,...,...,...,...,...,...,...
1860,1861,Richenderfer,plate,1.01,1500,-0.0218,15.0,120.0,10,9.4
1861,1862,Richenderfer,plate,1.01,1500,-0.0434,15.0,120.0,10,10.4
1862,1863,Richenderfer,plate,1.01,2000,-0.0109,15.0,120.0,10,10.8
1863,1864,Richenderfer,plate,1.01,2000,-0.0218,15.0,120.0,10,10.9


# **Data Preprocessing**

In [3]:
df = df.drop(['id'],axis=1)

In [4]:
new_column_names = ['author', 'geometry', 'pressure', 'mass_flux', 'exit_concentration', 'equivalent_diameter', 'hydraulic_diameter', 'channel_length', 'exp_critical_heat_flux']
df.columns = new_column_names

In [5]:
df.drop(["author"],axis=1,inplace=True)
# author can be dropped as from EDA, we can see three different authors contributed to gather 3 seperate geo study data

In [6]:
df.drop(["geometry"],axis=1,inplace=True)
# eda revealed geometry is not much of importance

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1865 entries, 0 to 1864
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   pressure                1865 non-null   float64
 1   mass_flux               1865 non-null   int64  
 2   exit_concentration      1865 non-null   float64
 3   equivalent_diameter     1865 non-null   float64
 4   hydraulic_diameter      1865 non-null   float64
 5   channel_length          1865 non-null   int64  
 6   exp_critical_heat_flux  1865 non-null   float64
dtypes: float64(5), int64(2)
memory usage: 102.1 KB


In [8]:
df.duplicated().any()

True

In [9]:
df.drop_duplicates(inplace = True)

In [10]:
df.duplicated().any()

False

In [11]:
df.isna().sum().any()

False

**As suggested in the paper, we need to gain
Unity Variance and Zero Mean (approx)**

In [12]:
# mean of all the columns
df.mean()

pressure                     9.863929
mass_flux                 2895.929774
exit_concentration           0.018447
equivalent_diameter          9.443993
hydraulic_diameter          16.736100
channel_length             938.381312
exp_critical_heat_flux       3.878062
dtype: float64

In [13]:
# variance of all the cols
df.var()

pressure                  1.870167e+01
mass_flux                 2.705712e+06
exit_concentration        1.356412e-02
equivalent_diameter       4.202919e+01
hydraulic_diameter        4.796055e+02
channel_length            5.556513e+05
exp_critical_heat_flux    4.078113e+00
dtype: float64

In [14]:
from sklearn.model_selection import train_test_split

X = df.drop('exp_critical_heat_flux', axis=1)
y = df['exp_critical_heat_flux']

X_train, X_val_test, y_train, y_val_test = train_test_split(X, y, test_size=0.2, random_state=101, shuffle=True)
# Note that we are Shuffleing the data.............................................................^^^^^^^^^^^^

# Now the Val and Test Splits: 5 % for testing 15 for validating
X_val, X_test, y_val, y_test = train_test_split(X_val_test, y_val_test, test_size=0.3, random_state=101, shuffle=True)

In [15]:
# We need to standardize the data in such a way that mean becomes 0 and standard deviation becomes 1 approximately
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_val_scaled = scaler.transform(X_val)

In [16]:
X_train_scaled_df = pd.DataFrame(X_train_scaled, columns=X_train.columns)
X_val_scaled_df = pd.DataFrame(X_val_scaled, columns=X_val.columns)
X_test_scaled_df = pd.DataFrame(X_test_scaled, columns=X_test.columns)

# Save the datasets to CSV files
X_train_scaled_df.to_csv('X_train.csv', index=False)
X_val_scaled_df.to_csv('X_val.csv', index=False)
X_test_scaled_df.to_csv('X_test.csv', index=False)
y_train.to_csv('y_train.csv', index=False)
y_val.to_csv('y_val.csv', index=False)
y_test.to_csv('y_test.csv', index=False)

> Data Preprocessing Completed and Data is now ready for Modelling

# **Modelling**