### SEMICONDUCTOR MANUFACTURING PRODUCT QUALITY PREDICTION

**Highlights:**
 <br> 1. Unlike with machine learning algorithms, 
 <br> 1.A: no dataset outliers treatment as the auto-encoder neural netowrk is insensitive to them 
 <br> 1.B: the multi-collinearity among the descriptors is also not bothered as the rigorous dimentionality reduction happens (quicker compared to PCA)
 <br> 2. Both the keras types of models are illustrated: Sequential API and Functional API
 <br> 3. All of the data is gathered from the sensors in real-time 
 <br> 4. Dataset is labelled as 'Product Quality' where 1: Good 0: Bad
 <br> 5. Auto-encoder neural network is used as a dimentionality reduction technique
 <br> 6. Support vector machine uses the encoded data for the prediction of product quality

Import all the necessary library packages

In [234]:
import pandas as pd 
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from keras.layers import Input, Dense
from keras.models import Model, Sequential
from keras import regularizers

Functions used

In [136]:
def getRedundantColumns(X):                                         
    RedundantColumns = set()
    for loc in range(X.shape[1]):
        tocomparecolumn = X.iloc[:, loc]
        for nextloc in range(loc + 1, X.shape[1]):
            comparewithcolumn = X.iloc[:,nextloc]
            if tocomparecolumn.equals(comparewithcolumn):
                RedundantColumns.add(X.columns.values[nextloc])
    return list(RedundantColumns)

#### Data Preparation

In [194]:
dataset = pd.read_csv('SMPQ.csv') 
dataset.isnull().sum()   # identifies the missing columns

Time          0
0             6
1             7
2            14
3            14
             ..
586           1
587           1
588           1
589           1
Pass/Fail     0
Length: 592, dtype: int64

In [195]:
dataset.replace('', np.nan, inplace=True)    # replace miising values across the dataset with NaN
dataset = dataset.fillna(dataset.median())   # fill the NaN values in each column with their respective column median
# dataset.isnull().sum() # to know the number of column missing values afterwards
print('Actual Dataset dimension:',dataset.shape)

  dataset = dataset.fillna(dataset.median())   # fill the NaN values in each column with their respective column median


Actual Dataset dimension: (1567, 592)


In [196]:
RedundantColumns = np.array(getRedundantColumns(dataset))   # Identifying the duplicate columns
RedundantColumns

array(['515', '237', '258', '380', '402', '374', '231', '313', '178',
       '508', '481', '191', '371', '414', '535', '514', '462', '534',
       '266', '229', '403', '533', '243', '502', '260', '373', '265',
       '97', '512', '397', '52', '236', '507', '240', '501', '505', '513',
       '263', '400', '186', '458', '329', '69', '325', '233', '315',
       '466', '451', '401', '179', '261', '230', '256', '504', '327',
       '372', '422', '190', '330', '226', '276', '503', '531', '328',
       '189', '192', '379', '404', '449', '529', '399', '464', '509',
       '194', '375', '369', '242', '463', '530', '536', '532', '370',
       '264', '141', '241', '394', '259', '322', '398', '193', '381',
       '450', '234', '528', '262', '537', '149', '395', '364', '465',
       '284', '314', '326', '257', '235', '498', '506', '232', '378',
       '538', '396', '461'], dtype='<U3')

In [197]:
dataset = dataset.T.drop_duplicates().T    # removes all of the above duplicate columns 
print('Dataset dimension after removing duplicate columns:',dataset.shape) 

Dataset dimension after removing duplicate columns: (1567, 480)


In [198]:
dataset = dataset.drop(['Time'], axis=1)  # Drop the 'Time' column as it is not needed

dataconsistency = dataset.nunique()       # Identify whether the column data is identical for all the rows
inconsistant_columns = dataconsistency[dataconsistency == 1].index
dataset = dataset.drop(inconsistant_columns, axis=1) # drop the columns with no data variation
print('Dataset dimension after removing useless columns:',dataset.shape)   # columns with all the rows having same data

Dataset dimension after removing useless columns: (1567, 475)


In [226]:
# Seperate the class variable with the descriptors, as the normalization must be performed only with non-class variable
Descriptors = dataset.iloc[:,:-1].values      
Class = pd.DataFrame(dataset.iloc[:,-1].values)
Class = Class.rename(columns={Class.columns[0]: 'Product Quality'})  # Renaming the label
Class['Product Quality'] = Class['Product Quality'].replace([-1,1],[1,0])    # 1: Good, 0: Bad

In [227]:
Descriptors = np.asarray(Descriptors).astype(np.float32)  # Performs the normalization
layer = tf.keras.layers.Normalization(axis=None)
layer.adapt(Descriptors)
Normalised_Data=layer(Descriptors)
Descriptors = pd.DataFrame(Normalised_Data)

In [229]:
dataframes = [Descriptors, Class]              # merge the class variable column with the normalized descriptors columns
modified_dataset = pd.concat(dataframes, axis=1)

In [179]:
modified_dataset.to_csv(r'SMPQ ModifiedDataset.csv')   # export the filtered dataset

#### Model Building

Training the auto-encoder network (Funcitonal API Model), to use the bottleneck layer as the reduced dimentionality for prediction engine

In [233]:
Guiding_layer = Input(shape =(Descriptors.shape[1], ))

en_layer1 = Dense(300, activation ='tanh',activity_regularizer = regularizers.l1(0.01))(Guiding_layer)
en_Layer2 = Dense(150, activation ='tanh',activity_regularizer = regularizers.l1(0.01))(en_layer1)
en_Layer3 = Dense(75, activation ='tanh',activity_regularizer = regularizers.l1(0.01))(en_Layer2)
en_Layer4 = Dense(32, activation ='tanh',activity_regularizer = regularizers.l1(0.01))(en_Layer3)

Bottleneck_layer = Dense(15, activation ='relu')(en_Layer4)       # Compressed dimentionality
  

de_layer1 = Dense(32, activation ='tanh')(Bottleneck_layer)
de_layer2 = Dense(75, activation ='tanh')(de_layer1)
de_layer3 = Dense(150, activation ='tanh')(de_layer2)
de_layer4 = Dense(300, activation ='tanh')(de_layer3)
  
reconstructed_layer = Dense(Descriptors.shape[1], activation ='relu')(de_layer4)

NameError: name 'Input' is not defined