<a href="https://www.kaggle.com/code/tanavbajaj/neural-network-basic-using-tensorflow?scriptVersionId=100220038" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Neural Network Basic Using Tensorflow


#  Importing the libraries

In [1]:
import numpy as np 
import pandas as pd 
import tensorflow as tf

# Read the Dataset

In [2]:
data= pd.read_csv('../input/titanic/train.csv')

In [3]:
print(data.shape)
data.head()

(891, 12)


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


Select the required data frames and by intuition class, fare, age and sex are the most important predictors. 
This is known because higher class people were given priority over the lower class. Sex is important because women were given preference over men. Age becomes a predictor because children were also given preference. 
So as per my intuition lower class men were most likely to die. 

But we can’t rely on intuition only so let's go for the machine learning algorithm. 


In [4]:
data = data[['Survived', 'Pclass', 'Sex', 'Age', 'Fare']]

In [5]:
data = data.dropna()
print(data.shape)

(714, 5)


In [6]:
target = data.pop('Survived')

As per the datasets out of 800 people around 500 people died and the rest survived. 

Now that the NULL values have been dropped and the target is separated from the data time to build the machine learning pipeline. 

One hot encoding on the categorical dataset and normalisation of the numeric dataset. 
 Normalisation is to make sure that the dataset fits between 0 and 1. 
One hot encoding creates a new column for each category. All are filled with 0s and 1s. 1s refer to the existence of that category for the row. 


# Split the dataset into parts for the same


In [7]:
categorical_feature_names = ['Pclass','Sex']
numeric_feature_names = ['Fare', 'Age']
predicted_feature_name = ['Survived']

### To feed the dataset to Tensorflow it must be pre-processed in a certain way. The first task is to create the tensor dictionary 


In [8]:
def create_tensor_dict(data, categorical_feature_names):
    inputs = {}
    for name, column in data.items():
      if type(column[0]) == str:
        dtype = tf.string
      elif (name in categorical_feature_names):
        dtype = tf.int64
      else:
        dtype = tf.float32

      inputs[name] = tf.keras.Input(shape=(), name=name, dtype=dtype)
    return inputs

inputs = create_tensor_dict(data, categorical_feature_names)


Here each column is assigned a particular TensorFlow datatype based on its current datatype and a dictionary is created to uniquely identify each column and its data type. 

Next up is normalising the dataset
Before normalizing the features a helper function is needed to convert pandas dataframe to tenroflow floats and converts it into one big tensor. 


In [9]:
def stack_dict(inputs, fun=tf.stack):
    values = []
    for key in sorted(inputs.keys()):
      values.append(tf.cast(inputs[key], tf.float32))

    return fun(values, axis=-1)

# Next its time to normalise using Keras’s inbuilt normalizer. 


In [10]:
def create_normalizer(numeric_feature_names, data):
    numeric_features = data[numeric_feature_names]
    
    normalizer = tf.keras.layers.Normalization(axis=-1)
    normalizer.adapt(stack_dict(dict(numeric_features)))  
    return normalizer

### Using the stack_dict and create_normalizer function time to create a dictionary in a way the normalizer can process it. 

In [11]:
def normalize_numeric_input(numeric_feature_names, inputs, normalizer):
    numeric_inputs = {}
    for name in numeric_feature_names:
      numeric_inputs[name]=inputs[name]

    numeric_inputs = stack_dict(numeric_inputs)
    numeric_normalized = normalizer(numeric_inputs) 
    return numeric_normalized

In [12]:
normalizer = create_normalizer(numeric_feature_names, data)
numeric_normalized = normalize_numeric_input(numeric_feature_names, inputs, normalizer)
print(numeric_normalized)

2022-07-07 03:01:00.410165: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-07-07 03:01:00.530231: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


KerasTensor(type_spec=TensorSpec(shape=(None, 2), dtype=tf.float32, name=None), name='normalization/truediv:0', description="created by layer 'normalization'")


# Creating a way to store all the preprocessed dataset. 

In [13]:
preprocessed = []
preprocessed.append(numeric_normalized)

# One Hot Encoding
Now the numeric part of the dataset has been normalised it is time to do one hot encoding to the categorical features. 

Here we iterate through the columns and find the string and integer type of columns and convert into the one-hot encoded columns for strings. Then we do the same for integer values. To do this placeholders from the input dictionary created above are taken. 
At the end one hot encodings are returned from the function. 


In [14]:
def one_hot_encode_categorical_features(categorical_feature_names, data, inputs):
    one_hot = []
    for name in categorical_feature_names:
      value = sorted(set(data[name]))

      if type(value[0]) is str:
        lookup = tf.keras.layers.StringLookup(vocabulary=value, output_mode='one_hot')
      else:
        lookup = tf.keras.layers.IntegerLookup(vocabulary=value, output_mode='one_hot')

      x = inputs[name][:, tf.newaxis]
      x = lookup(x)
      one_hot.append(x)
    return one_hot

## Adding one hot encoded data to the preprocessed one. 

In [15]:
one_hot = one_hot_encode_categorical_features(categorical_feature_names, data, inputs)
preprocessed = preprocessed + one_hot


In [16]:
preprocesssed_result = tf.concat(preprocessed, axis=-1)


# Keras preprocessing before the model is constructed

In [17]:
preprocessor = tf.keras.Model(inputs, preprocesssed_result)

In [18]:
preprocessor(dict(data.iloc[:1]))

<tf.Tensor: shape=(1, 9), dtype=float32, numpy=
array([[-0.5303766, -0.5189777,  0.       ,  0.       ,  0.       ,
         1.       ,  0.       ,  0.       ,  1.       ]], dtype=float32)>

# Build the Neural Network

Using Keras Sequential its time to define the neural network. We will be using 2 dense hidden layers with 10 neurons each and apply the ReLU activation function. 


In [19]:
network = tf.keras.Sequential([
  tf.keras.layers.Dense(10, activation='relu'),
  tf.keras.layers.Dense(10, activation='relu'),
  tf.keras.layers.Dense(1)
])

# Next pr eprocessor and network are tied together

In [20]:
x = preprocessor(inputs)
result = network(x)
model = tf.keras.Model(inputs, result)

# Compiling the Model
Finally the entire model is compiled using the Adam optimizer ( Adam is generally used as default) along with binary cross as loss function and accuracy as the evaluation function. 

In [21]:
model.compile(optimizer='adam',
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=['accuracy'])

In [22]:
history = model.fit(dict(data), target, epochs=25, batch_size=8)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


# Split the dataset into test and training datasets

In [23]:
from sklearn.model_selection import train_test_split

train_data, val_data, train_target, val_target = train_test_split(data,target, train_size=0.8)
history = model.fit(dict(train_data), train_target, validation_data=(dict(val_data), val_target), epochs=20, batch_size=8)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


# Accuracy

In [24]:
results = model.evaluate(dict(train_data), train_target, batch_size=128)
print("test accuracy:", results[1]*100 , "%")

test accuracy: 81.08581304550171 %
