#                                            Multi-Class Considerations

In [2]:
# Non-Exclusive Classes

#Eg; A photo with multiple tags to it. Eg; Beach, Family, Holiday etc

# The model can output multiple classes(Classification). 
# With the use of sigmoid function which gives out the probability between 0 and 1, if the model predicted the picture to be 
# 0.6 beach and 0.8 vacation, it will output both classes. [If you specify to include all above 0.5]

# In essence, one data fed into a function can output multiple classes(results/answers/classifcation)

#  O(Picture) ---> O (Passes through function)--->O (0.6 Beach)
#                                             --->O (0.8 Vacation)
#                                             --->O (0.2 Cruise)

In [3]:
# Mutually Exclusive Classes

#Eg; Photo categorised as grayscale(black and white) or full color.
#    CANNOT BE BOTH AT THE SAME TIME

# Each data point can only have a single class(classification). Eg; Can only be Red, Blue or Green
# Using the Softmax Activation Function, it calculates the PROBABILITIES of each class OVER POSSIBLE target classes.
# The range is 0 to 1 and the sum of all probabilities will be equal to 1.
# The model returns the probabilities of each class. Eg; 0.2 Green, 0.7 Blue, 0.1 Red [Note that they add up to 1]
# The model will only select Blue as it has the highest probability.

#                                             Cost Function

In [5]:
# It must be an average so it can output a single value. 
# Because If you output multiple value, how to determine the performance of the model?

In [8]:
# Using y:True Value, a:Neuron's prediction
# a = σ(Z) where Z = Weight*x + bias  & σ is the function used eg;Sigmoid, ReLU(Rectified Linear Unit)
# We simply calculate the difference between Real Values: y(x) against Predicted Values: a(x)

# After getting the result, we will square it because 
#(1) We are calculating the mean. So if there is a negative and positives, 
#    the result could be 0 and it is not showing the actual difference(cost) 
#(2) To punish large errors which could go undetected when compared against other not so large errors

In [None]:
# How do we figure out what is the best weight which leads us to the lowest cost given that there are so many layers with 
# so many weights involved?

# In order to figure the weights to lead us to the lowest cost, we use gradient descent.

#                                         Gradient descent

In [7]:
# Imagine a graph with a U shaped line, with Weight on X axis and Cost on Y axis.
# We will want the weight to fall under the bottom of U where the Cost is the lowest.
# We will then take 'Steps' from the tip of the U which measures the gradient and slowly work out way till
# the lowest gradient close to 0
# But how big of a step should we take? A small step will take a lot of time measuring and a large step could risk over-stepping

# This is where Adaptive Gradient Descent comes in.

#                                    Adaptive Gradient Descent

In [10]:
# In Adaptive Gradient Descent, we start with larger steps which gets smaller when the slope gets closer to 0.
# ADAM: A Method for Stochastic Optimization will be used to search for these minimums.

#                                         Train Test Split

In [1]:
# Importing
# from sklearn.model_selection import train_test_split

# X = df[['Feature1','Feature2']].values   ----> Returns the columns as arrays which will be required
#                                          ----> Do note that the features are the columns you want the model to train
#                                               to identify Eg; The price of the house
#                                           ----> It is 2 brackets as it is a 2D Array and capital X to denote it

# Alternatively

# X = df.drop('price', axis=1).values ---> Works the same way as the above, more convenient as you dont have to
#                                          state all the features.



# y = df['Column you want to predict'].values  ----> Using the example of the house, it could be the price of the house


# Alternatively ver:2 [This one is get from the youtube video. Once you pop ah y straight away get the label column
#                       and is removed from df at the same time. After that you just label it as X.]

# y = df.pop('Label column')
# X = df



# Input the code: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
#    random_state ----> It will split the samples into train and test randomly. 
#    random_state=42 ----> It will always split them into the same sets at random. Eg; Like to follow the lecture's figures

#                                         MinMax Scaler

In [3]:
# As we are working with Weights and Biases, if we have large values in the Feature columns, it could cause errors with weights.
# To avoid that we normalise/scale it.

# from sklearn.preprocessing import MinMaxScaler

# scaler = MinMaxScaler()
# scaler.fit(X_train) ----> Calculates the parameters it needs to perform the actual scaling later on
#                     ----> Calculates the Standard Deviation Min and Max
#                     ----> We only run on training set and not test set as well is to prevent Data Leakage 
#                           We don't want to assume that we have prior information of the test set
#                     ----> If you run on the test set, it means that we have the std and min max of it which is not what we want

# X_train = scaler.transform(X_train)
# X_train = scaler.fit_transform(X_train) ---> Does the fitting and transform in a single line

# X_test = scaler.transform(X_test)

#                                   Creating Model with Keras Syntax

In [None]:
#Importing
# from tensorflow.keras.models import Sequential
# from tensorflow.keras.layers import Dense

# model = Sequential()                 ----> Dense: How many layers of Neurons in a 'column'
# model.add(Dense(4,activation='relu'))       Activation: Which activation method you want to use
# model.add(Dense(2,activation='relu'))       The copy and pasting: It is to determine how many 'columns' you have which this case 
# model.add(Dense(1,activation='relu'))       there's three
#                                          [We usually determine the Dense number based on the number of features]

# model.add(Dense(1)) ----> There's only 1 layer in the final layer to output the results Eg; Predicted Price

# model.add(Dense(1,activation='sigmoid')) ----> For Binary Classification, we want the last activation to be Sigmoid

# model.compile(optimizer = 'rmsprop', loss = 'mse') ----> Since this is a Regression model[Predicting the price of house].
#                                                          We will use loss = 'mse'

# model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['accuracy'])   ----> For predicting binary problems[Eg; Yes or No]
# model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics=['accuracy']) ----> For predicting classification problems
#                                                                                                     Eg; 0.6 Beach 0.4 Vacation photo

# model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test),batch_size=128, epochs=250) 

# Insert the X_train which is the data you want to train & y_train to y which is the output
# Epochs refers to one cycle through the full training dataset. So for this case, it is running 250 cycles through the training set
# validation_data ---> After each epoch, it will quickly run on the test data and check the loss.
#                      It will not affect the weights and bias of the model. They are only adjusted based on the training set.
#                      Only to check on the data. So as to identify if there is overfitting
# batch_size ---> For large data sets, we can but not must feed it in batches. The smaller the batch, the longer it will run 
#                 BUT less liklely will overfit which is good!             
#            ---> The number is usually in powers of 2 Eg; 32, 64, 128, 256

#                                              Plotting the loss

In [None]:
# loss_df = pd.DataFrame(model.history.history) ---> It will display the loss on the training set & validation set if included
# loss_df.plot()                            ----> This will display a graph to evaluate how much losses are decreased over time


# Alternatively

# plt.plot(r.history['loss'], label = 'loss')
# plt.plot(r.history['val_loss'], label = 'vaL_loss')  # Do note that the r in this case is the model

#                                             Callbacks (Early Stopping)

In [None]:
# Prerequisite

# Perform these steps again from above before Early Stopping: model = Sequential()
#                                                             model = add()
#                                                             model.compile()


#Importing Early stopping

# from tensorflow.keras.callbacks import EarlyStopping
# help(EarlyStopping) ---> To read up on the guide

# early_stop = EarlyStopping(monitor='val_loss', mode = 'min')

# monitor: Refers to the Quantity being monitored. For this case above, it is the val_loss[Do note taht val_loss is system defined term]

# patience: Number of epochs to continue running even after there's no improvements.

# mode: 'min' = Training stops when quantity monitored(val_loss in this case) stopped decreasing.
#       'max' = Training stops when quantity monitored(val_loss in this case) stopped increasing.
#       'auto' = Direction is inferred from the name of monitored quantity

# Tip: val_loss usually pairs with mode = 'min'. If you are looking for accuracy, go for mode = 'max' instead.


# Model 'Refitting'

# model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test),batch_size=128, epochs=250, callbacks = [early_stop])
# Do note that the callbacks has been added which was defined above

# Reevaluating the model after Early Stopping

# loss_df = pd.DataFrame(model.history.history)
# loss_df.plot()

# Callbacks(Scheduler)

In [None]:
## After Performing the model compilation, define the scheduler and put it under callbacks when fitting the model
## The scheduler says that once the epoch hits 50, the learning rate will be higher

# def scheduler(epochs, lr):
#   if epochs >= 50:
#     return 0.0001
#   return 0.001

# scheduler = tf.keras.callbacks.LearningRateScheduler(schedule)

# model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test),batch_size=128, epochs=250, callbacks = [scheduler])

#                                   Adding Dropout Layers[Prevent Overfitting]

In [None]:
# Importing
# from tensorflow.keras.layers import Dropout

# You can perform this step when making the first model or after evaluating it and you see that it was Overfitting

# model = Sequential()

# model.add(Dense(4,activation='relu'))
# model.add(Dropout(0.5))              ---> This will turn off 50% of the Neurons at random each time

# model.add(Dense(2,activation='relu'))
# model.add(Dropout(0.5))

# model.add(Dense(1,activation='relu'))       
                                          
# model.compile()                     ---> Don't copy and paste this command ah, follow the one on top.


# model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test),batch_size=128, epochs=250, callbacks = [early_stop])
# Notice that Early Stopping has also been included here to further prevent Overfitting

#                                            Evaluating the model

In [1]:
#                                               [For Regression Task]

# model.evaluate(X_test,y_test,verbose=0) ----> This will return the Mean Squared Error of the losses
#                                         ----> The model that we trained is now put to the test with values X_test & y_test 
#                                               which it has never seen before.
#                                         ----> The lower the Mean Squared Error, the better the model.
# model.predict(X_test) ----> This will output an array of the model's predictions of y(Price in this case) using only
#                             the data presented in X_test.

# test_predictions = model.predict(X_test) 
# test_predictions = pd.Series(test_predictions.reshape(300,)) ----> We can then make it into a Panda series to compare it
#                                                                    side by side with the actual y(Price) figure.
#                                                              ----> The 300 just represented 300 rows with 1 column for this case
# pred_df = pd.DataFrame(y_test, columns=['Test True Y']) ----> This is to map the true y(Price) value to be concatenate 
#                                                               with the predicted y to see if the figures are close.
# pred_df = pd.concat([pred_df, test_predictions], axis=1) ----> Mapping the columns together
# pred_df.columns = ['Test True Y','Model Predictions'] ----> Naming the columns
# sns.scatterplot(x='Model Predictions', y= 'Test True Y', data=pred_df) ----> Creating a scatterplot for visualisation

#                       Mean Absolute Error / Mean Squared error / Explained Variance Score


# Importing
# from sklearn.metrics import mean_absolute_error, mean_squared_error

# mean_squared_error(pred_df['Test True Y'], pred_df['Model Predictions']) ----> This will output the mean squared error.
#                                                                          ----> Do note that the first parameter must be the true value


# explained_variance_score(y_test, test_predictions) ---> Best score is 1.0.
#                                                    ---> It shows how much variance is explained by the model


#                                             [For Classification Task]

# model.predict_classes(X_test) ---> Shows the classisfications based on the X_test dataset.
# predictions = model.predict_classes(X_test)

# from sklearn.metrics import classification_report, confusion_matrix

# print(classification_report(y_test, predictions)) ---> This report takes in the true value of y and compare it with Predictions

# print(confusion_matrix(y_test, predictions)) ---> To check on the performance of the model.



#                                          Predicting a 'New' item

In [None]:

# input_a_name = df.drop('Column to be predicted', axis = 1).iloc[0] ---> This process is purely to check on the existing set. 
#                                                                         If it is a real data, don't do this

# inputted_name = inputted_name.values.reshape(-1, Number of features)  ---> Converts to array. 
#                                                                            The -1 means keep old dimensions along axis

# inputted_name = scaler.transform(inputted_name) ---> To scale it

# model.predict(inputted_name) ---> Returns the prediction done by the model Eg; Price of a house

#                                               Saving the model

In [None]:
# from tensorflow.keras.models import load_model
# model.save('Input name.h5')                         ----> Need to put the h5 inside
# model_to_load = load_model('The saved model's name) ----> This is to load the model, the model_to_load is just an example