# A Simple Introduction to Deep Learning

## Import the essential libraries.
1. While importing keras, there will be a message about Keras' current backend. 
2. The choices are 
    1. Tensorflow (by Google), 
    2. Theano (by LISA Lab at Université de Montréal). Theano is currently not under active development
    3. CNTK (by Microsoft)

In [None]:
import keras
import sklearn
from sklearn import preprocessing, metrics
import numpy as np
from matplotlib import pyplot as plt

## Data preprocess
1. Here, a publicly available benchmarking dataset is downloaded via keras. (Reference: https://keras.io/datasets/#boston-housing-price-regression-dataset)

2. Then sklearn's Standard Scalar function is used to scale the data by removing the mean and restrict each feature to a unit variance. That means, the mean is subtracted from each value and then it is divided by the standard deviation (Reference: https://stackoverflow.com/a/40853967/2374160)

In [None]:
def data_preprocess():
    # Load Data
    (xtrain, ytrain), (xtest, ytest) = keras.datasets.boston_housing.load_data()
    
    # Feature Scaling
    scaler = sklearn.preprocessing.StandardScaler()
    xtrain = scaler.fit_transform(xtrain)
    xtest  = scaler.transform(xtest)
    
    return (xtrain,ytrain,xtest,ytest)

## Build the ANN
1. ANN is first initialized as an object from the class Keras.models 
2. Then Layers are added. Initial input layer is automatically added while we define the first hidden layer.
3. Final layer is a single ouput neuron. linear activation is used since it is a regression scenario.
4. Finally the model is compiled with the information of loss value to compute and track, and the optimizer algorithm to follow while finding the local minima of loss function


In [None]:
def build_neuralnet():
    # initialize the model by 
    nnmodel = keras.models.Sequential() 
    
    nnmodel.add(keras.layers.Dense(64, activation='sigmoid', input_shape=(13,)))
    nnmodel.add(keras.layers.Dense(1, activation='linear'))
    
    nnmodel.compile(optimizer=keras.optimizers.Nadam(lr=0.001), loss='mae')
    return nnmodel

# Train the model
1. This function requires training data (xtrain,ytrain), features of test data (xtest), and the un-trained ANN model
2. xtrain and ytrain are used to train the model (also termed as fitting). 
3. xtest is used to predict values for the test data. Noitice that the ytest - the original target values of test data - is not shown to the  ANN model while training.
4. nnmodel is the compiled ANN model object from build_neuralnet function defined above

In [None]:
def train(xtrain,ytrain,xtest,nnmodel):
    
    earlystop = keras.callbacks.EarlyStopping(patience=10,verbose=1)
    chkpt_mdl = keras.callbacks.ModelCheckpoint('best_model.h5',save_best_only=True,verbose=1)
    calbcks   = [earlystop,chkpt_mdl]
    
    nnhistory = nnmodel.fit(xtrain,ytrain,callbacks=calbcks,epochs=500,
                    validation_split=0.2,batch_size=16,verbose=1)
    
    best_nn   = keras.models.load_model('best_model.h5')
    ypredict  = best_nn.predict(xtest)
    
    return (ypredict,nnhistory)

## Optional Functions (But strongly recommended)
1. To plot the original and predicted target values of the test data against each other so as to compare
2. mae: mean absolute error, mse: mean squared error
3. Training data statistics are plotted in plot_history function to see the model training history. It is recommended to do so, to check for overfitting if callbacks are not used.

In [None]:
def plot_data(ytest,ypredict):
    plt.scatter(ytest,ypredict)
    plt.xlabel('True Values [1000$]')
    plt.ylabel('Predictions [1000$]')
    plt.axis('equal')
    plt.xlim(plt.xlim())
    plt.ylim(plt.ylim())
    _ = plt.plot([-100, 100], [-100, 100])
    plt.show()
    
    print "correlation r2 score is: "+str(sklearn.metrics.r2_score(ytest,ypredict))
    print "correlation mse value is: "+str(sklearn.metrics.mean_squared_error(ytest,ypredict))
    print "correlation mae value is: "+str(sklearn.metrics.mean_absolute_error(ytest,ypredict))
    
    
def plot_history(history):
  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Abs Error [1000$]')
  plt.plot(history.epoch, np.array(history.history['loss']),
           label='Train Loss')
  plt.plot(history.epoch, np.array(history.history['val_loss']),
           label = 'Val loss')
  plt.legend()
  plt.ylim([0, 50])
  plt.show()


## Now call everything!!
1. random seed is used as a random number generator reference. It is only for the reproducibility of exact same result in subsequent runs since the ANN models assign initial function values (weights, biases) randomly.
2. Keras graph is cleared to start with a fresh ANN graph, without any variables from previous runs. It is safe to run that line everytime especially while using python notebooks
3. The final final lines are calling the functions those were defined above. They are self explanatory.

In [None]:
np.random.seed(7)
keras.backend.clear_session()

xtrain,ytrain,xtest,ytest = data_preprocess()
nnmodel                   = build_neuralnet()
ypredict, nnhistory       = train(xtrain,ytrain,xtest,nnmodel)

plot_data(ytest,ypredict)
plot_history(nnhistory)

## Notes, Tips
1. When in doubt, X it. (X = Google, Bing, Yahoo, ...) 
2. Read the keras, sklearn documentations for specific function/class details - They are well maintained
3. If something is not available in documentations, most probably it will be there in Stackoverflow
4. If something is not available in Stackoverflow, ask your question in Stackoverflow (didn't see that coming?)
5. If no one answers your question in Stackoverflow, Google Search lists many more links
6. ANN modeling has many parameters to tune including
    1. Optimizer: type, learning rate, momentum, ..
    2. Loss function: mae, mse, cross_entropy, ...
    3. Layers: Number of layers, Neurons per layer, activation function, initializers, regularizers, ...
    4. ...
7. Data preprocessing is important. Focus on accumulating maximum amount of information, and Engineering features. 
    1. Principal Component Analysis
    2. Normalizer, Scaler,
    3. Model based feature selection
    4. ...