# Starting Off

How does sklearn utilize numpy?

# How to build a Deep Neural Network with Python

## Model building steps:

1. Specify Architecture

2. Compile

3. Fit 

4. Predict

## Specify the architecture

In [None]:
#instantiate the model
model = Sequential()
#create the input layer
model.add(Dense(100, activation='relu', input_shape = (n_cols,)))
#add one hidden layer
model.add(Dense(100, activation='relu'))
#add the final layer
model.add(Dense(1))

## Compiling a model 

- Specify the optimizer
    - Many options and mathematically complex
    - “Adam” is usually a good choice 
- Loss function
    - “mean_squared_error” common for regression

In [None]:
model.compile(optimizer='adam', loss='mean_squared_error')

## Fitting a model

- Applying backpropagation and gradient descent with your data to update the weights
- Scaling data before fi!ing can ease optimization

In [None]:
# Train neural network
model.fit(features, # Features
                      target, # Target
                      epochs=15, # Number of epochs
                      verbose=2, # Some output
                      batch_size=100, # Number of observations per batch
                      validation_data=(X_test, y_test)) # Data for evaluation

In [None]:
model.predict(features)

## Applied:Create a Regression NN

In [None]:
# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential
import pandas as pd

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/learn-co-students/nyc-mhtn-ds-042219-lectures/master/Module_4/kc_feat_engineering_project_revamp/kc_housing_data_for_feat_engineering_lab.csv', index_col = 0)

In [None]:
df.head()

In [None]:
features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot',
       'floors', 'waterfront', 'view', 'condition', 'grade', 'sqft_above',
       'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long',
       'sqft_living15', 'sqft_lot15', 'yr_old', 'since_sold',
       ]

In [None]:
y = df['price']
X = df[features]

In [None]:
len(features)

- Store the number of columns in the predictors data to `n_cols`. This has been done for you.
- Start by creating a `Sequential` model called `model`.
- Use the `.add()` method on `model` to add a `Dense` layer.
- Add 50 units, specify `activation='relu'`, and the `input_shape` parameter to be the tuple `(n_cols,)` which means it has `n_cols` items in each row of data, and any number of rows of data are acceptable as inputs.
- Add another `Dense` layer. This should have 32 units and a 'relu' activation.
- Finally, add an output layer, which is a `Dense` layer with a single node. Don't use any activation function here.

In [None]:
# Save the number of columns in predictors: n_cols
n_cols = len(features)

# Set up the model: model
model = ____

# Add the first layer
____.____(____(____, ____=____, ____=(____)))

# Add the second layer
____

# Add the output layer
____

- Compile the model using `model.compile()`. Your `optimizer` should be `'adam'` and the `loss` should be `'mean_squared_error'`.

In [None]:
# Compile the model
____

# Verify that model contains information from compiling
print("Loss function: " + model.loss)

- Fit the `model`. Remember that the first argument is the predictive features (`predictors`), and the data to be predicted (`target`) is the second argument.

In [None]:
# Fit the model
____

## Classification Models


- ‘categorical_crossentropy’ loss function Similar to log loss: Lower is be!er
- Add metrics = [‘accuracy’] to compile step for easy-to- understand diagnostics
- Output layer has separate node for each possible outcome, and uses ‘so"max’ activation

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/learn-co-students/nyc-mhtn-ds-042219-lectures/master/Module_4/cleaned_titanic.csv', index_col=0)
df.head()

In [None]:
predictors = df.drop(columns=['Survived'])
n_cols = predictors.shape[1]

- Convert `df.Survived` to a categorical variable using the `to_categorical()` function.

In [None]:
from keras.utils import to_categorical
# Convert the target to categorical: target
target = to_categorical(df.Survived)

In [None]:
# Import `train_test_split` from `sklearn.model_selection`
from sklearn.model_selection import train_test_split


# Split the data up in train and test sets
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.2, random_state=42)

In [None]:
n_cols

- Specify a `Sequential` model called `model`.
- Add a `Dense` layer with 32 nodes. Use `'relu'` as the `activation` and `(n_cols,)` as the `input_shape`.
- Add the `Dense` output layer. Because there are two outcomes, it should have 2 units, and because it is a classification model, the `activation` should be `'softmax'`.
- Compile the model, using `'sgd'` as the `optimizer`, `'categorical_crossentropy'` as the loss function, and `metrics=['accuracy']` to see the accuracy (what fraction of predictions were correct) at the end of each epoch.
- Fit the model using the `X_train` and the `y_train`.

In [None]:
from keras.utils import to_categorical

# Convert the target to categorical: target
target = ____

# Set up the model
model = ____

# Add the first layer
____

# Add the output layer
____

# Compile the model
____

# Fit the model
____

## Saving, reloading and using your Model

In [None]:
from keras.models import load_model
model.save('model_file.h5')
my_model = load_model('my_model.h5')
predictions = my_model.predict(data_to_predict_with)
probability_true = predictions[:,1]

- Create your predictions using the model's `.predict()` method on `X_test`.
- Use NumPy indexing to find the column corresponding to predicted probabilities of survival being True. This is the second column (index `1`) of `predictions`. Store the result in `predicted_prob_true` and print it.

In [None]:
# Calculate predictions: predictions
predictions = ____

# Calculate predicted probability of survival: predicted_prob_true
predicted_prob_true = ____

# print predicted_prob_true
print(predicted_prob_true)

## Verify your model structure

In [None]:
my_model.summary()

## Let's play with Hyperparameter tuning

[Google Playground](https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/playground-exercises)

## Hyperparameter Tuning 

    


- **Number of Hidden Layers**

*For many problems you can start with just one or two hidden layers it will work just fine. For more complex problems, you can gradually ramp up the number of hidden layers until your model starts to over fit. Very complex tasks, like image classification, will need dozens of layers.*


- **Number of Neurons per layer**

*The number of nuerons for the input and output layers are dependent on your data and the task. For hiddne layers, a common practice is to create a funnel with funnel with fewer and fewer neurons per layer.*

*In general, you will get more bang for your buck by adding on more layers than adding more neurons.*

- **[Activation Functions](https://towardsdatascience.com/exploring-activation-functions-for-neural-networks-73498da59b02)**
    - Linear
    - Sigmoid
    - Softmax
    - Tanh
    - ReLu
    - elu
    
*In most cases you can use the ReLu activation function (or one of its variants) in the hidden layers. For the output layer, the softmax activation function is generally good for multiclass problems and the sigmouid function for binary classificatin problems. For regression tasks, you can simply use no activation function at all*

- [Selecting an optimizer](https://www.dlology.com/blog/quick-notes-on-how-to-choose-optimizer-in-keras/)
    - Adam
    - SGD
    - RMSprop
    - Adagrad



- **Learning Rate**

*If you set it too low, training will eventually converge, but it will do so slowly.*
*If you set it too high, it might acutally diverge.*
*If you set it slightly too high, it will converge at first but miss the local optima.*


- **Regularization** 
    - **L1 and L2**
    - **Dropout:**
        
        *Dropout is most popular techniqure for deep neural networks. It is a fairly simple algorithm where at every training step, every neuron has a probability fo being teporarily "droppedout," meaning it will be completely ignored during this traing step, but it may be active during the next step.*
    
    - [Early Stopping](https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/)
    
    *Just interrupt training whne its performance on the validation set starts dropping*
    
    


[Paper on selecting hyperparameters](https://arxiv.org/pdf/1206.5533v2.pdf)

# Fitting a Model with Keras

## Import  Modules 

In [None]:
# Create first network with Keras
from keras.layers import Dense, Dropout, Activation
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras import regularizers
from keras.optimizers import SGD


## Define Model
Models in Keras are defined as a sequence of layers.

We create a Sequential model and add layers one at a time until we are happy with our network topology.

In [None]:
network = Sequential()

# Add a dropout layer for input layer
network.add(Dropout(0.2, input_shape=(n_cols,)))
# Add fully connected layer with a ReLU activation function
network.add(Dense(units=16, activation='relu'))
# Add a dropout layer for previous hidden layer
network.add(Dropout(0.25))
# Add fully connected layer with a ReLU activation function and L2 regularization
network.add(Dense(units=16, kernel_regularizer=regularizers.l2(0.01),activation='relu'))
#Final Layer
network.add(Dense(2, activation='softmax'))

[Using GridSearchCV to tune Neural Networks](https://chrisalbon.com/deep_learning/keras/tuning_neural_network_hyperparameters/)

## Compile model


In [None]:
network.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

[Keras Implementation of optimizers](https://keras.io/optimizers/)

[Impact of Learning Rate on MOdel Performance](https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/)

In [None]:
# Set callback functions to early stop training and save the best model so far
callbacks = [EarlyStopping(monitor='val_loss', patience=3),
             ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]

## Fit the Model

In [None]:
# Train neural network
history = network.fit(X_train, # Features
                      y_train, # Target
                      epochs=15, # Number of epochs
                      verbose=2, # Some output
                      batch_size=100, # Number of observations per batch
                      validation_data=(X_test, y_test)) # Data for evaluation

In [None]:
X_test.shape

## Evaluate the Model

In [None]:
score = network.evaluate(X_test, y_test, batch_size=128)


In [None]:
print("\n%s: %.2f%%" % (network.metrics_names[1], score[1]*100))

## Create predictions

In [None]:
# calculate predictions
predictions = network.predict(X_test)
# round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)

In [None]:
import matplotlib.pyplot as plt

# Get training and test loss histories
training_loss = history.history['loss']
test_loss = history.history['val_loss']

# Create count of the number of epochs
epoch_count = range(1, len(training_loss) + 1)

# Visualize loss history
plt.plot(epoch_count, training_loss, 'r--')
plt.plot(epoch_count, test_loss, 'b-')
plt.legend(['Training Loss', 'Test Loss'])
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show();

https://chrisalbon.com/deep_learning/keras/visualize_loss_history/

In [None]:
# Get training and test accuracy histories
training_accuracy = history.history['acc']
test_accuracy = history.history['val_acc']

# Create count of the number of epochs
epoch_count = range(1, len(training_accuracy) + 1)

# Visualize accuracy history
plt.plot(epoch_count, training_accuracy, 'r--')
plt.plot(epoch_count, test_accuracy, 'b-')
plt.legend(['Training Accuracy', 'Test Accuracy'])
plt.xlabel('Epoch')
plt.ylabel('Accuracy Score')
plt.show();

https://chrisalbon.com/deep_learning/keras/visualize_performance_history/

In [None]:




# calculate predictions
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)

## Resources 

http://neuralnetworksanddeeplearning.com/
    
http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

https://chrisalbon.com/deep_learning/keras/visualize_neural_network_architecture/