### Wide Deep Model
Can be used for Classification and Regression (Google use it for Google App as Recommendation Algorethmus)


### 1 sparse feature vector
Sparse representation is used when feature vectors are expected to have a large percentage of zeros in them, as opposed to dense vectors. <br>
**For Example:** <br>
Subject = {computer_science, culture, math, etc} <br>
**One-Hot:** <br>
Subject = [ 1, 0, 0, 0 ] one represents computer_science<br>


### 2 Feature Multiplication
You can combine such feature vector with other information and represent them together as a matrix <br>

**+** very efficient way <br>

**-** you need to design it manually <br>

**-** overfitting, every feature will multiplicate with other features, this is a kind of "memory for all of information"


### 3 Dense feature vector
For the same example like Subject = {computer_science, culture, math, etc} you can also represent the information in following way (dense featrue vector), which will show you the distance between vectors:
<br>
computer_science = [ 0.3, 0.25, 0.1, 0.4 ] 
<br>
culture = [ 0.5, 0.2, 0.2, 0.1 ]
<br>
math = [ 0.33, 0.35, 0.1, 0.2 ]
<br>
etc = [ 0.4, 0.15, 0.7, 0.4 ]
<br> <br>
**3.1 Word2Vector**
<br>
use exactly this way this calculate the similarity of words. As result we got: 
<br>
man - women = king - queen
<br><br>
**+** it will also take the meaning of such things into consideration <br>
**+** compatible also with the information, which didn't appeared in training phase <br>
**+** less manually work <br>
**-** underfitting for example it recommend you something, what you don't really want

### 4 Wide Deep Model
![title](imag/wide_deep_learning.png)

### 5 Use wide deep model to predict california housing price

In [None]:
import tensorflow as tf
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd
import os
import sys
import time
from tensorflow import keras
from tensorflow.python.keras.callbacks import History
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

# Loading datasets
housing = fetch_california_housing()
# Split data into tran, test and validation
# by default train_test_split split data in 3: 1 -> train and test -> default param -> test_size = 0.25
x_train_all, x_test, y_train_all, y_test = train_test_split(housing.data, housing.target, random_state = 7)
x_train, x_valid, y_train, y_valid = train_test_split(x_train_all, y_train_all, random_state = 11)

print(x_train.shape, y_train.shape)
print(x_valid.shape, y_valid.shape)
print(x_test.shape, y_test.shape)

# Data normalization
# before normalization
print(np.max(x_train), np.min(x_train))

# perform normalization
scaler = StandardScaler()
# 1. data in x_train is int32, we need to convert them to float32 first 
# 2. convert x_train data from 
#    [None, 28, 28] -> [None, 784] 
#       -> after all reshape back to [None, 28, 28]
x_train_scaled = scaler.fit_transform(x_train)
x_valid_scaled = scaler.transform(x_valid)
x_test_scaled  = scaler.transform(x_test)

# after normalization
# print(np.max(x_train_scaled), np.min(x_train_scaled))

In [None]:
# in wide deep model we usualy use multiple inputs. for a simple example I just
# made from one dataset two subsets:
# the first one take first 6 columns and the second the last 6
input_wide = keras.layers.Input(shape=[5]) # wide part

input_deep = keras.layers.Input(shape=[6]) # beginn to build deep part
hidden1 = keras.layers.Dense(30, activation='relu')(input_deep)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)

# concate two parts together
concat = keras.layers.concatenate([input_wide, hidden2])
output = keras.layers.Dense(1)(concat)

# the same as multi-input, it is also possible to have a multi-output net
'''
output2 = keras.layers.Dense(1)(hidden2)
model = keras.models.Model(inputs = [input_wide, input_deep],
                           outputs = [output, output2])
'''
model = keras.models.Model(inputs = [input_wide, input_deep],
                           outputs = output)
model.summary()
# mean_squared_error make model as regression
model.compile(loss = "mean_squared_error", optimizer = "sgd", metrics = ["accuracy"])

callbacks = [
    keras.callbacks.EarlyStopping(patience = 5, min_delta = 1e-2)
]


In [None]:
# simulate input data as wide and deep

x_train_scaled_wide = x_train_scaled[:, :5]
x_train_scaled_deep = x_train_scaled[:, 2:]
x_test_scaled_wide = x_test_scaled[:, :5]
x_test_scaled_deep = x_test_scaled[:, 2:]
x_valid_scaled_wide = x_valid_scaled[:, :5]
x_valid_scaled_deep = x_valid_scaled[:, 2:]

history = model.fit([x_train_scaled_wide, x_train_scaled_deep], 
                    y_train, 
                    validation_data=([x_valid_scaled_wide, x_valid_scaled_deep], y_valid),
                    epochs = 100, 
                    callbacks = callbacks)

In [None]:
def plot_learning_curves(history: History):
    pd.DataFrame(history.history).plot(figsize = (8, 5))
    plt.grid(True)
    plt.gca().set_ylim(0, 1)
    plt.show()

plot_learning_curves(history)

In [None]:
test_loss, test_acc = model.evaluate([x_test_scaled_wide, x_test_scaled_deep], y_test)
# one_hot encoded results
predictions = model.predict([x_test_scaled_wide, x_test_scaled_deep])
index = 40

for indx in range(index):
    print(y_test[indx], predictions[indx])