<a href="https://colab.research.google.com/github/zjzsu2000/CMPE258/blob/master/Ungraded_assignment_5/2%EF%BC%89_TensorFlow_Redo2_basic_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Ref:
https://www.tensorflow.org/tutorials/keras/regression

# TensorFlow_Redo2_basic_regression

In [0]:
# Use seaborn for pairplot
!pip install seaborn

# Use some functions from tensorflow_docs
!pip install git+https://github.com/tensorflow/docs

In [0]:
import pathlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)

In [0]:
import tensorflow_docs as tfdocs
import tensorflow_docs.plots
import tensorflow_docs.modeling

## Using the Auto MPG dataset

The dataset is available from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/).


### Get the data


In [0]:
data_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
data_path

###Import it using pandas

In [0]:
columns = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
df = pd.read_csv(data_path, names=columns,na_values = "?", 
                       comment='\t',sep=" ", skipinitialspace=True)

data = df.copy()
data.head()

### Clean the data


In [0]:
data.isna().sum()

### Drop the rows is NaN.

In [0]:
data = data.dropna()

###Convert categorical column to a one-hot:

In [0]:
data['Origin'] = data['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})

In [0]:
data = pd.get_dummies(data, prefix='', prefix_sep='')
data.head()

### Split the data into train and test

In [0]:
train = data.sample(frac=0.8,random_state=0)
test = data.drop(train.index)

### Inspect the data


In [0]:
sns.pairplot(train[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")

Also look at the overall statistics:

In [0]:
train_stats = train.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_stats

### Split features from labels


In [0]:
train_labels = train.pop('MPG')
test_labels = test.pop('MPG')

### Normalize

In [0]:
def normalize(X):
  return (X - train_stats['mean']) / train_stats['std']

In [0]:
train_normed = normalize(train)
test_normed = normalize(test)

##Modeling

### Build the model
Using  `Sequential` model with two densely connected hidden layers, and an output layer that returns a single, continuous value. 

In [0]:
from keras.layers import Dense
from keras import Sequential
from keras.optimizers import RMSprop

In [0]:
def build_model():
  model = Sequential([
    Dense(64, activation='relu', input_shape=[len(train.keys())]),
    Dense(64, activation='relu'),
    Dense(1)
  ])

  model.compile(loss='mse',optimizer=RMSprop(0.001),metrics=['mae', 'mse'])
  return model

In [0]:
model = build_model()

### Inspect the model


In [0]:
model.summary()

## Take a batch of `10` examples from the training data and call `model.predict` on it.

In [0]:
batch = train_normed[:10]
result = model.predict(batch)
result

### Train the model

Train model for 1000 epochs.

In [0]:
history = model.fit(train_normed, train_labels, epochs=1000, 
                    validation_split = 0.2, verbose=0,
                    callbacks=[tfdocs.modeling.EpochDots()])

##Visualize the model's training progress using the stats stored in the `history` .

In [0]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch

In [0]:
hist.head()

In [0]:
plotter = tfdocs.plots.HistoryPlotter(smoothing_std=2)

In [0]:
plotter.plot({'Basic': history}, metric = "mae")
plt.ylim([0, 10])
plt.ylabel('MAE [MPG]')

In [0]:
plotter.plot({'Basic': history}, metric = "mse")
plt.ylim([0, 20])
plt.ylabel('MSE [MPG^2]')

##Update the `model.fit` call to automatically stop training when the validation score doesn't improve.

In [0]:
model = build_model()

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

early_history = model.fit(train_normed, train_labels, 
                    epochs=1000, validation_split = 0.2, verbose=0, 
                    callbacks=[early_stop, tfdocs.modeling.EpochDots()])

In [0]:
plotter.plot({'Early Stopping': early_history}, metric = "mae")
plt.ylim([0, 10])
plt.ylabel('MAE [MPG]')

## Using the **test** set

In [0]:
loss, mae, mse = model.evaluate(test_normed, test_labels, verbose=2)

print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))

### Make predictions

Predict MPG values using test dataset:

In [0]:
test_preds = model.predict(test_normed).flatten()




In [0]:
a = plt.axes(aspect='equal')
plt.scatter(test_labels, test_preds)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
lims = [0, 50]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)

It looks like our model predicts well.

#### The error distribution.

In [0]:
error = test_preds - test_labels
plt.hist(error, bins = 25)
plt.xlabel("Prediction Error [MPG]")
_ = plt.ylabel("Count")

It's not quite gaussian, because the number of samples is very small.