## Step 0: Import Python packages

Let start by importing some basic packages 

In [1]:
import pandas as pd
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

import numpy as np
import json

from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from google.cloud import bigquery
from witwidget.notebook.visualization import WitWidget, WitConfigBuilder


## Step 1: Download BigQuery data to our notebook


Click on the home button of you cloud console and fill in the values for your Project Number and Project ID. We are going to use te BigQuery Python SDK to query our Data Warehouse for the Natality dataset

In [2]:
import os

from google.cloud import bigquery

# Find your project number 
PROJECT_NUMBER = # ENTER PROJECT NUMBER HERE
PROJECT_ID = #ENTER PROJECT ID HERE 
client = bigquery.Client(project=PROJECT_NUMBER)

Using the python SDK, we can use SQL to retrieve our dataset and send it to a python data frame

In [3]:
query="""
SELECT
  weight_pounds,
  is_male,
  mother_age,
  plurality,
  gestation_weeks
FROM
  publicdata.samples.natality
WHERE year > 2000
LIMIT 10000
"""
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,weight_pounds,is_male,mother_age,plurality,gestation_weeks
0,7.568469,True,22,1,46.0
1,8.807467,True,39,1,42.0
2,8.313632,True,23,1,35.0
3,8.000575,False,27,1,40.0
4,6.563162,False,29,1,39.0


In [4]:
df.describe()

Unnamed: 0,weight_pounds,mother_age,plurality,gestation_weeks
count,9991.0,10000.0,10000.0,9888.0
mean,7.278609,27.3653,1.0303,38.681634
std,1.354406,6.235699,0.183808,2.622498
min,0.500449,12.0,1.0,19.0
25%,6.624891,22.0,1.0,38.0
50%,7.374463,27.0,1.0,39.0
75%,8.124034,32.0,1.0,40.0
max,12.936726,51.0,4.0,47.0


In [5]:
df['is_male'].value_counts()


True     5190
False    4810
Name: is_male, dtype: int64

## Step 2: Prepare the dataset for training

In order to save on time and go through the lab, we are going to do some very light preprocessing.

In [6]:
# drop nulls
df = df.dropna()

# shuffle is a module from sklearn utils 
df = shuffle(df, random_state=2) 

# grab our target column
labels = df['weight_pounds']

# grab our data
data = df.drop(columns=['weight_pounds'])

# one hot encoding
data['is_male'] = data['is_male'].astype(int)

In [7]:
data.head()

Unnamed: 0,is_male,mother_age,plurality,gestation_weeks
39,1,25,1,40.0
6130,0,20,1,40.0
5986,0,26,1,38.0
7683,0,23,1,41.0
4914,0,24,1,39.0


## Step 3: Split the data into train test sets

Once the data is prepped, the next step is to perform our train test split and define our model. We will be creating a Sequential Kera model with an RELU activation function

In [8]:
x,y = data,labels
x_train,x_test,y_train,y_test = train_test_split(x,y)

In [9]:
model = Sequential([
    Dense(64, activation='relu', input_shape=(len(x_train.iloc[0]),)),
    Dense(32, activation='relu'),
    Dense(1)]
)

2022-06-22 22:14:12.510804: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-06-22 22:14:12.510915: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-06-22 22:14:12.510964: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (89fb02f5e3b4): /proc/driver/nvidia/version does not exist
2022-06-22 22:14:12.512965: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in 

## Step 4: Train the model

Once we have defined the model, next step is to compile and train

In [10]:
model.compile(optimizer=tf.keras.optimizers.RMSprop(),
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=['mae', 'mse'])

In [11]:
model.fit(x_train, y_train, epochs=10, validation_split=0.1)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f51f07b1f50>

## Step 5: Evaluate the model 

We can grab a set of samples from our test dataset to run a quick evaluation on our predictions.

In [12]:
num_examples = 10
predictions = model.predict(x_test[:num_examples])



In [13]:
for i in range(num_examples):
    print('Predicted val: ', predictions[i][0])
    print('Actual val: ',y_test.iloc[i])
    print()

Predicted val:  6.6830378
Actual val:  6.3118345610599995

Predicted val:  6.485905
Actual val:  5.37486994756

Predicted val:  7.255404
Actual val:  6.9996768185

Predicted val:  7.660682
Actual val:  9.00808802532

Predicted val:  7.391691
Actual val:  7.43839671988

Predicted val:  7.0401845
Actual val:  6.2501051276999995

Predicted val:  6.839683
Actual val:  8.811876612139999

Predicted val:  7.0075073
Actual val:  5.3131405142

Predicted val:  6.839683
Actual val:  6.686620406459999

Predicted val:  6.7544136
Actual val:  7.29068700434



In [14]:
model

<keras.engine.sequential.Sequential at 0x7f5200128990>

## Step 6: Register and Deploy Model

In [15]:
import time
t = time.localtime()
current_time = time.strftime("%H-%M-%S", t)
print(current_time)

22-14-30


We are going to export our model artifact to our cloud storage bucket location. The artifacts will be stored in the export_dir variable

In [16]:
export_dir = '{}/export/natality_{}'.format('gs://'+PROJECT_ID, time.strftime("%Y%m%d-%H%M%S"))
print('Exporting to {}'.format(export_dir))
tf.saved_model.save(model, export_dir)

Exporting to gs://vertex-ai-demo-351514/export/natality_20220622-221431
INFO:tensorflow:Assets written to: gs://vertex-ai-demo-351514/export/natality_20220622-221431/assets


Once we have the model artifact stored, we are going use the SDK to upload the model to the Vertex AI registry. 

In [17]:
from google.cloud import aiplatform
my_model = aiplatform.Model.upload(display_name='natality-test'+current_time,
                                  artifact_uri=export_dir,
                                  serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest',
                                  project=PROJECT_ID)



Creating Model
Create Model backing LRO: projects/697349885677/locations/us-central1/models/8678423287803412480/operations/7203329002059071488
Model created. Resource name: projects/697349885677/locations/us-central1/models/8678423287803412480
To use this Model in another session:
model = aiplatform.Model('projects/697349885677/locations/us-central1/models/8678423287803412480')


If you go back to the Vertex AI Cloud Console, you will see that you now have created a model, let's deploy it to an endpoint

In [18]:
endpoint = my_model.deploy(
     deployed_model_display_name='my-endpoint',
     traffic_split={"0": 100},
     machine_type="n1-standard-4",
     accelerator_count=0,
     min_replica_count=1,
     max_replica_count=1,
   )

Creating Endpoint
Create Endpoint backing LRO: projects/697349885677/locations/us-central1/endpoints/3447602271775358976/operations/8000466136103649280
Endpoint created. Resource name: projects/697349885677/locations/us-central1/endpoints/3447602271775358976
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/697349885677/locations/us-central1/endpoints/3447602271775358976')
Deploying model to Endpoint : projects/697349885677/locations/us-central1/endpoints/3447602271775358976
Deploy Endpoint model backing LRO: projects/697349885677/locations/us-central1/endpoints/3447602271775358976/operations/8423804501076475904
Endpoint model deployed. Resource name: projects/697349885677/locations/us-central1/endpoints/3447602271775358976


In [43]:
num_examples

10

## Step 7. Call REST Endpoint 
Then, we convert our numpy data to type float32 and to a list. We convert to a list because numpy data is not JSON serializable so we can’t send it in the body of our request

In [28]:
results = endpoint.predict(instances=  np.asarray(x_test[:num_examples]).astype(np.float32).tolist() ).predictions

In [42]:
for i in range(len(results)):
    print('Predicted val: ', results[i][0])
    print('Actual val: ',y_test.iloc[i])
    print()

Predicted val:  6.68303728
Actual val:  6.3118345610599995

Predicted val:  6.48590517
Actual val:  5.37486994756

Predicted val:  7.25540447
Actual val:  6.9996768185

Predicted val:  7.6606822
Actual val:  9.00808802532

Predicted val:  7.39169121
Actual val:  7.43839671988

Predicted val:  7.04018354
Actual val:  6.2501051276999995

Predicted val:  6.83968306
Actual val:  8.811876612139999

Predicted val:  7.00750637
Actual val:  5.3131405142

Predicted val:  6.83968306
Actual val:  6.686620406459999

Predicted val:  6.75441408
Actual val:  7.29068700434

