<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---

## Model Submission Guide: Dates Classification Competition
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data using Sklearn Column Transformer/ Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Get data in and set up X_train, X_test, y_train objects

In [2]:
#install aimodelshare library
! pip install aimodelshare --upgrade

^C


In [3]:
# Get competition data
from aimodelshare import download_data
download_data('public.ecr.aws/i7c1w0k3/date_competition_data-repository:latest') 


Data downloaded successfully.


In [12]:
# Separate data into X_train, y_train, and X_test
import pandas as pd
y_train = pd.read_csv("date_competition_data/y_train.csv")
y_train_labels = y_train.idxmax(axis=1)

X_train = pd.read_csv("date_competition_data/X_train.csv")
X_test=pd.read_csv("date_competition_data/X_test.csv")

X_train.head()

Unnamed: 0,PERIMETER,MAJOR_AXIS,MINOR_AXIS,ECCENTRICITY,EQDIASQ,SOLIDITY,CONVEX_AREA,EXTENT,ASPECT_RATIO,ROUNDNESS,...,SkewRB,KurtosisRR,KurtosisRG,KurtosisRB,EntropyRR,EntropyRG,EntropyRB,ALLdaub4RR,ALLdaub4RG,ALLdaub4RB
0,2385.511,849.6287,548.4377,0.7638,681.0944,0.9874,368969,0.7596,1.5492,0.8045,...,0.3835,2.5413,2.701,4.7956,-45702240000.0,-31309285376,-31791349760,54.5422,46.1706,47.5687
1,2360.467,919.113,479.2314,0.8533,661.2138,0.9657,355591,0.6926,1.9179,0.7744,...,0.8084,13.2536,10.8915,5.0327,-6173153000.0,-10177111040,-13946608640,20.5354,27.7553,33.2095
2,1059.046,378.1419,270.6614,0.6983,319.2416,0.9883,80990,0.7521,1.3971,0.8968,...,-0.1599,2.2247,2.4509,2.2165,-9834114000.0,-9082079232,-8377428480,54.3915,53.0591,50.5
3,2279.7581,770.8477,654.0494,0.5292,708.2852,0.9945,396190,0.7632,1.1786,0.9527,...,1.6654,5.7796,9.1424,7.8056,-18411850000.0,-23242752000,-23591233536,34.5177,38.9835,38.4695
4,1843.916,647.9982,479.8817,0.672,556.0906,0.9906,245181,0.7818,1.3503,0.8977,...,-0.3694,2.4534,2.2759,2.5079,-46650240000.0,-40748339200,-32199612416,66.5225,62.6833,56.9092


##2.   Preprocess data using Sklearn / Write and Save Preprocessor function


In [13]:
# Simple Preprocessor with sklearn 

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

StandardScaler()

In [14]:
# Here is where we actually write the preprocessor function:
def preprocessor(data):
    preprocessed_data=scaler.transform(data)
    return preprocessed_data

In [15]:
# check shape of X data after preprocessing it using our new function
preprocessor(X_train).shape

(718, 33)

##3. Fit model on preprocessed data and save preprocessor function and model 


In [21]:
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Activation

model = Sequential()
model.add(Dense(16, input_dim=33, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(32, activation='relu'))

model.add(Dense(7, activation='softmax')) 
                                            
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

# Fitting the NN to the Training set
model.fit(preprocessor(X_train), y_train, 
               epochs = 15, validation_split=0.25) 

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x1a5cac7a5b0>

#### Save preprocessor function to local "preprocessor.zip" file

In [22]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


#### Save model to local ".onnx" file

In [25]:
import os
os.environ["PATH"] += os.pathsep + 'C:/Program Files/Graphviz/bin'

import pydot_ng as pydot#这个是测试代码
print (pydot.find_graphviz())#这个是测试代码

{'dot': 'C:/Program Files/Graphviz/bin\\dot.exe', 'twopi': 'C:/Program Files/Graphviz/bin\\twopi.exe', 'neato': 'C:/Program Files/Graphviz/bin\\neato.exe', 'circo': 'C:/Program Files/Graphviz/bin\\circo.exe', 'fdp': 'C:/Program Files/Graphviz/bin\\fdp.exe', 'sfdp': 'C:/Program Files/Graphviz/bin\\sfdp.exe'}


In [26]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx
import pydot

onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## 4. Generate predictions from X_test data and submit model to competition


In [27]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials

#This is the unique rest api that powers this class section's Date Classification Playground -- make sure to update the apiurl for new competition deployments
apiurl="https://jl70pxwaek.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=apiurl)

In [None]:
#Instantiate Competition
import aimodelshare as ai
mycompetition= ai.Competition(apiurl)

In [None]:
#Submit Model 1: 

#-- Generate predicted y values (Model 1)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

unexpected EOF while parsing (<unknown>, line 0)
unexpected EOF while parsing (<unknown>, line 0)


ValueError: malformed node or string: {'model_metadata': ''}

ValueError: malformed node or string: {'model_metadata': ''}

In [None]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

## 5. Repeat submission process to improve place on leaderboard


In [None]:
# Train and submit model 2 using same preprocessor (note that you could save a new preprocessor, but we will use the same one for this example).
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

model_2 = Sequential()
model_2.add(Dense(128, input_dim=33, activation='relu'))
model_2.add(Dropout(.3))
model_2.add(Dense(64, activation='relu'))
model_2.add(Dense(64, activation='relu'))
model_2.add(Dropout(.3))
model_2.add(Dense(64, activation='relu'))

model_2.add(Dense(7, activation='softmax')) 
                                            
# Compile model
model_2.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

# Fitting the NN to the Training set
model_2.fit(preprocessor(X_train), y_train, 
               epochs = 10, validation_split=0.25) 

Epoch 1/10
Epoch 1/10
Epoch 2/10
Epoch 2/10
Epoch 3/10
Epoch 3/10
Epoch 4/10
Epoch 4/10
Epoch 5/10
Epoch 5/10
Epoch 6/10
Epoch 6/10
Epoch 7/10
Epoch 7/10
Epoch 8/10
Epoch 8/10
Epoch 9/10
Epoch 9/10
Epoch 10/10
Epoch 10/10


<keras.callbacks.History at 0x7fd3003ca8e0>

<keras.callbacks.History at 0x7fd3003ca8e0>

In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model_2, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model2.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [None]:
#Submit Model 2: 

#-- Generate predicted y values (Model 2)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model_2.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model2.onnx",
                                 prediction_submission=prediction_labels,
                                 preprocessor_filepath="preprocessor.zip")

In [None]:
# Compare two or more models
data=mycompetition.compare_models([5,6], verbose=1)
mycompetition.stylize_compare(data)

## Optional: Tune model within range of hyperparameters with Keras Tuner

*Simple example shown below. Consult [documentation](https://keras.io/guides/keras_tuner/getting_started/) to see full functionality.*

In [None]:
! pip install keras_tuner

In [None]:
#Separate validation data 
from sklearn.model_selection import train_test_split
x_train_split, x_val, y_train_split, y_val = train_test_split(
     X_train, y_train, test_size=0.2, random_state=42)

In [None]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, BatchNormalization
from keras.regularizers import l1, l2, l1_l2
import keras_tuner as kt


#Define model structure & parameter search space with function
def build_model(hp):
    model = keras.Sequential()
    model.add(Dense(64, input_dim=33, activation='relu', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
    model.add(Dense(64, activation='relu', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
    model.add(Dense(units=hp.Int("units", min_value=32, max_value=512, step=32), #range 32-512 inclusive, minimum step between tested values is 32
                    activation='relu', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
    model.add(Dense(7, activation='softmax')) 
    model.compile(
        optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"],
    )
    return model

#initialize the tuner (which will search through parameters)
tuner = kt.RandomSearch(
    hypermodel=build_model, 
    objective="val_accuracy", # objective to optimize
    max_trials=3, #max number of trials to run during search
    executions_per_trial=3, #higher number reduces variance of results; guages model performance more accurately 
    overwrite=True,
    directory="tuning_model",
    project_name="tuning_units",
)

tuner.search(preprocessor(x_train_split), y_train_split, epochs=1, validation_data=(preprocessor(x_val), y_val))

In [None]:
# Build model with best hyperparameters

# Get the top 2 hyperparameters.
best_hps = tuner.get_best_hyperparameters(5)
# Build the model with the best hp.
tuned_model = build_model(best_hps[0])
# Fit with the entire dataset.
tuned_model.fit(x=preprocessor(X_train), y=y_train, epochs=5)

In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(tuned_model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("tuned_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [None]:
#Submit Model 3: 

#-- Generate predicted y values (Model 3)
prediction_column_index=tuned_model.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit to Competition Leaderboard
mycompetition.submit_model(model_filepath = "tuned_model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

In [None]:
# Get leaderboard

data = mycompetition.get_leaderboard()
mycompetition.stylize_leaderboard(data)

In [None]:
# Compare two or more models
data=mycompetition.compare_models([5, 6, 7], verbose=1)
mycompetition.stylize_compare(data)