<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---

## Model Submission Guide: Titanic Survival Competition
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data using Sklearn Column Transformer/ Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Get data in and set up X_train, X_test, y_train objects

In [1]:
#install aimodelshare library
# !pip install aimodelshare --upgrade
# !pip install pydot

In [3]:
# Get competition data
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/titanic_competition_data-repository:v1') 

Downloading [>                                                ]

Data downloaded successfully.


In [4]:
# Separate data into X_train, y_train, and X_test
import pandas as pd
training_data=pd.read_csv("titanic_competition_data/training_data.csv")
y_train_labels = training_data['survived']
X_train = training_data.drop(['survived'], axis=1)

X_test=pd.read_csv("titanic_competition_data/test_data.csv")

X_train.head()

Unnamed: 0,pclass,sex,age,fare,embarked
0,3,male,28.0,7.25,S
1,3,female,26.0,16.1,S
2,3,female,47.0,7.0,S
3,2,male,57.0,12.35,Q
4,3,female,37.0,9.5875,S


## 2.   Preprocess data using Sklearn Column Transformer/ Write and Save Preprocessor function


In [5]:
# In this case we use Sklearn's Column transformer in our preprocessor function

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

#Preprocess data using sklearn's Column Transformer approach

# We create the preprocessing pipelines for both numeric and categorical data.
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')), #'imputer' names the step
    ('scaler', StandardScaler())])

categorical_features = ['embarked', 'sex', 'pclass']

# Replacing missing values with Modal value and then one-hot encoding.
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

# Final preprocessor object set up with ColumnTransformer...

preprocess = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# fit preprocessor to your data
preprocess = preprocess.fit(X_train)

In [6]:
# Here is where we actually write the preprocessor function:

# Write function to transform data with preprocessor 
# In this case we use sklearn's Column transformer in our preprocessor function

def preprocessor(data):
    preprocessed_data=preprocess.transform(data)
    return preprocessed_data

In [7]:
# check shape of X data after preprocessing it using our new function
preprocessor(X_train).shape

(1047, 10)

## 3. Fit model on preprocessed data and save preprocessor function and model 


In [8]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(C=10, penalty='l1', solver = 'liblinear')
model.fit(preprocessor(X_train), y_train_labels) # Fitting to the training set.
model.score(preprocessor(X_train), y_train_labels) # Fit score, 0-1 scale.

0.778414517669532

#### Save preprocessor function to local "preprocessor.zip" file

In [9]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


#### Save model to local ".onnx" file

In [10]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

# Check how many preprocessed input features are there?
from skl2onnx.common.data_types import FloatTensorType

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  #Insert correct number of preprocessed features

onnx_model = model_to_onnx(model, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## 4. Generate predictions from X_test data and submit model to competition


In [11]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials

#This is the unique rest api that powers this Titanic Survival Playground -- make sure to update the apiurl for new competition deployments
apiurl="https://mzq4b9vwq5.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=apiurl)

AI Modelshare Username: ············
AI Modelshare Password: ···············


AI Model Share login credentials set successfully.


In [12]:
#Instantiate Competition
import aimodelshare as ai
mycompetition= ai.Competition(apiurl)

In [12]:
#Submit Model 1: 

#-- Generate predicted values (a list of predicted labels "survived" or "died") (Model 1)
prediction_labels = model.predict(preprocessor(X_test))

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 10

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1656


In [13]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,dropout_layers,dense_layers,relu_act,softmax_act,loss,optimizer,memory_size,username,version
0,78.63%,77.36%,77.36%,77.36%,sklearn,False,False,LogisticRegression,,10.0,,,,,,liblinear,,AIModelShare,1
1,76.34%,73.01%,76.59%,72.06%,sklearn,False,False,LogisticRegression,,10.0,,,,,,lbfgs,,AIModelShare,2
2,57.25%,54.72%,54.72%,54.72%,sklearn,False,False,GradientBoostingClassifier,,,,,,,,,,AIModelShare,6
3,56.49%,53.73%,53.75%,53.72%,keras,False,True,Sequential,4.0,28450.0,,4.0,3.0,1.0,str,RMSprop,2102896.0,AIModelShare,9
4,62.60%,40.35%,81.15%,51.00%,keras,False,True,Sequential,7.0,18114.0,2.0,5.0,4.0,1.0,str,SGD,1560216.0,AIModelShare,8
5,56.49%,51.50%,52.03%,51.80%,sklearn,False,False,RandomForestClassifier,,,,,,,,,,AIModelShare,5
6,54.96%,51.71%,51.76%,51.72%,sklearn,False,False,LogisticRegression,,10.0,,,,,,liblinear,,AIModelShare,3
7,54.96%,51.71%,51.76%,51.72%,sklearn,False,False,LogisticRegression,,10.0,,,,,,liblinear,,newusertest,10
8,60.31%,43.80%,50.99%,50.30%,keras,False,True,Sequential,4.0,9154.0,,4.0,3.0,1.0,str,SGD,1323608.0,AIModelShare,7
9,54.96%,47.28%,48.24%,48.65%,sklearn,False,False,LogisticRegression,,10.0,,,,,,lbfgs,,AIModelShare,4


## 5. Repeat submission process to improve place on leaderboard


In [13]:
# Train and submit model 2 using same preprocessor (note that you could save a new preprocessor, but we will use the same one for this example).
from sklearn.linear_model import LogisticRegression

model_2 = LogisticRegression(C=.01, penalty='l2')
model_2.fit(preprocessor(X_train), y_train_labels) # Fitting to the training set.
model_2.score(preprocessor(X_train), y_train_labels) # Fit score, 0-1 scale.

0.775549188156638

In [14]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Insert correct number of preprocessed features

onnx_model = model_to_onnx(model_2, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model2.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [22]:
#Submit Model 2: 

#-- Generate predicted y values (Model 2)
prediction_labels = model_2.predict(preprocessor(X_test))

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model2.onnx",
                                 prediction_submission=prediction_labels,
                                 preprocessor_filepath="preprocessor.zip")

100% [................................................................................] 5919 / 5919

Insert search tags to help users find your model (optional):  x
Provide any useful notes about your model (optional):  x



Your model has been submitted as model version 56

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1656


In [26]:
# Compare two or more models
data=mycompetition.compare_models([1,2], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_1,model_version_2
0,C,1.000000,10,0.010000
1,class_weight,,,
2,dual,False,False,False
3,fit_intercept,True,True,True
4,intercept_scaling,1,1,1
5,l1_ratio,,,
6,max_iter,100,100,100
7,multi_class,auto,auto,auto
8,n_jobs,,,
9,penalty,l2,l1,l2







In [31]:
# Submit a third model using GridSearchCV

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
import numpy as np

param_grid = {'n_estimators': np.arange(100, 300, 50),'max_depth':[1, 3, 5]} #np.arange creates sequence of numbers for each k value

gridmodel = GridSearchCV(RandomForestClassifier(), param_grid=param_grid, cv=10)

#use meta model methods to fit score and predict model:
gridmodel.fit(preprocessor(X_train), y_train_labels)

#extract best score and parameter by calling objects "best_score_" and "best_params_"
print("best mean cross-validation score: {:.3f}".format(gridmodel.best_score_))
print("best parameters: {}".format(gridmodel.best_params_))


best mean cross-validation score: 0.799
best parameters: {'max_depth': 5, 'n_estimators': 100}


In [32]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Insert correct number of preprocessed features

onnx_model = model_to_onnx(gridmodel, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("gridmodel.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [33]:
#Submit Model 3: 

#-- Generate predicted values
prediction_labels = gridmodel.predict(preprocessor(X_test))

# Submit to Competition Leaderboard
mycompetition.submit_model(model_filepath = "gridmodel.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

100% [................................................................................] 5919 / 5919

Insert search tags to help users find your model (optional):  x
Provide any useful notes about your model (optional):  



Your model has been submitted as model version 57

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1656


In [34]:
# Get leaderboard

data = mycompetition.get_leaderboard()
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,deep_learning,model_type,depth,num_params,globalmaxpooling2d_layers,inputlayer_layers,upsampling2d_layers,dropout_layers,conv2dtranspose_layers,maxpooling2d_layers,dense_layers,conv2d_layers,reshape_layers,relu_act,softmax_act,loss,optimizer,memory_size,username,version
0,78.63%,77.36%,77.36%,77.36%,sklearn,,LogisticRegression,,10.0,,,,,,,,,,,,,liblinear,,AIModelShare,1
1,76.34%,73.01%,76.59%,72.06%,sklearn,,LogisticRegression,,10.0,,,,,,,,,,,,,lbfgs,,AIModelShare,2
2,58.02%,55.35%,55.38%,55.33%,sklearn,,GradientBoostingClassifier,,,,,,,,,,,,,,,,,itareque,23
3,58.02%,55.35%,55.38%,55.33%,sklearn,,GradientBoostingClassifier,,,,,,,,,,,,,,,,,itareque,25
4,57.25%,54.72%,54.72%,54.72%,sklearn,,GradientBoostingClassifier,,,,,,,,,,,,,,,,,qx2210,21
5,57.25%,54.72%,54.72%,54.72%,sklearn,,GradientBoostingClassifier,,,,,,,,,,,,,,,,,AIModelShare,6
6,57.25%,54.72%,54.72%,54.72%,sklearn,,GradientBoostingClassifier,,,,,,,,,,,,,,,,,Catherine_Xie,45
7,56.49%,54.41%,54.39%,54.48%,sklearn,,KNeighborsClassifier,,,,,,,,,,,,,,,,,breitnermak,44
8,56.49%,53.73%,53.75%,53.72%,keras,True,Sequential,4.0,28450.0,,,,,,,4.0,,,3.0,1.0,str,RMSprop,2102896.0,AIModelShare,9
9,58.02%,53.21%,53.86%,53.42%,sklearn,,RandomForestClassifier,,,,,,,,,,,,,,,,,Catherine_Xie,43


In [35]:
# Compare two or more models
data=mycompetition.compare_models([2,3], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_2,model_version_3
0,C,1.000000,0.010000,10
1,class_weight,,,
2,dual,False,False,False
3,fit_intercept,True,True,True
4,intercept_scaling,1,1,1
5,l1_ratio,,,
6,max_iter,100,100,100
7,multi_class,auto,auto,auto
8,n_jobs,,,
9,penalty,l2,l2,l1







In [36]:
# Here are several classic ML architectures you can consider choosing from to experiment with next:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import GradientBoostingClassifier

#Example code to fit model:
model = GradientBoostingClassifier(n_estimators=50, learning_rate=1.0,
    max_depth=1, random_state=0).fit(preprocessor(X_train), y_train_labels)
model.score(preprocessor(X_train), y_train_labels)

# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=preprocessor(X_test).shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Insert correct number of preprocessed features

onnx_model = model_to_onnx(model, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

#-- Generate predicted values (a list of predicted labels "real" or "fake")
prediction_labels = model.predict(preprocessor(X_test))

# Submit model to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)


100% [................................................................................] 5919 / 5919

Insert search tags to help users find your model (optional):  
Provide any useful notes about your model (optional):  



Your model has been submitted as model version 58

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1656
