In [1]:
import pickle
import json
import pandas as pd
from sklearn.externals import joblib



## Deserialize the model stuff

Now here's a pretty cool thing: we are going to predict on a new
observation with only the imports we have above. Notice that there's
nothing about scikit up there.

If you can remember from the previous notebook there were 3
things that we serialized so now we need to deserialize them.

Firstly, let's deserialize the columns from the super handy json format

In [2]:
with open('columns.json', 'r') as fh:
    columns = json.load(fh)

And now let's un-pickle the dtypes as well:

In [3]:
with open('dtypes.pickle', 'rb') as fh:
    dtypes = pickle.load(fh)

Finally let's get reload the pipeline

In [4]:
pipeline = joblib.load('pipeline.pickle')

Okay, now we've got everything that we need in order to make
a new prediction! Now all that's needed is the prediction itself
which should come in a json format.

## Deserialize and prep the observation

This is exactly as we saw in the previous notebook:

In [5]:
new_obs_str = '{"Age": 22.0, "Cabin": null, "Embarked": "S", "Fare": 7.25, "Parch": 0, "Pclass": 3, "Sex": "male", "SibSp": 1}'
new_obs_dict = json.loads(new_obs_str)
obs = pd.DataFrame([new_obs_dict], columns=columns)
obs = obs.astype(dtypes)

Which leaves us with a dataframe containing a single observation:

In [6]:
obs

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Fare,Cabin,Embarked
0,3,male,22.0,1,0,7.25,,S


Which in turn can be passed into predict_proba

In [7]:
outcome = pipeline.predict_proba(obs)
outcome

array([[0.90499549, 0.09500451]])

The outcome array has two entries for each observation in a binary classification task. 
For each observation the 0th entry corresponds to the probability of being in the negative class 
(*or 0 in our case*) while the 1st corresponds to the probability of being in the
*positive class (or 1 in our case)*.

So, if we want to know the liklihood of surviving, we need to look
at the probability of being in the positive class which you would
access like the following:

In [8]:
# there's only a single observation... so yeah
observation_index = 0
# This is the trick, go for the the positive class index
positive_class_index = 1
# You do indexing of numpy arrays a bit different than normal
# python arrays:
survival_probabilty = outcome[observation_index, positive_class_index]
print('the observation has {} probability of survival'.format(survival_probabilty))

the observation has 0.09500451145615225 probability of survival
