## Exercise 3.6 Making predictions


In this exercise, you will write code to use the trained model to make predictions on new data.

You can download the titanic dataset from the following link:

[titanic_all_numeric.csv](https://drive.google.com/file/d/11nuYS-l3EXCsGJt81y4YTt3oTnFGaB68/view?usp=drive_link)

The data is pre-loaded into a pandas DataFrame called `df`. We will divide our data into two subsets: the first one (800 rows) for training and the second one (91 rows) for predictions using the trained model.

The trained network from your previous coding exercise is now stored as model. New data to make predictions is stored in a NumPy array as `pred_data`. Use model to make predictions on your new data.

In this exercise, your predictions will be probabilities, which is the most common way for data scientists to communicate their predictions to colleagues.

## Instructions

* Create your predictions using the model's `predict()` method on `pred_data`.
* Use NumPy indexing to find the column corresponding to predicted probabilities of survival being `True`. This is the second column (index `1`) of `predictions`. Store the result in `predicted_prob_true` and print it.

## Code

Load data and convert the data to NumPy array:

In [1]:
import numpy as np
import pandas as pd

# Load csv file into the dataframe: df
df = pd.read_csv("titanic_all_numeric.csv")

# Convert the boolean values of the 'age_was_missing' column to integer
df.age_was_missing = df.age_was_missing.replace({True: 1, False: 0})

# The dataframe df has 891 rows, we will divide df into two parts
# The first 800 rows are used to create the predictors for training the model
# Other 91 rows are used to create the pred_data for making predictions with the model
trainDF = df.iloc[:800,:]
predictDF = df.iloc[800:,:]
print(df.shape)
print(trainDF.shape)
print(predictDF.shape)

(891, 11)
(800, 11)
(91, 11)


  df.age_was_missing = df.age_was_missing.replace({True: 1, False: 0})


In [2]:
# Import necessary modules
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical

# Create predictors NumPy array: predictors
predictors = trainDF.drop(['survived'], axis=1).values

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]

# Convert the target to categorical: target
target = to_categorical(trainDF['survived'])

# Create data for predictions NumPy array: pred_data
pred_data = predictDF.drop(['survived'], axis=1).values


Create the neural network, then compile and fit the model

In [3]:
# Specify, compile, and fit the model
model = Sequential()
model.add(Dense(32, activation='relu', input_shape = (n_cols,)))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='sgd',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(predictors, target)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.5903 - loss: 2.8660


<keras.src.callbacks.history.History at 0x78c44f9c2360>

Create the predictions using the trained model

In [5]:
# Calculate predictions: predictions
predictions = model.predict(pred_data)

# Calculate predicted probability of survival: predicted_prob_true
predicted_prob_true = predictions[:, 1]

# Print predicted_prob_true
print(predicted_prob_true)

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[0.26519445 0.46989194 0.9876191  0.77379566 0.29167166 0.25303042
 0.02665492 0.4115497  0.20730023 0.68888015 0.32047388 0.42087317
 0.21775514 0.636439   0.26331177 0.04876453 0.33265227 0.6914919
 0.12300608 0.64746636 0.86695045 0.31962383 0.02933931 0.3722523
 0.7288241  0.23026301 0.84713846 0.8959384  0.2415219  0.7246822
 0.6429563  0.7134833  0.25173488 0.363336   0.4566754  0.94176817
 0.40879738 0.22866653 0.86461085 0.56666934 0.4104501  0.50963134
 0.6121427  0.22760515 0.49105775 0.14078562 0.5710162  0.24228072
 0.67025435 0.9341604  0.6582529  0.01118174 0.7013645  0.7909924
 0.30993307 0.45851374 0.9952606  0.23011896 0.5932515  0.25173488
 0.2218427  0.42308939 0.23082341 0.53610903 0.40736398 0.15770735
 0.38633144 0.7612075  0.24924855 0.64703035 0.32061517 0.5546202
 0.1344103  0.11735877 0.5372815  0.51131594 0.45409548 0.42556775
 0.22654952 0.8300265  0.60333604 0.23113123 0.3971958  0.3041

The ouput should be:

3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step

[0.67805415 0.67400146 0.9999853  0.3875375  0.5985795  0.6164698
 0.89714897 0.5290434  0.694053   0.883478   0.5992863  0.7111521
 0.670607   0.6320936  0.6159816  0.85379726 0.54823047 0.7421135
 0.6533071  0.54122615 0.97659034 0.60646343 0.8911979  0.6048734
 0.88516706 0.62198186 0.91774607 0.9476062  0.62564427 0.93983597
 0.50025886 0.71690863 0.59585077 0.58837867 0.5724051  0.9671306
 0.5849963  0.61918813 0.9129218  0.784304   0.57787883 0.6095583
 0.75754553 0.60315263 0.5694081  0.65394723 0.7943384  0.6117081
 0.77466637 0.9892311  0.6483037  0.77982235 0.47878534 0.8670235
 0.6792838  0.55752456 0.999963   0.79640836 0.6378785  0.59585077
 0.588558   0.58213407 0.7576602  0.80326736 0.6440964  0.6678745
 0.5529083  0.89298415 0.62583524 0.46835485 0.59932536 0.8583759
 0.6688276  0.67672044 0.6243774  0.49477854 0.5868015  0.5741109
 0.61847866 0.9318947  0.7129401  0.6240244  0.55705667 0.6472381
 0.591772   0.7761354  0.6546704  0.77318966 0.6351063  0.7619014
 0.62768227]
