difference between autoencoder and RNN: https://datascience.stackexchange.com/questions/12219/difference-replicator-neural-network-vs-autoencoder?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

In [1]:
import pandas as pd
from collections import defaultdict

from keras.models import Model
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
from keras import backend as K

from sklearn.metrics import mean_squared_error

from utils import analyze_outlier, calculate_roc_auc,X, Y

Using TensorFlow backend.


In [2]:
X.shape

(387, 30)

In [3]:
input_dim = X.shape[1]

In [4]:
early_stop = EarlyStopping(monitor="loss", min_delta=0, patience=5, mode="auto")

In [5]:
def get_model():
    inp = Input(shape=(input_dim, ))
    encoded = Dense(input_dim//2, activation="relu")(inp)
    decoded = Dense(input_dim, activation="sigmoid")(encoded)
    
    model = Model(inputs=inp, outputs=decoded)
    model.compile(loss="binary_crossentropy", optimizer="adam")
    return model

In [6]:
model = get_model()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 30)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 15)                465       
_________________________________________________________________
dense_2 (Dense)              (None, 30)                480       
Total params: 945
Trainable params: 945
Non-trainable params: 0
_________________________________________________________________


In [7]:
hist = model.fit(X, X, epochs=500, callbacks=[early_stop], verbose=0)
final_loss = hist.history["loss"][-1]

In [8]:
pred = model.predict(X)

In [9]:
def calculate_outlier_factor(X, Y, pred):
    outlier_factors = defaultdict(dict)
    for i in range(X.shape[0]):
        outlier_factors[i]["OF"] = mean_squared_error(X[i], pred[i])
        outlier_factors[i]["Y"] = Y[i]
    return outlier_factors

In [10]:
outlier_factors = calculate_outlier_factor(X, Y, pred)

In [11]:
df = pd.DataFrame.from_dict(outlier_factors, orient="index")

In [12]:
analyze_outlier(df)

Within the top 30 ranked cases (ranked according to the Outlier Factor), 23 of the malignant cases (the outliers), comprising 76.66666666666667% of all malignant cases, were identified.


In [13]:
calculate_roc_auc(df["Y"], df["OF"])

ROC AUC score: 0.034733893557422964
