# Jane Street Market Prediction (#5.2)
## Autoencoder

Loaded by [#5.3](https://www.kaggle.com/wendellavila/janestreet-dimensionality-reduction) & [#6](https://www.kaggle.com/wendellavila/janestreet-ensemble)

Notebook Navigation<br>
[All](https://www.kaggle.com/wendellavila/janestreet-index/) | [#1](https://www.kaggle.com/wendellavila/janestreet-model-selection/) | [#2.1](https://www.kaggle.com/wendellavila/janestreet-preprocessing-selection) | [#2.2](https://www.kaggle.com/wendellavila/janestreet-data-preprocessing) | [#3](https://www.kaggle.com/wendellavila/janestreet-regularization-selection) | [#4.1](https://www.kaggle.com/wendellavila/janestreet-hyperparameter-tuning) | [#4.2](https://www.kaggle.com/wendellavila/janestreet-hyperparameter-evaluation) | [#5.1](https://www.kaggle.com/wendellavila/janestreet-pca) | [#5.2](https://www.kaggle.com/wendellavila/janestreet-autoencoder) | [#5.3](https://www.kaggle.com/wendellavila/janestreet-dimensionality-reduction-evaluation) |[#6](https://www.kaggle.com/wendellavila/janestreet-ensemble)

## Imports

In [1]:
#import janestreet
import os
import glob
import IPython

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns', 150)
%matplotlib inline

import tensorflow as tf
from tensorflow import keras
from keras import layers
from keras.models import load_model
from keras.callbacks import Callback, ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

import sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

## Loading data

In [2]:
#data loaded from another notebook
#already preprocessed and downsized for faster loading
#https://www.kaggle.com/code/wendellavila/janestreet-data-preprocessing/
train_data = pd.read_pickle('../input/janestreet-data-preprocessing/train-mean-indicator.pkl')
features = [col for col in train_data.columns if 'feature' in col]
resp_cols = [col for col in train_data.columns if 'resp' in col]

## Deep Autoencoder

### Defining Autoencoder

In [3]:
def create_autoencoder(num_input,num_output,noise=0.05):
    i = layers.Input(num_input)
    encoded = layers.BatchNormalization()(i)
    encoded = layers.GaussianNoise(noise)(encoded)
    encoded = layers.Dense(64,activation='relu')(encoded)
    decoded = layers.Dropout(0.2)(encoded)
    decoded = layers.Dense(num_input,name='decoded')(decoded)
    x = layers.Dense(32,activation='relu')(decoded)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(0.2)(x)
    x = layers.Dense(num_output,activation='sigmoid',name='label_output')(x)
    
    encoder = tf.keras.models.Model(inputs=i,outputs=decoded)
    autoencoder = tf.keras.models.Model(inputs=i,outputs=[decoded,x])

    autoencoder.compile(optimizer=tf.keras.optimizers.Adam(0.001),
                        loss={'decoded':'mse','label_output':'binary_crossentropy'})
    return autoencoder, encoder

### Training Autoencoder

In [4]:
X_train = train_data.loc[:, features].values    
y_train = np.stack([(train_data[col] > 0).astype('int') for col in resp_cols]).T #Multitarget
del train_data

In [5]:
autoencoder, encoder = create_autoencoder(len(features),len(resp_cols),noise=0.1)
   
autoencoder.fit(X_train,(X_train,y_train),epochs=1000,batch_size=4096, validation_split=0.1,
                callbacks=[EarlyStopping('val_loss',patience=12,restore_best_weights=True)])
encoder.save_weights('./encoder.h5')

2022-08-02 00:28:26.270329: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-02 00:28:26.417210: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-02 00:28:26.418036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-02 00:28:26.419800: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compil

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
