**Implementation of Softmax Regression (Classification) as Multi-layer Perceptron in TensorFlow 2**

This model is designed to classify attributes of the patterns / pattern languages or sequences to a set of the classes. Class with a highest probability is a prediction for a given sample in training dataset.

Because dataset of the attributes is still being consulted, sample dataset with 11 independent variables is provided. Dependent variable is a label, class. There are 10 classes in a dataset, classes 5, 6 and 7 are most present (some patterns are used more often than the others).

Currently these options are being considered:
- term frequencies (tf)
- inverse frequencies (tf-idf)
- probabilities (from our work Modelling of Organizational Pattern Sequences in Bayesian Network)
- Table 1 from section 3.3. in http://www2.fiit.stuba.sk/~vranic/pub/ExtractingRelations.pdf

<ins>Example how to interpret output from this Neural network</ins>:

Let's say prediction for a first row in our training dataset is a vector of values: (0.08568677, 0.09945365, 0.08751229, 0.09474804, 0.1098659 , 0.12171782, 0.10679027, 0.10450635, 0.10343555, 0.08628327). This means first row in dataset has been assigned to class 6 because of its highest probability (0.1211782). Class 6 can represent organizational pattern, organizational pattern language or sequence of organizational patterns.

Please note that prediction is correct if it is consistent with actual class. Accuracy of the model is one of the metrics used to evaluate this behavior.

For further reference see: https://d2l.ai/chapter_linear-networks/softmax-regression.html

In [4]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from tensorflow import keras

tf.keras.backend.clear_session()

# we can also use probabilities instead of n-gram frequencies (tf-idf or binary encoded)
frequencies = pd.read_csv('dataset.csv', sep = ';')
train, val, test = np.split(frequencies.sample(frac=1, random_state=42), [int(.6*len(frequencies)), int(.8*len(frequencies))])

# classes 5, 6 and 7 are most present
num_classes = np.bincount(train['pattern']) 

model = Sequential([
    Dense(11, activation='relu'),
    Dense(350, activation='relu'), 
    Dense(50, activation='relu'), 
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

y_train = train["pattern"]
y_test = test["pattern"]
del train["pattern"]
del test["pattern"]
del val["pattern"]

model.fit(train, y_train, epochs=10, 
          batch_size=250, verbose=1,
          validation_split=0.2)

results = model.evaluate(test, y_test, verbose = 1)
print('test loss, test acc:', results)

# some classes are present more than the others. Our task here would be to classify imbalanced data
# 1. we normalize them

scaler = StandardScaler()
train_features = scaler.fit_transform(train)
val_features = scaler.transform(val)
test_features = scaler.transform(test)

train_labels = np.array(y_train)
bool_train_labels = train_labels != 0

# 2. then we're able to visualize their distributions

pos_df = pd.DataFrame(train_features[ bool_train_labels], columns=train.columns)
neg_df = pd.DataFrame(train_features[~bool_train_labels], columns=train.columns)

sns.jointplot(x=pos_df['freq3'], y=pos_df['freq5'], kind='hex', xlim=(-5,5), ylim=(-5,5))

model.summary()
model.predict(train_features[:10])

KeyError: 'class'