# INNOV8: The Space Saga
This is the solution to the **Part 1: Decoding and Classifying Alien Commu-
nications** of the INNOV8: The Space Saga challenge. The solution is implemented in Python using TensorFlow and Scikit-learn libraries.

## 1. Importing Libraries
We first import essential libraries such as Pandas for data handling, Numpy for numerical operations, TensorFlow for building the neural network, and Scikit-learn utilities for feature extraction and preprocessing.


In [18]:
# Importing libraries
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.utils.class_weight import compute_class_weight

## 2. Loading and Pre-Processing Data
We load the training and test datasets, pre-process the categorical column `tail` by converting it into binary values, and apply vectorization to the text data (`message`). Additionally, numeric columns (`fingers` and `tail`) are scaled to ensure all features are on the same scale.

In [19]:
# Importing alien data and submission data
data = pd.read_csv("./data.csv")
submission = pd.read_csv("./test.csv")

# Converting the tail column into binary values
data['tail'] = data['tail'].apply(lambda x: 1 if x == 'yes' else 0)
submission['tail'] = submission['tail'].apply(lambda x: 1 if x == 'yes' else 0)

le = LabelEncoder()
scaler = StandardScaler()
vectoriser = TfidfVectorizer(ngram_range=(1, 2), min_df=2)

X_text = vectoriser.fit_transform(data['message']).toarray()
X_submission_text = vectoriser.transform(submission['message']).toarray()
X_numeric_scaled = scaler.fit_transform(data[['fingers', 'tail']].values)
X_submission_numeric_scaled = scaler.transform(submission[['fingers', 'tail']].values)

encoded_data = np.concatenate((X_text, X_numeric_scaled), axis=1)
encoded_species = le.fit_transform(data['species'])
encoded_submission = np.concatenate((X_submission_text, X_submission_numeric_scaled), axis=1)

## 3. Training the Neural Network
We define a feedforward neural network with two hidden layers, compile it using the RMSprop optimizer, and train the model on the preprocessed data. Class weights are computed to handle class imbalance, and early stopping is used to prevent overfitting.

In [None]:
X_train, y_train = encoded_data, encoded_species

class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(y_train), y=y_train)
class_weights_dict = dict(enumerate(class_weights))

model = tf.keras.Sequential([
    tf.keras.layers.Dense(512, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(256, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.001)),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(len(np.unique(y_train)), activation='softmax')
])

early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)

model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=45, batch_size=4096, callbacks=[early_stopping], verbose=2, class_weight=class_weights_dict)

## 4. Making Predictions
Once the model is trained, we use it to make predictions on the test dataset. The predicted species are then decoded using the label encoder and saved into a CSV file for submission.


In [None]:
predictions = model.predict(encoded_submission)
le.fit(data['species'])
predicted_species = le.inverse_transform(np.argmax(predictions, axis=1))

submission['species'] = predicted_species
submission[['species']].to_csv('result.csv', index=False)