Cosmic Classifier - IIT Roorkee Cognizance '25

Exploratory Data Analysis

Loading the Dataset :The CSV file (thermoracleTrain.csv) is loaded into a Pandas DataFrame named df.
The dataset contains 60000 rows and 11 columns with the Prediction column with values 0 to 9, being the target column.
Dataset Statistic Summary : Mean for all columns (except Prediction) is almost zero indicating the data is scaled. Two categorical columns are present - Magnetic Field Strength and Radiation Levels.

Dealing with Missing values in Numerical Columns

Rows where the Prediction column is Null are dropped from the dataset.
The rest of the missing values in numerical columns are filled with the respective medians.
The value counts of the predicted classes show that the data is fairly balanced.

Categorical Columns Labeling and Cleaning

The categorical columns- Magnetic Field Strength and Radiation Levels have values Category_1 to Category_20 which are mapped to corresponding values of 1 to 20 (treating them as levels) to numerical columns.
K-Nearest Neighbours Imputation is used to fill the missing values of the categorical columns.

Univariate and Bivariate Feature Analysis

Distributions of all features are plotted to check for skewness and understand its overall distributon.
Pairwise correlation of all the features in plotted in a correlation matrix as a part of bivariate analysis.
Surface Temperature and Mineral Abundance is seen to have strong positive correlation with the target column.

Noise Detection and Removal

Oulier Analysis and Removal

Rows with absolute z-scores <3 are retained and else dropped leaving 55778 rows.

Noise Detection and Elimination using Ensemble Techniques

In the ensemble-based noise detection method, a list of classifiers is chosen to use for predicting class labels on the dataset. These models are not pre-trained; rather, they are instantiatedand then used within a cross-validation framework to generate predictions for each instance.The ensemble method computes how often each instance is misclassified across all the classifiers. The more classifiers that disagree with the true label, the higher the noise score for that instance, suggesting that it might be mislabeled or noisy.
Top 20 percentile data based on noise scores are removed resulting in 50186 columns remaining.

Model Development and Training

We developed a custom neural network architecture for the classification task. Details of the same are as follows.

Neural Network Architecture for Cosmic Classification

This model uses a deep learning approach with residual connections to classify cosmic objects.

Residual Block Structure

Each residual block maintains a skip connection that allows gradients to flow through the network more efficiently.
Uses L2 regularization to prevent overfitting.
Implements BatchNormalization to stabilize learning.
Uses LeakyReLU activation for better gradient flow.
Includes dropout for additional regularization.

Network Design

Input Layer: Accepts 15 astronomical features.
Initial Feature Extraction: 256 neurons with regularization.
Residual Blocks: Four blocks with decreasing dimensions (256 → 192 → 128 → 64).
Output Layer: 10 classification categories using softmax activation.

Training Optimizations

✅ SGD Optimizer: Uses momentum and Nesterov acceleration.
✅ Adaptive Learning Rate: Reduces learning rate when improvement plateaus.
✅ Early Stopping: Prevents overfitting by monitoring validation accuracy.
✅ Data Augmentation: Applies random noise to training data for better generalization.

This architecture combines modern deep learning techniques like residual connections, regularization, and normalization to create a robust classifier for cosmic objects while minimizing overfitting.

Model Performance Analysis for Cosmic Classifier

This deep learning model demonstrates strong performance across all classes with an overall accuracy of 95.85%.

Performance Metrics

📌 Overall Accuracy: 96.50% across all 10 planet classifications
📌 Average Precision: 96.36% (macro)
📌 Average Recall: 96.33%
📌 Average F1-Score: 96.34%

Class-Specific Performance

Best Performing Class: Class 1 with 99.00% F1-score (99.60% recall, 98.42% precision)
Lowest Performing Class: Class 9 with 91.99% F1-score (92.09% recall, 91.89% precision)
Most Balanced Classes: Classes 0, 2, and 6 show nearly equal precision and recall

Error Analysis

Confusion Patterns: Class 9 has the most misclassifications, particularly with Class 4
False Positives: Most prevalent in Classes 8 and 9
False Negatives: Most common in Class 5 (8% of samples)

Model Balance

✅ The model maintains consistent performance across all classes with minimal variance in metrics.
✅ No significant bias toward any particular class, despite slight variations in support counts.
✅ Weighted averages closely match macro averages, indicating good class balance.

This balanced performance suggests that the model has successfully learned distinctive features for each planetary classification while maintaining generalizability. 🚀

Key Observations from Model Training

1. Steady Accuracy Improvement

The training accuracy consistently increases, reaching around 95%.
The validation accuracy surpasses training accuracy early and stabilizes around 96%.

2. Loss Reduction Over Time

The training loss starts high but steadily decreases, indicating effective learning.
The validation loss also decreases but remains slightly higher than training loss.

3. No Major Overfitting

Validation accuracy does not drop significantly, and validation loss does not increase at later epochs.
The small gap between training and validation loss suggests good generalization.

4. Model Convergence

The model continues improving without drastic fluctuations, suggesting stable training.

5. Potential for Early Stopping

If validation loss starts increasing after more epochs, early stopping can be applied to prevent overfitting.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
CosmicClassifierFinal[1].ipynb		CosmicClassifierFinal[1].ipynb
README.md		README.md
cosmicclassifierTest.csv		cosmicclassifierTest.csv
cosmicclassifierTraining.csv		cosmicclassifierTraining.csv
cosmictest.csv		cosmictest.csv
requirements.txt		requirements.txt
submission.csv		submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cosmic Classifier - IIT Roorkee Cognizance '25

Exploratory Data Analysis

Dealing with Missing values in Numerical Columns

Categorical Columns Labeling and Cleaning

Univariate and Bivariate Feature Analysis

Noise Detection and Removal

Oulier Analysis and Removal

Noise Detection and Elimination using Ensemble Techniques

Model Development and Training

Neural Network Architecture for Cosmic Classification

Residual Block Structure

Network Design

Training Optimizations

Model Performance Analysis for Cosmic Classifier

Performance Metrics

Class-Specific Performance

Error Analysis

Model Balance

Key Observations from Model Training

1. Steady Accuracy Improvement

2. Loss Reduction Over Time

3. No Major Overfitting

4. Model Convergence

5. Potential for Early Stopping

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

meoyushi/CosmicClassifier

Folders and files

Latest commit

History

Repository files navigation

Cosmic Classifier - IIT Roorkee Cognizance '25

Exploratory Data Analysis

Dealing with Missing values in Numerical Columns

Categorical Columns Labeling and Cleaning

Univariate and Bivariate Feature Analysis

Noise Detection and Removal

Oulier Analysis and Removal

Noise Detection and Elimination using Ensemble Techniques

Model Development and Training

Neural Network Architecture for Cosmic Classification

Residual Block Structure

Network Design

Training Optimizations

Model Performance Analysis for Cosmic Classifier

Performance Metrics

Class-Specific Performance

Error Analysis

Model Balance

Key Observations from Model Training

1. Steady Accuracy Improvement

2. Loss Reduction Over Time

3. No Major Overfitting

4. Model Convergence

5. Potential for Early Stopping

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages