# 🏷️ Soil Image Classification with Autoencoder (Anomaly Detection)

## 📌 Overview
This project tackles soil image classification using an unsupervised anomaly detection approach. A convolutional autoencoder is trained solely on normal (label = 1) images to learn their structure. At test time, high reconstruction error is used to flag anomalous images.

---

## 📂 Dataset
- Source: Kaggle - Soil Classification Part 2
- Files:
  - `train_labels.csv`: image_id + label (1 = normal, 0 = anomaly)
  - `test_ids.csv`: image_id only
  - `train/`, `test/`: image directories

---

## 🧹 Data Processing
- Only label = 1 images used for training.
- Images resized to 128×128 and converted to tensors using torchvision transforms.
- Loaded using custom PyTorch Dataset + DataLoader.

---

## 🧠 Model Architecture
A convolutional autoencoder:
- **Encoder**: 3 conv layers to compress input
- **Decoder**: 3 deconv layers to reconstruct the image

```text
Input → Conv → ReLU → Conv → ReLU → Conv → ReLU
     → Deconv → ReLU → Deconv → ReLU → Deconv → Sigmoid → Output


## 🏋️ Training

The autoencoder is trained to reconstruct only normal images (`label = 1`). It minimizes the pixel-wise **Mean Squared Error (MSE)** between input and output.

- **Loss Function**: `nn.MSELoss()`
- **Optimizer**: `torch.optim.Adam` with learning rate `1e-3`
- **Epochs**: 20
- **Batch Size**: 64

## 🧪 Evaluation Strategy

After training, the model reconstructs test images. For each image, a reconstruction error is computed by comparing the original and reconstructed images. A threshold is then calculated as the mean plus two standard deviations of all reconstruction errors. Images with error above the threshold are labeled as anomalies (label = 0), and the rest as normal (label = 1).

---

## 📤 Submission

The predicted labels are combined with the test image IDs into a DataFrame. This DataFrame is then saved in the required CSV format with two columns: `image_id` and `# label`.

---

## ✅ Conclusion

This project demonstrates how convolutional autoencoders can be effectively used for anomaly detection in image data. By training exclusively on normal samples, the model learns to identify outliers based on reconstruction error, making it a practical solution for unsupervised image classification tasks.