## Students Information

Please enter the names and IDs of the two students below:

1. **Name**: Yasmine Ashraf Ghanem  
   **ID**: `9203707` 

2. **Name**: Yasmin Abdullah Nasser  
   **ID**: `9203717` 


## Students Instructions

This is your first graded lab assignment, as you put the work you have studied in the lectures in action, please take this opportunity to enhance your understanding of the concepts and hone your skills. As you work on your assignment, please keep the following instructions in mind:

- Clearly state your personal information where indicated.
- Be ready with your work before the time of the next discussion slot in the schedule.
- Plagiarism will be met with penalties, refrain from copying any answers to make the most out of the assignment. If any signs of plagiarism are detected, actions will be taken.
- It is acceptable to share the workload of the assignment bearing the discussion in mind.
- Feel free to [reach out](mailto:cmpsy27@gmail.com) if there were any ambiguities or post on the classroom.

## Submission Instructions

To ensure a smooth evaluation process, please follow these steps for submitting your work:

1. **Prepare Your Submission:** Alongside your main notebook, include any additional files that are necessary for running the notebook successfully. This might include data files, images, or supplementary scripts.

2. **Rename Your Files:** Before submission, please rename your notebook to reflect the IDs of the two students working on this project. The format should be `ID1_ID2`, where `ID1` and `ID2` are the student IDs. For example, if the student IDs are `9123456` and `9876543`, then your notebook should be named `9123456_9876543.ipynb`.

3. **Check for Completeness:** Ensure that all required tasks are completed and that the notebook runs from start to finish without errors. This step is crucial for a smooth evaluation.

4. **Submit Your Work:** Once everything is in order, submit your notebook and any additional files via the designated submission link on Google Classroom **(code: 2yj6e24)**. Make sure you meet the submission deadline to avoid any late penalties.
5. Please, note that the same student should submit the assignments for the pair throughout the semester.

By following these instructions carefully, you help us in evaluating your work efficiently and fairly **and any failure to adhere to these guidelines can affect your grades**. If you encounter any difficulties or have questions about the submission process, please reach out as soon as possible.

We look forward to seeing your completed projects and wish you the best of luck!





## Installation Instructions

In this lab assignment, we require additional Python libraries for scientific mathematics, particularly in the context of machine learning (ML) and satellite image analysis. To fulfill these requirements, we need to install Scikit-learn and Scikit-image. 
1. Install Scikit-learn  
Scikit-learn (Sklearn) is a powerful Python library for ML tasks, offering various algorithms for classification, regression, clustering, and model evaluation. It is extensively used for analyzing satellite imagery, enabling tasks such as land cover classification and environmental parameter prediction. On the other hand, Scikit-image (Skimage) provides comprehensive tools for image processing and computer vision, facilitating tasks such as image preprocessing, feature extraction, and segmentation. These libraries are essential for extracting valuable insights from satellite images and conducting advanced analysis in scientific computing and research domains.
```bash
pip install scikit-learn scikit-image
```


> **Note:** You are allowed to install any other necessary libraries you deem useful for solving the lab. Please ensure that any additional libraries are compatible with the project requirements and are properly documented in your submission.


## Maximum Likelihood Estimator (MLE) Classifier
The Maximum Likelihood Estimator (MLE) is a fundamental statistical approach used to infer the parameters of a given distribution that are most likely to result in the observed data. In the context of image classification, MLE helps to quantify the probability of observing the data within each predefined class based on their distinct statistical properties. This method is highly effective for classifying images into categories by comparing the likelihoods of the data under different model parameters, enabling the most probable class assignment.

1. **Calculate Class Priors**: Estimate the probability of each class based on the training dataset. This is expressed as:
   $$
   P(C_k) = \frac{N_k}{N}
   $$
   where \(N_k\) is the number of samples of class \(k\) and \(N\) is the total number of samples.

2. **Estimate Class-specific Parameters**: For each class, estimate parameters such as the mean \(\mu_k\) and covariance \(\Sigma_k\) of features that describe the distribution of the data:
   $$
   \mu_k = \frac{1}{N_k} \sum_{x \in C_k} x
   $$
   $$
   \Sigma_k = \frac{1}{N_k} \sum_{x \in C_k} (x - \mu_k)(x - \mu_k)^T
   $$

3. **Compute Likelihoods**: For a given test instance \(x\), compute the likelihood of that instance belonging to each class using the estimated parameters:
   $$
   p(x | C_k) = \frac{1}{(2\pi)^{d/2} |\Sigma_k|^{1/2}} \exp\left(-\frac{1}{2} (x - \mu_k)^T \Sigma_k^{-1} (x - \mu_k)\right)
   $$

4. **Classify Based on Maximum Likelihood**: Assign the class label to each test instance based on the highest likelihood, which can be calculated as:
   $$
   \hat{y} = \arg\max_{k} P(C_k) \cdot p(x | C_k)
   $$

The Naive Bayes classifier is perhaps the most well-known application of the Maximum Likelihood Estimator principle in classification tasks. It assumes that the features in each class are independent, simplifying the computation of likelihoods. While Naive Bayes is popular for its simplicity and efficiency, it is not the only technique that leverages the MLE approach. Other classical alternatives include Logistic Regression, which applies MLE to estimate the parameters that best predict categorical outcomes, and Gaussian Mixture Models, which use MLE to estimate the parameters of multiple Gaussian distributions within the data. Students are encouraged to explore these models to gain a deeper understanding of statistical estimation techniques.


## Req- Image Classification for EuroSATallBands
Image classification is a key challenge in satellite imaging and remote sensing. As discussed in the lecture, this task is typically conducted on a pixel-wise basis because a single image can contain multiple textural elements of different celestial features. However, for this specific assignment, we will focus on identifying the dominant phenomena in the image as the basis for classification.

- **Load the Images**: Load the images of the EuroSAT dataset that belong to the **residential**, **river**, and **forest** classes.

- **Split the Dataset**: Split the dataset such that 10% of each class is used as testing data, and the remainder is used for training your classifier. Use the indices provided by `np.random.choice` with seed set to `27`. **Code is provided do not change it**.

- **Feature Extraction**: Extract suitable features from the images that you think might be relevant in distinguishing each class from the others. Keep in mind the curse of dimensionality when selecting features.

- **Implement a Maximum Likelihood Estimator (MLE)**: Implement a Maximum Likelihood Estimator (MLE) based on your training data. 
- **Report Accuracy and Average F1 Score**: After testing your classifier on the test set, report the **Accuracy** and **Average F1 Score** of your model.


In [1]:
# Add your libraries here
import numpy as np
import os
import cv2
import skimage

In [2]:
# DO NOT CHANGE THIS CELL
## Training set indices.
np.random.seed(27)  # Set random seed for reproducibility

# Randomly select indices for the test sets for each class
residential_test_indices = np.random.choice(np.arange(3000), size=300, replace=False)
forest_test_indices = np.random.choice(np.arange(3000), size=300, replace=False)
river_test_indices = np.random.choice(np.arange(2500), size=250, replace=False)


In [3]:
def read_dataset(folder_path):
    content = os.listdir(folder_path)

    print(content)

    # Read the images
    images = [] # List to store images
    labels = [] # 0: residential, 1: forest, 2: river
    
    for folder in content:
        if folder.lower() == 'residential':
            images_in_folder = os.listdir(folder_path + '/' + folder)
            for image in images_in_folder:
                image_path = folder_path + '/' + folder + '/' + image
                images.append(cv2.imread(image_path))
                labels.append(0)

        elif folder.lower() == 'forest':
            images_in_folder = os.listdir(folder_path + '/' + folder)
            for image in images_in_folder:
                image_path = folder_path + '/' + folder + '/' + image
                images.append(cv2.imread(image_path))
                labels.append(1)

        elif folder.lower() == 'river':
            images_in_folder = os.listdir(folder_path + '/' + folder)
            for image in images_in_folder:
                image_path = folder_path + '/' + folder + '/' + image
                images.append(cv2.imread(image_path))
                labels.append(2)

        else:
            continue
    
    return images, labels

def shuffle_dataset(images, labels):
    # shuffle the dataset
     
    # zip the images and labels together
    zipped_lists = list(zip(images, labels))

    # shuffle the zipped list
    np.random.shuffle(zipped_lists)

    # unzip the zipped list
    shuflled_images, shuffled_labels = zip(*zipped_lists)

    return shuflled_images, shuffled_labels

def split_dataset(images, labels):
    # Split the dataset into training and test sets
    residential_images = np.array(images)[np.array(labels) == 0]
    forest_images = np.array(images)[np.array(labels) == 1]
    river_images = np.array(images)[np.array(labels) == 2]

    # split the dataset intp 90% training and 10% test
    # Sort the indices in reverse order to avoid index shifting when removing elements
    residential_test_indices.sort(reverse=True)
    forest_test_indices.sort(reverse=True)
    river_test_indices.sort(reverse=True)

    # Extract elements based on the indices and remove them from the original list
    # for index in indices:
    #     extracted_elements.append(original_list.pop(index))

    # return extracted_elements, original_list
    

def extract_features(images, labels):

    images = np.array(images)
    images_reshaped = np.reshape(images, (images.shape[0], -1))  # -1 automatically calculates the remaining dimension

    # Get the images for each class
    residential_images = images_reshaped[labels == 0]
    forest_images = images_reshaped[labels == 1]
    river_images = images_reshaped[labels == 2]
    


def maximum_likelihood_estimation(features, images, labels, image_to_classify):

    # Preprocessing #
    # reshape images to have the same dimensions
    images = np.array(images)
    images_reshaped = np.reshape(images, (images.shape[0], -1))  # -1 automatically calculates the remaining dimension


    # 1. calculate class priors
    prior_residential = np.sum(labels == 0) / len(labels)
    prior_forest = np.sum(labels == 1) / len(labels)
    prior_river = np.sum(labels == 2) / len(labels)

    # 2. calculate class means and covariance matrices
    residential_images = images_reshaped[labels == 0]
    forest_images = images_reshaped[labels == 1]
    river_images = images_reshaped[labels == 2]

    mean_residential = np.mean(residential_images, axis=0)
    mean_forest = np.mean(forest_images, axis=0)
    mean_river = np.mean(river_images, axis=0)

    # Normalize (center) the data
    residential_images = residential_images - mean_residential
    forest_images = forest_images - mean_forest
    river_images = river_images - mean_river

    # When dealing with image data, where each row typically corresponds to a pixel and each column corresponds to an image, it's common to set rowvar to False because you want to compute the covariance between pixels across all images (observations). 
    coveriance_residential = np.cov(residential_images, rowvar=False)
    coveriance_forest = np.cov(forest_images, rowvar=False)
    coveriance_river = np.cov(river_images, rowvar=False)


    # 3. compute the likelihood of each class
    # what is the d?? => matenseesh
    d = 1 
    likelihood_residential = (1/((2*np.pi)**(d/2)*np.sqrt(np.linalg.det(coveriance_residential)))) * np.exp(-0.5 * np.dot(np.dot((image_to_classify - mean_residential), np.linalg.inv(coveriance_residential)), (image_to_classify - mean_residential).T))
    likelihood_forest = (1/((2*np.pi)**(d/2)*np.sqrt(np.linalg.det(coveriance_forest)))) * np.exp(-0.5 * np.dot(np.dot((image_to_classify - mean_forest), np.linalg.inv(coveriance_forest)), (image_to_classify - mean_forest).T))
    likelihood_river = (1/((2*np.pi)**(d/2)*np.sqrt(np.linalg.det(coveriance_river)))) * np.exp(-0.5 * np.dot(np.dot((image_to_classify - mean_river), np.linalg.inv(coveriance_river)), (image_to_classify - mean_river).T))

    # 4. classify based on maximum likelihood
    maximum_likelihood = np.argmax([likelihood_residential * prior_residential, likelihood_forest * prior_forest, likelihood_river * prior_river])
    
    return maximum_likelihood

In [22]:
'''
    We have three classes in the dataset: residential, forest, and river.
    The image can be classified into one of these three classes.
    The dataset is divided into training and testing sets.
'''
# REQ1 For the RGB images dataset.

# Follow the steps 
# 1. Load the dataset 
images, labels = read_dataset('dataset/RGB')

# 2. Split the dataset into training and testing sets
split_dataset(images, labels)

# 3. Extract the features from the images


# 4. Implement the maximum likelihood estimation


['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial', 'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']
(2, 4, 3, 5, 1)
('b', 'd', 'c', 'e', 'a')


### Grading Rubric (Total: 10 Marks)

The lab is graded based on the following criteria:

1. **Data Loading and Preparation (2 Marks)**
   - Correctly loads images for the residential, river, and forest classes. (0.5 Marks)
   - Accurately splits the dataset into training and testing subsets and clearly shows this split. (1.5 Marks)

2. **Feature Extraction (2 Marks)**
   - Implements feature extraction appropriately, considering the curse of dimensionality. (1 Mark)
   - Extracts and justifies the selection of features relevant to distinguishing the classes. (1 Mark)

3. **Implementation of MLE Classifier (3 Marks)**
   - Correctly calculates and clearly shows class priors and class-specific parameters. (1 Mark)
   - Accurately computes likelihoods using the likelihood equation (probability density function) and classifies based on maximum likelihood. Must clearly show these calculations and explain the choice of likelihood equation. (2 Marks)

4. **Model Evaluation and Understanding (3 Marks)**
   - Shows **confusion matrix** and correctly calculates and clearly shows the calculations for Accuracy and Average F1 Score. (1 Mark)
   - **Comparison amongst your peers.** Compares the model's performance against those of peers to identify strengths and areas for improvement. (2 Marks)

Each section of the lab will be evaluated on completeness, and correctness in approach and analysis. Part of the rubric also includes the student's ability to explain and justify their choices and results.


## Submission Instructions

To ensure a smooth evaluation process, please follow these steps for submitting your work:

1. **Prepare Your Submission:** Alongside your main notebook, include any additional files that are necessary for running the notebook successfully. This might include data files, images, or supplementary scripts.

2. **Rename Your Files:** Before submission, please rename your notebook to reflect the IDs of the two students working on this project. The format should be `ID1_ID2`, where `ID1` and `ID2` are the student IDs. For example, if the student IDs are `9123456` and `9876543`, then your notebook should be named `9123456_9876543.ipynb`.

3. **Check for Completeness:** Ensure that all required tasks are completed and that the notebook runs from start to finish without errors. This step is crucial for a smooth evaluation.

4. **Submit Your Work:** Once everything is in order, submit your notebook and any additional files via the designated submission link on Google Classroom **(code: 2yj6e24)**. Make sure you meet the submission deadline to avoid any late penalties.
5. Please, note that the same student should submit the assignments for the pair throughout the semester.

By following these instructions carefully, you help us in evaluating your work efficiently and fairly **and any failure to adhere to these guidelines can affect your grades**. If you encounter any difficulties or have questions about the submission process, please reach out as soon as possible.

We look forward to seeing your completed projects and wish you the best of luck!
