# End-to End Satellite Image Segmention Pipeline

The end-to-end pipeline for satellite image segmentation is composed of the following steps:

1. Data Ingestion
2. Data Preprocessing
3. Model Training
4. Model Evaluation
5. API Inference

## Usage

### Model training 

To Train the model locally, run:

```bash
python src/get_and_process_data.py
```
followed by:

```bash
python src/train_model.py
```

Additionally, if you do not have a GPU, a colab notebook is available at `notebooks/train_model.ipynb`. The notebook pulls the repo from Github, runs the ingestion and training pipeline, then saves the model periodically to Google Drive. 

### Model deployment via API

#### Local development environment 

1. Ensure that the `model_path` parameter in `conf/config.yaml` points to the right model path, and model source is set to `local`:


    ```yaml
    api:
      MODEL_PATH: # model path to use for inference
      MODEL_SOURCE: local
    ```

2. Ensure docker is installed on your local machine and service started

3. Build the docker image

    ```bash
    docker build -t sat_img_seg -f .\docker\flask_app.dockerfile .
    ```

4. Run the docker image

    ```bash
    docker run -p 5000:5000 sat_img_seg
    ```

5. The API is now running on `http://localhost:5000/`




#### Model deployment to AWS

A github actions workflow is set up to automatically deploy the API to AWS EC2, whenever the `main` branch is updated.

1. Upload trained model artifact, .pth, to google drive, and obtain the file id, the file id can be obtained from the shareable link : https://drive.google.com/file/<file_id>/view?usp=drive_link

2. Spin up a EC2 instance on AWS, ensure that docker is installed and service started

3. Make sure that you have a docker account on docker hub

4. On the Github repo, ensure that the following secrets are set:
    - DOCKER_USERNAME 
    - DOCKER_PASSWORD
    - AWS_HOST (Public IP address of the EC2 instance)
    - AWS_USERNAME (Username of the EC2 instance)
    - AWS_PRIVATE_KEY (Private key of the EC2 instance, you should download a .pem file when you create the EC2 instance, copy and paste the content of the .pem file into the secret)

5. Once a change has been made to the master branch, or manually triggered, the github actions workflow will be triggered, and the API will be deployed to the EC2 instance

5. The API is now running on `http://<AWS_HOST>:5000/`

## Dataset

The dataset consists of 1366 rural and 1156 urban satellite images and correpsonding masks consisting of remote sensing images from Nanjing, Changzhou, and Wuhan. They are part of the [LoveDA](https://github.com/Junjue-Wang/LoveDA) dataset. 

The masks contains 5 labels: unlablled (0), building (1), woodland (2), water(3), road(4). Unlablled class consists of all landcover types other than the types specified in labels 1-4. Overall, More than half of all mask pixels are unlablled, more than 1/3 are woodlands, and only a minority are buildings, water and road.

The images and masks were split into 256x256 smaller images. Image augmentation where additional datasets were created by randomly varying the brightness and constrast of source images.

## Modelling

2 segmentation architectures were used. A basic encoder-decorder network and Unet (https://arxiv.org/pdf/1505.04597.pdf). The basic network achieved 0.72 of IOU (Intersection over Union) and the Unet achieved 0.74 after about 30-40 epoches trained on the Google Colab platform. 

## Limitations and Future Exploration

### Limitations

- Does not handle overlapped objects well, i.e. tree over road
- Does not pick up areas where separation between objects are not as apparent
- Unlabelled class consists of grassland, concrete paving, shadow, etc.
- Some ground truth labelling are inaccurate
- Dataset only consists of some areas in Poland, mostly forested and rural areas

### Future explorations

- Explore other architectures and pretrained models
- Increase variety of datasets to include a fair representation of urban, rural, and different types of natural landcovers 
- Better labelled training sets


## Acknowledgement

The Unet architecture was based on the paper by Olaf Ronneberger, Philipp Fischer, and Thomas Brox: https://arxiv.org/pdf/1505.04597.pdf. 

In addition, the project takes reference from videos by DigitalSreeni: https://www.youtube.com/c/DigitalSreeni.