# **Photovoltaic panel segmentation on building facades**

Ayca Duran*, Pedram Mirabian, Panagiotis Karapiperis, Christoph Waibel,
Bernd Bickel and Arno Schlueter

## Step 3: Dataset Preparation

Following the data collection phase, the images were pre-processed to ensure a high dataset quality. These steps were performed using the [roboflow](https://roboflow.com) platform to speed up collaboration. 

Following this step, a processing step was performed on the masks to include occlusions and remove non-PV areas that could not be easily labeled on Roboflow (patches of non-PV pixels within a PV mask).

Finally, the 

TODO IMAGE ROBOFLOW MAINSCREEN

### Image Cleaning

the images were manually inspected and cropped in some cases to remove non-essential environment features.

### Manual Labeling

Manual labeling was performed to determine the ground truth PV masks on the building facades and creating the dataset.

The cleaned and labeled images were then split into training, validation and test sets using a stratified sampling approach (detailed below) utilizing the metadata of the PV projects to ensure a representative distribution in the model training process.

### Data Augmentation

The training dataset was then augmented by the following methods to increase the dataset size to 7x the original:

* Flip
* Crop (0% ... 40% zoom)
* Saturation (-20% ... +20%)
* Brightness (-20% ... +20%)
* Exposure (-20% ... +20%)

The labeling and augmentation step was performed using https://roboflow.com/

### Pre-processing Masks

To deal with holes in masks, using Detectron2 library

In [None]:
'''
GLOBAL SETUP
'''
from pathlib import Path
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount= False)
    repo_path = Path("/content/drive/MyDrive/PVFINDER")
except:
    repo_path = Path.cwd().parent
    raise EnvironmentError("Google Colab environment not detected. Please run this notebook in Google Colab to ensure all required workflows are functional.")

base_path = repo_path / "files" / "02_datasets" / "dataset" / "02_stratAug" # should continue like dataset/02_stratAug/train/... ()

In [None]:
#@title installation & imports

import sys
import os
import json
import pickle
import distutils.core

# Check if GPU is available
import locale
locale.getpreferredencoding = lambda: "UTF-8"
!nvidia-smi

# Detectron2 (needs download on Google Colab)
!git clone 'https://github.com/facebookresearch/detectron2'
dist = distutils.core.run_setup("./detectron2/setup.py")
!python -m pip install {' '.join([f"'{x}'" for x in dist.install_requires])}
sys.path.insert(0, os.path.abspath('./detectron2'))

#
import detectron2
from detectron2 import model_zoo
#
from detectron2.engine import DefaultPredictor
from detectron2.engine import DefaultTrainer
from detectron2.engine.hooks import HookBase
#
from detectron2.checkpoint import DetectionCheckpointer
#
from detectron2.evaluation import inference_context
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
#
from detectron2.config import get_cfg
#
import detectron2.utils
from detectron2.utils.visualizer import Visualizer, ColorMode
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data import DatasetMapper, build_detection_test_loader
from detectron2.data.datasets import register_coco_instances
from detectron2.utils.events import EventStorage
from detectron2.utils.logger import setup_logger
from detectron2.utils.logger import log_every_n_seconds
setup_logger()
import detectron2.utils.comm as comm
#
from detectron2 import structures

# Logging
import time
import datetime
import logging
from tqdm import tqdm

# Required libraries
import csv
import numpy as np
import pandas as pd
import math, random
import matplotlib.pyplot as plt
from matplotlib import cm
import PIL
from PIL import Image
import cv2
from pycocotools import mask as mask_utils
import gc
import warnings

# PyTorch (pre-installed on Google Colab)
import torch
from torch.utils.data import Dataset, DataLoader
!nvcc --version
TORCH_VERSION = (torch.__version__)
print("torch: ", TORCH_VERSION)
print("detectron2:", detectron2.__version__)

In [None]:
# Note: This is a faster way to install detectron2 in Colab, but it does not include all functionalities.
# See https://detectron2.readthedocs.io/tutorials/install.html for full installation instructions

import sys, os, distutils.core

!git clone 'https://github.com/facebookresearch/detectron2'
dist = distutils.core.run_setup("./detectron2/setup.py")
!python -m pip install {' '.join([f"'{x}'" for x in dist.install_requires])}
sys.path.insert(0, os.path.abspath('./detectron2'))

# Properly install detectron2. (Please do not install twice in both ways)
# !python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

import numpy as np
import math, random

import matplotlib.pyplot as plt
from matplotlib import cm
import seaborn as sns

from PIL import Image
import cv2
from google.colab.patches import cv2_imshow

import os, json, pickle

import torch
import detectron2

from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer, ColorMode
from detectron2.data import MetadataCatalog, DatasetCatalog

from detectron2.utils.logger import setup_logger
setup_logger()

!nvcc --version
TORCH_VERSION = (torch.__version__)
print("torch: ", TORCH_VERSION)
print("detectron2:", detectron2.__version__)


#@title CREATE CASE 2) find only visible PV - uncomment if needed to run
import os
import json
import numpy as np
import cv2
import matplotlib.pyplot as plt
from pycocotools import mask as mask_utils
from PIL import Image
from tqdm.notebook import tqdm

# Create binary masks for each class
def get_mask(annotations, img_id, height, width, category_id):
    """Generates a binary mask for a given image and category."""
    mask = np.zeros((height, width), dtype=np.uint8)
    for ann in annotations:
        if ann['image_id'] == img_id and ann['category_id'] == category_id:
            rles = mask_utils.frPyObjects(ann['segmentation'], height, width)  # FIX: Correct order (height, width)
            decoded_mask = mask_utils.decode(rles)
            if decoded_mask is not None:
                decoded_mask = decoded_mask.reshape((height, width))  # Ensure correct shape
                mask |= np.max(decoded_mask, axis=-1) if decoded_mask.ndim == 3 else decoded_mask
    return mask

for split in ["train", "valid", "test"]:
    json_path = os.path.join(base_path, split, "_annotations.coco.json")
    image_sizes_path = os.path.join(base_path, split, "image_sizes.json")
    output_path = os.path.join(base_path, split, "_annotations_case2.coco.json")

    # Load the COCO JSON data
    with open(json_path, 'r') as f:
        coco_data = json.load(f)

    image_sizes = {}
    if os.path.exists(image_sizes_path):
        try:
            with open(image_sizes_path, 'r') as f:
                image_sizes = json.load(f)
        except json.JSONDecodeError:
            print(f"WARNING: Could not read {image_sizes_path}. Resetting image sizes.")
            image_sizes = {}

    if not image_sizes:
        progress_bar = tqdm(total=len(coco_data["images"]), desc="Reading image sizes...", dynamic_ncols=True)
        for img in coco_data['images']:
            img_path = os.path.join(base_path, split, img['file_name'])
            if os.path.exists(img_path):
                with Image.open(img_path) as image:
                    width, height = image.size
                    image_sizes[img['file_name']] = {"width": width, "height": height}
            else:
                print(f"ERROR: Image file {img_path} not found!")
            progress_bar.update(1)

        with open(image_sizes_path, 'w') as f:
            json.dump(image_sizes, f, indent=2)
        progress_bar.close()
    else:
        print(f">>> Read {image_sizes_path}\n{len(image_sizes)} items.")

    for img in coco_data["images"]:
        width, height = image_sizes[img["file_name"]]["width"], image_sizes[img["file_name"]]["height"]
        if img['width'] != width or img['height'] != height:
            print(f"WARNING: Size mismatch in {img['file_name']} (COCO: {img['width']}x{img['height']}, Actual: {width}x{height})")
            img['width'], img['height'] = width, height  # Correct the values


    # Get COCO class IDs dynamically
    category_mapping = {cat['name']: cat['id'] for cat in coco_data['categories']}
    PV_ID = category_mapping.get("pv", None)
    PV_OCCLUDED_ID = category_mapping.get("pv_occluded", None)
    NONPV_FACADE_ID = category_mapping.get("nonpv_facade", None)

    if None in [PV_ID, PV_OCCLUDED_ID, NONPV_FACADE_ID]:
        raise ValueError("One or more required categories are missing from COCO JSON")

    new_annotations = []
    ann_id = 0

    progress_bar = tqdm(total=len(coco_data["images"]), desc="Processing masks...", dynamic_ncols=True)
    for img in coco_data['images']:
        img_id = img["id"]
        width, height = image_sizes[img['file_name']]["width"], image_sizes[img['file_name']]["height"]
        img_path = os.path.join(base_path, split, img['file_name'])

        pv_mask = get_mask(coco_data['annotations'], img_id, height, width, PV_ID)
        occlusion_mask = get_mask(coco_data['annotations'], img_id, height, width, PV_OCCLUDED_ID)
        nonpv_mask = get_mask(coco_data['annotations'], img_id, height, width, NONPV_FACADE_ID)

        shared_occlusion = np.logical_and(pv_mask, occlusion_mask).astype(np.uint8)
        # cleaned_pv_mask = np.clip(pv_mask - shared_occlusion - nonpv_mask, 0, 1)
        cleaned_pv_mask = pv_mask.copy()
        cleaned_pv_mask[shared_occlusion == 1] = 0  # Remove only the overlapping occlusions
        cleaned_pv_mask[nonpv_mask == 1] = 0  # Remove NonPV areas




        # # Define kernel for morphological operations
        # kernel = np.ones((3, 3), np.uint8)

        # # Apply morphological opening to remove small noise & thin lines
        # cleaned_pv_mask = cv2.morphologyEx(cleaned_pv_mask, cv2.MORPH_OPEN, kernel)

        # # Remove small connected components
        # num_labels, labels, stats, _ = cv2.connectedComponentsWithStats(cleaned_pv_mask, connectivity=8)

        # # Set a minimum area threshold (adjust as needed)
        # min_area = 20  # Increase this if needed
        # for i in range(1, num_labels):  # Ignore background (0)
        #     if stats[i, cv2.CC_STAT_AREA] < min_area:
        #         cleaned_pv_mask[labels == i] = 0  # Remove small components

        # # **Optional: Apply morphological closing to fill gaps left by thin line removal**
        # cleaned_pv_mask = cv2.morphologyEx(cleaned_pv_mask, cv2.MORPH_CLOSE, kernel)

        # # **Optional: Additional Thinning Removal**
        # # Detect edges using Sobel to find thin structures
        # edges = cv2.Sobel(cleaned_pv_mask, cv2.CV_8U, 1, 1, ksize=3)
        # thin_lines = (edges > 0).astype(np.uint8)

        # # Subtract detected thin lines from the cleaned mask
        # cleaned_pv_mask = np.clip(cleaned_pv_mask - thin_lines, 0, 1)







        # Visualization
        # fig, axes = plt.subplots(1, 5, figsize=(15, 5))
        # original_image = np.array(Image.open(img_path).convert("RGB")) if os.path.exists(img_path) else np.zeros((height, width, 3), dtype=np.uint8)

        # axes[0].imshow(original_image)
        # axes[0].set_title("Original Image")
        # axes[1].imshow(pv_mask, cmap="Reds", alpha=0.7)
        # axes[1].set_title("PV Mask")
        # axes[2].imshow(occlusion_mask, cmap="Blues", alpha=0.7)
        # axes[2].set_title("PV Occluded")
        # axes[3].imshow(nonpv_mask, cmap="Greens", alpha=0.7)
        # axes[3].set_title("NonPV Facade")
        # axes[4].imshow(cleaned_pv_mask, cmap="gray")
        # axes[4].set_title("Final PV Mask")

        # for ax in axes:
        #     ax.axis("off")
        # plt.show()

        # Find connected components to separate PV regions
        num_labels, labels, stats, _ = cv2.connectedComponentsWithStats(cleaned_pv_mask.astype(np.uint8), connectivity=8)

        for i in range(1, num_labels):  # Ignore the background label (0)
            new_mask = (labels == i).astype(np.uint8)
            bbox_width = int(stats[i, cv2.CC_STAT_WIDTH])
            bbox_height = int(stats[i, cv2.CC_STAT_HEIGHT])
            if new_mask.sum() > 0 and bbox_width >= 10 and bbox_height >= 10:  # Filter small areas
                rle = mask_utils.encode(np.asfortranarray(new_mask))
                rle['counts'] = rle['counts'].decode('utf-8')  # Convert bytes to string for JSON compatibility
                new_annotations.append({
                    'id': ann_id,
                    'image_id': img_id,
                    'category_id': PV_ID,
                    'segmentation': rle,
                    'area': int(stats[i, cv2.CC_STAT_AREA]),
                    'bbox': [
                        int(stats[i, cv2.CC_STAT_LEFT]),
                        int(stats[i, cv2.CC_STAT_TOP]),
                        bbox_width,
                        bbox_height
                    ],
                    'iscrowd': 0
                })
                ann_id += 1
        progress_bar.update(1)
    progress_bar.close()

    # Update COCO JSON
    coco_data["images"] = coco_data["images"] # the dimensions (if needed) have been changed in the code
    coco_data['annotations'] = new_annotations
    coco_data['categories'] = [cat for cat in coco_data['categories'] if cat['id'] == PV_ID]

    with open(output_path, 'w') as f:
        json.dump(coco_data, f, indent=2)

    print(f"Processed and saved: {output_path}")


Reading image sizes...:   0%|          | 0/825 [00:00<?, ?it/s]

Processing masks...:   0%|          | 0/825 [00:00<?, ?it/s]

Processed and saved: /content/drive/MyDrive/DLforPVFacades/paper/dataset/7_dataset_final_PVandNonPV_NoAug_Unstratified/train/_annotations_case2.coco.json


Reading image sizes...:   0%|          | 0/104 [00:00<?, ?it/s]

Processing masks...:   0%|          | 0/104 [00:00<?, ?it/s]

Processed and saved: /content/drive/MyDrive/DLforPVFacades/paper/dataset/7_dataset_final_PVandNonPV_NoAug_Unstratified/valid/_annotations_case2.coco.json


Reading image sizes...:   0%|          | 0/104 [00:00<?, ?it/s]

Processing masks...:   0%|          | 0/104 [00:00<?, ?it/s]

Processed and saved: /content/drive/MyDrive/DLforPVFacades/paper/dataset/7_dataset_final_PVandNonPV_NoAug_Unstratified/test/_annotations_case2.coco.json


### Encoding Masks for Occlusion

In [None]:
#@title def functions - just run this

def make_next(models_dir, d2model, d2epochs, d2lr, d2bs, d2wd, make= True) -> str:
    # Calculate model iteration
    models_dir_models = [item for item in os.listdir(models_dir) if item[0] != "."]
    if len(models_dir_models) != 0:
        nums = [text.split("_")[0] for text in models_dir_models]

        models = sorted(models_dir_models, key=lambda folder: int(folder.split('_')[0]))
        model_num = int(models[-1].split("_")[0]) + 1
    else:
        model_num = 0

    rem = 3 - len(str(model_num))
    zeroes = "0" * rem
    model_arch = d2model.split("/")[1]
    model_name = f"{zeroes}{model_num}_{model_arch}_{d2epochs}epochs_{d2lr}lr_{d2bs}bs_{d2wd}wd"

    model_dir = os.path.join(models_dir, model_name)
    if make:
        os.makedirs(model_dir, exist_ok=True)
        print(f">>> created model folder at\n{model_dir}")
    else:
        print(f">>> if make == true then folder will be created at\n{model_dir}")

    return model_dir

def model_name_from_num(model_num):
    zeroes = "0" * (3 - len(str(model_num)))
    model_name = None

    for folder in os.listdir(trained_models_folder):
        if folder.split("_")[0] == f"{zeroes}{model_num}":
            model_name = folder
    if model_name == None:
        raise Exception(f"Model with number {model_num} not found in\n{trained_models_folder}")

    return model_name

class LossEvalHook(HookBase):
    def __init__(self, eval_period, model, data_loader):
        self._model = model
        self._period = eval_period
        self._data_loader = data_loader

    def _do_loss_eval(self):
        # Copying inference_on_dataset from evaluator.py
        total = len(self._data_loader)
        num_warmup = min(5, total - 1)

        start_time = time.perf_counter()
        total_compute_time = 0
        losses = []
        for idx, inputs in enumerate(self._data_loader):
            if idx == num_warmup:
                start_time = time.perf_counter()
                total_compute_time = 0
            start_compute_time = time.perf_counter()
            if torch.cuda.is_available():
                torch.cuda.synchronize()
            total_compute_time += time.perf_counter() - start_compute_time
            iters_after_start = idx + 1 - num_warmup * int(idx >= num_warmup)
            seconds_per_img = total_compute_time / iters_after_start
            if idx >= num_warmup * 2 or seconds_per_img > 5:
                total_seconds_per_img = (time.perf_counter() - start_time) / iters_after_start
                eta = datetime.timedelta(seconds=int(total_seconds_per_img * (total - idx - 1)))
                log_every_n_seconds(
                    logging.INFO,
                    "Loss on Validation  done {}/{}. {:.4f} s / img. ETA={}".format(
                        idx + 1, total, seconds_per_img, str(eta)
                    ),
                    n=5,
                )
            loss_batch = self._get_loss(inputs)
            losses.append(loss_batch)
        mean_loss = np.mean(losses)
        self.trainer.storage.put_scalar('validation_loss', mean_loss)
        comm.synchronize()

        return losses

    def _get_loss(self, data):
        # How loss is calculated on train_loop
        metrics_dict = self._model(data)
        metrics_dict = {
            k: v.detach().cpu().item() if isinstance(v, torch.Tensor) else float(v)
            for k, v in metrics_dict.items()
        }
        total_losses_reduced = sum(loss for loss in metrics_dict.values())
        return total_losses_reduced

    def after_step(self):
        next_iter = self.trainer.iter + 1
        is_starting = next_iter == 5
        is_final = next_iter == self.trainer.max_iter
        if is_starting or is_final or (self._period > 0 and next_iter % self._period == 0):
            self._do_loss_eval()
        self.trainer.storage.put_scalars(timetest=12)

class MyTrainer(DefaultTrainer):
    @classmethod
    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
        if output_folder is None:
            output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
        return COCOEvaluator(dataset_name, cfg, True, output_folder)

    def build_hooks(self):
        hooks = super().build_hooks()
        hooks.insert(-1,LossEvalHook(
            cfg.TEST.EVAL_PERIOD,
            self.model,
            build_detection_test_loader(
                self.cfg,
                self.cfg.DATASETS.TEST[0],
                DatasetMapper(self.cfg,True)
            )
        ))
        return hooks

class MeanStdDataset(Dataset):
    def __init__(self, data_dir, pv_only= False, extension= '.jpg', transform= None):
        self.data_dir = data_dir
        self.image_files  = [f for f in os.listdir(data_dir) if f.endswith(extension)]
        if pv_only:
            self.image_files = [f for f in self.image_files if "_gsv_" in f or "_web_" in f]
        self.transform = transform

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, index):
        # Open the image and convert to a numpy array, then to a torch tensor.
        image = torch.from_numpy(np.array(PIL.Image.open(os.path.join(self.data_dir, self.image_files[index])), dtype=np.float32))
        # If grayscale, add a channel dimension
        if image.ndim == 2:
            image = image.unsqueeze(-1)
        if self.transform:
            image = self.transform(image)
        return image

def compute_mean_std(dataloader):
    '''
    Compute the per-channel mean and standard deviation for images that may have different sizes.
    This function iterates over images (even if in batches) and computes the running sum, squared sum and total pixel count.
    '''
    total_sum = None
    total_sq_sum = None
    total_pixels = 0

    for batch in tqdm(dataloader):
        # Each batch is a list of images (they are not stacked since their sizes vary)
        for image in batch:
            # image shape: (H, W, C) or (H, W) - but we've handled grayscale in __getitem__
            # Compute per-channel sum and squared sum over height and width.
            # If image is (H, W, C), then summing over dimensions 0 and 1 gives a vector of length C.
            img_sum = torch.sum(image, dim=[0, 1])
            img_sq_sum = torch.sum(image ** 2, dim=[0, 1])
            # Total number of pixels per channel in this image (same for each channel)
            num_pixels = image.shape[0] * image.shape[1]

            if total_sum is None:
                total_sum = img_sum
                total_sq_sum = img_sq_sum
            else:
                total_sum += img_sum
                total_sq_sum += img_sq_sum

            total_pixels += num_pixels

    # The mean is computed by dividing the total per-channel sum by the total number of pixels.
    mean = (total_sum / total_pixels)
    # Standard deviation: sqrt(E[X^2] - (E[X])^2)
    std = torch.sqrt(total_sq_sum / total_pixels - mean ** 2)
    return mean, std

def custom_collate(batch):
    return batch

warnings.filterwarnings("ignore", message="torch.meshgrid: in an upcoming release")

### Dataset Mean and Standard Deviation

In [None]:
#@title calculate dataset mean std - and the resulting numbers

calculated_values = {
    5: {
        "mean": [122.1287, 129.9507, 136.4666],
        "std": [72.9773, 72.1729, 77.3321]
    }
}

# The above values are calculated using the method below.
# If you want to calculate them again (this might take a while),
# toggle the boolean below:
calc_mean_std = False

if calc_mean_std:
    # calculate mean and std for the selected dataset
    dataset = MeanStdDataset(os.path.join(dataset_dir, "train"), pv_only= pv_only)
    print(f"{len(dataset)} images... (progress bar is batch size 4)")
    train_loader = DataLoader(dataset, batch_size=4, shuffle=False,
                                num_workers=2, collate_fn=custom_collate, pin_memory=True)
    total_mean, total_std = compute_mean_std(train_loader)
    print(f'\nmean (RGB):', total_mean)
    print('std (RGB): ', total_std)
else:
    # use pre-calculated values based on the selected dataset
    total_mean = calculated_values[5]["mean"]
    total_std = calculated_values[5]["std"]