# Automatic Labeling for Coastline Detection

This notebook demonstrates how to automatically classify pixels between water and mainland in PNG images using K-Means clustering and an autoencoder for semi-supervised learning. This approach avoids manual labeling of the dataset.

## Steps
1. Preprocess Images
2. Apply K-Means Clustering
3. Post-Process the Clustering Results
4. Use Autoencoder for Semi-Supervised Classification

## Preprocessing Images
We will load the images, resize them if necessary, and normalize the pixel values.

In [None]:
import os
import json
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Input, GlobalAveragePooling2D, UpSampling2D
from tensorflow.keras.applications import VGG16, ResNet50, InceptionV3
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, TensorBoard, ModelCheckpoint
from tensorflow.keras.metrics import AUC, Precision, Recall
from sklearn.metrics import confusion_matrix, classification_report, f1_score, cohen_kappa_score, matthews_corrcoef
import seaborn as sns
from PIL import Image
from tqdm import tqdm
import cv2
from sklearn.cluster import KMeans

# Set system encoding to UTF-8 (solve a windows issue with charmap undefiend)
import sys
sys.stdin.reconfigure(encoding='utf-8')
#sys.stdout.reconfigure(encoding='utf-8')

In [1]:

# Define function to load and preprocess images
def load_and_preprocess_image(image_path):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = cv2.resize(image, (256, 256))
    return image


# Function to verify and clean images
def verify_images(directory):
    for root, _, files in os.walk(directory):
        for file in files:
            file_path = os.path.join(root, file)
            try:
                img = Image.open(file_path)
                img.verify()  # Verify that it is an image
            except (IOError, SyntaxError) as e:
                print(f"Deleting corrupted image: {file_path}")
                os.remove(file_path)

# Normalize images

In [None]:
# Define paths to the directories
img_height, img_width = 150, 150
batch_size = 32
epochs = 50
start_epoch = 0
base_dir = 'classification'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

# Verify and clean images
verify_images(train_dir)
verify_images(validation_dir)

# Data preprocessing and augmentation by using ImageDataGenerator
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

validation_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical'
)

validation_generator = validation_datagen.flow_from_directory(
    validation_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical'
)


## Applying K-Means Clustering
We will convert the image into a vector of pixel values and apply the K-Means algorithm to cluster the pixels into two classes: water and mainland.

In [2]:

# Load and preprocess a sample image for demonstration
def load_and_preprocess_image(image_path):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = cv2.resize(image, (256, 256))
    return image

# Apply K-Means to the image
def apply_kmeans(image, n_clusters=2):
    pixel_values = image.reshape((-1, 3))
    pixel_values = np.float32(pixel_values)
    
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(pixel_values)
    segmented_image = labels.reshape(image.shape[:2])
    return segmented_image



## Post-Processing
After clustering, we apply morphological filters to remove small misclassified regions and smooth the results.

In [3]:
# Post-process the segmented image
def post_process(segmented_image):
    kernel = np.ones((5,5),np.uint8)
    processed_image = cv2.morphologyEx(segmented_image, cv2.MORPH_CLOSE, kernel)
    return processed_image

## Example of K-Means Clustering and Post-Processing
Load an image, apply K-Means clustering, and then post-process the results.

In [None]:
# Path to your image
import random
both_dir = os.path.join(train_dir, 'both')
image_files = [f for f in os.listdir(both_dir) if os.path.isfile(os.path.join(both_dir, f))]
random_image = random.choice(image_files)
image_path = os.path.join(both_dir, random_image)

image = load_and_preprocess_image(image_path)

# Apply K-Means
segmented_image = apply_kmeans(image)

# Post-Process
processed_image = post_process(segmented_image)

# Display results
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title('Original Image')
plt.imshow(image)
plt.subplot(1, 2, 2)
plt.title('Segmented Image')
plt.imshow(processed_image, cmap='gray')
plt.show()

## Using Autoencoder for Semi-Supervised Classification
Next, we will use an autoencoder to further refine the classification. The autoencoder will be trained on a set of images to learn the features of water and mainland.

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model

# Define the autoencoder model
input_img = Input(shape=(256, 256, 3))

x = Conv2D(64, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Summary of the model
autoencoder.summary()

## Training the Autoencoder
We will train the autoencoder using a set of training images.

In [None]:
# Assuming X_train and X_test are pre-loaded datasets of images
# Example: X_train and X_test could be loaded using a data generator or any other method

autoencoder.fit(X_train, X_train, epochs=50, batch_size=128, shuffle=True, validation_data=(X_test, X_test))

## Using the Trained Autoencoder for Classification
Use the trained autoencoder to classify pixels in the images.

In [None]:
# Get the decoded images (predictions)
decoded_imgs = autoencoder.predict(X_test)

# Display some of the original and decoded images
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(X_test[i])
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i])
plt.show()

## Conclusion
This notebook demonstrates how to use K-Means clustering and an autoencoder for automatic classification of pixels between water and mainland. This approach can be extended and refined with more complex models and larger datasets for improved accuracy.