# Monet GAN Kaggle Competition

https://github.com/zkek44/Generate_Monet_GAN

## Problem and Data Description

The goal of this Kaggle competition is to generate Monet-style paintings using GANs (Generative Adversarial Networks). Specifically, I need to build a GAN-based model capable of translating real-world photographs into paintings that resemble the style of Claude Monet.

The dataset includes two domains of 256x256 RGB images:

monet_jpg (300 images): Paintings by Claude Monet — used as the target domain for the style transfer.

photo_jpg (7,028 images): Real-world photographs — used as the source domain to be converted into Monet-style images.

The same sets are also provided as TFRecords (monet_tfrec and photo_tfrec).

In [1]:
import shutil
import os

working_dir = "/kaggle/working"

for item in os.listdir(working_dir):
    path = os.path.join(working_dir, item)
    try:
        if os.path.isfile(path) or os.path.islink(path):
            os.unlink(path)
        elif os.path.isdir(path):
            shutil.rmtree(path)
    except Exception as e:
        print(f"Failed to delete {path}: {e}")

print("✅ /kaggle/working cleaned.")

✅ /kaggle/working cleaned.


## Import Libraries

In [None]:
import tensorflow as tf
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from PIL import Image
import os
import numpy as np
from tqdm.notebook import tqdm
import glob
import shutil

## Load Data

In [None]:
monet_dir = '/kaggle/input/gan-getting-started/monet_jpg'
photo_dir = '/kaggle/input/gan-getting-started/photo_jpg'

In [None]:
IMG_HEIGHT = 256
IMG_WIDTH = 256
BATCH_SIZE = 1
AUTOTUNE = tf.data.AUTOTUNE

In [None]:
def normalize_img(img):
    img = tf.cast(img, tf.float32)
    return (img / 127.5) - 1  # Scale to [-1, 1]

In [None]:
def load_jpg(filename):
    img = tf.io.read_file(filename)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [IMG_HEIGHT, IMG_WIDTH])
    return normalize_img(img)

In [None]:
monet_files = tf.data.Dataset.list_files(monet_dir + '/*.jpg', seed=42)
photo_files = tf.data.Dataset.list_files(photo_dir + '/*.jpg', seed=42)

monet_ds = monet_files.map(load_jpg, num_parallel_calls=AUTOTUNE).cache().shuffle(300).batch(BATCH_SIZE).prefetch(AUTOTUNE)
photo_ds = photo_files.map(load_jpg, num_parallel_calls=AUTOTUNE).cache().shuffle(1000).batch(BATCH_SIZE).prefetch(AUTOTUNE)

## EDA

We can see that we have 300 Monet paintings that will be used to train the model and 7,038 photos to generate Monet-style paintings from.

There are no missing values and since all of the images are already resized to 256x256 so there isn't any more data cleaning that we need to perform.

## Model Building

In [None]:
def resnet_block(x, filters):
    init = x
    x = layers.Conv2D(filters, 3, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    x = layers.Conv2D(filters, 3, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.add([x, init])
    return x

In [None]:
def build_generator():
    inputs = layers.Input(shape=[IMG_HEIGHT, IMG_WIDTH, 3])
    x = layers.Conv2D(64, 7, padding='same')(inputs)
    x = layers.ReLU()(x)
    x = layers.Conv2D(128, 3, strides=2, padding='same')(x)
    x = layers.ReLU()(x)
    x = layers.Conv2D(256, 3, strides=2, padding='same')(x)
    x = layers.ReLU()(x)
    for _ in range(9): x = resnet_block(x, 256)
    x = layers.Conv2DTranspose(128, 3, strides=2, padding='same')(x)
    x = layers.ReLU()(x)
    x = layers.Conv2DTranspose(64, 3, strides=2, padding='same')(x)
    x = layers.ReLU()(x)
    x = layers.Conv2D(3, 7, padding='same', activation='tanh')(x)
    return tf.keras.Model(inputs, x)

In [None]:
def build_discriminator():
    inp = layers.Input(shape=[IMG_HEIGHT, IMG_WIDTH, 3])
    x = layers.Conv2D(64, 4, strides=2, padding='same')(inp)
    x = layers.LeakyReLU(0.2)(x)
    x = layers.Conv2D(128, 4, strides=2, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.LeakyReLU(0.2)(x)
    x = layers.Conv2D(256, 4, strides=2, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.LeakyReLU(0.2)(x)
    x = layers.Conv2D(512, 4, strides=1, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.LeakyReLU(0.2)(x)
    x = layers.Conv2D(1, 4, strides=1, padding='same')(x)
    return tf.keras.Model(inputs=inp, outputs=x)

## Loss and Optimizer Functions

In [None]:
loss_obj = tf.keras.losses.MeanSquaredError()
LAMBDA_CYCLE = 10
LAMBDA_IDENTITY = 0.5 * LAMBDA_CYCLE

def generator_loss(fake_output):
    return loss_obj(tf.ones_like(fake_output), fake_output)

def discriminator_loss(real_output, fake_output):
    return 0.5 * (loss_obj(tf.ones_like(real_output), real_output) + 
                  loss_obj(tf.zeros_like(fake_output), fake_output))

def cycle_loss(real, cycled):
    return tf.reduce_mean(tf.abs(real - cycled)) * LAMBDA_CYCLE

def identity_loss(real, same):
    return tf.reduce_mean(tf.abs(real - same)) * LAMBDA_IDENTITY

In [None]:
generator_g = build_generator()
generator_f = build_generator()
discriminator_x = build_discriminator()
discriminator_y = build_discriminator()

generator_g_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
generator_f_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_x_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_y_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)

## Model Training

In [None]:
@tf.function
def train_step(real_x, real_y):
    with tf.GradientTape(persistent=True) as tape:
        fake_y = generator_g(real_x, training=True)
        cycled_x = generator_f(fake_y, training=True)
        fake_x = generator_f(real_y, training=True)
        cycled_y = generator_g(fake_x, training=True)

        same_x = generator_f(real_x, training=True)
        same_y = generator_g(real_y, training=True)

        disc_real_x = discriminator_x(real_x, training=True)
        disc_real_y = discriminator_y(real_y, training=True)
        disc_fake_x = discriminator_x(fake_x, training=True)
        disc_fake_y = discriminator_y(fake_y, training=True)

        gen_g_loss = generator_loss(disc_fake_y)
        gen_f_loss = generator_loss(disc_fake_x)
        total_cycle_loss = cycle_loss(real_x, cycled_x) + cycle_loss(real_y, cycled_y)
        total_gen_g_loss = gen_g_loss + total_cycle_loss + identity_loss(real_y, same_y)
        total_gen_f_loss = gen_f_loss + total_cycle_loss + identity_loss(real_x, same_x)
        disc_x_loss = discriminator_loss(disc_real_x, disc_fake_x)
        disc_y_loss = discriminator_loss(disc_real_y, disc_fake_y)

    generator_g_optimizer.apply_gradients(zip(tape.gradient(total_gen_g_loss, generator_g.trainable_variables), generator_g.trainable_variables))
    generator_f_optimizer.apply_gradients(zip(tape.gradient(total_gen_f_loss, generator_f.trainable_variables), generator_f.trainable_variables))
    discriminator_x_optimizer.apply_gradients(zip(tape.gradient(disc_x_loss, discriminator_x.trainable_variables), discriminator_x.trainable_variables))
    discriminator_y_optimizer.apply_gradients(zip(tape.gradient(disc_y_loss, discriminator_y.trainable_variables), discriminator_y.trainable_variables))

In [None]:
import time

EPOCHS = 50

for epoch in range(EPOCHS):
    start = time.time()
    for image_x, image_y in tf.data.Dataset.zip((monet_ds, photo_ds)):
        train_step(image_x, image_y)
    print(f'Epoch {epoch+1} completed in {time.time()-start:.2f} sec')

## Submission

In [None]:
OUTPUT_DIR = '/kaggle/working'
os.makedirs(OUTPUT_DIR, exist_ok=True)

In [None]:
def denormalize_img(img_tensor):
    # Convert pixel range from [-1, 1] to [0, 255]
    img = (img_tensor + 1) * 127.5
    img = tf.clip_by_value(img, 0, 255)
    return tf.cast(img, tf.uint8)

def generate_and_save_images(generator, photo_paths, output_dir):
    i = 0
    for path in tqdm(photo_paths):
        image = load_jpg(path)
        image = tf.expand_dims(image, 0)  # Add batch dimension

        prediction = generator(image, training=False)
        prediction = denormalize_img(prediction[0])

        # Save as JPEG
        output_path = os.path.join(output_dir, f"{i}.jpg")
        Image.fromarray(prediction.numpy()).save(output_path)
        i += 1

In [None]:
photo_paths = sorted(glob.glob('/kaggle/input/gan-getting-started/photo_jpg/*.jpg'))[:7000]
generate_and_save_images(generator_g, photo_paths, OUTPUT_DIR)

In [None]:
shutil.make_archive('images', 'zip', OUTPUT_DIR)

## Results

In [None]:
def show_sample(index):
    img = load_jpg(photo_paths[index])
    generated = generator_g(tf.expand_dims(img, 0), training=False)
    generated = (generated[0] + 1) * 127.5

    fig, axs = plt.subplots(1, 2, figsize=(8, 4))
    axs[0].imshow((img + 1) / 2)
    axs[0].set_title("Original Photo")
    axs[1].imshow(generated.numpy().astype('uint8'))
    axs[1].set_title("Monet-style Output")
    for ax in axs:
        ax.axis('off')
    plt.show()

show_sample(0)

I trained a CycleGAN model for 50 epochs using Monet paintings and real-world photos. The generator learned to map photo images to the Monet style using adversarial, cycle-consistency, and identity losses. The training was slow on CPU but accelerated substantially when moved to Google Colab with GPU support.

The model successfully captured the Monet style with soft brush strokes, pastel tones, and texture patterns. Generated images showed meaningful transformations while preserving content (e.g., shapes, trees, sky). Identity and cycle-consistency losses helped maintain structure and color integrity.

Hyperparameter Settings: Optimizers: Adam (lr=2e-4, β₁=0.5) Loss Weights: Cycle = 10, Identity = 5 Batch Size: 1 (common in GAN training) Epochs: 50

What Helped: Using ResNet blocks in the generator improved detail preservation. Batch normalization stabilized training. Training with image normalization to [-1, 1] was essential for tanh activation.

Challenges: Training on CPU was unfeasibly slow. GPU memory constraints forced us to use batch size of 1. Without TFRecord + TPU acceleration, full convergence is slow.

Future Work: Use TFRecords with TPU for speed and scalability Add perceptual loss for better visual fidelity Use style attention layers for more nuanced style transfer