# Flower Image Classification - Data Exploration

This notebook explores the flower image dataset and demonstrates basic data preprocessing steps for our image classification project.

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import os

# Set up paths
# Adjust this if the notebook is moved
SRC_DIR = Path("../src")
import sys
sys.path.append(str(SRC_DIR.resolve()))

In [None]:
# Import our custom modules
from data_loader import download_and_prepare_dataset, prepare_dataset_for_training, visualize_data
from model import create_cnn_model, create_transfer_learning_model

## 1. Load and Explore the Dataset

We'll use the TensorFlow Flowers dataset, which contains images of 5 types of flowers.

In [None]:
# Download and prepare the dataset
train_ds, val_ds, class_names, num_classes = download_and_prepare_dataset()

print(f"Number of classes: {num_classes}")
print(f"Class names: {class_names}")

## 2. Visualize Sample Images

Let's look at some examples from each class to understand our data better.

In [None]:
# Prepare datasets for visualization
raw_train_ds = train_ds  # Keep a reference to the raw dataset
train_ds, val_ds = prepare_dataset_for_training(train_ds, val_ds, cache=False)

# Create unbatched dataset for visualization
unbatched_ds = train_ds.unbatch()

# Visualize some examples
visualize_data(unbatched_ds, class_names)

## 3. Analyze Class Distribution

Let's check if our dataset is balanced across classes.

In [None]:
# Count examples per class
class_counts = {name: 0 for name in class_names}

for _, label in raw_train_ds:
    class_counts[class_names[label.numpy()]] += 1

# Plot class distribution
plt.figure(figsize=(10, 6))
plt.bar(class_counts.keys(), class_counts.values())
plt.title('Class Distribution in Training Dataset')
plt.xlabel('Flower Type')
plt.ylabel('Number of Images')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("Class distribution:")
for class_name, count in class_counts.items():
    print(f"{class_name}: {count} images")

## 4. Examine Image Properties

Let's look at the size distribution and other properties of our images.

In [None]:
# Analyze image properties
image_sizes = []
aspect_ratios = []

# Sample some images for analysis
for image, _ in raw_train_ds.take(100):
    height, width, _ = image.shape
    image_sizes.append((height, width))
    aspect_ratios.append(width / height)

# Plot image size distribution
heights, widths = zip(*image_sizes)

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(widths, heights, alpha=0.5)
plt.title('Image Dimensions')
plt.xlabel('Width (pixels)')
plt.ylabel('Height (pixels)')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.hist(aspect_ratios, bins=20, alpha=0.7)
plt.axvline(x=1, color='r', linestyle='--', label='Square')
plt.title('Aspect Ratio Distribution')
plt.xlabel('Aspect Ratio (width/height)')
plt.ylabel('Count')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

print(f"Average image size: {np.mean(heights):.1f} x {np.mean(widths):.1f} pixels")
print(f"Average aspect ratio: {np.mean(aspect_ratios):.2f}")

## 5. Data Preprocessing Pipeline

Examine the data preprocessing steps.

In [None]:
# Get a single image for demonstration
for image, label in raw_train_ds.take(1):
    original_image = image.numpy()
    class_name = class_names[label.numpy()]
    
    # Preprocess the image
    # Resize
    resized_image = tf.image.resize(image, [224, 224]).numpy()
    # Normalize
    normalized_image = resized_image / 255.0
    
    # Display the transformations
    plt.figure(figsize=(12, 4))
    
    plt.subplot(1, 3, 1)
    plt.imshow(original_image)
    plt.title(f'Original: {class_name}\n{original_image.shape}')
    plt.axis('off')
    
    plt.subplot(1, 3, 2)
    plt.imshow(resized_image.astype('uint8'))
    plt.title(f'Resized: 224x224\n{resized_image.shape}')
    plt.axis('off')
    
    plt.subplot(1, 3, 3)
    plt.imshow(normalized_image)
    plt.title(f'Normalized: [0, 1]\n{normalized_image.shape}')
    plt.axis('off')
    
    plt.tight_layout()
    plt.show()

## 6. Model Architecture Preview

Let's examine our model architectures.

In [None]:
# Create both models
basic_model = create_cnn_model(num_classes=num_classes)
transfer_model = create_transfer_learning_model(num_classes=num_classes)

# Print model summaries
print("Basic CNN Model:")
basic_model.summary()

print("\nTransfer Learning Model:")
transfer_model.summary()

## 7. Conclusion

In this notebook, we've:

1. Loaded and explored the flowers dataset
2. Visualized sample images from each class
3. Analyzed the class distribution
4. Examined image properties and sizes
5. Demonstrated the preprocessing pipeline
6. Previewed our model architectures

Next steps would be to train the models using our `train.py` script and evaluate their performance.

In [None]:
# That's it for exploration!
print("Notebook completed.")