# Exploratory Data Analysis (EDA)

In this notebook, we will perform exploratory data analysis on the coconut leaf image dataset. The goal is to understand the data distribution, visualize the images, and identify any patterns or anomalies.

In [1]:
# Import necessary libraries
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Define the path to the dataset
data_path = '../data/annotated/'  # Adjust the path as necessary

# List all images in the dataset
image_files = [f for f in os.listdir(data_path) if f.endswith('.jpg') or f.endswith('.png')]
print(f'Total images found: {len(image_files)}')

In [3]:
# Display a few sample images
plt.figure(figsize=(15, 10))
for i, image_file in enumerate(image_files[:9]):  # Display first 9 images
    img = Image.open(os.path.join(data_path, image_file))
    plt.subplot(3, 3, i + 1)
    plt.imshow(img)
    plt.axis('off')
    plt.title(image_file)
plt.show()

In [4]:
# Load labels and create a DataFrame
# Assuming labels are stored in a CSV file
labels_df = pd.read_csv('../data/annotations.csv')  # Adjust the path as necessary
print(labels_df.head())

In [5]:
# Visualize the distribution of classes
plt.figure(figsize=(10, 6))
sns.countplot(data=labels_df, x='label')
plt.title('Distribution of Classes')
plt.xticks(rotation=45)
plt.show()

## Conclusion

In this EDA, we explored the coconut leaf image dataset, visualized some sample images, and analyzed the distribution of classes. Further analysis can be performed to understand the characteristics of the images and prepare for model training.