# Exploratory Data Analysis for Hair Type Classification

**Goals of This Notebook:**
- Conduct a structured exploratory analysis of the Hair Type dataset to understand its characteristics before building any classification or clustering models

- Examine class distributions, visual variability, image statistcs, basic texture patterns that may relate to curl tightness

- These insights are the foundation to the later stages of the project: (1) training VGG16-based CNN to classify the four broad hair categories, and (2) extracting texture and curl features to explore finer curl patterned (similiar to the Andrew Walker types) emerge through unsupervised clustering

#### Note: Some cells include working notes and ideas intended for my report

In [7]:
# download kaggle dataset
import kagglehub
kaggle_hairtype_dataset_path = kagglehub.dataset_download('kavyasreeb/hair-type-dataset')

print("Dataset downloaded at: ", kaggle_hairtype_dataset_path )

data_dir = kaggle_hairtype_dataset_path


Dataset downloaded at:  C:\Users\nisa2\.cache\kagglehub\datasets\kavyasreeb\hair-type-dataset\versions\1


In [8]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


In [11]:
import sys
print(sys.version)

3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]


In [12]:
# libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import seaborn as sns
import os
import matplotlib.pyplot as plt


ImportError: Traceback (most recent call last):
  File "C:\Users\nisa2\anaconda3\Lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 73, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed.


Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors for some common causes and solutions.
If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message.

In [None]:
# random seeds

np.random.seed(42)
tf.random.set_seed(42)

# Dataset Exploration (EDA)

## Load Dataset & Inspect Metadata

- for "Dataset Description" of report

In [None]:
# load dataset
i

# see labels

In [None]:
# dataset path

# print folder structure

## Dataset Overview

- good for Experiment Settings and Benchmark Data section of report

In [None]:
# total number of images

# number of images per class

# class imbalance ratio

In [None]:
# bar chart visualization of class distribution
sns.countplot(data=df, x='label')
plt.title("Class Distribution")

## Display Sample Images

Why:
- See Data diversity
- see lighting/background issues (can be mentioned in report)
- see if curliness is visually apparent

figures would go well in report

(delete or rewrite later)

In [None]:
# One figure per class
# display 10 images per class

## Image Dimensions & Aspect Ratio

- Good for justifiing resizing decisions before sending images into VGG16

In [None]:
# load images

# print shapes (height, width, channels)

# count frequency of unique dimensions

# visualize histogram of aspect ratios

## Image Statisitics (to analyze class differences for CNN)

 - Average brightness
 - Average color histograms per channel
 - hair-region pixel proportion
 
 helps answer these if want to put in report:
 - does hair type correlate with brightness or color distributions
 - do curly/kinky images have different texture intensity

In [None]:
# convert sample images to grayscale

In [None]:
# compute average brightness per image

In [None]:
# plot brightness distribution per image

In [None]:
# compute color histograms for RGB channels

## Edge Detection Exploration

This is an insight into hair texture

Insight goals:
- curly and kinky hair have more edges
- straight hair have smoother regions

-- this supports reasoning for feature extraction and curl tightness

In [None]:
# apply Canny edge detection on sample images

In [None]:
# plot number of edge pixels per class

## Texture Feature Exploration

-- to show why texture matters and motivate feature extraction

### Gabor Filter
- show frequency and orietation patterns
---- curly texture produce unique responses

In [None]:
# show frequency and orientation patterns

### Local Binary Patterns
- lbp is good for strand texture

### GLCM Metrics
- contrast, homogeneity, energy, entropy

In [None]:
# contrast

# homogeneity

# energy

# entropy

## Feature DataFrame Preview

-- 40 random images for preview

image       label          curl_tightness        glcm contrast         density

-- for technical details

## Dimentionality Reduction Preview


-- this will help prove that there is a possiblilty what i will do will work

-- can show sepraretion between classes and potential subclutsters within images

In [None]:
# run PCA on preview dataset

In [None]:
# plot points colors by class (straight, wavy, curly, kinky)

## Prepare Data for Classification (VGG16)

In [None]:
# image resize size

In [None]:
# train, val, and test split paths

In [None]:
# preprocess input