## **<h3 align="center"> Deep Learning - Project </h3>**
# **<h3 align="center"> Phylum Arthropoda - Steven</h3>**
**Group 4 members:**<br>
Alexandra Pinto - 20211599@novaims.unl.pt - 20211599<br>
Steven Carlson - 20240554@novaims.unl.pt - 20240554<br>
Sven Goerdes - 20240503@novaims.unl.pt - 20240503<br>
Tim Straub - 20240505@novaims.unl.pt - 20240505<br>
Zofia Wojcik  - 20240654@novaims.unl.pt - 20240654<br>

# Table of Contents
* [1. Introduction](#intro)
* [2. Setup](#setup)
* [3. Data Loading](#dataloading)
* [4. Image Preprocessing](#imagepreprocessing)
* [5. Neural Networks Models](#nnmodels)



# 1. Introduction <a class="anchor" id="intro"></a>

In this second notebook, we will preprocess images from the **Arthropoda** phylum and develop a deep learning model to accurately classify them at the family level.

# 2. Setup <a class="anchor" id="setup"></a>
In this section, we will import the necessary libraries that will be used throughout the notebook. These libraries will help with data handling and image processing.

In [3]:
# Standard libraries
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import zipfile
import seaborn as sns

# Libraries for image processing
from glob import glob
from PIL import Image
import tensorflow as tf
from tensorflow.keras import layers, models


In [4]:
#Libraries from Keras
from keras.utils import image_dataset_from_directory

# 3. Data Loading <a class="anchor" id="dataloading"></a>

Let's open the train and test for Arthropoda Phylum.

In [5]:
# Load the DataFrame from the CSV file
arthropoda_train = pd.read_csv("train_test_splits/Arthropoda_train.csv")
arthropoda_train.head(3)

Unnamed: 0,eol_content_id,eol_page_id,kingdom,phylum,family,file_path
0,29600171,52691998,animalia,arthropoda,attelabidae,arthropoda_attelabidae/29600171_52691998_eol-f...
1,29911905,132324,animalia,arthropoda,platystictidae,arthropoda_platystictidae/29911905_132324_eol-...
2,28408134,1065346,animalia,arthropoda,apidae,arthropoda_apidae/28408134_1065346_eol-full-si...


In [6]:
# Load the DataFrame from the CSV file
arthropoda_test = pd.read_csv("train_test_splits/Arthropoda_test.csv")
arthropoda_test.head(3)

Unnamed: 0,eol_content_id,eol_page_id,kingdom,phylum,family,file_path
0,14644220,463474,animalia,arthropoda,formicidae,arthropoda_formicidae/14644220_463474_eol-full...
1,21071836,355546,animalia,arthropoda,gomphidae,arthropoda_gomphidae/21071836_355546_eol-full-...
2,29404596,1198625,animalia,arthropoda,pisauridae,arthropoda_pisauridae/29404596_1198625_eol-ful...


# 4. Image Preprocessing <a class="anchor" id="imagepreprocessing"></a>

In [24]:
#Define some stuff
num_classes = arthropoda_train['family'].nunique() #number of classes = number of families
batch_size = 64
input_shape = (596, 596, 3)
image_size = (596, 596)
value_range = (0.0, 1.0)


In [19]:
root_dir = r"C:\Users\sacar\OneDrive\Documents\Semester 2 NOVA\DL\DeepLearning2425\rare_species"

#Do this in pandas
arthropoda_train['full_path'] = arthropoda_train['file_path'].apply(lambda x: os.path.normpath(os.path.join(root_dir, x)))

file_paths = arthropoda_train['full_path'].tolist()
labels = arthropoda_train['family'].tolist()

print(file_paths[:5])
print(labels[:5])

['C:\\Users\\sacar\\OneDrive\\Documents\\Semester 2 NOVA\\DL\\DeepLearning2425\\rare_species\\arthropoda_attelabidae\\29600171_52691998_eol-full-size-copy.jpg', 'C:\\Users\\sacar\\OneDrive\\Documents\\Semester 2 NOVA\\DL\\DeepLearning2425\\rare_species\\arthropoda_platystictidae\\29911905_132324_eol-full-size-copy.jpg', 'C:\\Users\\sacar\\OneDrive\\Documents\\Semester 2 NOVA\\DL\\DeepLearning2425\\rare_species\\arthropoda_apidae\\28408134_1065346_eol-full-size-copy.jpg', 'C:\\Users\\sacar\\OneDrive\\Documents\\Semester 2 NOVA\\DL\\DeepLearning2425\\rare_species\\arthropoda_tettigoniidae\\29822044_858738_eol-full-size-copy.jpg', 'C:\\Users\\sacar\\OneDrive\\Documents\\Semester 2 NOVA\\DL\\DeepLearning2425\\rare_species\\arthropoda_cerambycidae\\22539004_354871_eol-full-size-copy.jpg']
['attelabidae', 'platystictidae', 'apidae', 'tettigoniidae', 'cerambycidae']


In [None]:
#Create the tensorflow dataset
train = tf.data.Dataset.from_tensor_slices((file_paths, labels))
print(train)

<_TensorSliceDataset element_spec=(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.string, name=None))>


In [22]:
#Function to process the images
def process_image(file_path, label):
    image = tf.io.read_file(file_path) # Read the image file
    image = tf.image.decode_jpeg(image, channels=3) # Decode the JPEG image
    image = tf.image.resize(image, image_size) # Resize the image to the target size
    image = tf.cast(image, tf.float32) / 255.0  # Normalize to [0, 1]
    return image, label
    
     

In [25]:
train = train.map(process_image, num_parallel_calls=tf.data.AUTOTUNE) # Map the function to the dataset

In [29]:
for image, label in train.take(3):
    print("Image shape:", image.numpy().shape)
    print("Label:", label.numpy())

Image shape: (596, 596, 3)
Label: b'attelabidae'
Image shape: (596, 596, 3)
Label: b'platystictidae'
Image shape: (596, 596, 3)
Label: b'apidae'


# 5. Neural Network Models <a class="anchor" id="nnmodels"></a>

NameError: name 'models' is not defined