# Atividade 3: Classifica√ß√£o de Lixo Dom√©stico

> Classifica√ß√£o de lixo dom√©stico utilizando Python e Keras.

## Desafio

Classificar alguns objetos encontrados em lixo dom√©stico usando o _dataset_ do Kaggle dispon√≠vel em https://www.kaggle.com/datasets/farzadnekouei/trash-type-image-dataset/.
O conjunto de dados possui 6 classes (6 tipos de lixo):

- üì¶ Caixas de papel√£o;
- ü•Ç Vidro;
- üõ¢Ô∏è Metal;
- üóûÔ∏è Papel;
- ü•§ Pl√°stico;
- üóëÔ∏è Entulhos (restos de embalagem, comida e outros que n√£o se enquadram nas categorias anteriores).

## Autores

- Orientadora: Ello√° B. Guedes - [@elloa](https://github.com/elloa)
- Time:
  - Debora Souza Barros - [@Debby-Barros](https://github.com/Debby-Barros)
  - Diana Martins - [@ddianaom](https://github.com/ddianaom)
  - Gabriel Dos Santos Lima - [@gabrielSantosLima](https://github.com/gabrielSantosLima)
  - Thiago Marques - [@tmmarquess ](https://github.com/tmmarquess)


## Etapa 0: Configura√ß√£o do ambiente

Os t√≥picos que ser√£o abordados nesta etapa:
* Importa√ß√£o das bibliotecas
* Baixar o _dataset_ para o arquivo local do projeto  

In [None]:
!pip install optuna keras_tuner tensorflow[and-cuda] kaggle

In [None]:
import os
import zipfile
import cv2
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

os.environ["KERAS_BACKEND"] = "tensorflow"

import keras
import keras_tuner as kt

from collections import Counter
from glob import glob
from keras_tuner import HyperModel
from keras.utils import to_categorical
from keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

In [None]:
# baixando do kaggle
if not os.path.isdir('dataset'):
  !rm -r sample_data
  !kaggle datasets download -d farzadnekouei/trash-type-image-dataset
  !unzip trash-type-image-dataset.zip
  !rm trash-type-image-dataset.zip
  !mv TrashType_Image_Dataset dataset
else:
  print("Conjunto de dados j√° existe no diret√≥rio atual.")

## Etapa 1: Importa√ß√£o do conjunto de dados

Os t√≥picos que ser√£o abordados nesta etapa:
* Importar o _dataset_
* Verificar quantos exemplos o _dataset_ possui

In [None]:
# diret√≥rio do dataset
base_dir = 'dataset'

# quantidade de exemplos do dataset
image_files = glob(os.path.join(base_dir, '**', '*.jpg'), recursive=True)
print(f'O dataset possui {len(image_files)} imagens')

## Etapa 2: An√°lise explorat√≥ria

Os t√≥picos que podem ser abordados nesta etapa:
* Buscar explorar informa√ß√µes relevantes sobre a base de dados. Algumas sugest√µes de perguntas que podem servir como ponto de partida:
  * Quantas classes existem?
  * Quantos exemplos cada classe possui?
* Analisar a qualidade das imagens do _dataset_ e descrever as limita√ß√µes que podem ser encontradas (se poss√≠vel apresentar exemplos)

In [None]:
# quantidade de classes no dataset
count_classes = 0
for dir in os.listdir(base_dir):
  count_classes += 1

print(f"No dataset 'Trash type' existem {count_classes} classes")

In [None]:
# quantidades de exemplos em cada classe
files_count = {}
for root, dirs, files in os.walk(base_dir):
  for dir in dirs:
    qtd_files = os.path.join(root, dir)
    count = len(os.listdir(qtd_files))
    files_count[dir] = count


for key, item in files_count.items():
  print(f'Na classe "{key}" existem {item} imagens')

In [None]:
# Dimens√µes das imagens
def img_dimensions(img_dir):
    files = os.listdir(img_dir)
    dim = []
    for file in files:
        img_path = os.path.join(img_dir, file)
        img = cv2.imread(img_path)

        height, width, channels = img.shape
        dim.append((height, width))

    count_dim = Counter(dim)

    print("Dimens√µes mais comuns:")
    for dim, freq in count_dim.most_common(15):
        print(f"Dimens√£o (altura x largura): {dim}, Frequ√™ncia: {freq}")
    print('\n')


for dir in os.listdir(base_dir):
  dir_path = os.path.join(base_dir, dir)
  if os.path.isdir(dir_path):
    print(f'Analisando imagens em: {dir_path}')
    img_dimensions(dir_path)

In [None]:
# plotando algumas imagens das classes do dataset 'Trash Type'
def plot_images_from_subfolders(base_dir, num_images=3):
    subfolders = [os.path.join(base_dir, folder) for folder in os.listdir(base_dir) if os.path.isdir(os.path.join(base_dir, folder))]

    for folder_path in subfolders:
        print(f"Imagens de: {folder_path}")
        fig, axes = plt.subplots(nrows=1, ncols=num_images, figsize=(15, 5))
        files = os.listdir(folder_path)

        for i in range(num_images):
            img_path = os.path.join(folder_path, files[i])
            img = cv2.imread(img_path)
            img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            axes[i].imshow(img_rgb)
            axes[i].axis('off')
        plt.show()

plot_images_from_subfolders(base_dir, 3)

## Etapa 3: Pr√©-processamento

Os t√≥picos que podem ser abordados nesta etapa:
* Definir o tamanho da grade de busca a ser contemplada
* Preparar o conjunto de dados para o treinamento com a estrat√©gia de valida√ß√£o cruzada _holdout_

In [None]:
# Utilizano Keras
param_grid_keras = {
    'units': [32, 64, 128, 256],
    'activation': ['relu', 'tanh', 'sigmoid'],
    'optimizer': ['adam', 'sgd', 'rmsprop'],
    'learning_rate': [0.01, 0.1, 0.4],
    'batch_size': [32, 64],
    'epochs': [10, 20, 30]
}

In [None]:
# Preparar o conjunto de dados para o treinamento com a estrat√©gia de valida√ß√£o cruzada holdout
data = []
labels = []

for root, dirs, files in os.walk(base_dir):
  for dir in dirs:
    for file in os.listdir(os.path.join(root, dir)):
      img_path = os.path.join(root, dir, file)
      img = cv2.imread(img_path)
      img = np.array(img)
      label = dir
      data.append(img)
      labels.append(label)

In [None]:
# Organizando algumas informa√ß√µes sobre o conjunto de dados
num_classes = 6
image_shape = data[0].shape

In [None]:
# Preparando a transforma√ß√£o dos r√≥tulos para atributos categ√≥ricos utilizando OneHotEncoder
encoder = LabelEncoder()

In [None]:
X = np.array(data)
y = to_categorical(encoder.fit_transform(np.array(labels)))

x_train_temp, x_test, y_train_temp, y_test = train_test_split(X, y, test_size=.3, shuffle=True) # Holdout 70/30
x_train, x_val, y_train, y_val = train_test_split(x_train_temp, y_train_temp, test_size=.2, shuffle=True) # Holdout 80/20

## Etapa 4: Treinamento e testes dos modelos

Os t√≥picos que ser√£o abordados nesta etapa:
* Definir qual o modelo que ser√° utilizado e quais arquiteturas ser√£o avaliadas
* Preparar modelo(s) para grade de busca
* Treinamento
* Teste do(s) modelo(s)

In [None]:
class CNNModel(HyperModel):
    def __init__(self, input_shape, num_classes, name=None, tunable=True):
       super().__init__(name, tunable)
       self.input_shape = input_shape
       self.num_classes = num_classes

    def __choice_param(self, param, hp):
      return hp.Choice(param, param_grid_keras[param])

    def build(self, hp):
        model = keras.models.Sequential()

        # Input Layer
        model.add(keras.layers.Input(shape=self.input_shape))

        # Feature Layers
        model.add(keras.layers.Conv2D(64, kernel_size=(3, 3), activation=self.__choice_param('activation', hp)))
        model.add(keras.layers.Conv2D(64, kernel_size=(3, 3), activation=self.__choice_param('activation', hp)))
        model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
        model.add(keras.layers.Conv2D(128, kernel_size=(3, 3), activation=self.__choice_param('activation', hp)))
        model.add(keras.layers.Conv2D(128, kernel_size=(3, 3), activation=self.__choice_param('activation', hp)))
        model.add(keras.layers.GlobalAveragePooling2D())
        model.add(keras.layers.Dropout(0.5))

        # Dense Layers
        model.add(keras.layers.Dense(128, activation=self.__choice_param('activation', hp)))
        model.add(keras.layers.Dense(self.num_classes, activation='softmax'))

        # Preparing model to train
        model.compile(loss = 'categorical_crossentropy',
                      optimizer=keras.optimizers.Adam(learning_rate=self.__choice_param('learning_rate', hp)),
                      metrics=['accuracy'])
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            batch_size=self.__choice_param('batch_size', hp),
            **kwargs,
        )

### Treinando

In [None]:
epochs = 1
max_trials = 5
early_stopping_callback = EarlyStopping(monitor='val_loss',
                                        patience=3)

Treinamento: Rede customizada

In [None]:
tuner_custom = kt.RandomSearch(
    CNNModel(image_shape, num_classes),
    objective='val_accuracy',
    directory='models/custom',
    overwrite=True,
    max_trials=max_trials)

In [None]:
tuner_custom.search(
    x_train,
    y_train,
    epochs=epochs,
    validation_data=(x_val, y_val),
    callbacks=[early_stopping_callback])

Treinamento: ResNet

In [None]:
batch_size = 32

In [None]:
tuner_resnet = kt.RandomSearch(
    kt.applications.HyperResNet(input_shape=image_shape, classes=num_classes),
    objective='val_accuracy',
    directory='models/resnet',
    overwrite=True,
    max_trials=max_trials)

In [None]:
tuner_resnet.search(
    x_train,
    y_train,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(x_val, y_val),
    callbacks=[early_stopping_callback])

### Recuperando os melhores modelos

In [None]:
def get_best_model(tuner):
  best_models = tuner.get_best_models(num_models=1)
  return best_models[0]

In [None]:
custom_best_model = get_best_model(tuner_custom)
custom_best_model.summary()

In [None]:
resnet_best_model = get_best_model(tuner_resnet)
resnet_best_model.summary()

In [None]:
def print_history_of_model(model):
  history = model.fit(
      x_train,
      y_train,
      batch_size=batch_size,
      epochs=epochs,
      validation_data=(x_val, y_val))

  plt.figure(figsize=(6,6))
  plt.plot(history.history['accuracy'], label='acur√°cia do treinamento')
  plt.plot(history.history['val_accuracy'], label='acur√°cia da valida√ß√£o')
  plt.title('Hist√≥rico de Acur√°cia')
  plt.xlabel('√âpocas')
  plt.ylabel('Acur√°cia')
  plt.legend()
  plt.show()

In [None]:
print_history_of_model(custom_best_model)

In [None]:
print_history_of_model(resnet_best_model)

### Salvando os modelos

In [None]:
custom_best_model.save('model_custom.keras')

In [None]:
resnet_best_model.save('model_resnet.keras')

## Etapa 5: An√°lise quantitativa e qualitativa de desempenho dos modelos avaliados

Os t√≥picos que podem ser abordados nesta etapa:
* An√°lise quantitativa do(s) modelo(s)
* An√°lise qualitativa do(s) modelo(s)
* Conclus√£o. Incluir na disserta√ß√£o:
  * Sugest√µes de melhoria;
  * Desafios;
  * Pr√≥ximos passos.

### Avaliando a qualidade dos modelos

In [None]:
# Fun√ß√£o de avalia√ß√£o
def show_metrics(y_true, y_pred):
    # Matriz de Confus√£o
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(10, 8))
    plt.title('Matriz de Confus√£o')
    sns.heatmap(cm, annot=True, fmt='.0f', cmap='Blues')
    plt.show()

    # Acur√°cia
    acc = accuracy_score(y_true, y_pred)
    print(f"\nAcur√°cia: {acc:.4f}")

    # F1-Score
    f_score = f1_score(y_true, y_pred, average='weighted')
    print(f"F1-Score: {f_score:.4f}")

    # Precis√£o
    precision = precision_score(y_true, y_pred, average='weighted')
    print(f"Precis√£o: {precision:.4f}")

    # Revoca√ß√£o
    recall = recall_score(y_true, y_pred, average='weighted')
    print(f"Revoca√ß√£o: {recall:.4f}")

def load_and_predict(model_path, x_test):
    model = keras.models.load_model(model_path)
    y_pred = model.predict(x_test)
    return np.argmax(y_pred, axis=1)

def evaluate_model(model_name, y_test_classes, y_pred):
    print(f"M√©tricas do {model_name}:")
    show_metrics(y_test_classes, y_pred)

In [None]:
# Carregar e prever usando os modelos
y_pred_custom = load_and_predict('model_custom.keras', x_test)
y_pred_resnet = load_and_predict('model_resnet.keras', x_test)

# Converter y_test para classes
y_test_classes = np.argmax(y_test, axis=1)

In [None]:
evaluate_model("Modelo Custom", y_test_classes, y_pred_custom)

In [None]:
evaluate_model("Modelo ResNet", y_test_classes, y_pred_resnet)