# Task
Train an AI model for medical exam and clinical document processing with the following objectives: classify exams using Machine Learning on tabular data to diagnose whether a person has a disease, and optionally perform diagnosis with image data using Convolutional Neural Networks (CNN). The solution should involve selecting public medical datasets, exploring and preprocessing the data, building and evaluating classification models using at least two techniques, interpreting the results with techniques like feature importance and SHAP, and critically discussing the practical applicability of the model while emphasizing the physician's final decision.

## Escolha e carregamento dos dados

### Subtask:
Selecionar um ou mais datasets médicos públicos adequados para o problema de classificação de doenças e carregar os dados em um formato acessível (por exemplo, pandas DataFrame).


# Task
Desenvolver um modelo de classificação de imagens utilizando Redes Neurais Convolucionais (CNN) para diagnosticar tuberculose a partir de radiografias de tórax. O modelo deve ser treinado com imagens de exames normais e com tuberculose, disponíveis nos seguintes links do Google Drive: "https://drive.google.com/drive/folders/1jCPGtOqr--sK3lXdefqd0AcS3fKaIL2S?usp=drive_link" (exames normais) e "https://drive.google.com/drive/folders/13DxNeyOJIJvCIAHfkOtrfzZAS-_6OB9l?usp=drive_link" (exames com tuberculose). O projeto deve incluir o carregamento e organização das imagens, pré-processamento, construção e treinamento da CNN, avaliação do modelo com métricas adequadas e uma discussão crítica dos resultados e sua aplicabilidade.

## Montar o google drive

### Subtask:
Acessar os arquivos de imagem diretamente do Google Drive.


**Reasoning**:
Import the necessary library and mount Google Drive to access the image files.



In [None]:
import kagglehub
from pathlib import Path

dataset_path = Path(__file__).parent / "dataset"

# Download latest version
path = kagglehub.dataset_download("tawsifurrahman/tuberculosis-tb-chest-xray-dataset", path=dataset_path)

print("Path to dataset files:", path)



MessageError: Error: credential propagation was unsuccessful

## Carregar e organizar as imagens

### Subtask:
Carregar as imagens dos links fornecidos (radiografias normais e com tuberculose) e organizá-las em estruturas de diretório adequadas para o treinamento de modelos de Deep Learning.


**Reasoning**:
Create the necessary directories and copy the images from Google Drive to these directories, then count the number of images in each directory to verify the copy operation.



**Reasoning**:
Instantiate an ImageDataGenerator for training data with rescaling and data augmentation, and another for validation data with only rescaling, both with a validation split.

**Reasoning**:
Use the ImageDataGenerator objects to create data generators for training and validation by flowing from the image directory.

**Reasoning**:
Create the training and validation data generators using flow_from_directory and print the number of images found to verify the data loading.

In [None]:
import os

normal_dir = '/content/data/normal'
tuberculosis_dir = '/content/data/tuberculosis'

print(f"Contents of {normal_dir}:")
print(os.listdir(normal_dir)[:10]) # Print first 10 files to avoid long output
print(f"\nNumber of files in {normal_dir}: {len(os.listdir(normal_dir))}")

print(f"\nContents of {tuberculosis_dir}:")
print(os.listdir(tuberculosis_dir)[:10]) # Print first 10 files to avoid long output
print(f"\nNumber of files in {tuberculosis_dir}: {len(os.listdir(tuberculosis_dir))}")

Contents of /content/data/normal:
['Normal-2029.png', 'Normal-407.png', 'Normal-1936.png', 'Normal-2413.png', 'Normal-3410.png', 'Normal-859.png', 'Normal-490.png', 'Normal-934.png', 'Normal-973.png', 'Normal-1421.png']

Number of files in /content/data/normal: 3500

Contents of /content/data/tuberculosis:
['Tuberculosis-132.png', 'Tuberculosis-128.png', 'Tuberculosis-133.png', 'Tuberculosis-168.png', 'Tuberculosis-149.png', 'Tuberculosis-113.png', 'Tuberculosis-111.png', 'Tuberculosis-137.png', 'Tuberculosis-138.png']

Number of files in /content/data/tuberculosis: 9


**Reasoning**:
The previous attempts to load and organize the images failed because the source paths were incorrect. The user needs to update the `source_normal` and `source_tuberculosis` variables with the correct paths to their image folders in Google Drive. The provided code block includes instructions for the user to update these paths. Executing this block again with the correct paths should allow the images to be copied, thus completing the subtask. I am now updating the paths based on the user's previous input and adding code to clear the destination directories before copying.

**Reasoning**:
The previous command failed because the specified paths for the image folders in Google Drive were incorrect. I need to provide the correct paths to the folders based on the Google Drive links provided in the task description. Since I don't have direct access to the Google Drive file structure from the links, I will assume the user needs to update the `source_normal` and `source_tuberculosis` variables with the actual paths to the folders after mounting Google Drive. I will regenerate the code block with comments indicating where the user should update the paths.



In [None]:
import os
import shutil

# Create directories
os.makedirs('/content/data/normal', exist_ok=True)
os.makedirs('/content/data/tuberculosis', exist_ok=True)

# Define source directories in Google Drive.
# **IMPORTANT**: Replace the paths below with the actual paths to your image folders
# after mounting your Google Drive.
# You can find the correct paths by navigating your mounted Google Drive in the Colab file explorer.
source_normal = '/content/drive/MyDrive/path/to/your/normal/images' # Update this path
source_tuberculosis = '/content/drive/MyDrive/path/to/your/tuberculosis/images' # Update this path

# Copy images from Google Drive to the new directories
try:
    for filename in os.listdir(source_normal):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_normal, filename), '/content/data/normal')

    for filename in os.listdir(source_tuberculosis):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_tuberculosis, filename), '/content/data/tuberculosis')

    # Verify the copy operation by counting files
    num_normal = len(os.listdir('/content/data/normal'))
    num_tuberculosis = len(os.listdir('/content/data/tuberculosis'))

    print(f"Number of normal images copied: {num_normal}")
    print(f"Number of tuberculosis images copied: {num_tuberculosis}")

except FileNotFoundError as e:
    print(f"Error: {e}. Please ensure the source paths in Google Drive are correct.")
except Exception as e:
    print(f"An error occurred: {e}")


Error: [Errno 2] No such file or directory: '/content/drive/MyDrive/path/to/your/normal/images'. Please ensure the source paths in Google Drive are correct.


## Pré-processamento das imagens

### Subtask:
Realizar o pré-processamento das imagens, incluindo redimensionamento, normalização e possivelmente aumento de dados (data augmentation) para preparar o dataset para a CNN.


**Reasoning**:
Import necessary libraries and define parameters for image preprocessing.



In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

IMG_HEIGHT = 224
IMG_WIDTH = 224
BATCH_SIZE = 32

**Reasoning**:
Create ImageDataGenerators for training and validation sets, including data augmentation for training.



In [None]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    horizontal_flip=True,
    shear_range=0.2,
    height_shift_range=0.2,
    width_shift_range=0.2,
    validation_split=0.2 # Split data for validation
)

validation_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2 # Use the same split as training
)

**Reasoning**:
Use the ImageDataGenerators to load images from the directories, applying preprocessing and creating data generators for training and validation.



In [None]:
data_dir = '/content/data'

train_generator = train_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training',
    shuffle=True
)

validation_generator = validation_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation',
    shuffle=False # No need to shuffle validation data
)

Found 2808 images belonging to 2 classes.
Found 701 images belonging to 2 classes.


## Carregar e organizar as imagens

### Subtask:
Carregar as imagens dos links fornecidos (radiografias normais e com tuberculose) e organizá-las em estruturas de diretório adequadas para o treinamento de modelos de Deep Learning.


**Reasoning**:
The previous attempt to load and organize the images failed because the source paths were incorrect. I need to update the source paths to the correct location of the images in the mounted Google Drive, as instructed in the prompt, and then execute the code to copy the images.



In [None]:
import os
import shutil

# Create directories
os.makedirs('/content/data/normal', exist_ok=True)
os.makedirs('/content/data/tuberculosis', exist_ok=True)

# Define source directories in Google Drive.
# **IMPORTANT**: Replace the paths below with the actual paths to your image folders
# after mounting your Google Drive.
# You can find the correct paths by navigating your mounted Google Drive in the Colab file explorer.
source_normal = '/content/drive/MyDrive/normal' # Update this path
source_tuberculosis = '/content/drive/MyDrive/tuberculosis' # Update this path

# Copy images from Google Drive to the new directories
try:
    for filename in os.listdir(source_normal):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_normal, filename), '/content/data/normal')

    for filename in os.listdir(source_tuberculosis):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_tuberculosis, filename), '/content/data/tuberculosis')

    # Verify the copy operation by counting files
    num_normal = len(os.listdir('/content/data/normal'))
    num_tuberculosis = len(os.listdir('/content/data/tuberculosis'))

    print(f"Number of normal images copied: {num_normal}")
    print(f"Number of tuberculosis images copied: {num_tuberculosis}")

except FileNotFoundError as e:
    print(f"Error: {e}. Please ensure the source paths in Google Drive are correct.")
except Exception as e:
    print(f"An error occurred: {e}")

Error: [Errno 2] No such file or directory: '/content/drive/MyDrive/normal'. Please ensure the source paths in Google Drive are correct.


## Pré-processamento das imagens

### Subtask:
Realizar o pré-processamento das imagens, incluindo redimensionamento, normalização e possivelmente aumento de dados (data augmentation) para preparar o dataset para a CNN.


## Carregar e organizar as imagens

### Subtask:
Carregar as imagens dos links fornecidos (radiografias normais e com tuberculose) e organizá-las em estruturas de diretório adequadas para o treinamento de modelos de Deep Learning.


**Reasoning**:
The previous attempt to load and organize the images failed because the source paths were incorrect. The user needs to update the `source_normal` and `source_tuberculosis` variables with the correct paths to their image folders in Google Drive. The provided code block includes instructions for the user to update these paths. Executing this block again with the correct paths should allow the images to be copied, thus completing the subtask.



In [None]:
import os
import shutil

# Create directories
os.makedirs('/content/data/normal', exist_ok=True)
os.makedirs('/content/data/tuberculosis', exist_ok=True)

# Define source directories in Google Drive.
# **IMPORTANT**: Replace the paths below with the actual paths to your image folders
# after mounting your Google Drive.
# You can find the correct paths by navigating your mounted Google Drive in the Colab file explorer.
source_normal = '/content/drive/MyDrive/turbeculose/TB_Chest_Radiography_Database/Normal' # Updated path based on the new link
source_tuberculosis = '/content/drive/MyDrive/turbeculose/TB_Chest_Radiography_Database/Tuberculosis' # Updated path based on the new link


# Copy images from Google Drive to the new directories
try:
    print(f"Attempting to copy images from: {source_normal} and {source_tuberculosis}")
    # Clear the destination directories before copying to avoid duplicating files on subsequent runs
    shutil.rmtree('/content/data/normal', ignore_errors=True)
    shutil.rmtree('/content/data/tuberculosis', ignore_errors=True)
    os.makedirs('/content/data/normal', exist_ok=True)
    os.makedirs('/content/data/tuberculosis', exist_ok=True)


    for filename in os.listdir(source_normal):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_normal, filename), '/content/data/normal')

    for filename in os.listdir(source_tuberculosis):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_tuberculosis, filename), '/content/data/tuberculosis')

    # Verify the copy operation by counting files
    num_normal = len(os.listdir('/content/data/normal'))
    num_tuberculosis = len(os.listdir('/content/data/tuberculosis'))

    print(f"Number of normal images copied: {num_normal}")
    print(f"Number of tuberculosis images copied: {num_tuberculosis}")

except FileNotFoundError as e:
    print(f"Error: {e}. Please ensure the source paths in Google Drive are correct.")
except Exception as e:
    print(f"An error occurred: {e}")

Attempting to copy images from: /content/drive/MyDrive/turbeculose/TB_Chest_Radiography_Database/Normal and /content/drive/MyDrive/turbeculose/TB_Chest_Radiography_Database/Tuberculosis
Number of normal images copied: 3500
Number of tuberculosis images copied: 700


## Pré-processamento das imagens

### Subtask:
Realizar o pré-processamento das imagens, incluindo redimensionamento, normalização e possivelmente aumento de dados (data augmentation) para preparar o dataset para a CNN.


**Reasoning**:
Instantiate an ImageDataGenerator for training data with rescaling and data augmentation, and another for validation data with only rescaling, both with a validation split.



In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    horizontal_flip=True,
    shear_range=0.2,
    height_shift_range=0.2,
    width_shift_range=0.2,
    validation_split=0.2
)

validation_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

**Reasoning**:
Use the ImageDataGenerator objects to create data generators for training and validation by flowing from the image directory.



In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

# Define parameters
IMG_HEIGHT = 224
IMG_WIDTH = 224
BATCH_SIZE = 32

data_dir = '/content/data'

train_generator = train_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training',
    shuffle=True
)

validation_generator = validation_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation',
    shuffle=False
)

Found 3360 images belonging to 2 classes.
Found 840 images belonging to 2 classes.


## Construir o modelo cnn

### Subtask:
Definir a arquitetura da Rede Neural Convolucional (CNN) para a tarefa de classificação de imagens.


**Reasoning**:
Import the necessary modules from TensorFlow and Keras to define the CNN architecture.



In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

**Reasoning**:
Instantiate a Sequential model and add the convolutional and pooling layers as described in the instructions.



In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


**Reasoning**:
Flatten the output of the convolutional layers and add dense layers for classification, then print the model summary.



In [None]:
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()

## Separar os dados

### Subtask:
Dividir o dataset de imagens em conjuntos de treino, validação e teste.


**Reasoning**:
Create the test data generator using ImageDataGenerator, ensuring it uses the 'validation' subset and shuffle is set to False.



In [None]:
test_datagen = ImageDataGenerator(rescale=1./255) # Only rescaling needed for test data

test_generator = test_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation', # Use the validation subset for testing as per the instructions
    shuffle=False
)

Found 0 images belonging to 2 classes.


## Carregar e organizar as imagens

### Subtask:
Carregar as imagens dos links fornecidos (radiografias normais e com tuberculose) e organizá-las em estruturas de diretório adequadas para o treinamento de modelos de Deep Learning.


**Reasoning**:
The previous attempts to load and organize the images failed because the source paths in Google Drive were incorrect. I need to provide the code again with instructions for the user to update the paths and execute it.



In [None]:
import os
import shutil

# Create directories
os.makedirs('/content/data/normal', exist_ok=True)
os.makedirs('/content/data/tuberculosis', exist_ok=True)

# Define source directories in Google Drive.
# **IMPORTANT**: Replace the paths below with the actual paths to your image folders
# after mounting your Google Drive.
# You can find the correct paths by navigating your mounted Google Drive in the Colab file explorer.
source_normal = '/content/drive/MyDrive/Normal' # Update this path
source_tuberculosis = '/content/drive/MyDrive/Tuberculosis' # Update this path


# Copy images from Google Drive to the new directories
try:
    print(f"Attempting to copy images from: {source_normal} and {source_tuberculosis}")
    for filename in os.listdir(source_normal):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_normal, filename), '/content/data/normal')

    for filename in os.listdir(source_tuberculosis):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_tuberculosis, filename), '/content/data/tuberculosis')

    # Verify the copy operation by counting files
    num_normal = len(os.listdir('/content/data/normal'))
    num_tuberculosis = len(os.listdir('/content/data/tuberculosis'))

    print(f"Number of normal images copied: {num_normal}")
    print(f"Number of tuberculosis images copied: {num_tuberculosis}")

except FileNotFoundError as e:
    print(f"Error: {e}. Please ensure the source paths in Google Drive are correct.")
except Exception as e:
    print(f"An error occurred: {e}")


Attempting to copy images from: /content/drive/MyDrive/Normal and /content/drive/MyDrive/Tuberculosis
Error: [Errno 2] No such file or directory: '/content/drive/MyDrive/Normal'. Please ensure the source paths in Google Drive are correct.


**Reasoning**:
The previous attempt to copy the images failed again because the source paths were incorrect. I need to reiterate the instruction for the user to correct the paths in Google Drive and then try copying the files again. Since I cannot automatically find the correct paths, I must rely on the user to provide them.



In [None]:
import os
import shutil

# Create directories
os.makedirs('/content/data/normal', exist_ok=True)
os.makedirs('/content/data/tuberculosis', exist_ok=True)

# Define source directories in Google Drive.
# **IMPORTANT**: Replace the paths below with the actual paths to your image folders
# after mounting your Google Drive.
# You can find the correct paths by navigating your mounted Google Drive in the Colab file explorer.
source_normal = '/content/drive/MyDrive/Normal' # Update this path
source_tuberculosis = '/content/drive/MyDrive/Tuberculosis' # Update this path


# Copy images from Google Drive to the new directories
try:
    print(f"Attempting to copy images from: {source_normal} and {source_tuberculosis}")
    for filename in os.listdir(source_normal):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_normal, filename), '/content/data/normal')

    for filename in os.listdir(source_tuberculosis):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_tuberculosis, filename), '/content/data/tuberculosis')

    # Verify the copy operation by counting files
    num_normal = len(os.listdir('/content/data/normal'))
    num_tuberculosis = len(os.listdir('/content/data/tuberculosis'))

    print(f"Number of normal images copied: {num_normal}")
    print(f"Number of tuberculosis images copied: {num_tuberculosis}")

except FileNotFoundError as e:
    print(f"Error: {e}. Please ensure the source paths in Google Drive are correct.")
except Exception as e:
    print(f"An error occurred: {e}")

Attempting to copy images from: /content/drive/MyDrive/Normal and /content/drive/MyDrive/Tuberculosis
Error: [Errno 2] No such file or directory: '/content/drive/MyDrive/Normal'. Please ensure the source paths in Google Drive are correct.


## Pré-processamento das imagens

### Subtask:
Realizar o pré-processamento das imagens, incluindo redimensionamento, normalização e possivelmente aumento de dados (data augmentation) para preparar o dataset para a CNN.


**Reasoning**:
Import the necessary libraries for image preprocessing.



In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

**Reasoning**:
Define the ImageDataGenerator for training and validation sets, including data augmentation for training and normalization for both.



In [None]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    horizontal_flip=True,
    shear_range=0.2,
    height_shift_range=0.2,
    width_shift_range=0.2,
    validation_split=0.2 # Split data for validation
)

validation_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2 # Use the same split as training
)

**Reasoning**:
Create the training and validation data generators using flow_from_directory and print the number of images found to verify the data loading.



In [None]:
data_dir = '/content/data'

# Assuming train_datagen and validation_datagen are already defined and configured
# from previous steps

train_generator = train_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training',
    shuffle=True
)

validation_generator = validation_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation',
    shuffle=False # No need to shuffle validation data
)

print(f"Found {train_generator.samples} training images belonging to {train_generator.num_classes} classes.")
print(f"Found {validation_generator.samples} validation images belonging to {validation_generator.num_classes} classes.")

Found 3360 images belonging to 2 classes.
Found 840 images belonging to 2 classes.
Found 3360 training images belonging to 2 classes.
Found 840 validation images belonging to 2 classes.


## Construir o modelo cnn

### Subtask:
Definir a arquitetura da Rede Neural Convolucional (CNN) para a tarefa de classificação de imagens.


**Reasoning**:
Define the CNN architecture as per the instructions, including the input shape for the first convolutional layer and print the model summary.



In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.summary()

## Separar os dados

### Subtask:
Dividir o dataset de imagens em conjuntos de treino, validação e teste.


**Reasoning**:
Import the ImageDataGenerator class.



In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

**Reasoning**:
Instantiate an ImageDataGenerator for the test set and create the test generator using flow_from_directory with the specified parameters.



In [None]:
test_datagen = ImageDataGenerator(rescale=1./255) # Only rescaling needed for test data

test_generator = test_datagen.flow_from_directory(
    '/content/data',
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation', # Using validation subset as per instructions for the test generator
    shuffle=False
)

Found 0 images belonging to 2 classes.


**Reasoning**:
Print the number of images and classes found for the test set.



In [None]:
print(f"Found {test_generator.samples} test images belonging to {test_generator.num_classes} classes.")

Found 0 test images belonging to 2 classes.


## Carregar e organizar as imagens

### Subtask:
Carregar as imagens dos links fornecidos (radiografias normais e com tuberculose) e organizá-las em estruturas de diretório adequadas para o treinamento de modelos de Deep Learning.


**Reasoning**:
The previous attempts to load and organize the images failed because the source paths in Google Drive were incorrect. I need to provide the code again, emphasizing that the user must update the placeholder paths with the correct ones from their mounted Google Drive. This is the only way to proceed with the task of loading the data.



In [None]:
import os
import shutil

# Create directories
os.makedirs('/content/data/normal', exist_ok=True)
os.makedirs('/content/data/tuberculosis', exist_ok=True)

# Define source directories in Google Drive.
# **IMPORTANT**: Replace the paths below with the actual paths to your image folders
# after mounting your Google Drive.
# You can find the correct paths by navigating your mounted Google Drive in the Colab file explorer.
source_normal = '/content/drive/MyDrive/Normal' # Update this path
source_tuberculosis = '/content/drive/MyDrive/Tuberculosis' # Update this path


# Copy images from Google Drive to the new directories
try:
    print(f"Attempting to copy images from: {source_normal} and {source_tuberculosis}")
    for filename in os.listdir(source_normal):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_normal, filename), '/content/data/normal')

    for filename in os.listdir(source_tuberculosis):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_tuberculosis, filename), '/content/data/tuberculosis')

    # Verify the copy operation by counting files
    num_normal = len(os.listdir('/content/data/normal'))
    num_tuberculosis = len(os.listdir('/content/data/tuberculosis'))

    print(f"Number of normal images copied: {num_normal}")
    print(f"Number of tuberculosis images copied: {num_tuberculosis}")

except FileNotFoundError as e:
    print(f"Error: {e}. Please ensure the source paths in Google Drive are correct.")
except Exception as e:
    print(f"An error occurred: {e}")

Attempting to copy images from: /content/drive/MyDrive/Normal and /content/drive/MyDrive/Tuberculosis
Error: [Errno 2] No such file or directory: '/content/drive/MyDrive/Normal'. Please ensure the source paths in Google Drive are correct.


## Carregar e organizar as imagens

### Subtask:
Carregar as imagens dos links fornecidos (radiografias normais e com tuberculose) e organizá-las em estruturas de diretório adequadas para o treinamento de modelos de Deep Learning.


**Reasoning**:
The previous attempts to load and organize the images failed because the source paths in Google Drive were incorrect. I need to provide the code again, emphasizing that the user *must* manually update the `source_normal` and `source_tuberculosis` variables with the correct paths to their image folders in Google Drive. I will keep the rest of the logic for creating directories and copying files as before, including the error handling and verification steps.



In [None]:
import os
import shutil

# Create directories
os.makedirs('/content/data/normal', exist_ok=True)
os.makedirs('/content/data/tuberculosis', exist_ok=True)

# Define source directories in Google Drive.
# **IMPORTANT**: Replace the paths below with the actual paths to your image folders
# after mounting your Google Drive.
# You can find the correct paths by navigating your mounted Google Drive in the Colab file explorer.
source_normal = '/content/drive/MyDrive/turbeculose/TB_Chest_Radiography_Database/Normal' # Update this path with the correct path to your normal images
source_tuberculosis = '/content/drive/MyDrive/turbeculose/TB_Chest_Radiography_Database/Tuberculosis' # Updated path based on the new link


# Copy images from Google Drive to the new directories
try:
    print(f"Attempting to copy images from: {source_normal} and {source_tuberculosis}")
    # Clear the destination directories before copying to avoid duplicating files on subsequent runs
    shutil.rmtree('/content/data/normal', ignore_errors=True)
    shutil.rmtree('/content/data/tuberculosis', ignore_errors=True)
    os.makedirs('/content/data/normal', exist_ok=True)
    os.makedirs('/content/data/tuberculosis', exist_ok=True)


    for filename in os.listdir(source_normal):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_normal, filename), '/content/data/normal')

    for filename in os.listdir(source_tuberculosis):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            shutil.copy(os.path.join(source_tuberculosis, filename), '/content/data/tuberculosis')

    # Verify the copy operation by counting files
    num_normal = len(os.listdir('/content/data/normal'))
    num_tuberculosis = len(os.listdir('/content/data/tuberculosis'))

    print(f"Number of normal images copied: {num_normal}")
    print(f"Number of tuberculosis images copied: {num_tuberculosis}")

except FileNotFoundError as e:
    print(f"Error: {e}. Please ensure the source paths in Google Drive are correct.")
except Exception as e:
    print(f"An error occurred: {e}")

Attempting to copy images from: /content/drive/MyDrive/turbeculose/TB_Chest_Radiography_Database/Normal and /content/drive/MyDrive/turbeculose/TB_Chest_Radiography_Database/Tuberculosis
Number of normal images copied: 3500
Number of tuberculosis images copied: 700


## Pré-processamento das imagens

### Subtask:
Realizar o pré-processamento das imagens, incluindo redimensionamento, normalização e possivelmente aumento de dados (data augmentation) para preparar o dataset para a CNN.


**Reasoning**:
Instantiate the training and validation ImageDataGenerators and create the image data generators using flow_from_directory, then print the number of images found.



In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    horizontal_flip=True,
    shear_range=0.2,
    height_shift_range=0.2,
    width_shift_range=0.2,
    validation_split=0.2
)

validation_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

data_dir = '/content/data'

train_generator = train_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training',
    shuffle=True
)

validation_generator = validation_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation',
    shuffle=False
)

print(f"Found {train_generator.samples} training images belonging to {train_generator.num_classes} classes.")
print(f"Found {validation_generator.samples} validation images belonging to {validation_generator.num_classes} classes.")

Found 3360 images belonging to 2 classes.
Found 840 images belonging to 2 classes.
Found 3360 training images belonging to 2 classes.
Found 840 validation images belonging to 2 classes.


## Treinar o modelo cnn

### Subtask:
Compilar e treinar o modelo CNN utilizando os dados de treino e validação.


**Reasoning**:
Compile the previously defined CNN model with the specified optimizer, loss function, and metrics, then train it using the training and validation data generators for a fixed number of epochs, storing the training history.



In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define parameters
IMG_HEIGHT = 224
IMG_WIDTH = 224
BATCH_SIZE = 32
data_dir = '/content/data'

# Create ImageDataGenerators for training and validation sets
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    horizontal_flip=True,
    shear_range=0.2,
    height_shift_range=0.2,
    width_shift_range=0.2,
    validation_split=0.2 # Split data for validation
)

validation_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2 # Use the same split as training
)

# Create data generators using flow_from_directory
train_generator = train_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training',
    shuffle=True
)

validation_generator = validation_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation',
    shuffle=False # No need to shuffle validation data
)

# Compile the model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // BATCH_SIZE,
    epochs=15, # Training for 15 epochs
    validation_data=validation_generator,
    validation_steps=validation_generator.samples // BATCH_SIZE
)

Found 3360 images belonging to 2 classes.
Found 840 images belonging to 2 classes.


  self._warn_if_super_not_called()


Epoch 1/15
[1m105/105[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m471s[0m 4s/step - accuracy: 0.8145 - loss: 0.5559 - val_accuracy: 0.8582 - val_loss: 0.3837
Epoch 2/15
[1m105/105[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m458s[0m 4s/step - accuracy: 0.9151 - loss: 0.2223 - val_accuracy: 0.8702 - val_loss: 0.2488
Epoch 3/15
[1m105/105[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m453s[0m 4s/step - accuracy: 0.9300 - loss: 0.1760 - val_accuracy: 0.8702 - val_loss: 0.2992
Epoch 4/15
[1m105/105[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m503s[0m 4s/step - accuracy: 0.9411 - loss: 0.1445 - val_accuracy: 0.8834 - val_loss: 0.2168
Epoch 5/15
[1m105/105[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m502s[0m 4s/step - accuracy: 0.9283 - loss: 0.1854 - val_accuracy: 0.8438 - val_loss: 0.2730
Epoch 6/15
[1m105/105[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m501s[0m 5s/step - accuracy: 0.8525 - loss: 0.3649 - val_accuracy: 0.8762 - val_loss: 0.2792
Epoch 7/15
[1m105/105

In [None]:
# Evaluate the model on the validation set
loss, accuracy = model.evaluate(validation_generator)

print(f"Validation Loss: {loss:.4f}")
print(f"Validation Accuracy: {accuracy:.4f}")

[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 1s/step - accuracy: 0.9778 - loss: 0.0599
Validation Loss: 0.1982
Validation Accuracy: 0.9000
