## Prepare CallBack

## Summary of the Notebook

This notebook is designed to prepare and manage callbacks for a machine learning workflow, specifically for TensorFlow/Keras models. Callbacks in a machine learning workflow are functions that allow you to customize the training process of a model. They are executed at specific stages during training (such as at the end of each epoch or batch) and can be used to:

- Save model checkpoints (e.g., after each epoch or when performance improves)
- Log training metrics for visualization (e.g., with TensorBoard)
- Adjust the learning rate dynamically
- Stop training early if performance stops improving (early stopping)
- Collect custom statistics or perform custom actions
- In TensorFlow/Keras, callbacks are passed to the model.fit() method via the callbacks argument. - Below is a summary of the key components and variables used in this notebook:

### Key Components:
1. **PrepareCallbacksConfig**: A dataclass that holds configuration details for callbacks, including paths for TensorBoard logs and model checkpoints.
2. **ConfigurationManager**: A class responsible for reading configuration files (`config.yaml` and `params.yaml`) and creating necessary directories.
3. **PrepareCallback**: A class that creates TensorFlow/Keras callbacks for TensorBoard logging and model checkpointing.

### Key Variables:
- **CONFIG_FILE_PATH**: Path to the configuration file (`config/config.yaml`).
- **PARAMS_FILE_PATH**: Path to the parameters file (`params.yaml`).
- **callback_list**: A list of TensorFlow/Keras callbacks, including TensorBoard and ModelCheckpoint callbacks.
- **config**: An instance of `ConfigurationManager` that manages the configuration settings.
- **create_directories**: A utility function to create directories as needed.
- **prepare_callbacks**: An instance of `PrepareCallback` that generates the required callbacks.
- **prepare_callbacks_config**: An instance of `PrepareCallbacksConfig` containing paths for TensorBoard logs and model checkpoints.
- **read_yaml**: A utility function to read YAML configuration files.
- **timestamp**: A string representing the current timestamp, used for naming TensorBoard log directories.

This notebook integrates configuration management, directory creation, and callback preparation to streamline the training process of machine learning models.

In [1]:
# Import os for directory operations
import os

In [2]:
# Print the current working directory (for debugging path issues)
%pwd

'/home/zkhechadoorian/CNNs_Cats_and_Dogs/research'

In [3]:
# (Optional) Change working directory up one level if needed for imports or file access
os.chdir("../")

In [4]:
# Print the new working directory after changing it
%pwd

'/home/zkhechadoorian/CNNs_Cats_and_Dogs'

In [5]:
# Define a dataclass for callback configuration
from dataclasses import dataclass
from pathlib import Path

@dataclass(frozen=True)
class PrepareCallbacksConfig:
    root_dir: Path  # Directory for artifacts
    tensorboard_root_log_dir: Path  # Directory for TensorBoard logs
    checkpoint_model_filepath: Path  # Filepath for model checkpoints

In [6]:
# Import project constants and utility functions for YAML reading and directory creation
from ImageClassification.constants import *
from ImageClassification.utils import read_yaml, create_directories

In [7]:
# ConfigurationManager reads config and params YAML files and prepares config objects
class ConfigurationManager:
    def __init__(
        self, 
        config_filepath = CONFIG_FILE_PATH,
        params_filepath = PARAMS_FILE_PATH):
        # Read configuration and parameter YAML files
        self.config = read_yaml(config_filepath)
        self.params = read_yaml(params_filepath)
        # Create the root artifact directory
        create_directories([self.config.artifacts_root])

    def get_prepare_callback_config(self) -> PrepareCallbacksConfig:
        # Extract the prepare_callbacks section from config
        config = self.config.prepare_callbacks
        model_ckpt_dir = os.path.dirname(config.checkpoint_model_filepath)
        # Create directories for checkpoints and TensorBoard logs
        create_directories([
            Path(model_ckpt_dir),
            Path(config.tensorboard_root_log_dir)
        ])

        # Build and return the config dataclass
        prepare_callback_config = PrepareCallbacksConfig(
            root_dir=Path(config.root_dir),
            tensorboard_root_log_dir=Path(config.tensorboard_root_log_dir),
            checkpoint_model_filepath=Path(config.checkpoint_model_filepath)
        )

        return prepare_callback_config

In [8]:
# Import TensorFlow and other required modules for callbacks
import os
import urllib.request as request
from zipfile import ZipFile
import tensorflow as tf
import time

2025-09-10 15:58:45.087352: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-10 15:58:45.087791: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-09-10 15:58:45.153874: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-09-10 15:58:46.779416: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To tur

In [9]:
# PrepareCallback creates TensorBoard and ModelCheckpoint callbacks
class PrepareCallback:
    def __init__(self, config: PrepareCallbacksConfig):
        self.config = config

    @property
    def _create_tb_callbacks(self):
        # Create a TensorBoard callback with a timestamped log directory
        timestamp = time.strftime("%Y-%m-%d-%H-%M-%S")
        tb_running_log_dir = os.path.join(
            self.config.tensorboard_root_log_dir,
            f"tb_logs_at_{timestamp}",
        )
        return tf.keras.callbacks.TensorBoard(log_dir=tb_running_log_dir)

    @property
    def _create_ckpt_callbacks(self):
        # Create a ModelCheckpoint callback to save the best model
        return tf.keras.callbacks.ModelCheckpoint(
            filepath=str(self.config.checkpoint_model_filepath),  # Convert WindowsPath to string
            save_best_only=True
        )

    def get_tb_ckpt_callbacks(self):
        # Return a list of both callbacks for use in model training
        return [
            self._create_tb_callbacks,
            self._create_ckpt_callbacks
        ]

In [10]:
# Example usage: Prepare and get the callbacks for training
try:
    config = ConfigurationManager()
    prepare_callbacks_config = config.get_prepare_callback_config()
    prepare_callbacks = PrepareCallback(config=prepare_callbacks_config)
    callback_list = prepare_callbacks.get_tb_ckpt_callbacks()
    # callback_list can now be passed to model.fit()
except Exception as e:
    raise e

[2025-09-10 15:58:47,721: INFO: common: yaml file: config/config.yaml loaded successfully]
[2025-09-10 15:58:47,723: INFO: common: yaml file: params.yaml loaded successfully]
[2025-09-10 15:58:47,723: INFO: common: created directory at: artifacts]
[2025-09-10 15:58:47,724: INFO: common: created directory at: artifacts/prepare_callbacks/checkpoint_dir]
[2025-09-10 15:58:47,725: INFO: common: created directory at: artifacts/prepare_callbacks/tensorboard_log_dir]


In [11]:
# Example: Get the directory name from a path (utility demonstration)
import os
os.path.dirname("x/y/z.txt")

'x/y'

In [12]:
# Example: Generate a timestamped string for log directories
import time
timestamp = time.strftime("%Y-%m-%d-%H-%M-%S")
f"tb_logs_at_{timestamp}"

'tb_logs_at_2025-09-10-15-58-47'