# Introduction to Keras Tuner

Adapted and expanded from the original code by Umberto Michelucci.

Original code Copyright 2020 The TensorFlow Authors

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Introduction to the Keras Tuner

## Overview

`keras.tuner` is a module within the Keras ecosystem, a popular open-source library for deep learning in Python. The primary purpose of `keras.tuner` is to perform hyperparameter tuning for Keras models, which is an essential step in optimizing machine learning models for better performance.

Here are the key aspects of `keras.tuner`:

1. **Functionality**: `keras.tuner` provides a simple and efficient way to find the best hyperparameter values for your Keras models. Hyperparameters include choices about the number of layers, their types, the number of neurons in each layer, learning rate, activation functions, and more.

2. **Tuners Available**: It offers several tuning algorithms, including Random Search, Hyperband, and Bayesian Optimization. Each of these tuners has its own strategy for exploring the hyperparameter space.
   - **Random Search**: Tests a random selection of hyperparameter values within the predefined search space.
   - **Hyperband**: An optimized version of random search which uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model.
   - **Bayesian Optimization**: Models the function mapping from hyperparameters to a target score and uses this model to select promising hyperparameters to evaluate in the real world.

3. **Ease of Integration**: Designed to integrate seamlessly with Keras models, `keras.tuner` makes it relatively straightforward to add hyperparameter tuning to your existing model-building workflow.

4. **Customization**: Users can define their own search space for hyperparameters, allowing for extensive customization and experimentation. This includes setting parameters like the number of layers, the number of units in each layer, learning rates, and other model hyperparameters.

5. **Search Process**: During the search process, `keras.tuner` systematically tests different combinations of hyperparameters to find the combination that yields the best performance on a validation dataset.

6. **Results Analysis**: After the tuning process, it provides detailed results about each trial (set of hyperparameters) including its performance, which can be analyzed to understand how different hyperparameters affect model performance.

7. **Practical Applications**: `keras.tuner` is widely used in deep learning projects where finding the right set of hyperparameters is crucial for model performance, such as in image recognition, natural language processing, and predictive modeling.


## Setup

In [1]:
import tensorflow as tf
from tensorflow import keras

Install and import the Keras Tuner.

In [3]:
# Run this only if necessary
!pip install -q -U keras-tuner


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
import keras_tuner as kt

## Download and prepare the dataset

In this tutorial, you will use the Keras Tuner to find the best hyperparameters for a machine learning model that classifies images of clothing from the [Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist).

## Dataset Description

The Zalando MNIST dataset, also known as the Fashion-MNIST dataset, is a dataset comprising of 70,000 grayscale images of 10 different fashion products from Zalando, a large European e-commerce company. It was created as a more challenging replacement for the traditional MNIST dataset of handwritten digits. Here are some key details about the Fashion-MNIST dataset:

1. **Content**: The dataset contains 70,000 grayscale images, each 28x28 pixels, divided into 10 fashion categories such as T-shirts/tops, trousers, pullovers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots.

2. **Training and Testing Split**: Similar to the original MNIST, it includes 60,000 training images and 10,000 test images. This standard split facilitates consistent evaluation of machine learning models.

3. **Purpose**: Fashion-MNIST was designed to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It introduces more complexity compared to the original MNIST, making it a better representative of modern computer vision tasks.

4. **Use Cases**: It's widely used for machine learning and computer vision tasks like classification, image recognition, and machine learning model performance evaluation.

5. **Accessibility and Usability**: Like MNIST, Fashion-MNIST is easily accessible and can be used with common machine learning libraries. It's suitable for both beginners and advanced researchers, providing a more challenging dataset than MNIST while maintaining a similar size and structure.

6. **Benchmarking**: Since its introduction, Fashion-MNIST has been adopted by the machine learning community as a benchmark dataset, often used in academic papers and machine learning competitions to evaluate the performance of various algorithms.

7. **Educational Value**: For educational purposes, Fashion-MNIST offers a more complex challenge than MNIST while being more comprehensible and visually interpretable than more complex datasets like ImageNet.

Load the data.

In [5]:
(img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()

In [6]:
# Normalize pixel values between 0 and 1
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0

## Define the model

When you build a model for hypertuning, you also define the hyperparameter search space in addition to the model architecture. The model you set up for hypertuning is called a *hypermodel*.

You can define a hypermodel through two approaches:

* By using a model builder function
* By subclassing the `HyperModel` class of the Keras Tuner API

You can also use two pre-defined [HyperModel](https://keras.io/api/keras_tuner/hypermodels/) classes - [HyperXception](https://keras.io/api/keras_tuner/hypermodels/hyper_xception/) and [HyperResNet](https://keras.io/api/keras_tuner/hypermodels/hyper_resnet/) for computer vision applications.

In this tutorial, you use a model builder function to define the image classification model. The model builder function returns a compiled model and uses hyperparameters you define inline to hypertune the model.

In [7]:
def model_builder(hp):
  model = keras.Sequential()
  model.add(keras.layers.Flatten(input_shape=(28, 28)))

  # Tune the number of units in the first Dense layer
  # Choose an optimal value between 32-512
  hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
  model.add(keras.layers.Dense(units=hp_units, activation='relu'))
  model.add(keras.layers.Dense(10))

  # Tune the learning rate for the optimizer
  # Choose an optimal value from 0.01, 0.001, or 0.0001
  hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

  model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])

  return model

## Hyperparameter with Hyperband

Instantiate the tuner to perform the hypertuning. The Keras Tuner has four tuners available - `RandomSearch`, `Hyperband`, `BayesianOptimization`, and `Sklearn`. In this tutorial, we use the [Hyperband](https://arxiv.org/pdf/1603.06560.pdf) tuner.

To instantiate the Hyperband tuner, you must specify the hypermodel, the `objective` to optimize and the maximum number of epochs to train (`max_epochs`).

Hyperband in Keras, specifically integrated through `keras.tuner`, is an implementation of the Hyperband hyperparameter tuning algorithm designed for optimizing hyperparameters in Keras models. It's an advanced, efficient method particularly suited for large hyperparameter spaces and complex models. Here's a detailed description of Hyperband in the context of Keras:

1. **Algorithm Overview**: 
   - Hyperband is based on the concept of adaptive resource allocation and early-stopping. It is an extension of Random Search but incorporates a systematic way to decide how many resources (like epochs) to allocate to each trial (set of hyperparameters) and when to stop underperforming trials.

2. **Efficient Exploration**: 
   - Unlike traditional methods that evaluate each hyperparameter combination for a fixed amount of resources, Hyperband dynamically allocates resources. It starts by evaluating many configurations with a small amount of resources and progressively gives more resources to promising configurations in subsequent rounds.

3. **Integration with Keras**: 
   - In Keras, `Hyperband` is provided through the `keras.tuner` module. It is designed to work seamlessly with Keras models, allowing for easy specification of the model architecture and the hyperparameters to tune.

4. **Key Parameters**: 
   - `max_epochs`: The maximum number of epochs to train a single model. It's the upper limit of resources that Hyperband can allocate to any trial.
   - `objective`: The metric to be optimized, which could be a standard metric like accuracy or a custom-defined function.
   - `factor`: The reduction factor that decides how much the number of configurations is reduced in each round.
   - `hyperband_iterations`: The number of times to run the hyperband algorithm (each with different random seeds).

5. **Process**: 
   - Hyperband runs in a series of "brackets". Each bracket comprises multiple rounds of training and evaluation, where each subsequent round trains fewer models for more epochs.
   - Initially, a large number of models are trained for a small number of epochs. Only the top-performing models (as per the specified `objective`) proceed to the next round, where they are trained for longer. This process repeats, reducing the number of models and increasing the epochs each time, until the best-performing models are identified.

6. **Advantages**: 
   - Hyperband is particularly effective when dealing with large datasets and complex models because it quickly discards poor-performing configurations.
   - It can significantly reduce the computational cost and time required for hyperparameter tuning compared to traditional methods.

7. **Usage in Keras**: 
   - To use Hyperband in Keras, you define a model-building function, specify the hyperparameter space, and then pass these to the `Hyperband` tuner. The tuner then manages the training and evaluation process, providing you with the best hyperparameters found.


In [8]:
tuner = kt.Hyperband(model_builder,
                     objective='val_accuracy',
                     max_epochs=10,
                     factor=3,
                     directory='my_dir',
                     project_name='intro_to_kt')

  super().__init__(**kwargs)


The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing 1 + log<sub>`factor`</sub>(`max_epochs`) and rounding it up to the nearest integer.

Early stopping is a regularization technique (not quite but let's go with that) used in training neural networks to prevent overfitting. It involves monitoring the performance of the model on a validation dataset and stopping the training process when the performance starts to degrade or stops improving. Here's a detailed description of the early stopping approach:

1. **Objective**: The primary goal of early stopping is to halt the training at the point when the model is generalized enough to perform well on unseen data, but before it starts to overfit the training data.

2. **How It Works**:
   - During training, the model's performance is continually evaluated on a separate validation dataset that is not used for the actual training.
   - After each epoch (or a set number of epochs), the algorithm checks how the model's performance on the validation set has changed.
   - If the model's performance on the validation set improves or remains the same, training continues.
   - If the model's performance on the validation set starts to worsen (e.g., the validation loss starts to increase), it's a sign that the model may be beginning to overfit the training data.

3. **Stopping Criteria**:
   - A common criterion is to stop training when the validation loss has not decreased for a specified number of epochs, often referred to as the "patience" parameter.
   - Alternatively, training can be stopped based on other metrics, such as accuracy or F1 score, depending on the specific task.

4. **Restoring the Best Model**:
   - When early stopping is triggered, it's common practice to restore the weights of the model to the state when it performed the best on the validation set. This ensures that the model retains the generalization capability it had before it started overfitting.

5. **Benefits**:
   - **Prevents Overfitting**: By stopping the training before the model overfits the data, early stopping helps in maintaining the model's ability to generalize to new data.
   - **Saves Time and Resources**: It reduces the number of unnecessary training epochs, saving computational resources and time.
   - **Automatic and Simple**: It's an automated approach that doesn't require manual intervention and is easy to implement in most deep learning frameworks.

6. **Implementation in Deep Learning Frameworks**:
   - Early stopping is supported in many deep learning frameworks as a built-in function. In frameworks like TensorFlow/Keras, it is implemented as a callback function that can be easily added to the training process.

7. **Tuning Early Stopping**:
   - The patience parameter and the specific metric to monitor are crucial aspects of early stopping and may require tuning based on the dataset and problem.


Create a callback to stop training early after reaching a certain value for the validation loss.

In [9]:
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

Run the hyperparameter search. The arguments for the search method are the same as those used for `tf.keras.model.fit` in addition to the callback above.

In [10]:
#
#
# CAREFUL: IT TAKES SOME TIME!
#
#
tuner.search(img_train, label_train, epochs=50, validation_split=0.2, callbacks=[stop_early])

# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")

Trial 30 Complete [00h 00m 09s]
val_accuracy: 0.8772500157356262

Best val_accuracy So Far: 0.890333354473114
Total elapsed time: 00h 03m 49s

The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 288 and the optimal learning rate for the optimizer
is 0.001.



## Train the model

Find the optimal number of epochs to train the model with the hyperparameters obtained from the search.

In [11]:
# Build the model with the optimal hyperparameters and train it on the data for 50 epochs
model = tuner.hypermodel.build(best_hps)
history = model.fit(img_train, label_train, epochs=50, validation_split=0.2)

val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))

Epoch 1/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.7790 - loss: 0.6148 - val_accuracy: 0.8553 - val_loss: 0.4030
Epoch 2/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8650 - loss: 0.3748 - val_accuracy: 0.8727 - val_loss: 0.3568
Epoch 3/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8806 - loss: 0.3232 - val_accuracy: 0.8715 - val_loss: 0.3556
Epoch 4/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8894 - loss: 0.2996 - val_accuracy: 0.8759 - val_loss: 0.3352
Epoch 5/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8955 - loss: 0.2834 - val_accuracy: 0.8770 - val_loss: 0.3424
Epoch 6/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9003 - loss: 0.2712 - val_accuracy: 0.8867 - val_loss: 0.3159
Epoch 7/50
[1m1

Re-instantiate the hypermodel and train it with the optimal number of epochs from above.

In [12]:
hypermodel = tuner.hypermodel.build(best_hps)

# Retrain the model
hypermodel.fit(img_train, label_train, epochs=best_epoch, validation_split=0.2)

Epoch 1/17
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.7852 - loss: 0.6226 - val_accuracy: 0.8515 - val_loss: 0.3991
Epoch 2/17
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8628 - loss: 0.3777 - val_accuracy: 0.8687 - val_loss: 0.3674
Epoch 3/17
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8783 - loss: 0.3329 - val_accuracy: 0.8780 - val_loss: 0.3448
Epoch 4/17
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8859 - loss: 0.3091 - val_accuracy: 0.8833 - val_loss: 0.3243
Epoch 5/17
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8942 - loss: 0.2866 - val_accuracy: 0.8808 - val_loss: 0.3307
Epoch 6/17
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8997 - loss: 0.2670 - val_accuracy: 0.8709 - val_loss: 0.3522
Epoch 7/17
[1m1

<keras.src.callbacks.history.History at 0x38853c580>

To finish this tutorial, evaluate the hypermodel on the test data.

In [13]:
eval_result = hypermodel.evaluate(img_test, label_test)
print("[test loss, test accuracy]:", eval_result)


  1/313 [..............................] - ETA: 44s - loss: 0.5660 - accuracy: 0.8750


 29/313 [=>............................] - ETA: 0s - loss: 0.5103 - accuracy: 0.8955 


 58/313 [====>.........................] - ETA: 0s - loss: 0.4559 - accuracy: 0.8928




















[test loss, test accuracy]: [0.5223038792610168, 0.8871999979019165]


The `my_dir/intro_to_kt` directory contains detailed logs and checkpoints for every trial (model configuration) run during the hyperparameter search. If you re-run the hyperparameter search, the Keras Tuner uses the existing state from these logs to resume the search. To disable this behavior, pass an additional `overwrite=True` argument while instantiating the tuner.

## Summary

In this tutorial, you learned how to use the Keras Tuner to tune hyperparameters for a model. To learn more about the Keras Tuner, check out these additional resources:

* [Keras Tuner on the TensorFlow blog](https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html)
* [Keras Tuner website](https://keras-team.github.io/keras-tuner/)

Also check out the [HParams Dashboard](https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams) in TensorBoard to interactively tune your model hyperparameters (in case you are using TensorBoard).