Keras models fit() method not converging with eager execution #18642

fantauzzi · 2018-04-18T09:40:46Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes, see below.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 16.04
TensorFlow installed from (source or binary):
Binary, with pip3
TensorFlow version (use command below):
v1.7.0-3-g024aecf414 1.7.0
Python version:
3.5.2
Bazel version (if compiling from source):
N/A
GCC/Compiler version (if compiling from source):
N/A
CUDA/cuDNN version:
9.0.176/7.0.5
GPU model and memory:
GTX 1080 (Armor MSI) with 8 GB VRAM
Exact command to reproduce:
Run program below

Describe the problem

After enabling eager execution, method tf.keras.models.Sequential.fit() doesn't seem to converge, as the computed loss doesn't go down (stays around 9 to 11); also, method fit() doesn't report the requested metric "accuracy": it doesn't print the metric to console, and does not return it in the History object.

After disabling eager execution, the same optimization converges, as loss goes down (to around 1); also, method fit() correctly reports the requested metric "accuracy", both to console and in the returned History object.

Source code / logs

import os
import pandas as pd
import numpy as np
import time
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.contrib.eager as tfe

"""
Check the beginning of main() for parameters
"""


def load_data(folder):
    file_name = os.path.join(folder, 'winequality-data.csv')
    if os.path.isfile(file_name):
        data = pd.read_csv(file_name)
    else:
        print('File {} not found.'.format(file_name, folder))
        print('Dataset can be downloaded from https://www.kaggle.com/c/uci-wine-quality-dataset/data')
        exit(1)
    # solutions = pd.read_csv(os.path.join(folder, 'winequality-solution-input.csv'))
    return data


def train_input_fn(features, labels, batch_size):
    features_tensor = tf.constant(features)
    labels_tensor = tf.constant(labels)
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((features_tensor, labels_tensor))

    # Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(len(labels)).repeat(count=1).batch(batch_size)

    # Return the dataset.
    return dataset


def loss(model, X, y):
    logits = model(X)
    the_loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=logits)
    return the_loss


def grad(model, inputs, targets):
    with tfe.GradientTape() as tape:
        loss_value = loss(model, inputs, targets)
    return tape.gradient(loss_value, model.variables)


def train(model, X, y, batch_size, epochs):
    train_ds = train_input_fn(X, y, batch_size=batch_size)
    optimizer = tf.train.AdamOptimizer()

    loss_by_epoch = []
    accuracy_by_epoch = []

    for epoch in range(epochs):
        epoch_loss_avg = tfe.metrics.Mean()
        epoch_error = tfe.metrics.Mean()
        for batch, (batch_X, batch_y) in enumerate(tfe.Iterator(train_ds)):
            grads = grad(model, batch_X, batch_y)
            optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
            epoch_loss_avg(loss(model, batch_X, batch_y))
            correct_prediction = tf.equal(tf.argmax(model(batch_X), axis=1, output_type=tf.int32), batch_y)
            epoch_error(tf.reduce_mean(tf.cast(correct_prediction, tf.float32)))
        print('Epoch {}:  loss={}  accuracy={}'.format(epoch, epoch_loss_avg.result(), epoch_error.result()))
        loss_by_epoch.append(epoch_loss_avg.result())
        accuracy_by_epoch.append(epoch_error.result())

    return loss_by_epoch, accuracy_by_epoch


def main():
    # Just comment the next line out to disable eager execution
    tf.enable_eager_execution()

    """
    Set use_fit to True to optimize by calling tf.keras.models.Sequential.fit(),
    set to False to use tfe.GradientTape() instead. Note that in order to use tfe.Gradient.tape(),
    eager execution must be enabled
    """
    use_fit = True

    epochs = 200
    batch_size = 64
    dataset_folder = '.'

    # Load dataset and convert it to numpy arrays
    data = load_data(dataset_folder)
    train_X = data.iloc[:, 0:11].values.astype(np.float32)
    train_y = data.iloc[:, 11].values.astype(np.int32)

    if use_fit:  # train_y needs to be 1-hot encoded for usage with model.fit()
        train_y = tf.keras.utils.to_categorical(train_y, num_classes=11)

    model = tf.keras.models.Sequential([
        tf.keras.layers.InputLayer(input_shape=(11,)),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(11, activation='softmax') if use_fit else tf.keras.layers.Dense(11)
    ])

    start = time.time()

    if use_fit:
        optimizer = tf.train.AdamOptimizer()
        model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
        history = model.fit(x=train_X, y=train_y, epochs=epochs, batch_size=batch_size, verbose=2)
        loss_by_epoch = history.history['loss']
        accuracy_by_epoch = history.history['acc'] if 'acc' in history.history else []

    else:
        loss_by_epoch, accuracy_by_epoch = train(model=model, X=train_X, y=train_y, batch_size=batch_size,
                                                 epochs=epochs)

    end = time.time()
    print('It took {} seconds'.format(end - start))

    # Chart loss and error
    fig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))
    fig.suptitle('Training Metrics')

    axes[0].set_ylabel("Loss", fontsize=14)
    axes[0].plot(loss_by_epoch)

    axes[1].set_ylabel("Accuracy", fontsize=14)
    axes[1].set_xlabel("Epoch", fontsize=14)
    axes[1].plot(accuracy_by_epoch)

    plt.show()


if __name__ == '__main__':
    main()

The text was updated successfully, but these errors were encountered:

fchollet · 2018-04-19T16:28:08Z

Thank you for the bug report. We have a fix for this issue, that will show up on GitHub soon.

…ecution enabled. Fixes tensorflow#18642. PiperOrigin-RevId: 193423288

asimshankar assigned fchollet Apr 18, 2018

asimshankar added comp:eager Eager related issues type:bug Bug and removed comp:eager Eager related issues labels Apr 18, 2018

fchollet closed this as completed Apr 19, 2018

fchollet mentioned this issue Apr 20, 2018

[Cherrypick] Fix critical loss computation bug in Model training/eval with eager execution #18747

Merged

chris-cyliu pushed a commit to chris-cyliu/tensorflow that referenced this issue Jun 10, 2018

Fix loss computation bug in Model training/eval methods with eager ex…

20b076b

…ecution enabled. Fixes tensorflow#18642. PiperOrigin-RevId: 193423288

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keras models fit() method not converging with eager execution #18642

Keras models fit() method not converging with eager execution #18642

fantauzzi commented Apr 18, 2018 •

edited

fchollet commented Apr 19, 2018

Keras models fit() method not converging with eager execution #18642

Keras models fit() method not converging with eager execution #18642

Comments

fantauzzi commented Apr 18, 2018 • edited

System information

Describe the problem

Source code / logs

fchollet commented Apr 19, 2018

fantauzzi commented Apr 18, 2018 •

edited