Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras models fit() method not converging with eager execution #18642

Closed
fantauzzi opened this issue Apr 18, 2018 · 1 comment
Closed

Keras models fit() method not converging with eager execution #18642

fantauzzi opened this issue Apr 18, 2018 · 1 comment
Assignees
Labels

Comments

@fantauzzi
Copy link

fantauzzi commented Apr 18, 2018

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    Yes, see below.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Ubuntu 16.04
  • TensorFlow installed from (source or binary):
    Binary, with pip3
  • TensorFlow version (use command below):
    v1.7.0-3-g024aecf414 1.7.0
  • Python version:
    3.5.2
  • Bazel version (if compiling from source):
    N/A
  • GCC/Compiler version (if compiling from source):
    N/A
  • CUDA/cuDNN version:
    9.0.176/7.0.5
  • GPU model and memory:
    GTX 1080 (Armor MSI) with 8 GB VRAM
  • Exact command to reproduce:
    Run program below

Describe the problem

After enabling eager execution, method tf.keras.models.Sequential.fit() doesn't seem to converge, as the computed loss doesn't go down (stays around 9 to 11); also, method fit() doesn't report the requested metric "accuracy": it doesn't print the metric to console, and does not return it in the History object.

After disabling eager execution, the same optimization converges, as loss goes down (to around 1); also, method fit() correctly reports the requested metric "accuracy", both to console and in the returned History object.

Source code / logs

import os
import pandas as pd
import numpy as np
import time
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.contrib.eager as tfe

"""
Check the beginning of main() for parameters
"""


def load_data(folder):
    file_name = os.path.join(folder, 'winequality-data.csv')
    if os.path.isfile(file_name):
        data = pd.read_csv(file_name)
    else:
        print('File {} not found.'.format(file_name, folder))
        print('Dataset can be downloaded from https://www.kaggle.com/c/uci-wine-quality-dataset/data')
        exit(1)
    # solutions = pd.read_csv(os.path.join(folder, 'winequality-solution-input.csv'))
    return data


def train_input_fn(features, labels, batch_size):
    features_tensor = tf.constant(features)
    labels_tensor = tf.constant(labels)
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((features_tensor, labels_tensor))

    # Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(len(labels)).repeat(count=1).batch(batch_size)

    # Return the dataset.
    return dataset


def loss(model, X, y):
    logits = model(X)
    the_loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=logits)
    return the_loss


def grad(model, inputs, targets):
    with tfe.GradientTape() as tape:
        loss_value = loss(model, inputs, targets)
    return tape.gradient(loss_value, model.variables)


def train(model, X, y, batch_size, epochs):
    train_ds = train_input_fn(X, y, batch_size=batch_size)
    optimizer = tf.train.AdamOptimizer()

    loss_by_epoch = []
    accuracy_by_epoch = []

    for epoch in range(epochs):
        epoch_loss_avg = tfe.metrics.Mean()
        epoch_error = tfe.metrics.Mean()
        for batch, (batch_X, batch_y) in enumerate(tfe.Iterator(train_ds)):
            grads = grad(model, batch_X, batch_y)
            optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
            epoch_loss_avg(loss(model, batch_X, batch_y))
            correct_prediction = tf.equal(tf.argmax(model(batch_X), axis=1, output_type=tf.int32), batch_y)
            epoch_error(tf.reduce_mean(tf.cast(correct_prediction, tf.float32)))
        print('Epoch {}:  loss={}  accuracy={}'.format(epoch, epoch_loss_avg.result(), epoch_error.result()))
        loss_by_epoch.append(epoch_loss_avg.result())
        accuracy_by_epoch.append(epoch_error.result())

    return loss_by_epoch, accuracy_by_epoch


def main():
    # Just comment the next line out to disable eager execution
    tf.enable_eager_execution()

    """
    Set use_fit to True to optimize by calling tf.keras.models.Sequential.fit(),
    set to False to use tfe.GradientTape() instead. Note that in order to use tfe.Gradient.tape(),
    eager execution must be enabled
    """
    use_fit = True

    epochs = 200
    batch_size = 64
    dataset_folder = '.'

    # Load dataset and convert it to numpy arrays
    data = load_data(dataset_folder)
    train_X = data.iloc[:, 0:11].values.astype(np.float32)
    train_y = data.iloc[:, 11].values.astype(np.int32)

    if use_fit:  # train_y needs to be 1-hot encoded for usage with model.fit()
        train_y = tf.keras.utils.to_categorical(train_y, num_classes=11)

    model = tf.keras.models.Sequential([
        tf.keras.layers.InputLayer(input_shape=(11,)),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(11, activation='softmax') if use_fit else tf.keras.layers.Dense(11)
    ])

    start = time.time()

    if use_fit:
        optimizer = tf.train.AdamOptimizer()
        model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
        history = model.fit(x=train_X, y=train_y, epochs=epochs, batch_size=batch_size, verbose=2)
        loss_by_epoch = history.history['loss']
        accuracy_by_epoch = history.history['acc'] if 'acc' in history.history else []

    else:
        loss_by_epoch, accuracy_by_epoch = train(model=model, X=train_X, y=train_y, batch_size=batch_size,
                                                 epochs=epochs)

    end = time.time()
    print('It took {} seconds'.format(end - start))

    # Chart loss and error
    fig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))
    fig.suptitle('Training Metrics')

    axes[0].set_ylabel("Loss", fontsize=14)
    axes[0].plot(loss_by_epoch)

    axes[1].set_ylabel("Accuracy", fontsize=14)
    axes[1].set_xlabel("Epoch", fontsize=14)
    axes[1].plot(accuracy_by_epoch)

    plt.show()


if __name__ == '__main__':
    main()

@asimshankar asimshankar added comp:eager Eager related issues type:bug Bug and removed comp:eager Eager related issues labels Apr 18, 2018
@fchollet
Copy link
Member

Thank you for the bug report. We have a fix for this issue, that will show up on GitHub soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants