Model performs differently for model.fit and custom training loop #41957

arshagarwal · 2020-08-01T11:19:44Z

System information

Os platform Linux Ubuntu 16.04
TensorFlow version used - 2.2.0
Python version: 3.6.9

Custom Training Loop vs Model.fit code

model=model()
optimizer=tf.keras.optimizers.Adam()
batch_size=opt.b_size
n_batches = int(len(train_set) / opt.b_size)

for i in range(opt.epochs):
    loss_t=0
    loss_vt=0
    it=0
    for j in range(n_batches):
        with tf.GradientTape() as tape:
            tape.watch(model.trainable_variables)
            curr=train_set[it:it+batch_size]
            forward=model(curr,True)
            loss=tf.keras.losses.MeanAbsoluteError()(y_train[it:it+batch_size],forward)
            loss_t += loss

        grads=tape.gradient(loss,model.trainable_variables)
        optimizer.apply_gradients(zip(grads,model.trainable_variables))
        it+=batch_size

    # shuffling the test set 
    index = np.arange(0, len(test_set))
    np.random.shuffle(index)
    test_set = test_set[index]
    y_test = y_test[index]
    loss_t=loss_t.numpy()
    
    # calculating the validation loss
    forward_v = model(test_set[:batch_size], False)
    loss_v = tf.keras.losses.MeanAbsoluteError()(y_test[:batch_size], forward_v)
    loss_v=loss_v.numpy()
    loss_t /= n_batches
    print("Loss: {} Validation loss: {} ".format( round(loss_t,4) , round(loss_v,4) ) )

The above mentioned code is the custom training loop of a model.

model=model()
model.compile(tf.keras.optimizers.Adam(),tf.keras.losses.MeanAbsoluteError(),metrics=['accuracy'])
model.fit(train_set,y_train,batch_size=opt.b_size,epochs=opt.epochs,validation_data=(test_set,y_test))

The above code uses model.fit method for training.

Behaviour and code to replicate the Results
But when I run the same code on the same train dataset and validation dataset with all the same parameters, the validation loss obtained is very different in the two cases.

Here is the google colab gist to replicate the results.
Alternatively, to replicate results locally, please run train.py and train2.py on my github using the following code after cloning the repository to reproduce the results:

bash import_weights.sh
python train.py --n_samples 1000 --epochs 100 --b_size 50 for model.fit
python train2.py --n_samples 1000 --epochs 100 --b_size 50 for custom training

The text was updated successfully, but these errors were encountered:

Saduf2019 · 2020-08-03T15:08:20Z

@arshagarwal
Can you please provide the issue faced in a colab gist for us to analyse the issue reported.

arshagarwal · 2020-08-06T04:22:04Z

@arshagarwal
Can you please provide the issue faced in a colab gist for us to analyse the issue reported.

Thanks for taking the time, here is the google colab gist.

Saduf2019 · 2020-08-06T07:29:12Z

@arshagarwal
I ran the code shared,please find the gist here.
Please confirm if this replicates the issue.

arshagarwal · 2020-08-06T07:56:21Z

@arshagarwal
I ran the code shared,please find the gist here.
Please confirm if this replicates the issue.

Yes, this is exactly what I am trying to report.

sushreebarsa · 2021-05-30T10:59:15Z

Was able to replicate the issue in TF 2.12.0-dev20221114, please find the gist here. Thank you!

arshagarwal added the type:bug Bug label Aug 1, 2020

google-ml-butler bot assigned Saduf2019 Aug 1, 2020

Saduf2019 added the TF 2.2 Issues related to TF 2.2 label Aug 2, 2020

Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Aug 3, 2020

Saduf2019 added comp:keras Keras related issues and removed stat:awaiting response Status - Awaiting response from author labels Aug 6, 2020

Saduf2019 assigned ymodak and unassigned Saduf2019 Aug 6, 2020

ymodak assigned omalleyt12 and unassigned ymodak Aug 17, 2020

sushreebarsa self-assigned this Dec 13, 2021

sushreebarsa added TF 2.7 Issues related to TF 2.7.0 and removed TF 2.2 Issues related to TF 2.2 labels Dec 13, 2021

sushreebarsa removed their assignment Mar 25, 2022

tilakrayal added TF 2.10 stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed TF 2.7 Issues related to TF 2.7.0 labels Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model performs differently for model.fit and custom training loop #41957

Model performs differently for model.fit and custom training loop #41957

arshagarwal commented Aug 1, 2020 •

edited

Saduf2019 commented Aug 3, 2020

arshagarwal commented Aug 6, 2020 •

edited

Saduf2019 commented Aug 6, 2020

arshagarwal commented Aug 6, 2020

sushreebarsa commented May 30, 2021 •

edited by tilakrayal

Model performs differently for model.fit and custom training loop #41957

Model performs differently for model.fit and custom training loop #41957

Comments

arshagarwal commented Aug 1, 2020 • edited

Saduf2019 commented Aug 3, 2020

arshagarwal commented Aug 6, 2020 • edited

Saduf2019 commented Aug 6, 2020

arshagarwal commented Aug 6, 2020

sushreebarsa commented May 30, 2021 • edited by tilakrayal

arshagarwal commented Aug 1, 2020 •

edited

arshagarwal commented Aug 6, 2020 •

edited

sushreebarsa commented May 30, 2021 •

edited by tilakrayal