Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model performs differently for model.fit and custom training loop #41957

Open
arshagarwal opened this issue Aug 1, 2020 · 5 comments
Open

Model performs differently for model.fit and custom training loop #41957

arshagarwal opened this issue Aug 1, 2020 · 5 comments
Assignees
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.10 type:bug Bug

Comments

@arshagarwal
Copy link

arshagarwal commented Aug 1, 2020

System information

  • Os platform Linux Ubuntu 16.04
  • TensorFlow version used - 2.2.0
  • Python version: 3.6.9

Custom Training Loop vs Model.fit code

model=model()
optimizer=tf.keras.optimizers.Adam()
batch_size=opt.b_size
n_batches = int(len(train_set) / opt.b_size)

for i in range(opt.epochs):
    loss_t=0
    loss_vt=0
    it=0
    for j in range(n_batches):
        with tf.GradientTape() as tape:
            tape.watch(model.trainable_variables)
            curr=train_set[it:it+batch_size]
            forward=model(curr,True)
            loss=tf.keras.losses.MeanAbsoluteError()(y_train[it:it+batch_size],forward)
            loss_t += loss

        grads=tape.gradient(loss,model.trainable_variables)
        optimizer.apply_gradients(zip(grads,model.trainable_variables))
        it+=batch_size

    # shuffling the test set 
    index = np.arange(0, len(test_set))
    np.random.shuffle(index)
    test_set = test_set[index]
    y_test = y_test[index]
    loss_t=loss_t.numpy()
    
    # calculating the validation loss
    forward_v = model(test_set[:batch_size], False)
    loss_v = tf.keras.losses.MeanAbsoluteError()(y_test[:batch_size], forward_v)
    loss_v=loss_v.numpy()
    loss_t /= n_batches
    print("Loss: {} Validation loss: {} ".format( round(loss_t,4) , round(loss_v,4) ) )

The above mentioned code is the custom training loop of a model.

model=model()
model.compile(tf.keras.optimizers.Adam(),tf.keras.losses.MeanAbsoluteError(),metrics=['accuracy'])
model.fit(train_set,y_train,batch_size=opt.b_size,epochs=opt.epochs,validation_data=(test_set,y_test))

The above code uses model.fit method for training.

Behaviour and code to replicate the Results
But when I run the same code on the same train dataset and validation dataset with all the same parameters, the validation loss obtained is very different in the two cases.

  • Here is the google colab gist to replicate the results.
  • Alternatively, to replicate results locally, please run train.py and train2.py on my github using the following code after cloning the repository to reproduce the results:
  1. bash import_weights.sh
  2. python train.py --n_samples 1000 --epochs 100 --b_size 50 for model.fit
  3. python train2.py --n_samples 1000 --epochs 100 --b_size 50 for custom training
@arshagarwal arshagarwal added the type:bug Bug label Aug 1, 2020
@Saduf2019 Saduf2019 added the TF 2.2 Issues related to TF 2.2 label Aug 2, 2020
@Saduf2019
Copy link
Contributor

@arshagarwal
Can you please provide the issue faced in a colab gist for us to analyse the issue reported.

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Aug 3, 2020
@arshagarwal
Copy link
Author

arshagarwal commented Aug 6, 2020

@arshagarwal
Can you please provide the issue faced in a colab gist for us to analyse the issue reported.

Thanks for taking the time, here is the google colab gist.

@Saduf2019
Copy link
Contributor

@arshagarwal
I ran the code shared,please find the gist here.
Please confirm if this replicates the issue.

@arshagarwal
Copy link
Author

@arshagarwal
I ran the code shared,please find the gist here.
Please confirm if this replicates the issue.

Yes, this is exactly what I am trying to report.

@Saduf2019 Saduf2019 added comp:keras Keras related issues and removed stat:awaiting response Status - Awaiting response from author labels Aug 6, 2020
@Saduf2019 Saduf2019 assigned ymodak and unassigned Saduf2019 Aug 6, 2020
@ymodak ymodak assigned omalleyt12 and unassigned ymodak Aug 17, 2020
@sushreebarsa
Copy link
Contributor

sushreebarsa commented May 30, 2021

Was able to replicate the issue in TF 2.12.0-dev20221114, please find the gist here. Thank you!

@sushreebarsa sushreebarsa self-assigned this Dec 13, 2021
@sushreebarsa sushreebarsa added TF 2.7 Issues related to TF 2.7.0 and removed TF 2.2 Issues related to TF 2.2 labels Dec 13, 2021
@sushreebarsa sushreebarsa removed their assignment Mar 25, 2022
@tilakrayal tilakrayal added TF 2.10 stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed TF 2.7 Issues related to TF 2.7.0 labels Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.10 type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants