Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour with dataset.mapAsync and fitDataset #1450

Closed
tafsiri opened this issue Mar 27, 2019 · 4 comments
Closed

Unexpected behaviour with dataset.mapAsync and fitDataset #1450

tafsiri opened this issue Mar 27, 2019 · 4 comments

Comments

@tafsiri
Copy link
Contributor

tafsiri commented Mar 27, 2019

TensorFlow.js version

"@tensorflow/tfjs": "1.0.0",

Browser version

Node 11.11

Describe the problem or feature request

  1. After creating a dataset pipeline with a batch and mapAsync operation. Passing it to model.fitDataset, causes the mapAsync to run over the whole dataset and then once again for each batch.

I've made a repo to reproduce that I'll link below but here is some of the output I get.

The code looks like this https://github.com/tafsiri/use-text-classifier/blob/fitdataset/training/train.js#L63

counter 0
Embedding a batch: 1729.892ms
counter 1
Embedding a batch: 1643.588ms
counter 2
Embedding a batch: 1329.843ms
counter 3
Embedding a batch: 1706.367ms
counter 4
Embedding a batch: 1489.202ms
counter 5
Embedding a batch: 1704.473ms
counter 6
Embedding a batch: 1431.451ms
counter 7
Embedding a batch: 1136.306ms
counter 8
Embedding a batch: 919.442ms
counter 9
Embedding a batch: 1476.211ms
counter 10
Embedding a batch: 1453.095ms
counter 11
// This continues until all the batches are consumed and then i get
counter 32
Embedding a batch: 1770.664ms
onBatchEnd 0 { batch: 0, size: 4, loss: 2.2711334228515625, acc: 0.5 }
counter 33
Embedding a batch: 2392.485ms
onBatchEnd 1 { batch: 1, size: 4, loss: 2.256931781768799, acc: 0.75 }
counter 34
Embedding a batch: 1544.857ms
onBatchEnd 2 { batch: 2, size: 4, loss: 2.3282790184020996, acc: 0 }
counter 35

The initial behaviour of going through all the batches happens at the start of each epoch.

2). When using mapAsync without batch it seems to not always wait for the function to finish? I'll link to sample code below but in this case I get

counter 0
Embedding a sentence: 843.928ms
counter 1
Embedding a sentence: 536.952ms
counter 2
counter 3
counter 4
counter 5
Embedding a sentence: 1144.132ms
counter 6
Embedding a sentence: 42.071ms
counter 7
counter 8
counter 9
counter 10
Embedding a sentence: 992.742ms
counter 11
Embedding a sentence: 242.740ms
counter 12
Embedding a sentence: 318.635ms

The counter seems to increment multiple times before some of the tensor operations complete.

The code looks like this https://github.com/tafsiri/use-text-classifier/blob/fitdataset/training/train.js#L23

Code to reproduce the bug / link to feature request

To reproduce 1). Clone this repo https://github.com/tafsiri/use-text-classifier/tree/fitdataset switch the fitDataset branch, go into the training folder and run node train.js

To reproduce 2). Do the same as above but comment out lines 124-130 in train.js and uncomment lines 133-140.

@shmishra99
Copy link
Contributor

Hi @tafsiri ,
Apology for the late response.
Kindly Let me know if your issue is resolved in latest version of npm 4.4.0. If it is not resolved yet, kindly share the reproducible code. Your provided code link is broken. Thanks!

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you.

@google-ml-butler
Copy link

Closing as stale. Please @mention us if this needs more attention.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants