Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use data generator to federated framework when train on large dataset #793

Closed
zm17943 opened this issue Jan 21, 2020 · 9 comments
Closed
Assignees

Comments

@zm17943
Copy link

zm17943 commented Jan 21, 2020

Hi!

I was very glad to customize my own data and model to federated interfaces and the training converged!

Now I am confused about an issue that in an images classification task, the whole dataset is extreme large and it can't be stored in a single federated_train_data nor be imported to memory for one time. So I need to load the dataset from the hard disk in batches to memory real-timely and use Keras model.fit_generator instead of model.fit during training, the approach people use to deal with large data.

I suppose in iterative_process shown in image classification tutorial, the model is fitted on a fixed set of data. Is there any way to adjust the code to let it fit to a data generator?I have looked into the source codes but still quite confused. Would be incredibly grateful for any hints.

@jkr26
Copy link
Collaborator

jkr26 commented Jan 22, 2020

Thanks for asking on SO! Dropping a link here, will close when there is an accepted answer.

@jkr26 jkr26 self-assigned this Jan 22, 2020
@zm17943
Copy link
Author

zm17943 commented Feb 10, 2020

Hi! Thanks so much for your reply on SO! Unfortunately I still can't work it out. Do you know any way to change the model training process on local clients?

@ZacharyGarrett
Copy link
Collaborator

@zm17943 could you take a look at the example in this StackOverflow answer? This does not load all clients at once, only the clients in one round of competition are used at a time, and then the tf.data.Dataset also are only loading a subset of their entirely data, as-needed, for training.

@zm17943
Copy link
Author

zm17943 commented Mar 2, 2020

Thank you! I have looked into the StackOverflow answer, and adjusted my code to load each client at one time. However, I am still confused about the use of real-time data augmentation, for example, can I use tf.Data.Dataset.from_generator to load data into Federated?

@aslfu
Copy link

aslfu commented Mar 2, 2020

Hi, I tried to use tf.data.Dataset.from_generator to train federated model. But this step took forever.

iterative_process.next

I tried to reduce the batch_size and trainable parameters to get it fast, but still. I was wondering how to diagnose the training process?

@zm17943
Copy link
Author

zm17943 commented Mar 2, 2020

I have exactly

Hi, I tried to use tf.data.Dataset.from_generator to train federated model. But this step took forever.

iterative_process.next

I tried to reduce the batch_size and trainable parameters to get it fast, but still. I was wondering how to diagnose the training process?

I have the exactly same issue!

@jkr26
Copy link
Collaborator

jkr26 commented Mar 2, 2020

One thing that I might investigate here: try adding a ds.take(1) to your dataset constructors, or raise a StopIteration exception from your generator after yielding e.g. a single element.

If TFF is given an infinite tf.data.Dataset, it will likely reduce forever, keep pulling elements out of the dataset.

I am thinking this way because if your generator never raises StopIteration, I believe from_generator will treat the dataset as infinite. The docs just linked reference Python's iterator protocol, which states:

If there are no further items, raise the StopIteration exception.

which therefore implies: if there is no StopIteration, there are further items.

@aslfu
Copy link

aslfu commented Mar 3, 2020

Yes, I am using ImageDataGenerator to create my generator, and it produces infinite batches.
How to add StopIteration in ImageDataGenerator ? Or do I need to use another generator.

@jkr26
Copy link
Collaborator

jkr26 commented Mar 4, 2020

We seem to be getting this question along multiple channel, so I think for ease of discoverability we would prefer to consolidate to stackoverflow. Please see the discussion here, and open a question there if that does not suit your needs.

Thanks!

@jkr26 jkr26 closed this as completed Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants