Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on initializable iterators w/ numpy arrays #138

Closed
jperl opened this issue Mar 8, 2019 · 5 comments
Closed

Guidance on initializable iterators w/ numpy arrays #138

jperl opened this issue Mar 8, 2019 · 5 comments

Comments

@jperl
Copy link

jperl commented Mar 8, 2019

I am collecting data during the training process and using Dataset.from_tensor_slices with placeholders and an initializable iterator to refresh the dataset. The dataset uses the tensor slices to then do further preprocessing.

As new data is collected, I reinitialize the iterator's placeholders with the new numpy array data.

Since initializable iterators are deprecated now, how do you recommend I seed the dataset with the dynamics numpy arrays? Should I switch to using a generator?

@jperl
Copy link
Author

jperl commented Mar 8, 2019

Relates to #68

@jperl jperl changed the title Guidance on initializable iterators Guidance on initializable iterators w/ numpy arrays Mar 8, 2019
@suphoff
Copy link
Contributor

suphoff commented Mar 8, 2019

It looks like the Iterator class even completely disappeared from the r2.0 API documentation.
As such I would concentrate on creating a dynamic modifiable dataset pipeline. This could be a custom dataset op - or a dataset pipeline using TF variables.
For a good example for the later - take a look at the replay buffer implemented in tensorflow/agent:
https://github.com/tensorflow/agents/blob/master/tf_agents/replay_buffers/tf_uniform_replay_buffer.py
https://github.com/tensorflow/agents/blob/master/tf_agents/replay_buffers/table.py

@jperl
Copy link
Author

jperl commented Mar 8, 2019

Using variables is a good idea, thank you!

@jperl jperl closed this as completed Mar 8, 2019
@yongtang
Copy link
Member

yongtang commented Mar 8, 2019

@jperl I think it depends on if your data is already in memory (numpy arrays) or not. If the data is already in numpy arrays then I think we could help implement a Dataset to map the numpy arrays into tensors as data continue to feed in. If you have other sources of input (like a file or a stream), we could also help implement a Dataset to take the input directly, not even necessarily to read into numpy array.

Maybe you could share some details or some boilerplate code to show what the data input format is?

@jperl
Copy link
Author

jperl commented Mar 8, 2019

The data is paths to images stored on a local filesystem. It is queried from a database, and then in memory as an array of strings. After each training loop we requery the database for the latest paths (and other metadata), and reinitialize the dataset.

I believe we can accomplish this similar to the TFUniformReplayBuffer example from @suphoff, using a Counter, map and variables or a generator to provide the latest numpy arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants