Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Conversation

@cwbeitel
Copy link
Contributor

only outstanding issue blocking this should be that the test in allen_brain_test.py that uses tf.eager breaks many/all other t2t tests not compatible with eager mode.

squashed these:

commit 76c8cd3f105c07919f84d70b02b1fbd38b5376fe
Author: cwbeitel me@cwbeitel-github.com
Date: Fri Jun 15 18:35:13 2018 +0000

fix mock_raw_data

commit f0dc3a0
Merge: 3e17473 4b7360d
Author: cwbeitel me@cwbeitel-github.com
Date: Thu Jun 14 22:54:33 2018 +0000

Merge branch 'allen-deps' of https://github.com/cwbeitel/tensor2tensor into allen-deps

commit 3e17473
Author: cwbeitel me@cwbeitel-github.com
Date: Fri Jun 8 17:04:39 2018 +0000

cleanup

commit 3e8ef9d
Author: cwbeitel me@cwbeitel-github.com
Date: Wed Jun 6 23:07:28 2018 +0000

removed image summary writing from metrics

training was 10x slower than CIFAR img2img problem, profiled and found 91% of time was being sp

ent writing ImageSummary's, removed that from metrics and saw the speed increase as expected.

commit 4814556
Author: cwbeitel me@cwbeitel-github.com
Date: Tue May 29 16:00:17 2018 +0000

basic img2img prob. defs that maybe get data

commit 320f06c
Author: cwbeitel me@cwbeitel-github.com
Date: Thu May 24 18:53:02 2018 +0000

test without accessing network, follow rcfile

* made test that accesses remote API optional, off by default
* included test that runs sub-image extractor on mocked raw input image within mocked directory

structure mirroring what would be produced by running actual data downloader
* refactoring according to rcfile

commit 503123b
Author: cwbeitel me@cwbeitel-github.com
Date: Wed May 23 22:09:31 2018 +0000

add [allen] deps to travis setup

commit 239188b
Author: cwbeitel me@cwbeitel-github.com
Date: Wed May 23 21:39:13 2018 +0000

data download and sub-image gen utility

* if additional [allen] deps not present suggest running `pip install tensor2tensor[allen]`
* obtains a list of section dataset ID's from Allen Institute API
* downloads specified number of raw images from specified number of section datasets
* produces specified number of sub-images of specified size from raw images.
* simple e2e test that runs the data downloader and sub-image generator with default parameters

commit af34cc6
Author: cwbeitel me@cwbeitel-github.com
Date: Wed May 23 17:47:35 2018 +0000

deps. for allen brain img probs. via extras

only outstanding issue blocking this should be that the test in allen_brain_test.py that uses tf.eager breaks many/all other t2t tests not compatible with eager mode.

squashed these:

commit 76c8cd3f105c07919f84d70b02b1fbd38b5376fe
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Fri Jun 15 18:35:13 2018 +0000

    fix mock_raw_data

commit f0dc3a0
Merge: 3e17473 4b7360d
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Thu Jun 14 22:54:33 2018 +0000

    Merge branch 'allen-deps' of https://github.com/cwbeitel/tensor2tensor into allen-deps

commit 3e17473
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Fri Jun 8 17:04:39 2018 +0000

    cleanup

commit 3e8ef9d
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Wed Jun 6 23:07:28 2018 +0000

    removed image summary writing from metrics

    training was 10x slower than CIFAR img2img problem, profiled and found 91% of time was being sp
ent writing ImageSummary's, removed that from metrics and saw the speed increase as expected.

commit 4814556
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Tue May 29 16:00:17 2018 +0000

    basic img2img prob. defs that maybe get data

commit 320f06c
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Thu May 24 18:53:02 2018 +0000

    test without accessing network, follow rcfile

    * made test that accesses remote API optional, off by default
    * included test that runs sub-image extractor on mocked raw input image within mocked directory
 structure mirroring what would be produced by running actual data downloader
    * refactoring according to rcfile

commit 503123b
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Wed May 23 22:09:31 2018 +0000

    add [allen] deps to travis setup

commit 239188b
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Wed May 23 21:39:13 2018 +0000

    data download and sub-image gen utility

    * if additional [allen] deps not present suggest running `pip install tensor2tensor[allen]`
    * obtains a list of section dataset ID's from Allen Institute API
    * downloads specified number of raw images from specified number of section datasets
    * produces specified number of sub-images of specified size from raw images.
    * simple e2e test that runs the data downloader and sub-image generator with default parameters

commit af34cc6
Author: cwbeitel <me@cwbeitel-github.com>
Date:   Wed May 23 17:47:35 2018 +0000

    deps. for allen brain img probs. via extras
@googlebot googlebot added the cla: yes PR author has signed CLA label Jun 15, 2018
@cwbeitel
Copy link
Contributor Author

@rsepassi Noticed many other tests were failing outside of allen_brain* tests with a bunch of errors about Eager mode, guessing the problem for those is my test that uses Eager.

@rsepassi
Copy link
Contributor

Where are you seeing the errors about Eager mode? I don't see them in the Travis test logs.

When you run the tests locally with pytest, do they pass?

We'll want to ensure these all pass on Travis. One idea is that maybe it's problematic running the allen tests alongside all the others and we'll want to run them separately. You can see in the .travis.yml file examples of a test being ignored in the main run and then run separately afterwards.

@cwbeitel cwbeitel changed the title [WIP] img2img_allen_brain problem defs img2img_allen_brain problem defs Jun 15, 2018
@cwbeitel cwbeitel changed the title img2img_allen_brain problem defs [WIP] img2img_allen_brain problem defs Jun 15, 2018
@cwbeitel
Copy link
Contributor Author

@rsepassi The verbose output that mentions Eager doesn't show on Travis (only lots of failed tests, see https://travis-ci.org/tensorflow/tensor2tensor/jobs/392846781). It will probably fix it to just ignore allen tests in main pytest call and call these separately as you were suggesting.

Ah there we go maybe before the reason it wasn't triggering the CI on the last PR was because there was a merge conflict, fixed that here and it proceeded.

@rsepassi
Copy link
Contributor

Looks like the allen tests are failing.

@cwbeitel
Copy link
Contributor Author

@rsepassi Yeah hmm taking a look. Those pass on my local walking through the CI commands. Removed -q from pip installs and put in commands to check if pandas is getting dropped at some point or never installed but it looks like the CI is using the old version of .travis.yml.

@cwbeitel
Copy link
Contributor Author

@rsepassi Those changes appear to have fixed everything except for a linter error in gym_utils, https://travis-ci.org/tensorflow/tensor2tensor/jobs/394214469#L2613,

pylint -j 2 tensor2tensor
Using config file /mnt/nfs-east1-d/work/prs/problem-def/pylintrc
************* Module tensor2tensor.data_generators.gym_utils
E1123: 31 WarmupWrapper.__init__: Unexpected keyword argument 'dtype' in constructor call [unexpected-keyword-arg]
W0221: 48 WarmupWrapper.reset: Parameters differ from overridden 'reset' method [arguments-differ]
E1123: 70 PongWrapper.__init__: Unexpected keyword argument 'dtype' in constructor call [unexpected-keyword-arg]
E1123:149 BreakoutWrapper.__init__: Unexpected keyword argument 'dtype' in constructor call [unexpected-keyword-arg]

Verified the allen_brain* files pass the linter locally.

@cwbeitel
Copy link
Contributor Author

@rsepassi My bad, there was a little linter fix I made and forgot about. I can make those changes to gym_utils if that simplifies things.

@cwbeitel
Copy link
Contributor Author

Hmm looks like those may not have actually been part of the problem on travis.

@cwbeitel cwbeitel changed the title [WIP] img2img_allen_brain problem defs img2img_allen_brain problem defs Jun 19, 2018
@cwbeitel
Copy link
Contributor Author

@rsepassi Let me know if you're waiting on me for anything, otherwise no pressure!

@rsepassi
Copy link
Contributor

Was just going through to merge it but noticed that AllenSDK has a non-permissive license which actually makes it hard to import and use the code within Google. It may still be possible but it would be a lot of work and I just don't have the cycles to go through something like that right now unfortunately.

Now I don't want your work to go to waste so here's what I'm thinking: you can maintain a fork of tensor2tensor with the Allen problems so that people can generate the data and we'd be happy to have the Problems in tensor2tensor just without any AllenSDK dependency/import and the generate_data method could print out "To generate the data for XXX go to github/cwbeitel/tensor2tensor and run ....". Or something like that.

Very sorry about this, wish I noticed it earlier. What do you think?

@cwbeitel
Copy link
Contributor Author

No worries yeah that sounds fine for now, thanks for the cycles thus far. I'll make that change.

As well as look into getting permission to host the data in a requester-pays GCS bucket. The code for scraping their API seems like it will be fragile over the long-term and I think a better approach than having to maintain this is to have it work once and provide that result as a pre-processed dataset in the style of things like imagenet, celeba, etc.

* Simplify allen_brain_utils* to download smaller static list of images, avoiding allensdk dependency.

* Simplify _generator given input is already in [0, 255]

* _generator ignores likely non-tissue/background regions based on max intensity

* Fractional in-painting step added to pre-processor, depends on inpaint_fraction property, randomly positioned square region

* Random crop to target dimensions instead of downsampling leaving full resolution of target intact

* Re-using same generated data for different problems that share examples (differing in terms of how these are pre-processed)
@cwbeitel
Copy link
Contributor Author

@rsepassi Hey! I removed the allensdk dependency but kept the data download utils which is far simpler if maintaining a list of image IDs (instead of building these through various queries to the API). There are 100 IDs included here randomly sampled across the full set of Mouse data. Given that we're considering 64 x 64px regions of images that are like 8.5k x 15k that's enough to generate around 3M examples. If that's not enough I referenced the API docs and a gist that includes the old download code.

@rsepassi
Copy link
Contributor

rsepassi commented Aug 3, 2018

Thank you @cwbeitel and sorry for the delay. Will merge shortly.

@rsepassi rsepassi merged commit 7967b44 into tensorflow:master Aug 3, 2018
@cwbeitel
Copy link
Contributor Author

cwbeitel commented Aug 3, 2018

@rsepassi No worries. Thank you.

tensorflow-copybara pushed a commit that referenced this pull request Aug 3, 2018
PiperOrigin-RevId: 207290554
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla: yes PR author has signed CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants