img2img_allen_brain problem defs #872

cwbeitel · 2018-06-15T18:59:48Z

only outstanding issue blocking this should be that the test in allen_brain_test.py that uses tf.eager breaks many/all other t2t tests not compatible with eager mode.

squashed these:

commit 76c8cd3f105c07919f84d70b02b1fbd38b5376fe
Author: cwbeitel me@cwbeitel-github.com
Date: Fri Jun 15 18:35:13 2018 +0000

fix mock_raw_data

commit f0dc3a0
Merge: 3e17473 4b7360d
Author: cwbeitel me@cwbeitel-github.com
Date: Thu Jun 14 22:54:33 2018 +0000

Merge branch 'allen-deps' of https://github.com/cwbeitel/tensor2tensor into allen-deps

commit 3e17473
Author: cwbeitel me@cwbeitel-github.com
Date: Fri Jun 8 17:04:39 2018 +0000

cleanup

commit 3e8ef9d
Author: cwbeitel me@cwbeitel-github.com
Date: Wed Jun 6 23:07:28 2018 +0000

removed image summary writing from metrics

training was 10x slower than CIFAR img2img problem, profiled and found 91% of time was being sp

ent writing ImageSummary's, removed that from metrics and saw the speed increase as expected.

commit 4814556
Author: cwbeitel me@cwbeitel-github.com
Date: Tue May 29 16:00:17 2018 +0000

basic img2img prob. defs that maybe get data

commit 320f06c
Author: cwbeitel me@cwbeitel-github.com
Date: Thu May 24 18:53:02 2018 +0000

test without accessing network, follow rcfile

* made test that accesses remote API optional, off by default
* included test that runs sub-image extractor on mocked raw input image within mocked directory

structure mirroring what would be produced by running actual data downloader
* refactoring according to rcfile

commit 503123b
Author: cwbeitel me@cwbeitel-github.com
Date: Wed May 23 22:09:31 2018 +0000

add [allen] deps to travis setup

commit 239188b
Author: cwbeitel me@cwbeitel-github.com
Date: Wed May 23 21:39:13 2018 +0000

data download and sub-image gen utility

* if additional [allen] deps not present suggest running `pip install tensor2tensor[allen]`
* obtains a list of section dataset ID's from Allen Institute API
* downloads specified number of raw images from specified number of section datasets
* produces specified number of sub-images of specified size from raw images.
* simple e2e test that runs the data downloader and sub-image generator with default parameters

commit af34cc6
Author: cwbeitel me@cwbeitel-github.com
Date: Wed May 23 17:47:35 2018 +0000

deps. for allen brain img probs. via extras

only outstanding issue blocking this should be that the test in allen_brain_test.py that uses tf.eager breaks many/all other t2t tests not compatible with eager mode. squashed these: commit 76c8cd3f105c07919f84d70b02b1fbd38b5376fe Author: cwbeitel <me@cwbeitel-github.com> Date: Fri Jun 15 18:35:13 2018 +0000 fix mock_raw_data commit f0dc3a0 Merge: 3e17473 4b7360d Author: cwbeitel <me@cwbeitel-github.com> Date: Thu Jun 14 22:54:33 2018 +0000 Merge branch 'allen-deps' of https://github.com/cwbeitel/tensor2tensor into allen-deps commit 3e17473 Author: cwbeitel <me@cwbeitel-github.com> Date: Fri Jun 8 17:04:39 2018 +0000 cleanup commit 3e8ef9d Author: cwbeitel <me@cwbeitel-github.com> Date: Wed Jun 6 23:07:28 2018 +0000 removed image summary writing from metrics training was 10x slower than CIFAR img2img problem, profiled and found 91% of time was being sp ent writing ImageSummary's, removed that from metrics and saw the speed increase as expected. commit 4814556 Author: cwbeitel <me@cwbeitel-github.com> Date: Tue May 29 16:00:17 2018 +0000 basic img2img prob. defs that maybe get data commit 320f06c Author: cwbeitel <me@cwbeitel-github.com> Date: Thu May 24 18:53:02 2018 +0000 test without accessing network, follow rcfile * made test that accesses remote API optional, off by default * included test that runs sub-image extractor on mocked raw input image within mocked directory structure mirroring what would be produced by running actual data downloader * refactoring according to rcfile commit 503123b Author: cwbeitel <me@cwbeitel-github.com> Date: Wed May 23 22:09:31 2018 +0000 add [allen] deps to travis setup commit 239188b Author: cwbeitel <me@cwbeitel-github.com> Date: Wed May 23 21:39:13 2018 +0000 data download and sub-image gen utility * if additional [allen] deps not present suggest running `pip install tensor2tensor[allen]` * obtains a list of section dataset ID's from Allen Institute API * downloads specified number of raw images from specified number of section datasets * produces specified number of sub-images of specified size from raw images. * simple e2e test that runs the data downloader and sub-image generator with default parameters commit af34cc6 Author: cwbeitel <me@cwbeitel-github.com> Date: Wed May 23 17:47:35 2018 +0000 deps. for allen brain img probs. via extras

cwbeitel · 2018-06-15T19:03:07Z

@rsepassi Noticed many other tests were failing outside of allen_brain* tests with a bunch of errors about Eager mode, guessing the problem for those is my test that uses Eager.

rsepassi · 2018-06-15T19:16:03Z

Where are you seeing the errors about Eager mode? I don't see them in the Travis test logs.

When you run the tests locally with pytest, do they pass?

We'll want to ensure these all pass on Travis. One idea is that maybe it's problematic running the allen tests alongside all the others and we'll want to run them separately. You can see in the .travis.yml file examples of a test being ignored in the main run and then run separately afterwards.

cwbeitel · 2018-06-15T22:25:03Z

@rsepassi The verbose output that mentions Eager doesn't show on Travis (only lots of failed tests, see https://travis-ci.org/tensorflow/tensor2tensor/jobs/392846781). It will probably fix it to just ignore allen tests in main pytest call and call these separately as you were suggesting.

Ah there we go maybe before the reason it wasn't triggering the CI on the last PR was because there was a merge conflict, fixed that here and it proceeded.

rsepassi · 2018-06-19T00:58:21Z

Looks like the allen tests are failing.

cwbeitel · 2018-06-19T16:25:02Z

@rsepassi Yeah hmm taking a look. Those pass on my local walking through the CI commands. Removed -q from pip installs and put in commands to check if pandas is getting dropped at some point or never installed but it looks like the CI is using the old version of .travis.yml.

cwbeitel · 2018-06-19T19:22:53Z

@rsepassi Those changes appear to have fixed everything except for a linter error in gym_utils, https://travis-ci.org/tensorflow/tensor2tensor/jobs/394214469#L2613,

pylint -j 2 tensor2tensor
Using config file /mnt/nfs-east1-d/work/prs/problem-def/pylintrc
************* Module tensor2tensor.data_generators.gym_utils
E1123: 31 WarmupWrapper.__init__: Unexpected keyword argument 'dtype' in constructor call [unexpected-keyword-arg]
W0221: 48 WarmupWrapper.reset: Parameters differ from overridden 'reset' method [arguments-differ]
E1123: 70 PongWrapper.__init__: Unexpected keyword argument 'dtype' in constructor call [unexpected-keyword-arg]
E1123:149 BreakoutWrapper.__init__: Unexpected keyword argument 'dtype' in constructor call [unexpected-keyword-arg]

Verified the allen_brain* files pass the linter locally.

cwbeitel · 2018-06-19T19:46:50Z

@rsepassi My bad, there was a little linter fix I made and forgot about. I can make those changes to gym_utils if that simplifies things.

cwbeitel · 2018-06-19T20:15:24Z

Hmm looks like those may not have actually been part of the problem on travis.

cwbeitel · 2018-06-21T18:43:43Z

@rsepassi Let me know if you're waiting on me for anything, otherwise no pressure!

rsepassi · 2018-06-21T22:40:16Z

Was just going through to merge it but noticed that AllenSDK has a non-permissive license which actually makes it hard to import and use the code within Google. It may still be possible but it would be a lot of work and I just don't have the cycles to go through something like that right now unfortunately.

Now I don't want your work to go to waste so here's what I'm thinking: you can maintain a fork of tensor2tensor with the Allen problems so that people can generate the data and we'd be happy to have the Problems in tensor2tensor just without any AllenSDK dependency/import and the generate_data method could print out "To generate the data for XXX go to github/cwbeitel/tensor2tensor and run ....". Or something like that.

Very sorry about this, wish I noticed it earlier. What do you think?

cwbeitel · 2018-06-21T23:27:32Z

No worries yeah that sounds fine for now, thanks for the cycles thus far. I'll make that change.

As well as look into getting permission to host the data in a requester-pays GCS bucket. The code for scraping their API seems like it will be fragile over the long-term and I think a better approach than having to maintain this is to have it work once and provide that result as a pre-processed dataset in the style of things like imagenet, celeba, etc.

* Simplify allen_brain_utils* to download smaller static list of images, avoiding allensdk dependency. * Simplify _generator given input is already in [0, 255] * _generator ignores likely non-tissue/background regions based on max intensity * Fractional in-painting step added to pre-processor, depends on inpaint_fraction property, randomly positioned square region * Random crop to target dimensions instead of downsampling leaving full resolution of target intact * Re-using same generated data for different problems that share examples (differing in terms of how these are pre-processed)

cwbeitel · 2018-07-12T23:03:03Z

@rsepassi Hey! I removed the allensdk dependency but kept the data download utils which is far simpler if maintaining a list of image IDs (instead of building these through various queries to the API). There are 100 IDs included here randomly sampled across the full set of Mouse data. Given that we're considering 64 x 64px regions of images that are like 8.5k x 15k that's enough to generate around 3M examples. If that's not enough I referenced the API docs and a gist that includes the old download code.

rsepassi · 2018-08-03T01:09:50Z

Thank you @cwbeitel and sorry for the delay. Will merge shortly.

cwbeitel · 2018-08-03T14:31:39Z

@rsepassi No worries. Thank you.

PiperOrigin-RevId: 207290554

googlebot added the cla: yes PR author has signed CLA label Jun 15, 2018

run allen tests separately on ci

d6eb9c1

cwbeitel changed the title ~~[WIP] img2img_allen_brain problem defs~~ img2img_allen_brain problem defs Jun 15, 2018

cwbeitel changed the title ~~img2img_allen_brain problem defs~~ [WIP] img2img_allen_brain problem defs Jun 15, 2018

Merge branch 'master' into allen-brain-problem-defs

2d4093e

travis debug

a9e8661

cwbeitel added 2 commits June 19, 2018 17:10

include previously excluded setup.py that defines [allen] step

0825eef

smaller e2e test, maybe addresses travis OOM

5731010

linter fixes

b294668

cwbeitel changed the title ~~[WIP] img2img_allen_brain problem defs~~ img2img_allen_brain problem defs Jun 19, 2018

cwbeitel and others added 3 commits July 12, 2018 16:03

Merge branch 'master' into allen-brain-problem-defs

4b963d7

cleanup

4eca1b7

Merge branch 'master' into allen-brain-problem-defs

1b81131

rsepassi merged commit 7967b44 into tensorflow:master Aug 3, 2018

tensorflow-copybara pushed a commit that referenced this pull request Aug 3, 2018

internal merge of PR #872

ac8f6e3

PiperOrigin-RevId: 207290554

img2img_allen_brain problem defs #872

img2img_allen_brain problem defs #872

Uh oh!

Conversation

cwbeitel commented Jun 15, 2018

Uh oh!

cwbeitel commented Jun 15, 2018

Uh oh!

rsepassi commented Jun 15, 2018

Uh oh!

cwbeitel commented Jun 15, 2018

Uh oh!

rsepassi commented Jun 19, 2018

Uh oh!

cwbeitel commented Jun 19, 2018

Uh oh!

cwbeitel commented Jun 19, 2018

Uh oh!

cwbeitel commented Jun 19, 2018

Uh oh!

cwbeitel commented Jun 19, 2018

Uh oh!

cwbeitel commented Jun 21, 2018

Uh oh!

rsepassi commented Jun 21, 2018

Uh oh!

cwbeitel commented Jun 21, 2018

Uh oh!

cwbeitel commented Jul 12, 2018

Uh oh!

rsepassi commented Aug 3, 2018

Uh oh!

cwbeitel commented Aug 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants