Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple time points per image file #89

Merged
merged 14 commits into from Feb 12, 2015

Conversation

industrial-sloth
Copy link
Contributor

This PR adds an nplanes option to the main Images-loading methods. If nplanes is specified, then a single input file will be interpreted as containing multiple image volumes, each with size nplanes in its final dimension. For instance, a single binary stack file loaded with arguments dims=(x, y, 8), nplanes=2 would turn into 4 separate records in an Images RDD, each with size (x, y, 2). In general, images that are loaded with z planes and a positive nplanes argument will result in z / nplanes time points, each with nplanes planes.

Initially with nplanes specified, keys for image rdds were (fileidx,
pointInFile) integer pairs. However this causes problems downstream
in blocks conversion. This commit changes that back to straight integer
indices. The downside is that for tifs, the entire tif must be loaded
into memory in order to count the number of planes, whereas previously
this could be pipelined lazily.
also fixes a bug with Images properties not being set correctly
when nplanes is passed
@industrial-sloth
Copy link
Contributor Author

Meh. This has some overlap with PR #87, which changes the expected dimensions of images with only a single z-plane from (x, y, 1) to just (x, y). I've written this code anticipating this new behavior, but forgot to update all the tests accordingly. Will pull in the relevant changes from PR #87 and put those in a new commit here as well to get the tests passing again.

This is consistent with the new behavior defined in PR 87, in the
tifs-and-tif-stacks branch. New imagesloader code in this branch
already anticipates this new behavior, this commit just updates
seriesloader for consistency with imagesloader and changes test
expectations accordingly.
@freeman-lab
Copy link
Member

@industrial-sloth you were right, merging the other tif PR created a conflict with this one.

Conflicts:
	python/test/test_context.py
	python/test/test_imagesloader.py
	python/thunder/rdds/fileio/imagesloader.py
	python/thunder/utils/context.py
@industrial-sloth
Copy link
Contributor Author

Whoops, missed changing some function names in the merge, fixing.

@@ -75,34 +76,82 @@ def fromStack(self, dataPath, dims, dtype='int16', ext='stack', startIdx=None, s
If true, will recursively descend directories rooted at datapath, loading all files in the tree that
have an extension matching 'ext'. Recursive loading is currently only implemented for local filesystems
(not s3).

nplanes: positive integer, default None
If passed, will cause a single binary stack file to be subdivided into multiple time points. Every
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid the explicit references to time. I just checked and we're generally good about this, though in fixing this would be great if you caught other users. There's certainly a notion of the images being /ordered/ both here and elsewhere, but that ordering need not relate to time specifically.

@freeman-lab
Copy link
Member

@industrial-sloth this is looking good, left a couple minor notes. One other question: how does the ordering work? Am I right in assuming that, given three (X,Y,Z) image files, call them 1, 2, 3, and setting nplanes=Z/2, you would end up with an ordering of 1_1, 1_2, 2_1, 2_2, 3_1, 3_2, where 1_1 and 1_2 are the first and second chunks of the first image, etc.. In other words, it'll be ordered first by the image files, then by the chunks. That seems like what we want, just making sure.

@industrial-sloth
Copy link
Contributor Author

@freeman-lab yes, that's the behavior. For concreteness, if we had N 10-page tif files and set nplanes=5, the first five pages of the first tif file would end up as an Images record with key 0. The second five pages of the first file would be the record with key 1, the first five pages of the second file would be record 2, and so on. (I'm pretty sure we're saying the same thing here, but just wanted to check. :)

Previously docs referred to 'time points', now describe more
general 'records'.
@industrial-sloth
Copy link
Contributor Author

Ok, I've changed the docstrings to talk about generic "records" rather than "time points". Looks like there's a merge conflict to resolve, will fix that momentarily.

@industrial-sloth
Copy link
Contributor Author

Have added checks that nplanes does evenly divide the number of pages / planes in an image file to the tif and binary stack readers. A slight oddity - for the tif reader, it looks like the most natural home for this check is on the executors, not the driver, since the driver doesn't know how many pages a tif file has got. At the moment then the executors throw ValueError if nplanes doesn't divide the total page count evenly. This leads to a somewhat weird looking stack dump, but is still interpretable? I think?

The flat binary stack file case is just fine, since there the driver knows what the dimensions are ahead of time and can do the check before launching jobs.

@freeman-lab
Copy link
Member

Thanks @industrial-sloth , agreed that doing the check on the executors for the tifs is kind of the only option for now, short of some convoluted process by which we load one to check, and then proceed if it's ok. I think this is fine, will merge this version in.

freeman-lab added a commit that referenced this pull request Feb 12, 2015
support multiple time points per image file
@freeman-lab freeman-lab merged commit d2b16c0 into thunder-project:master Feb 12, 2015
@industrial-sloth industrial-sloth deleted the megastacks branch February 12, 2015 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants