Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added map_as_series #256

Merged
merged 4 commits into from Mar 7, 2016
Merged

Conversation

jwittenbach
Copy link
Contributor

Adds a Image.map_as_series method that uses Blocks to apply a function to each series in an Images object and then turn the data back into an Images object -- avoids needing to transform the data all the way to a Series representation, which can be quite expensive to turn back into Images due to the high level of fragmentation that can occur when the total size of the spatial dimensions greatly outnumbers the size of the temporal dimension.

@d-v-b
Copy link
Contributor

d-v-b commented Mar 7, 2016

This sounds great, any estimate on the performance boost over Images -> Series -> Images?

@jwittenbach
Copy link
Contributor Author

@d-v-b from my anecdotal experience, it can be the difference between the job completely failing and being able to run to completion!

We decided to go this route rather than implementing a stand-alone thunder-movie package.

@freeman-lab
Copy link
Member

Would be really cool to report at least one benchmark alongside this change, just pick some fairly big representative workflow and time the two methods. though obviously not so big that the older method fails.

@jwittenbach
Copy link
Contributor Author

tests pass...merging

I'll get back with some stats on how this performs compared to Images.toseries().map(f).toimages()

jwittenbach added a commit that referenced this pull request Mar 7, 2016
@jwittenbach jwittenbach merged commit 8ba4720 into thunder-project:1.0.0 Mar 7, 2016
@jwittenbach
Copy link
Contributor Author

@d-v-b @freeman-lab @sofroniewn

Preliminary analysis using a 20 node cluster (19 workers): https://gist.github.com/jwittenbach/dca311743395d904c3d7

The last cell is still running...25 minutes later

@jwittenbach
Copy link
Contributor Author

Also bumped number of nodes up to 40 (39 workers). The new map_as_series approach shows a linear speed up (2x nodes => 2x speed). The old approach still stalls out and never completes.

@sofroniewn
Copy link

Nice, that's great!

@d-v-b
Copy link
Contributor

d-v-b commented Mar 8, 2016

Nice, I hope we can kiss those hanging stages goodbye

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants