Refactor (WIP) #237

freeman-lab · 2015-11-15T22:32:23Z

This is a huge refactoring of Thunder, and will the basis of an upcoming new release. We'd normally break it up into multiple PRs, but this touches so much of the code base that it was easier to do all at once.

There are three primary goals, based on a year of community experience and feedback, and consideration of the current ecosystem:

Loosen the dependency on Spark. This is a big one. Many superficial issues, including installation issues, complexity for new users and contributors, etc are due to Thunder's hard dependence on Spark. We will definitely continue to support Spark, we also want to enable work seamlessly across local and distributed environments, and against a variety of execution engines, including Spark but also new libraries like Dask. This PR begins that effort through some fundemental but neccessary refactoring.
Modularize the components. Thunder has started absorbing a wide variety of algorithms / analyses, especially with recent additions to image registration and spatiotemporal source extraction. These components are at different levels of maturity and specificity, and are better off as pluggable, composable pieces living in separate repos.
Modernize the codebase, and make more friendly to the Python ecosystem, in particular by ensuring Python 3 compatibility, using py.test for unit tests, and Pythonic naming conventions.

refactoring

new packages (inside thunder-project)

rime - source extraction
sleet - image registration
thundercloud - manage cluster on ec2

new packages (external)

station - context manager for distributed backends
checkist - minimal argument checking
showit - simple display of images and tiled images
serdeme - custom class serialization/deserialization

d-v-b · 2015-11-15T23:00:47Z

Looks great so far. I'm curious to hear more about how Thunder will abstract over parallelization engines.

Regarding the new packages, would usage be something like from thunder.sleet import fancy_registration?

freeman-lab · 2015-11-15T23:29:51Z

Thanks @d-v-b !

For engine switching, there's a new user-facing global context for switching between backends, and internally we condition on its state (it mainly only matters during loading).

For example, you'll be able to load using spark with

import thunder
thunder.setup(spark=True)
data = thunder.series.fromBinary('path')

and load locally with

import thunder
thunder.setup()
data = thunder.series.fromBinary('path')

or because local is the default you can just do

import thunder
data = thunder.series.fromBinary('path')

In all cases the returned object data will (eventually) know how to talk local or distributed. As of now only the Spark paths will work, but the scaffolding is there to support others.

freeman-lab · 2015-11-15T23:31:35Z

Oh and I haven't decided yet about the external libraries, could alias them as you suggest in which case thunder acts a meta-package (on top of its core functionality), or just do manual imports as in

from rime.algorithms import NMF
from sleet.algorithms import CrossCorr

open to suggestions here!

d-v-b · 2015-11-16T00:01:22Z

Regarding the external libraries, I'm not sure what would be best. If you envision people using the registration and source extraction methods outside of Thunder, then separate libs makes sense. Is this your intention?

In the other case, I don't see a big difference between
from thunder.registration import registration_method
and
from thunder.sleet.algorithms import registration_method

freeman-lab · 2015-11-16T01:33:07Z

@d-v-b we'll get the "using it outside Thunder" thing regardless, that's the nice thing about making them separate, assuming we structure them correctly 👍 the aliasing is just sugar in case people forget the names. I'll look at how some other projects do it...

freeman-lab · 2015-11-27T20:00:17Z

@d-v-b awesome suggestion! Didn't realize it was included with skimage, that's perfect. Just made the change for reading, seems to work great. For writing, how do you think we should handle dimensions and bit-depth? After playing with the options, here's a rough proposal:

assume all channels are luminance channels, so an (x, y) image will be saved as a single grayscale image and an (x, y, n) image will always be n grayscale image pages, as opposed to treating it as RGB or RGBA in the special case of n == 3 or n == 4 (i'd rather just add proper support for color channels down the road)
for both tif and png, make bit-depth a parameter (probably 8 for png, and 8/16/32 for tif), scale the values between a min and a max (an optional parameter), and convert to the appropriate numerical type (uint8, uint16, uint32) before writing.

freeman-lab · 2015-11-27T20:05:52Z

And while I'm renaming... the current plan is as follows:

for two-word names where the first word is four characters or less, make it one word, e.g.

thunder.images.frompng
thunder.series.fromarray
data.topng
data.tobinary

for names where the first word is longer than four characters, or there are more than two words, use snake case, e.g.

data.apply_values
data.filter_keys
data.group_by_window

Exceptions would only be to ensure consistency with closely associated Python libraries (e.g. numpy).

Let me know if anyone disagrees!

d-v-b · 2015-11-27T20:38:24Z

@freeman-lab
I agree that 3d arrays should be saved as stacks. How would a user save a 3D array as an RGB image under this regime? Perhaps by making a 4D array with dims = [t, z, y, x] where z is singleton? It's not critical -- in the worst case, the user can fall back to a custom implementation of tifffile.py.

Regarding bit-depth and data type, we should be sure to allow everything that tifs can hold (or everything fiji can read...) -- I think tifs can contain 32-bit floats, and for fractional data (e.g. dff timeseries) this is pretty critical. Also I'm fairly sure signed integer types are allowed, so the may need to be nuanced. Numpy allows arrays with dtype float16 and float64, neither of which can be read by fiji, although iirc I've saved tifs with these data before.

Perhaps for recasting data, there could be some kwargs to specify how this should be done, e.g.
data.totif(fname, dtype='uint16', rescale=True) if the user wants the data to be cast as int and rescaled to fill the bit depth. Rescale should probably default to False since the default, non-rescaling casting operation is what users should expect. Sorry to be predictably pedantic about dtypes 😄

freeman-lab · 2015-11-27T20:51:21Z

Thanks @d-v-b , I was counting on your pedantry 👍

re: RGB, we could for now just add a rgb=True flag that treats 3D images as RGB
re: bit-depth, i like the rescale=True idea, we can put off deciding the default but I'm fine with making it false for now (which puts the oness on the user to do the right thing). The only thing will be the png handling will be different from tif, because as far as I can tell the most sane, dependency-free png saving method is scipy.misc.imsave, which itself does scaling, so there's no way to skip it.

d-v-b · 2015-11-28T18:57:49Z

@freeman-lab,
I completely support the renaming! Just expect confusion from lots of users who aren't active on github / gitter after the update...

freeman-lab · 2015-11-28T19:36:30Z

@d-v-b 👍 careful documentation and explanation of changes (esp. breaking ones!) will be a high priority as soon as this is done

auvipy · 2015-12-08T07:17:00Z

thunder and scikit-image integration?

freeman-lab · 2016-01-08T00:20:16Z

closing in favor of #243

Starting refactor (WIP)

f49769a

freeman-lab changed the title ~~Starting refactor (WIP)~~ Refactor (WIP) Nov 15, 2015

freeman-lab added 23 commits November 25, 2015 01:34

Rewriting tests (WIP)

a16380b

Add timeseries tests

de530d3

Update travis

b57df1a

Move version

9fc5ed6

Move version again

1d0e9a6

Remove package

b4c6711

Move tests

63b795b

Add cd

054a9d5

Add pytest to conda

87a6730

Add cd back

b3662d8

Move tests again

d0d8d20

Reorganize config

b00dc61

Debug conftest

52ea283

Fix missing spark paths

f9bcd4c

Add data tests

ac0a7d5

Test cleanup

342b1e4

Standardize use of keys

d5cb184

Add images tests

ca3f0d8

Clean up

d548547

Add tests for readers

25c1487

Add block tests

e226672

Add key and shape tests

c288709

Add image io tests

7e8498c

Remove uneccessary pil array code

7627211

freeman-lab added 2 commits November 27, 2015 16:07

Continue renaming

15cca2e

More renaming

e931cd7

freeman-lab added 15 commits December 1, 2015 21:29

Use Spark 2.4 on travis

3262b2b

Add tests for example loading

6d6020e

Fix references to Spark version

8929b2e

Fix requirements

1d1bcb6

Name changes

034bbc8

Use both to predownload series test data

76b16af

Clean up example loading

8e7d574

More renaming (almost done!)

46c3c00

Clean up

a2a6933

Doc tweaks

40b134f

More naming updates

efa7d09

Move base read/write modules

c2450be

Restore series to images methods

cde559d

Travis updates to reflect requirements

374eb7c

Formatting cleanup

25e2fad

This was referenced Dec 10, 2015

Support for Python 3.4 #209

Closed

Support compressed tifs for read/write #239

Open

freeman-lab closed this Jan 8, 2016

freeman-lab deleted the refactor branch April 5, 2016 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor (WIP) #237

Refactor (WIP) #237

freeman-lab commented Nov 15, 2015

d-v-b commented Nov 15, 2015

freeman-lab commented Nov 15, 2015

freeman-lab commented Nov 15, 2015

d-v-b commented Nov 16, 2015

freeman-lab commented Nov 16, 2015

freeman-lab commented Nov 27, 2015

freeman-lab commented Nov 27, 2015

d-v-b commented Nov 27, 2015

freeman-lab commented Nov 27, 2015

d-v-b commented Nov 28, 2015

freeman-lab commented Nov 28, 2015

auvipy commented Dec 8, 2015

freeman-lab commented Jan 8, 2016

Refactor (WIP) #237

Refactor (WIP) #237

Conversation

freeman-lab commented Nov 15, 2015

d-v-b commented Nov 15, 2015

freeman-lab commented Nov 15, 2015

freeman-lab commented Nov 15, 2015

d-v-b commented Nov 16, 2015

freeman-lab commented Nov 16, 2015

freeman-lab commented Nov 27, 2015

freeman-lab commented Nov 27, 2015

d-v-b commented Nov 27, 2015

freeman-lab commented Nov 27, 2015

d-v-b commented Nov 28, 2015

freeman-lab commented Nov 28, 2015

auvipy commented Dec 8, 2015

freeman-lab commented Jan 8, 2016