Release Fifth development release · thunder-project/thunder

We are pleased to announce the release of Thunder 0.5.0. This release introduces several new features, including a new framework for image registration algorithms, performance improvements for core data conversions, improved EC2 deployment, and many bug fixes. This release requires Spark 1.1.0 or later, and is compatible with the most recent Spark release, 1.3.0.

Major features

A new image registration API inside the new thunder.imgprocessing package. See the tutorial.
Significant performance improvements to the Images to Series conversion, including a Blocks object as an intermediate stage. The inverse conversion, from Series back to Images, is now supported.
Support for tiff image files as an input format has been expanded and made more robust. Multiple image volumes can now be read from a single input file via the nplanes argument in the loading functions, and files can be read from nested directory trees using the recursive=True flag.
New methods for working with mutli-level indexing on Series objects, including selectByIndex and seriesStatByIndex, see the tutorial.
Convenient new getter methods for extracting Individual records or small sets of records using bracket notation, as in Series[(x,y,z)] or Images[k].
A new serializable decorator to make it easy to save/load small objects (e.g. models) to JSON, including handling of numpy arrays. See saving/loading of RegistrationModel for an example.

Minor features

Parameter files can be loaded from a file with simple JSON schema (useful for working with covariates), using ThunderContext.loadParams
A new method ThunderContext.setAWSCredentials handles AWS credential settings in managed cluster environments (where it may not be possible to modify system config files)
An Images object can be saved to a collection of binary files using Images.saveAsBinaryImages
Data objects now have a consistent __repr__ method, displaying uniform and informative results when these objects are printed.
Images and Series objects now each offer a meanByRegions() method, which calculates a mean over one or more regions specified either by a set of indices or a mask image.
TimeSeries has a new convolve() method.
The thunder and thunder-submit executables have been modified to better expose the options available in the underlying pyspark and spark-submit Spark executable scripts.
An improved and streamlined Colorize with new colorization options.
Load data hosted by the Open Connectome Project with the loadImagesOCP method.
New example data sets available, both for local testing and on S3
New tutorials: regression, image registration, multi-level indexing

Transition guide

Some keyword parameters have been changed for consistency with the Thunder style guide naming conventions. Example are inputformat, startidx, and stopidx parameters on the ThunderContext loading methods, which are now inputFormat, startIdx, and stopIdx, respectively. We expect minimal future changes in existing method and parameter names.
The Series methods normalize() and detrend() have been moved to TimeSeries objects, which can be created by the Series.toTimeSeries() method.
The default file extension for the binary stack format is now bin instead of stack. If you need to load files with the stack extension, you can use the ext='stack' keyword argument of loadImages.
export is now a method on the ThunderContext instead of a standalone function, and now supports exporting to S3.
The loadImagesAsSeries and convertImagesToSeries methods on ThunderContext now default to shuffle=True, making use of a revised execution path that should improve performance.
The method for loading example data has been renamed from loadExampleEC2 to loadExampleS3

Deployment and development

Anaconda is now the default Python installation on EC2 deployments, as well as on our Travis server for testing.
EC2 scripts and unit tests provide quieter and prettier status outputs.
Egg files now included with official releases, so that a pip install of thunder-python can immediately be deployed on a cluster without cloning the repo and building an egg.

Contributions:

Andrew Osheroff (data getter improvements)
Ben Poole (optimized window normalization, image registration)
Jascha Swisher (images to series conversion, serializable class, tif handling, get and meanBy methods, bug fixes)
Jason Wittenbach (new series indexing functionality, regression and indexing tutorials, bug fixes)
Jeremy Freeman (image registration, EC2 deployment, exporting, colorizing, bug fixes)
Kunal Lillaney (loading from OCP)
Michael Broxton (serializable class, new series statistics, improved EC2 deployment)
Noah Young (improved EC2 deployment)
Tom Sainsbury (image filtering, PNG saving options)
Uri Laseron (submit scripts, Hadoop versioning)

Roadmap

Moving forward we will do a code freeze and cut a release every three months. The next will be June 30th.

For 0.6.0 we will focus on the following components:

A source extraction / segmentation API
New capabilities for regression and GLM model fitting
New image registration algorithms (including volumetric methods)
Latent factor and network models
Improved performance on single-core workflows
Bug fixes and performance improvements throughout

If you are interested in contributing, let us know! Check out the existing issues or join us in the chatroom.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fifth development release