Fifth development release
We are pleased to announce the release of Thunder 0.5.0. This release introduces several new features, including a new framework for image registration algorithms, performance improvements for core data conversions, improved EC2 deployment, and many bug fixes. This release requires Spark 1.1.0 or later, and is compatible with the most recent Spark release, 1.3.0.
Major features
- A new image registration API inside the new
thunder.imgprocessing
package. See the tutorial. - Significant performance improvements to the
Images
toSeries
conversion, including aBlocks
object as an intermediate stage. The inverse conversion, fromSeries
back toImages
, is now supported. - Support for tiff image files as an input format has been expanded and made more robust. Multiple image volumes can now be read from a single input file via the nplanes argument in the loading functions, and files can be read from nested directory trees using the
recursive=True
flag. - New methods for working with mutli-level indexing on
Series
objects, includingselectByIndex
andseriesStatByIndex
, see the tutorial. - Convenient new getter methods for extracting Individual records or small sets of records using bracket notation, as in
Series[(x,y,z)]
orImages[k]
. - A new
serializable
decorator to make it easy to save/load small objects (e.g. models) to JSON, including handling of numpy arrays. See saving/loading ofRegistrationModel
for an example.
Minor features
- Parameter files can be loaded from a file with simple JSON schema (useful for working with covariates), using
ThunderContext.loadParams
- A new method
ThunderContext.setAWSCredentials
handles AWS credential settings in managed cluster environments (where it may not be possible to modify system config files) - An Images object can be saved to a collection of binary files using
Images.saveAsBinaryImages
- Data objects now have a consistent
__repr__
method, displaying uniform and informative results when these objects are printed. - Images and Series objects now each offer a
meanByRegions()
method, which calculates a mean over one or more regions specified either by a set of indices or a mask image. - TimeSeries has a new
convolve()
method. - The
thunder
andthunder-submit
executables have been modified to better expose the options available in the underlyingpyspark
andspark-submit
Spark executable scripts. - An improved and streamlined
Colorize
with new colorization options. - Load data hosted by the Open Connectome Project with the
loadImagesOCP
method. - New example data sets available, both for local testing and on S3
- New tutorials: regression, image registration, multi-level indexing
Transition guide
- Some keyword parameters have been changed for consistency with the Thunder style guide naming conventions. Example are
inputformat
,startidx
, andstopidx
parameters on the ThunderContext loading methods, which are nowinputFormat
,startIdx
, andstopIdx
, respectively. We expect minimal future changes in existing method and parameter names. - The Series methods
normalize()
anddetrend()
have been moved to TimeSeries objects, which can be created by theSeries.toTimeSeries()
method. - The default file extension for the binary
stack
format is nowbin
instead ofstack
. If you need to load files with thestack
extension, you can use theext='stack'
keyword argument ofloadImages
. export
is now a method on theThunderContext
instead of a standalone function, and now supports exporting to S3.- The
loadImagesAsSeries
andconvertImagesToSeries
methods onThunderContext
now default toshuffle=True
, making use of a revised execution path that should improve performance. - The method for loading example data has been renamed from
loadExampleEC2
toloadExampleS3
Deployment and development
- Anaconda is now the default Python installation on EC2 deployments, as well as on our Travis server for testing.
- EC2 scripts and unit tests provide quieter and prettier status outputs.
- Egg files now included with official releases, so that a pip install of thunder-python can immediately be deployed on a cluster without cloning the repo and building an egg.
Contributions:
- Andrew Osheroff (data getter improvements)
- Ben Poole (optimized window normalization, image registration)
- Jascha Swisher (images to series conversion, serializable class, tif handling, get and meanBy methods, bug fixes)
- Jason Wittenbach (new series indexing functionality, regression and indexing tutorials, bug fixes)
- Jeremy Freeman (image registration, EC2 deployment, exporting, colorizing, bug fixes)
- Kunal Lillaney (loading from OCP)
- Michael Broxton (serializable class, new series statistics, improved EC2 deployment)
- Noah Young (improved EC2 deployment)
- Tom Sainsbury (image filtering, PNG saving options)
- Uri Laseron (submit scripts, Hadoop versioning)
Roadmap
Moving forward we will do a code freeze and cut a release every three months. The next will be June 30th.
For 0.6.0 we will focus on the following components:
- A source extraction / segmentation API
- New capabilities for regression and GLM model fitting
- New image registration algorithms (including volumetric methods)
- Latent factor and network models
- Improved performance on single-core workflows
- Bug fixes and performance improvements throughout
If you are interested in contributing, let us know! Check out the existing issues or join us in the chatroom.