Skip to content

Third development release

Compare
Choose a tag to compare
@freeman-lab freeman-lab released this 23 Aug 22:04
· 1743 commits to master since this release

This update adds new functionality for loading data, alongside changes to the API for loading, and a variety of smaller bug fixes.

API changes

  • All data loading is performed through the new Thunder Context, a thin wrapper for a Spark Context. This context is automatically created when starting thunder, and has methods for loading data from different input sources.
  • tsc.loadText behaves identically to the load from previous versions.
  • Example data sets can now be loaded from tsc.makeExample, tsc.loadExample, and tsc.loadExampleEC2.
  • Output of the pack operation now preserves xy definition, but outputs will be transposed relative to previous versions.

New features

  • Include design matrix with example data set on EC2
  • Faster nmf implementation by changing update equation order (#15)
  • Support for loading local MAT files into RDDs through tsc.loadMatLocal
  • Preliminary support for loading binary files from HDFS using tsc.loadBinary (depends on features currently only available in Spark's master branch)

Bug fixes

  • Used pillow instead of PIL (#11)
  • Fixed important typo in documentation page (#18)
  • Fixed sorting bug in local correlations