Skip to content

0.7.4 Release notes

Daniel Smith edited this page Jan 20, 2017 · 1 revision

#General remarks TAP 0.7.4 is a minor release to versions 0.7.0/0.7.1/0.7.2/0.7.3.

TAP 0.7.4 adds new features to 0.7.3.

All new features are supported for spark-tk only. TAP 0.7.4 no longer includes support for the Analytics Toolkit (ATK).

It is recommended to switch from Analytics Toolkit to spark-tk to take advantage of the new features.

It is not mandatory to upgrade to 0.7.4 from 0.7.0/0.7.1/0.7.2/0.7.3.

All GitHub repositories included in this TAP release were tagged with a 0.7.4 tag.

#New features

##New model training features TAP version 0.7.4 introduces support for best model/hyperparameter selection. This support reduces the manual, time-consuming process of determining which algorithm and parameters are best suited to solve a given problem. This first implementation supports:
Classification algorithms only: logistic regression, SVM, Naïve-Bayes, and random forest Performance measured only by the accuracy metric K-fold cross validation

In future releases, we will expand the coverage of algorithms and performance metrics. More information is at https://github.com/trustedanalytics/spark-tk.

##DAAL-tk library This release includes an initial version of the DAAL-tk library. The DAAL-tk library, based on Intel’s DAAL (Data Analytics Acceleration Library) work, provides high-performance algorithms for creating and manipulating frames of data with the spark-tk library. More information is at https://github.com/trustedanalytics/daal-tk.

##DICOM support This release adds distributed DICOM image processing support to the spark-tk library, with handling for images of different sizes. Initially, only uncompressed MRI DICOM images are supported. DICOM support also performs performance optimizations for feature engineering (like PCA). DICOM support has been scale-tested. You can filter images based on metadata and can write custom filters to meet your specific use cases. More information is at https://github.com/trustedanalytics/spark-tk/blob/master/python/sparktk/dicom/dicom.py.

##OrientDB supported in Jupyter notebooks Access to OrientDB is now supported in Jupyter notebooks. From your Jupyter notebook, you can import/export graphs to/from OrientDB using a simple API. See more at 0.7.4-OrientDB-in-Jupyter.

##New spark-tk algorithms Single-source shortest path graph algorithm Betweenness centrality graph algorithm Highly-scalable random forest Cox proportional hazards model See the spark-tk reference information for details on these algorithms, at http://trustedanalytics.github.io/spark-tk/.

##New Scoring Engine support for revised models The Scoring Engine now allows a revised model of the same type and using the same I/O parameters to be seamlessly updated, without needing to redeploy the Scoring Engine. This allows you to focus more on analysis and less on process. This feature also supports forcing the use of a revised model that may be incompatible with the previous revision. Details are provided at https://github.com/trustedanalytics/scoring-engine.

##End-to-end Spark streaming and scoring use case example A new end-to-end Spark streaming and scoring example is available that predicts home energy usage over a 48-hour window, based on current weather forecast data. Download and work through this use case example at https://github.com/trustedanalytics/wenergy-demo.

#Upgrade information You can use this release to install a fresh instance of TAP 0.7.4 or to upgrade an existing instance of TAP 0.7.3. (If you have an earlier version of TAP than 0.7.3, you must first upgrade to 0.7.3.)

Note: This release cannot be used with TAP versions earlier than 0.7.0.

##Fresh Installation Information To install a new TAP 0.7.4 instance, follow these instructions:

##Upgrade Instructions To apply the TAP 0.7.4 upgrade follow the instructions at 0.7.4-upgrade-procedure.

#Applications Changed

Application name TAP 0.7.3 TAP 0.7.4
atk 0.7.3 (no longer supported)
spark-tk 0.7.3 0.7.4
scoring-engine-for-spark-tk 0.7.3 0.7.4
jupyter 0.7.3 0.7.4

#Fixed Problems and Issues

Description
Created troubleshooting note for connecting spark-tk to databases with JDCB drivers.
Fixed logsearch boshrelease version.
Fixed failure when applying PCA to rectangular dimensional Dicom images.

#Known Issues and Limitations

None

Clone this wiki locally