Examples of WhizzML scripts
Shell Makefile Python
Permalink
Failed to load latest commit information.
anomaly-benchmarking Expanded description in metadata, added catch for improper freq value Nov 18, 2016
anomaly-shift Unneeded calls to get and get-in removed Jul 21, 2016
best-first best-first: default for feature number Jul 26, 2016
best-k nits picked and installation instructions expanded Oct 18, 2016
boruta Fixing bugs: boruta script checks and random-candidate-ratio goes higher Jul 29, 2016
clean-dataset nits picked and installation instructions expanded Oct 18, 2016
cluster-classification Added test.sh to cluster-classification Sep 7, 2016
covariate-shift covariate-shift: metadata fix Jul 28, 2016
cross-validation Changing bias in LR cross-validation to boolean Nov 27, 2016
deduplicate tests: Mac-friendly grep usage Jan 24, 2017
find-neighbors Unneeded calls to get and get-in removed Jul 21, 2016
gradient-boosting gradient-boosting: wee refactoring Jul 22, 2016
items-to-features Little fixes for model-per-cluster and items-to-features Oct 13, 2016
model-or-ensemble model-or-ensemble: simplifications Aug 10, 2016
model-per-category model-per-category: metadata typos Nov 28, 2016
model-per-cluster nits picked and installation instructions expanded Oct 18, 2016
remove-anomalies replace spurious 'resource' tag in a few scripts' metadata.json with … Sep 19, 2016
repair-missing Fix doc just a tiny bit more Sep 26, 2016
seeded-best-k Added pkg installation instructions to readmes Oct 17, 2016
sliding-window tests: Mac-friendly grep usage Jan 24, 2017
smacdown-branin Fixes for smacdown-branin May 18, 2016
smacdown-ensemble Remove debug println Nov 10, 2016
stacked-generalization stacked-generalization: updated to use newer WhizzML idioms Nov 24, 2016
stratified-sampling stratified-sampling: fixed typo and test Feb 21, 2017
tutorials Fix for filter-dataset-population Dec 21, 2016
unify-optype unify-optype: fixing test Feb 4, 2017
.gitignore model-per-category: evaluation tests Nov 28, 2016
license Initial examples May 15, 2016
makefile Minimal test framework and makefile Jul 28, 2016
metadata.json stratified-sampling: added to top-level metadata, readme Feb 21, 2017
readme.md stratified-sampling: more nits, formatting Feb 21, 2017
test-utils.sh Minimal test framework and makefile Jul 28, 2016

readme.md

Examples of WhizzML scripts and libraries

WhizzML

Each script or library is in a directory in this folder. For each one you will always find a readme explaining what's its purpose and usage, the actual whizzml code in a .whizzml file, and the JSON metadata needed to create BigML resources.

By convention, when the artifact is a library, the files are called library.whizzml and metadata.json, while for a script we use script.whizzml and metadata.json.

Examples

  • covariate-shift Determine if there is a shift in data distribution between two datasets.
  • model-or-ensemble Decide whether to use models or ensembles for predictions, given an input source.
  • deduplicate Removes contiguos duplicate rows of a dataset, where "duplicate" means a concrete field's value is the same for a set of contiguous rows.
  • remove-anomalies Using Flatline and an anomaly detector, remove from an input dataset its anomalous rows.
  • smacdown-branin Simple example of SMACdown, using the Branin function as evaluator.
  • smacdown-ensemble Use SMACdown to discover the best possible ensemble to model a given dataset id.
  • find-neighbors Using cluster distances as a metric, find instances in a dataset close to a given row.
  • stacked-generalization Simple stacking using decision tree, ensembles and logistic regression.
  • best-first Feature selection using a greedy algorithm.
  • gradient-boosting Boosting algorithm using gradient descent.
  • model-per-cluster Scripts and library to model data after clustering and make predictions using the resulting per-cluster model.
  • model-per-category Scripts to model and predict from an input dataset with a hand-picked root node field.
  • best-k Scripts and library implementing Pham-Dimov-Nguyen algorithm for choosing the best k in k-means clusters.
  • seeded-best-k Scripts and library implementing Pham-Dimov-Nguyen algorithm for choosing the best k in k-means clusters, with user-provided seeds.
  • anomaly-shift Calculate the average anomaly between two given datasets.
  • cross-validation Scripts for performing k-fold crossvalidation.
  • clean-dataset Scripts and library for cleaning up a dataset.
  • boruta Script for feature selection using the Boruta algorithm.
  • cluster-classification Script that determines which input fields are most important for differentiating between clusters.
  • anomaly-benchmarking Script that takes any dataset (classification or regression) and turns it into a binary classification problem with the two classes "normal" and "anomalous".
  • sliding-window Script that extends a dataset with new fields containing row-shifted values from numeric fields. For casting time series forecasting as a supervised learning problem.
  • unify-optype Script that matches the field optypes to a given dataset
  • stratified-sampling Script that implements the stratified sampling technique

How to install

There are three kinds of installable WhizzML artifacts in this repo, identified by the field "kind" in their metadata: libraries, scripts and packages. The latter are compounds of libraries and scripts, possibly with interdependencies, meant to be installed together.

Libraries and scripts are easily installed at the BigML dashboard. To install a script, navigate to 'Scripts' and then hover over the installation dropdown. Choose 'Import script from GitHub' and paste in the url to the example's folder. To install a library, first navigate to 'Libraries', and the rest of the process is the same.

Packages can be installed in either of the following ways:

Using bigmler

If you have bigmler installed in your system, just checkout the repository 'whizzml/examples' and, at its top level, issue the command:

    make compile PKG=example-name

replacing example-name with the actual example name. That will create all of the example's scripts and libraries for you.

Using the web UI

  • Install each of the libraries seperately, using the urls to each of their folders. (For example, https://github.com/whizzml/examples/tree/master/clean-dataset/clean-data)

  • Install each of the scripts seperately, using the urls to each of their folders.

  • If a script requires a library, you will get the error message 'Library ../my-library not loaded.' Load the library by clicking in the textbox above the error message and typing the first few letters of the library's name. Select the library, then create the script as usual.

Compiling packages and running tests

The makefile at the top level provides targets to register packages and run tests (when they're available). It needs a working installation of bigmler. Just type

make help

for a list of possibilities, including:

  • tests to run all available test scripts (which live in the test subdirectory of some packages), which typically use bigmler.

  • compile to use bigmler to register in BigML the resources associated with one or more packages in the repository.

  • clean to delete resources and outputs (both remote and local) created by compile.

  • distcheck combines most of the above to check that all the scripts in the repository are working: this target should build cleanly before merging into

The verbosity of the tests output can be controlled with the variable VERBOSITY, which runs from 0 (the default, mostly silent) to 2. E.g.:

make tests VERBOSITY=1

If you write your own test scripts, include test-utils.sh for shared utilities.