additional transforms or trainers to project machine learning
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
data
docs
machinelearning @ adeb27b
machinelearningext
.gitignore
.gitmodules
.local.jenkins.lin.yml
.local.jenkins.win.yml
.travis.yml
LICENSE
README.md
appveyor.yml
azure-pipelines.yml
build.cmd
build.sh
build_doc.bat
conf.dox
copy_binaries.py

README.md

Custom Extensions to ML.net

This project proposes some extension to machinelearning written in C#. Work in progress.

TravisCI Build status CircleCI

TravisCI Build status CircleCI

Build

On windows: build.cmd or build ml to force rebuilding machinelearning.

On Linux: build.sh.

The documentation can be build with: doxygen conf.dox.

Documentation

Example 1: Inner API

This example relies on the inner API, mostly used inside components of ML.net.

var env = new TlcEnvironment();
var iris = "iris.txt";

// We read the text data and create a dataframe / dataview.
var df = DataFrameIO.ReadCsv(iris, sep: '\t',
                             dtypes: new DataKind?[] { DataKind.R4 });

// We add a transform to concatenate two features in one vector columns.
var conc = env.CreateTransform("Concat{col=Feature:Sepal_length,Sepal_width}", df);

// We create training data by mapping roles to columns.
var trainingData = env.CreateExamples(conc, "Feature", label: "Label");

// We create a trainer, here a One Versus Rest with a logistic regression as inner model.
var trainer = env.CreateTrainer("ova{p=lr}");

using (var ch = env.Start("test"))
{
    // We train the model.
    var pred = trainer.Train(env, ch, trainingData);

    // We compute the prediction (here with the same training data but it should not be the same).
    var scorer = ScoreUtils.GetScorer(pred, trainingData, env, null);

    // We store the predictions on a file.
    DataFrame.ViewToCsv(scorer, "iris_predictions.txt", host: env);

    // Or we could put the predictions into a dataframe.
    var dfout = DataFrameIO.ReadView(scorer);

    // And access one value...
    var v = dfout.iloc[0, 7];
    Console.WriteLine("PredictedLabel: {0}", v);
}

The current interface of DataFrame is not rich. It will improve in the future.

Example 2: Inner API like Scikit-Learn

This is the same example but with a ScikitPipeline which looks like scikit-learn.

var env = new TlcEnvironment();
var iris = "iris.txt";

// We read the text data and create a dataframe / dataview.
var df = DataFrameIO.ReadCsv(iris, sep: '\t',
                             dtypes: new DataKind?[] { DataKind.R4 });

var pipe = new ScikitPipeline(new[] { "Concat{col=Feature:Sepal_length,Sepal_width}" }, "ova{p=lr}");
pipe.Train(df, feature: "Feature", label: "Label");

var scorer = pipe.Predict(df);

var dfout = DataFrameIO.ReadView(scorer);

// And access one value...
var v = dfout.iloc[0, 7];
Console.WriteLine("PredictedLabel: {0}", v);

Example 3: DataFrame in C#

The class DataFrame replicates some functionalities datascientist are used to in others languages such as Python or R. It is possible to do basic operations on columns:

var text = "AA,BB,CC\n0,1,text\n1,1.1,text2";
var df = DataFrameIO.ReadStr(text);
df["AA+BB"] = df["AA"] + df["BB"];
Console.WriteLine(df.ToString());
AA,BB,CC,AA+BB
0,1,text,1
1,1.1,text2,2.1

Or:

df["AA2"] = df["AA"] + 10;
Console.WriteLine(df.ToString());
AA,BB,CC,AA+BB,AA2
0,1,text,1,10
1,1.1,text2,2.1,11

The next instructions change one value based on a condition.

df.loc[df["AA"].Filter<DvInt4>(c => (int)c == 1), "CC"] = "changed";
Console.WriteLine(df.ToString());
AA,BB,CC,AA+BB,AA2
0,1,text,1,10
1,1.1,changed,2.1,11

A specific set of columns or rows can be extracted:

var view = df[df.ALL, new [] {"AA", "CC"}];
Console.WriteLine(view.ToString());
AA,CC
0,text
1,changed

The dataframe also allows basic filtering:

var view = df[df["AA"] == 0];
Console.WriteLine(view.ToString());
AA,BB,CC,AA+BB,AA2
0,1,text,1,10