# Support Vector Machines

Support vector machines (SVMs) let us predict categories. This exercise will demonstrate a simple support vector machine that can predict a category from a small number of features.

Our problem is that we want to be able to categorise which type of tree an new specimen belongs to. To do this, we will use features of three different types of trees to train an SVM.

## Step 1 - Import NuGet packages

Necessary NuGet packages can easily be imported to use it in a Jupyter Notebook using the following code. In this case we will need Microsot.ML and Xplot.Plotly for the graphics.

In [1]:
// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML,1.4"
    
//Install XPlot package
#r "nuget:XPlot.Plotly,2.0.0"

using Microsoft.ML;
using Microsoft.ML.Data;
using XPlot.Plotly;

## Step 2 - Storing and Visualizing the data

First of all, you need to create a class suited to store the information you have. Once this is done, you can load structured information into this class and verify  it by showing the schema and some rows of the DataView.

In [2]:
public class TreeInput
{
    [LoadColumn(0)]
    [ColumnName("LeafWidth")]
    public float LeafWidth { get; set; } 

    [LoadColumn(1)]
    [ColumnName("LeafLength")]
    public float LeafLength { get; set; }

    [LoadColumn(2)]
    [ColumnName("TrunkGirth")]
    public float TrunkGirth { get; set; }

    [LoadColumn(3)]
    [ColumnName("TrunkHeight")]
    public float TrunkHeight { get; set; }

    [LoadColumn(4)]
    [ColumnName("Label")]
    public bool Label { get; set; }
}

public class TreeOutput
{
    [ColumnName("PredictedLabel")]
    public bool TreeType { get; set; }
    public float Score { get; set; }
    public float Probability { get; set; }
}

First we need to define the path of the data file that we are going to use in the exercises.

In [3]:
string TrainDataPath = "./Data/trees.txt";

The next step is to add the following code the create the MLContext and the TextLoader to read the training data, as done in the previous excercises.

In [4]:
MLContext mlContext = new MLContext(seed:0);
IDataView dataView = mlContext.Data.LoadFromTextFile<TreeInput>(path: TrainDataPath, hasHeader: true, separatorChar:'\t');

display(dataView.Schema);

index,Name,Index,IsHidden,Type,Annotations
0,LeafWidth,0,False,{ Microsoft.ML.Data.NumberDataViewType: RawType: System.Single },{ Microsoft.ML.DataViewSchema+Annotations: Schema: [ ] }
1,LeafLength,1,False,{ Microsoft.ML.Data.NumberDataViewType: RawType: System.Single },{ Microsoft.ML.DataViewSchema+Annotations: Schema: [ ] }
2,TrunkGirth,2,False,{ Microsoft.ML.Data.NumberDataViewType: RawType: System.Single },{ Microsoft.ML.DataViewSchema+Annotations: Schema: [ ] }
3,TrunkHeight,3,False,{ Microsoft.ML.Data.NumberDataViewType: RawType: System.Single },{ Microsoft.ML.DataViewSchema+Annotations: Schema: [ ] }
4,Label,4,False,{ Microsoft.ML.Data.BooleanDataViewType: RawType: System.Boolean },{ Microsoft.ML.DataViewSchema+Annotations: Schema: [ ] }


Once again, we can also have a preview of the data stored in the IDataView structure.

In the cell below replace the text `<printDataHere>` with display(fewRows);

In [5]:
public static List<TreeInput> Head(MLContext mlContext, IDataView dataView, int numberOfRows = 4)
{
    string msg = string.Format("DataView: Showing {0} rows with the columns", numberOfRows.ToString());
    display(msg);
          
    var rows = mlContext.Data.CreateEnumerable<TreeInput>(dataView, reuseRowObject: false)
                    .Take(numberOfRows)
                    .ToList();
    
    return rows;
}

display(h4("Showing a few rows from training DataView:"));

var fewRows = Head(mlContext, dataView, 10);

/*
 REPLACE <PrintDataHere> WITH display(fewRows);
*/
<PrintDataHere>
//

DataView: Showing 10 rows with the columns

index,LeafWidth,LeafLength,TrunkGirth,TrunkHeight,Label
0,5.13,6.18,8.26,8.74,False
1,7.49,4.02,8.07,6.78,False
2,9.22,4.16,5.46,8.45,True
3,3.46,5.19,8.72,10.4,False
4,4.55,5.15,9.01,9.64,False
5,7.64,2.58,9.73,7.75,False
6,8.69,4.35,4.37,8.82,True
7,7.21,3.62,8.71,7.43,False
8,8.52,3.67,5.99,9.7,True
9,6.61,5.29,6.8,6.07,False


To get the chart of this data, first we need to extract the information we want to show into lists. Each of this list will represent the list of values for an axis of each of the elements on it.

In [6]:
int numberOfRows = 150;
float[] leaf_width = dataView.GetColumn<float>("LeafWidth").Take(numberOfRows).ToArray();
float[] leaf_length= dataView.GetColumn<float>("LeafLength").Take(numberOfRows).ToArray();
float[] trunk_girth = dataView.GetColumn<float>("TrunkGirth").Take(numberOfRows).ToArray();
float[] trunk_height = dataView.GetColumn<float>("TrunkHeight").Take(numberOfRows).ToArray();
bool[] label = dataView.GetColumn<bool>("Label").Take(numberOfRows).ToArray();

In this case we want to not only represent the data by it's values in a two dimensional axis, but also by class. To do this, we will split the data by class into different lists.

In [7]:
List<float> leaf_width0 = new List<float>();
List<float> leaf_width1 = new List<float>();
List<float> leaf_length0 = new List<float>();
List<float> leaf_length1 = new List<float>();
List<float> trunk_girth0 = new List<float>();
List<float> trunk_girth1 = new List<float>();
List<float> trunk_height0 = new List<float>();
List<float> trunk_height1 = new List<float>();

for (int i=0; i<label.Length; i++)
{
    if (label[i] == false)
    {
        leaf_width0.Add(leaf_width[i]);
        leaf_length0.Add(leaf_length[i]);
        trunk_girth0.Add(trunk_girth[i]);
        trunk_height0.Add(trunk_height[i]);  
    }
    else
    {
        leaf_width1.Add(leaf_width[i]);
        leaf_length1.Add(leaf_length[i]);
        trunk_girth1.Add(trunk_girth[i]);
        trunk_height1.Add(trunk_height[i]);
    }
}

Once the previous function is done, we can represent each class in a different chart and then combine them into a single chart. The both charts below represent the different leaf and trunk features by class.

In the cell below replace the text `<printDataHere>` with display(lchart);

In [8]:
var lchart1 = new Graph.Scatter()
    {
        x = leaf_width0,
        y = leaf_length0,
        mode = "markers",
        marker = new Graph.Marker()
        {
            color = "blue",
        }
    };

var lchart2 = new Graph.Scatter()
    {
        x = leaf_width1,
        y = leaf_length1,
        mode = "markers",
        marker = new Graph.Marker()
        {
            color = "red",
        }
    };

var lchart = Chart.Plot(new[] {lchart1, lchart2});

var layout = new Layout.Layout(){title="Leaf features"};
lchart.WithLayout(layout);
lchart.WithXTitle("Leaf Width");
lchart.WithYTitle("Leaf Length");
lchart.WithLegend(true);
lchart.WithLabels(new[]{"Class 0", "Class1"});
lchart.Width = 700;
lchart.Height = 500;
lchart.WithLegend(true);


/*
 REPLACE <PrintDataHere> WITH display(lchart);
*/
<PrintDataHere>
//

In the cell below replace the text `<printDataHere>` with display(tchart);

In [9]:
var tchart1 = new Graph.Scatter()
    {
        x = trunk_girth0,
        y = trunk_height0,
        mode = "markers",
        marker = new Graph.Marker()
        {
            color = "blue",
        }
    };

var tchart2 = new Graph.Scatter()
    {
        x = trunk_girth1,
        y = trunk_height1,
        mode = "markers",
        marker = new Graph.Marker()
        {
            color = "red",
        }
    };
var tchart = Chart.Plot(new[] {tchart1, tchart2});

var layout = new Layout.Layout(){title="Trunk features"};
tchart.WithLayout(layout);
tchart.WithXTitle("Trunk Girth");
tchart.WithYTitle("Trunk Height");
tchart.WithLabels(new[]{"Class 0", "Class1"});
tchart.WithLegend(true);
tchart.Width = 700;
tchart.Height = 500;

/*
 REPLACE <PrintDataHere> WITH display(tchart);
*/
<PrintDataHere>
//

## Step 3 - Building a SVM model and running a prediction

Let's make a support vector machine. SVMs are a family algorithm for classification, regression, transduction, novelty detection and semi-supervised learning. The syntax for a support vector machine is composed by features and lables. Your features set will be called trainX and your labels set will be called trainY.

Let's first the following code to create a pipeline. In this case we are creating a pipeline using the LinearSupportVectorMachines trainer and we are only going to use the leaf features.

In the cell below replace the text `<ConcatenateHere>` with **"Features", "LeafWidth", "LeafLength", "TrunkGirth", "TrunkHeight"**

In [10]:
// Specify the support vector machine trainer
var pipeline = 
/*
 REPLACE <ConcatenateHere> WITH "Features", "LeafWidth", "LeafLength", "TrunkGirth", "TrunkHeight"
*/
    mlContext.Transforms.Concatenate(<ConcatenateHere>)
//    
    mlContext.Transforms.Concatenate("Features", "LeafWidth", "LeafLength", "TrunkGirth", "TrunkHeight")
    .Append(mlContext.BinaryClassification.Trainers.LinearSvm());

The next step is to train our model by passing our training data to the method Fit

In [11]:
// Train the model
var model = pipeline.Fit(dataView);

Let's use the model to get a prediction using the first two features, the leaf features. In this case we do this by calling the method CreatePredictionEngine to generate our final prediction engine. 

With that done, we can finally proceed to our prediction.

In the cell below replace:
     `<LeafWidth>`   WITH 5.13E+00f
     `<LeafLength>`  WITH 6.18E+00f
     `<TrunkGirth>`  WITH 8.26E+00f
     `<TrunkHeight>` WITH 8.74E+00f

In [12]:
// Use the trained model for one-time prediction
var predictionEngine = mlContext.Model.CreatePredictionEngine<TreeInput, TreeOutput>(model);

// Obtain the prediction
var prediction = predictionEngine.Predict(new TreeInput
{
// Features to include in the prediction
/*
 REPLACE <LeafWidth>   WITH 5.13E+00f
 REPLACE <LeafLength>  WITH 6.18E+00f
 REPLACE <TrunkGirth>  WITH 8.26E+00f
 REPLACE <TrunkHeight> WITH 8.74E+00f
*/
    LeafWidth = <LeafWidth>,
    LeafLength = <LeafLength>,
    TrunkGirth = <TrunkGirth>,
    TrunkHeight = <TrunkHeight>
// 
});

Console.WriteLine($"*************************************");
Console.WriteLine("Tree type {0}", prediction.TreeType ? "0" : "1");
Console.WriteLine($"Score: {prediction.Score}");
Console.WriteLine($"Probability: {prediction.Probability}");
Console.WriteLine($"*************************************");

*************************************
Tree type 1
Score: -164,64635
Probability: 0
*************************************


## Conclusion

And that's it! You've made a simple support vector machine that can predict the type of tree based on the leaf and trunk measurements!