# Linear Regression

We want to know how to make our chocolate-bar customers happier. To do this, we need to know which chocolate bar features predict customer happiness. For example, customers may be happier when chocolate bars are bigger, or when they contain more cocoa.

We have data on customer happiness when eating chocolate bars with different features. Lets look at the relationship between happiness and cocoa percentage.

## Step 1 - Import NuGet packages

Necessary NuGet packages can easily be imported to use it in a Jupyter Notebook using the following code. In this case we will need Microsot.ML and Xplot.Plotly for the graphics.

In [None]:
// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML,1.4"
    
//Install XPlot package
#r "nuget:XPlot.Plotly,2.0.0"

using Microsoft.ML;
using Microsoft.ML.Data;
using XPlot.Plotly;

## Step 2 - Storing the data

First of all, you need to create a class suited to store the information you have. Once this is done, you can load structured information into this class and verify  it by showing the schema and some rows of the DataView.


In [None]:
public class ChocolateInput
{
    [LoadColumn(0)]
    public float weight;

    [LoadColumn(1)]
    public float cocoa_percent;

    [LoadColumn(2)]
    public float sugar_percent;

    [LoadColumn(3)]
    public float milk_percent;

    [LoadColumn(4)]
    public float customer_happiness;

}

public class ChocolateOutput
{
    [ColumnName("Score")]
    public float CustomerHappiness { get; set; }
}


First we need to define the path of the data file that we are going to use in the exercises.

Then we add the following code to create the MLContext, which is the starting point of all ML.NET projects. It provides a mechanism to log, as well as the entry point for training, prediction, model operations and more.

Let's load our data into a IDataView structure.

In [None]:
string TrainDataPath = "./Data/chocolate-data.txt";

MLContext mlContext = new MLContext(seed:0);
IDataView dataView = mlContext.Data.LoadFromTextFile<ChocolateInput>(path: TrainDataPath, hasHeader: true, separatorChar:'\t');

display(dataView.Schema);

We can also review the content of the data in this structure.

In the cell below replace the text `<printDataHere>` with display(fewRows); and then press Run in the toolbar above (or press Shift+Enter).

In [None]:
public static List<ChocolateInput> Head(MLContext mlContext, IDataView dataView, int numberOfRows = 4)
{
    string msg = string.Format("DataView: Showing {0} rows with the columns", numberOfRows.ToString());
    display(msg);
          
    var rows = mlContext.Data.CreateEnumerable<ChocolateInput>(dataView, reuseRowObject: false)
                    .Take(numberOfRows)
                    .ToList();
    
    return rows;
}

display(h4("Showing a few rows from training DataView:"));

var fewRows = Head(mlContext, dataView, 10);

/*
 REPLACE <PrintDataHere> WITH display(fewRows);
*/
<PrintDataHere>
//

The data represents 100 different variations of chocolate bars and the measured customer happiness for each one.

## Step 3 - Building a model and running a prediction

Now, we are going to be using the ML.NET regression trainer LbfgsPoissonRegression to train the model to make a prediction of customer happiness based on the cocoa percentage.

The first thing you will need to do in any ML.NET project is to create a pipeline. The general idea here is that you create a 'chain' operations like data loading, transformations and model building together to create a 'pipeline'. In this case we are creating a pipeline using the LbfgsPoissonRegression algorithm, which is a type of Linear regression; we are concatenating our columns and then appending a PoissonRegression model to the end.

In [None]:
var pipeline =
// Specify the PoissonRegression regression trainer
mlContext.Transforms.Concatenate("Features", "cocoa_percent")
.Append(mlContext.Regression.Trainers.LbfgsPoissonRegression("customer_happiness"));

The next step is to train our model by passing our training data to the method Fit.

In [None]:
//Train the model
var model = pipeline.Fit(dataView);

The final step is to use the model to get a prediction. In this case we do this by calling the method CreatePredictionEngine to generate our final prediction engine. Then we can pass in a chocolateInput object with a cocoa percentage of 65 and see what the model predicts for customer happiness.

In the cell below replace the text `<CocoaPercent>` with 65

In [None]:
//Get the prediction
// Use the trained model for one-time prediction
var predictionEngine = mlContext.Model.CreatePredictionEngine<ChocolateInput, ChocolateOutput>(model);


/*
 CHANGE  <CocoaPercent> 65;
*/
var prediction = predictionEngine.Predict(new ChocolateInput
{
    cocoa_percent =  <CocoaPercent>,
});

//
// Obtain the prediction

prediction

## Step 4 - Graphing prediction data

We want to know which chocolate bar features make customers happy. Here we are going to generate a graph using the data at the training data.

First we need to extract the information we want to show into lists. Each of this list will represent the list of values for an axis of each of the elements on it.

In [None]:
int numberOfRows = 1000;
float[] cocoa_percent = dataView.GetColumn<float>("cocoa_percent").Take(numberOfRows).ToArray();
float[] customer_happiness = dataView.GetColumn<float>("customer_happiness").Take(numberOfRows).ToArray();

Then, we set the chart and the layout options and display the result.

In [None]:
// Plot Cocoa-Percent vs Customer Happiness

var chart = Chart.Plot(
    new Graph.Scatter()
    {
        x = cocoa_percent,
        y = customer_happiness,
        mode = "markers",
        marker = new Graph.Marker()
        {
            color = customer_happiness,
            colorscale = "Jet"
        }
    }
);

var layout = new Layout.Layout(){title="Cocoa Percent vs Customer Happiness"};
chart.WithLayout(layout);
chart.WithXTitle("Cocoa Percent");
chart.WithYTitle("Customer Happiness");
chart.WithLegend(true);
chart.Width = 700;
chart.Height = 500;
chart.WithLegend(true);

display(chart);