# Multiple Linear Regression

From the previous exercise, we know that customers are happier with chocolate bars that are large and have high amounts of cocoa. Customers may feel differently when they have to pay for these bars though.

In this exercise, we will try to find the chocolate bar that best suits customers, taking into account the cocoa content, size, and price.

## Step 1 - Import NuGet packages

Necessary NuGet packages can easily be imported to use it in a Jupyter Notebook using the following code. In this case we will need Microsot.ML and Xplot.Plotly for the graphics.

In [None]:
// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML,1.4"
    
//Install XPlot package
#r "nuget:XPlot.Plotly,2.0.0"

using Microsoft.ML;
using Microsoft.ML.Data;
using XPlot.Plotly;

## Step 2 - Storing the data

First of all, you need to create a class suited to store the information you have. Once this is done, you can load structured information into this class and verify  it by showing the schema and some rows of the DataView.

In [None]:
public class ChocolateInput
{
    [LoadColumn(0)]
    public float weight;

    [LoadColumn(1)]
    public float cocoa_percent;

    [LoadColumn(2)]
    public float cost;

    [LoadColumn(3)]
    public float customer_happiness;

}

public class ChocolateOutput
{
    [ColumnName("Score")]
    public float CustomerHappiness { get; set; }
}

First we need to define the path of the data file that we are going to use in the exercises.

In [None]:
string TrainDataPath = "./Data/chocolate-data-multiple-linear-regression.txt";

The next step is to add the following code the create the MLContext and the TextLoader to read the training data, this is explained with more detail in the previous Linear Regression excercise.

In [None]:
MLContext mlContext = new MLContext(seed:0);
IDataView dataView = mlContext.Data.LoadFromTextFile<ChocolateInput>(path: TrainDataPath, hasHeader: true, separatorChar:'\t');

display(dataView.Schema);

Once again, we can also have a preview of the data stored in the IDataView structure.

In the cell below replace the text `<printDataHere>` with display(fewRows); and then press Run in the toolbar above (or press Shift+Enter).

In [None]:
public static List<ChocolateInput> Head(MLContext mlContext, IDataView dataView, int numberOfRows = 4)
{
    string msg = string.Format("DataView: Showing {0} rows with the columns", numberOfRows.ToString());
    display(msg);
          
    var rows = mlContext.Data.CreateEnumerable<ChocolateInput>(dataView, reuseRowObject: false)
                    .Take(numberOfRows)
                    .ToList();
    
    return rows;
}

display(h4("Showing a few rows from training DataView:"));

var fewRows = Head(mlContext, dataView, 10);

/*
 REPLACE <PrintDataHere> WITH display(fewRows);
*/
<PrintDataHere>
//

## Step 3 - Perform a simple linear regression

Previously we found that customers like a high percentage of cocoa and heavier bars of chocolate. Large bars of chocolate cost more money, though, which might make customers less inclined to purchase them.

Let's perform a simple linear regression to see the relationship between customer happiness and chocolate bar weight when the cost of the chocolate was taken into consideration for the survey.

The first thing you will need to do is to create a pipeline. In this case we are creating a pipeline using the PoissonRegression algorithm, which is a type of Linear regression. For this pipeline we'll only consider the Weight feature.

In the cell below replace the text `<ConcatenateHere>` with **"Features", "weight", "cocoa_percent", "cost"** and then press Run in the toolbar above (or press Shift+Enter).

In [None]:
var pipeline =
// Features to include in the prediction
/*
 REPLACE <ConcatenateHere> WITH "Features", "weight", "cocoa_percent", "cost"
*/
mlContext.Transforms.Concatenate(<ConcatenateHere>)
//    
// Specify the regression trainer
.Append(mlContext.Regression.Trainers.LbfgsPoissonRegression("customer_happiness"));

The next step is to train our model by passing our training data to the method Fit.

In [None]:
// Train the model
var model = pipeline.Fit(dataView);

The final step is to use the model to get a the regression coefficient and the slope. We will use these variables to build a graphic and have a better visualization of the data.

In [None]:
// The model's feature weight coefficients
var regressionModel = model.LastTransformer.Model;
var weights = regressionModel.Weights;
var intercept = regressionModel.Bias;
Console.WriteLine($"Coefficients: Weight={weights[0]:0.##}, CocoaPercent={weights[1]:0.##}, Cost={weights[2]:0.##}");

## Step 4 - Graphing prediction data

We want to know which chocolate bar features make customers happy but taking the cost into account. Here we are going to generate a graph using the data at the training data.

First we need to extract the information we want to show into lists. Each of this list will represent the list of values for an axis of each of the elements on it.

In [None]:
int numberOfRows = 1000;
float[] weight = dataView.GetColumn<float>("weight").Take(numberOfRows).ToArray();
float[] cocoa_percent = dataView.GetColumn<float>("cocoa_percent").Take(numberOfRows).ToArray();
float[] customer_happiness = dataView.GetColumn<float>("customer_happiness").Take(numberOfRows).ToArray();

Then, we set the chart and the layout options and display the result.

In the cell below replace the text `<printDataHere>` with display(chart); and then press Run in the toolbar above (or press Shift+Enter).

In [None]:
// Plot Cocoa-Percent vs Customer Happiness

var chart = Chart.Plot(
    new Graph.Scatter()
    {
        x = cocoa_percent,
        y = customer_happiness,
        mode = "markers",
        marker = new Graph.Marker()
        {
            color = customer_happiness,
            colorscale = "Jet"
        }
    }
);

var layout = new Layout.Layout(){title="Cocoa Percent vs Customer Happiness"};
chart.WithLayout(layout);
chart.WithXTitle("Cocoa Percent");
chart.WithYTitle("CustomerHappiness");
chart.WithLegend(true);
chart.Width = 700;
chart.Height = 500;
chart.WithLegend(true);

/*
 REPLACE <PrintDataHere> WITH display(chart);
*/
<PrintDataHere>
//
