# Polynomial Regression (ML Task: Time series)

Sometimes our data doesn't have a linear relationship, but we still want to predict an outcome.

Suppose we want to predict how satisfied people might be with a piece of fruit, we would expect satisfaction would be low if the fruit was under ripened or over ripened. Satisfaction would be high in between under ripe and overripe.

This is not something linear regression will help us with, so we can turn to polynomial regression to help us make predictions for these more complex non-linear relationships!

In [None]:
// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML,1.4"
    
//Install XPlot package
#r "nuget:XPlot.Plotly,2.0.0"
    
using Microsoft.ML;
using Microsoft.ML.Data;
using XPlot.Plotly;

## Step 2 - Storing the data

First of all, you need to create a class suited to store the information you have. Once this is done, you can load structured information into this class and verify  it by showing the schema and some rows of the DataView.

In [None]:
public class TrafficData
{
    [LoadColumn(0)]
    [ColumnName("Time")]
    public float Time { get; set; }

    [LoadColumn(1, 6)]
    [VectorType(6)]
    [ColumnName("HistoricalMeasures")]
    public float[] HistoricalMeasures { get; set; }

    [LoadColumn(7)]
    [ColumnName("Label")]
    public float AverageMeasure { get; set; }
}

public class TrafficPrediction
{
    [ColumnName("Score")]
    public float InternetTraffic { get; set; }
}

First we need to define the path of the data file that we are going to use in the exercises.

In [None]:
string TrainDataPath = "./Data/traffic_by_hour.csv";

The next step is to add the following code the create the MLContext and the TextLoader to read the training data, this is explained with more detail in the previous Linear Regression and Multiple Linear Regression excercises.

In [None]:
MLContext mlContext = new MLContext(seed:0);
IDataView dataView = mlContext.Data.LoadFromTextFile<TrafficData>(path: TrainDataPath, hasHeader: false, separatorChar:'\t');

display(dataView.Schema);

Once again, we can also have a preview of the data stored in the IDataView structure.

In the cell below replace the text `<printDataHere>` with display(fewRows); and then press Run in the toolbar above (or press Shift+Enter).

In [None]:
public static List<TrafficData> Head(MLContext mlContext, IDataView dataView, int numberOfRows = 4)
{
    string msg = string.Format("DataView: Showing {0} rows with the columns", numberOfRows.ToString());
    display(msg);
          
    var rows = mlContext.Data.CreateEnumerable<TrafficData>(dataView, reuseRowObject: false)
                    .Take(numberOfRows)
                    .ToList();
    
    return rows;
}

display(h4("Showing a few rows from training DataView:"));

var fewRows = Head(mlContext, dataView, 24);

/*
 REPLACE <PrintDataHere> WITH display(fewRows);
*/
<PrintDataHere>
//

Finally we are going split the loaded data in two parts: one dataset for training and another one for testing the model. Then we can use the IDataView method Preview to get a summary of the data and write it out to the console.

In [None]:
var trainTestData = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);
IDataView trainDataView = trainTestData.TrainSet;
IDataView testDataView = trainTestData.TestSet;

Console.WriteLine($"Training set preview: { trainDataView.Preview().ToString()}");
Console.WriteLine($"Test set preview: { testDataView.Preview().ToString()}");

## Step 3 - Data Visualization: Mean values for each hour

Let's see if we can draw out a clearer pattern by taking the average values for each hour.

In [None]:
public class PlotChartPoint
{
    public double X { get; set; }

    public double Y { get; set; }
}

Then, we'll create a function that define de point values mesauring the average of the data.

In [None]:
public static IEnumerable<PlotChartPoint> GetAvgChartPointsFromData(IEnumerable<TrafficData> data)
{
    return data
        .Select(x => new PlotChartPoint()
        {
            X = x.Time,
            Y = x.AverageMeasure
        });
}

Using this function to obtain the values and splitting them into one list per axis, we can plot a chart with their representation.

In [None]:
// Get an array of the average data points
var avgPoints = GetAvgChartPointsFromData(mlContext.Data.CreateEnumerable<TrafficData>(trainDataView, reuseRowObject: true));
var avgX = avgPoints.Select(PlotChartPoint => PlotChartPoint.X);
var avgY = avgPoints.Select(PlotChartPoint => PlotChartPoint.Y);

Then, we set the chart and the layout options and display the result.

In the cell below replace the text `<printDataHere>` with display(chart); and then press Run in the toolbar above (or press Shift+Enter).

In [None]:
var chart = Chart.Plot(
    new Graph.Scatter()
    {
        x = avgX,
        y = avgY,
        mode = "markers",
        marker = new Graph.Marker()
        {
            color = avgY,
            colorscale = "Jet"
        }
    }
);

var layout = new Layout.Layout(){title="Avg Internet Traffic on time"};
chart.WithLayout(layout);
chart.WithXTitle("Time of the Day");
chart.WithYTitle("Average Traffic");
chart.WithLegend(true);
chart.Width = 700;
chart.Height = 500;
chart.WithLegend(true);

/*
 REPLACE <PrintDataHere> WITH display(chart);
*/
<PrintDataHere>
//

## Step 4 - Use a model to make a prediction

Let's use the midpoints in between the hours to analyse the relationship between the time of day and the amount of internet traffic.

We specify a feature 𝑥 (time of day) and our label 𝑦 (the amount of traffic).

The first thing you will need to do is to create a pipeline. Polynomial regression is considered to be a special case of multiple linear regression, so in this case we are creating the pipeline using the PoissonRegression algorithm, the one we used for Multiple Linear Regression.

In [None]:
// Create the pipeline
var pipeline =
    mlContext.Transforms.Concatenate("Features", "Time", "Label")
    .Append(mlContext.Regression.Trainers.LbfgsPoissonRegression());

The next step is to train our model by passing our training data to the method Fit.

In [None]:
// Train the model
var model = pipeline.Fit(dataView);

Now let's try using this model to make a prediction for a time between 00 and 24.

In the cell below replace the text `<HourHere>` with 12.5f

In [None]:
// Use the trained model to predict the internet traffic
var predictionEngine = mlContext.Model.CreatePredictionEngine<TrafficData, TrafficPrediction>(model);

// Features to include in the prediction
/*
 REPLACE <HourHere> WITH "Features", "weight", "cocoa_percent", "cost"
*/
var time = 12.5f;
//    
// Specify the regression trainer


//Obtain the prediction
var prediction = predictionEngine.Predict(new TrafficData
{
    Time = time,
    HistoricalMeasures = new float[] { 43.5f, 45.3f, 41.9f, 40.3f, 31.5f, 44.6f },
    AverageMeasure = 41.18f
});

Console.WriteLine($"At t={time}, predicted internet traffic is {prediction.InternetTraffic} Gbps.");

## Conclusion

And there we have it! You have made a polynomial regression model and used it for analysis! This models gives us a prediction for the level of internet traffic we should expect to see at any given time of day.