# Polynomial Regression Library with Python 

* Change the current working directory into `./demo`

In [2]:
%cd demo

F:\SEU\SEE\PySEE\home\notebook\demo


In [None]:
%pwd

## 3 Polynomial Regression Library 

**Polynomial Regression** 

https://en.wikipedia.org/wiki/Polynomial_regression
    
### 3.1   Polynomial regression：Matrix form and calculation of estimates

#### 3.1.1 The polynomial regression model

$${\displaystyle y_{i}\,=\,\beta _{0}+\beta _{1}x_{i}+\beta _{2}x_{i}^{2}+\cdots +\beta _{m}x_{i}^{m}+\varepsilon _{i}\ (i=1,2,\dots ,n)}$$

can be expressed in matrix form in terms of a design matrix ${\displaystyle \mathbf {X}}$ , a response vector ${\displaystyle ${\vec {y}}}$, a parameter vector ${\displaystyle {\vec {\beta }}}$, and a vector ${\displaystyle {\vec {\varepsilon }}}$ of random errors. 

The i-th row of ${\displaystyle \mathbf {X} }$  and ${\displaystyle {\vec {y}}}$ will contain the x and y value for the $i$-th data sample. Then the model can be written as a system of linear equations:

$${\displaystyle {\begin{bmatrix}y_{1}\\y_{2}\\y_{3}\\\vdots \\y_{n}\end{bmatrix}}={\begin{bmatrix}1&x_{1}&x_{1}^{2}&\dots &x_{1}^{m}\\1&x_{2}&x_{2}^{2}&\dots &x_{2}^{m}\\1&x_{3}&x_{3}^{2}&\dots &x_{3}^{m}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&x_{n}&x_{n}^{2}&\dots &x_{n}^{m}\end{bmatrix}}{\begin{bmatrix}\beta _{0}\\\beta _{1}\\\beta _{2}\\\vdots \\\beta _{m}\end{bmatrix}}+{\begin{bmatrix}\varepsilon _{1}\\\varepsilon _{2}\\\varepsilon _{3}\\\vdots \\\varepsilon _{n}\end{bmatrix}},}$$

which when using pure matrix notation is written as
$${\displaystyle {\vec {y}}=\mathbf {X} {\vec {\beta }}+{\vec {\varepsilon }}.\,}$$

The vector of estimated polynomial regression coefficients (using [ordinary least squares estimation](https://en.wikipedia.org/wiki/Estimation)) is

$${\displaystyle {\widehat {\vec {\beta }}}=(\mathbf {X} ^{\mathsf {T}}\mathbf {X} )^{-1}\;\mathbf {X} ^{\mathsf {T}}{\vec {y}},\,}$$

assuming $m < n$ which is required for the matrix to be invertible; then since ${\displaystyle \mathbf {X} }$  is a [Vandermonde matrix](https://en.wikipedia.org/wiki/Vandermonde_matrix), the invertibility condition is guaranteed to hold if all the ${\displaystyle x_{i}}$ values are distinct. This is the unique **least-squares solution**.

#### 3.1.2 Least-squares solution
 
The model function has the form ${\displaystyle f(x,\beta )}f(x,\beta )$, where $m$ adjustable parameters are held in the vector ${\boldsymbol {\beta }}$. 

The goal is to find the parameter values for the model that "best" fits the data. The fit of a model to a data point is measured by its residual, defined as the difference between the actual value of the dependent variable and the value predicted by the model:

$$r_{i}=y_{i}-f(x_{i},{\boldsymbol  \beta })$$

The least-squares method finds the optimal parameter values by **minimizing** the sum, ${\displaystyle S}$ of squared residuals:

$${\displaystyle S=\sum _{i=1}^{n}{r_{i}}^{2}}$$

**Reference** 

* 周建华，陈建龙，张小向编：《几何于代数》 科学出版社，2012
  
  * 4.6 线性方程组的最小二乘解 


### 3.2 The shared library

#### 3.2.1 The C code


In [3]:
%%file ./include/curvefit.h
#ifndef CURVEFIT_H
#define CURVEFIT_H

#ifdef __cplusplus
extern "C"
{
#endif
 
void c_polyfit(double x[], double y[], int size, int n, double a[]);

#ifdef __cplusplus
} // extern "C"
#endif

#endif /* !CURVEFIT_H */


Overwriting ./include/curvefit.h


> **extern "C"**
>
>If we need to use some `C library` in `C++`
>
>* `extern "C"` specifies that the function is defined elsewhere and uses the `C-language calling convention`. 
>
>* The `extern "C"` modifier may also be applied to multiple function declarations in a block.


## Using Gauss lib

### C

In [15]:
%%file ./src/curvefit.c
/*
   x[size], y[size]
   n is the degree of Polynomial 
   a[n+1] is the polynomial regression coefficients (
*/

#include <math.h>
#include<stdlib.h> 
#include "curvefit.h"
#include "eqlinear.h"

void c_polyfit(double x[], double y[], int size, int n, double a[])
{
    int i, j;
    double X[2 * n + 1]; //Array that will store the values of sigma(xi),sigma(xi^2),....sigma(xi^2n)
    double **B;
    B=(double**)malloc(sizeof(double)*(n+1)); 
    for(i=0;i<(n+1);i++)  
        B[i]=(double*)malloc(sizeof(double)*(n+2));
    double Y[n+1];  // Array to store the values of sigma(yi),sigma(xi*yi),sigma(xi^2*yi)...sigma(xi^n*yi)
  
    for (i = 0; i < 2 * n + 1; i++)
    {
        X[i] = 0;
        for (j = 0; j < size; j++)
            X[i] = X[i] + pow(x[j], i); //consecutive positions of the array will store N,sigma(xi),sigma(xi^2),....sigma(xi^2n)
    }
   
     
    for (i = 0; i <= n; i++)
        for (j = 0; j <= n; j++)
            B[i][j] = X[i + j]; // Build the Normal matrix by storing the corresponding coefficients at the right positions except the last column of the matrix
     
    for (i = 0; i < n+1; i++)
    {
        Y[i] = 0;
        for (j = 0; j < size; j++)
           Y[i] = Y[i] + pow(x[j], i) * y[j]; //consecutive positions will store sigma(yi),sigma(xi*yi),sigma(xi^2*yi)...sigma(xi^n*yi)
   }
   for (i = 0; i <= n; i++)
        B[i][n + 1] = Y[i]; // load the values of Y as the last column of B(Normal Matrix but augmented)
   
 
   gauss_am_pivoting(n+1,B,a);
    
   for(i=0;i<(n+1);i++)  
       free(B[i]);
   free(B); 
    
}

Overwriting ./src/curvefit.c


### makefile-

In [5]:
%%file ./makefile-libcurvefit.mk
CC=gcc
CFLAGS=-O3 -Wall -fPIC

SRCDIR= ./src/
OBJDIR= ./obj/
BINDIR= ./bin/
INCDIR= ./include/

# Linux
# LIB=libcurvefit.so 
LIB=libcurvefit.dll

SRCS=$(SRCDIR)curvefit.c  

# non-path filename
filename=$(notdir $(SRCS))

# the obj target of a source code using the pattern rule
OBJS=$(patsubst %.c,$(OBJDIR)%.o,$(filename))

all:$(LIB)
    
$(LIB): $(OBJS)  
	$(CC) -shared -o $(BINDIR)$@ $(OBJS) -L./bin/ -leqlin

# the pattern rule: one step rule for multiple source files
$(OBJS):$(SRCS)
	$(CC) $(CFLAGS) -o $(OBJDIR)$(notdir $@) -c $(patsubst  %.o,$(SRCDIR)%.c,$(notdir $@))  -I$(INCDIR) 

Overwriting ./makefile-libcurvefit.mk


In [6]:
!make -f makefile-libcurvefit.mk

gcc -O3 -Wall -fPIC -o ./obj/curvefit.o -c ./src/curvefit.c  -I./include/ 
gcc -shared -o ./bin/libcurvefit.dll ./obj/curvefit.o -L./bin/ -leqlin


##  Call Library in C

In [7]:
%%file ./data/springData.csv
Distance(m),Mass(kg)
0.0865,0.1
0.1015,0.15
0.1106,0.2
0.1279,0.25
0.1892,0.3
0.2695,0.35
0.2888,0.4
0.2425,0.45
0.3465,0.5
0.3225,0.55
0.3764,0.6
0.4263,0.65
0.4562,0.7
0.4502,0.75
0.4499,0.8
0.4534,0.85
0.4416,0.9
0.4304,0.95
0.437,1.0

Overwriting ./data/springData.csv


#####  strtok_r()

C provides the functions `strtok_r()` for splitting a string by some delimiter

```c
char *strtok_r(char *str, const char *delim, char **saveptr);
```

*  The third argument `saveptr` is a pointer to a `char *` variable that is used internally by `strtok_r()` in  order to maintain context between `successive calls`

* On the `first` call to `strtok_r()`, `str` should point to `the string to be parsed`, and the value of `saveptr` is ignored.

* In `subsequent calls`, `str` should be `NULL`, and `saveptr` should be unchanged since the previous call.


In [14]:
%%file ./src/DemoPolyFit.c
/*
   The Demo of Simple PolyFit 
*/

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "curvefit.h"


int main()
{

   int size = 19;
   double distances[size];
   double forces[size];

   FILE *fp = fopen("./data/springData.csv", "r");
   if (fp == NULL)
   {
      fprintf(stderr, "failed to open file for reading\n");
      return 1;
   }

   char line[1024];
   fgets(line, sizeof(line), fp);
   int i = 0;
   while (fgets(line, sizeof(line), fp))
   {
      char *save_ptr;
      // The first call to strtok_r(), str point to the string to be parsed line
      char *d = strtok_r(line, ",", &save_ptr);
      // In subsequent calls, str is NULL, and saveptr is unchanged since the previous call.
      char *m = strtok_r(NULL, ",", &save_ptr);
      distances[i] = atof(d);
      forces[i] = atof(m) * 9.81;
      i++;
   };
   fclose(fp);

   int n = 1; // n is the degree of Polynomial
   double a[n + 1];
   c_polyfit(forces, distances, size - 6, n, a);
   printf(" PolynomialFit: k = %.6f ", 1/a[1]);
   return 0;
}

Overwriting ./src/DemoPolyFit.c


**Windows**

In [16]:
!gcc -o  ./bin/DemoPolyFit ./src/DemoPolyFit.c -L./bin/ -lcurvefit -I./include

In [17]:
!.\bin\DemoPolyFit

 PolynomialFit: k = 15.453365 


#### 3.2.4 Call in Python

[UNDERSTANDING_EXPERIMENTAL_DATA:The Behavior of Springs](Unit2-3-UNDERSTANDING_EXPERIMENTAL_DATA.ipynb)

In [25]:
import numpy as np
from ctypes import *


def getData(fileName):
    dataFile = open(fileName, 'r')
    distances = []
    forces = []
    discardHeader = dataFile.readline() # Distance(m),Mass(kg)
    for line in dataFile:
        d, m = line.split(',')
        distances.append(float(d))
        forces.append(float(m)*9.81) # m*9.81 -> force
    dataFile.close()
    return (forces, distances)

inputFile='./data/springData.csv'
forces, distances = getData(inputFile)

forces=forces[:-6]
distances=distances[:-6]

# call the PolynomialFit Library by ctypes
flib=cdll.LoadLibrary("./bin/libcurvefit.dll")

size=len(forces)
y=(c_double*size)(*distances)
x=(c_double*size)(*forces)
n=1
c=(c_double*(n+1))()

flib.c_polyfit(byref(x), byref(y), size, n, byref(c));

print("Linear Fit:")
print("\tPolynomialFit:k =",1/c[1]);
# np.polyfit
a,b= np.polyfit(forces,distances, 1)    
print("\tnp.polyfit: k =",1/a);


Linear Fit:
	PolynomialFit:k = 15.453365184877422
	np.polyfit: k = 15.453365184877441


## Reference

Python ctypes http://docs.python.org/3/library/ctypes.html

