# Curve Fitting

## Objective and Prerequisites

Try this Jupyter Notebook Modeling Example to learn how you can fit a function to a set of observations. We will formulate this regression problem as a linear programming problem using the Gurobi Python API and then solve it with the Gurobi Optimizer.

This model is example 11 from the fifth edition of Model Building in Mathematical Programming, by H. Paul Williams on pages 266 and 319-320.

This modeling example is at the beginner level, where we assume that you know Python and that you have some knowledge about building mathematical optimization models. The reader should also consult the 
[documentation](https://www.gurobi.com/resources/?category-filter=documentation) 
of the Gurobi Python API.

**Download the Repository** <br />
You can download the repository containing this and other examples by clicking [here](https://github.com/Gurobi/modeling-examples/archive/master.zip). 

## Model Formulation

### Sets and Indices

$i \in \text{Observations}=\{1, .. ,n\}$.


### Parameters

$x_{i} \in \mathbb{R}$: Independent variable value at observation $i$.

$y_{i} \in \mathbb{R}$: Dependent variable value at observation $i$.

### Decision Variables

$a \in \mathbb{R}$: Value of the constant term in the function that explains the values of $y$ in terms of the values of $x$.

$b \in \mathbb{R}$: Coefficient of the linear term in the function that explains the values of $y$ in terms of the values of $x$.

$u_{i} \in \mathbb{R}^+$: Positive deviation of the proposed function of x with respect to the value of y at observation $i$.

$v_{i} \in \mathbb{R}^+$: Negative deviation of the proposed function of x with respect to the value of y at observation $i$.

$z$: Value of the maximum deviation.

We model the problem for the first goal:

* Fit a line $y=a+bx$ to the given data set in order to minimize the sum of absolute deviations of each observed value of $y$ from the value predicted by the linear relationship.


### Constraints Problem 1

**Deviation**: Each pair of corresponding data values $(x_{i},y_{i})$ gives rise to the following constraint.

\begin{equation}
bx_{i} + a + u_{i} - v_{i} = y_{i}  \quad \forall i \in \text{Observations}
\end{equation}

Where $x_{i}$  and $y_{i}$  are  the given values in the set of observations, $b$, $a$, $u_{i}$ and $v_{i}$ are variables. 
The positive deviation $u_{i}$ and the negative deviation $v_{i}$ give the amounts by which the values of $y_{i}$ proposed by the linear expression differ from the observed values.

### Objective Function Problem 1

**Total deviation**: The objective is to minimize the total positive and negative deviations.

\begin{equation}
\text{Minimize} \quad \sum_{i \in \text{Observations}} (u_{i} + v_{i})
\end{equation}

We now provide a model formulation for the second goal:

* Fit a line $y=a+bx$ to the given data set in order to  minimize the maximum deviation of all the observed values of $y$ from the value predicted by the linear relationship.

For this new formulation, in addition to the "Deviation constraints", we need to include the following constraints.

### Constraints Problem 2

**Maximum deviation**: The following constraints ensure that the decision variable $z$ takes the value of the maximum deviation.

\begin{equation}
z \geq u_{i}  \quad \forall i \in \text{Observations}
\end{equation}

\begin{equation}
z \geq v_{i}  \quad \forall i \in \text{Observations}
\end{equation}

### Objective Function Problem 2

**Minimum/Maximum deviation**: The objective is to minimize the maximum deviation.

\begin{equation}
\text{Minimize} \quad z
\end{equation}

## Python Implementation

We import the Gurobi Python Module.

In [None]:
%pip install gurobipy

In [1]:
import gurobipy as gp
from gurobipy import GRB

# tested with Python 3.7.0 & Gurobi 9.1.0

## Input data

We define the corresponding values for $x$ and $y$ in the set of observations.

In [2]:
# Sample data: values of independent variable x and dependent variable y

observations, x, y = gp.multidict({
    ('1'): [0,1],
    ('2'): [0.5,0.9],
    ('3'): [1,0.7],
    ('4'): [1.5,1.5],
    ('5'): [1.9,2],
    ('6'): [2.5,2.4],
    ('7'): [3,3.2],
    ('8'): [3.5,2],
    ('9'): [4,2.7],
    ('10'): [4.5,3.5],
    ('11'): [5,1],
    ('12'): [5.5,4],
    ('13'): [6,3.6],
    ('14'): [6.6,2.7],
    ('15'): [7,5.7],
    ('16'): [7.6,4.6],
    ('17'): [8.5,6],
    ('18'): [9,6.8],
    ('19'): [10,7.3]
})

## Model Deployment

We create a model and the variables. The variables of the model are the constant term and coefficient of the linear term of the function f(x), the positive and negative deviations, and the maximum deviation.

In [3]:
model = gp.Model('CurveFitting')

# Constant term of the function f(x). This is a free continuous variable that can take positive and negative values. 
a = model.addVar(lb=-GRB.INFINITY, ub=GRB.INFINITY, vtype=GRB.CONTINUOUS, name="a")

# Coefficient of the linear term of the function f(x). This is a free continuous variable that can take positive 
# and negative values.
b = model.addVar(lb=-GRB.INFINITY, ub=GRB.INFINITY, vtype=GRB.CONTINUOUS, name="b")

# Non-negative continuous variables that capture the positive deviations
u = model.addVars(observations, vtype=GRB.CONTINUOUS, name="u")

# Non-negative continuous variables that capture the negative deviations
v = model.addVars(observations, vtype=GRB.CONTINUOUS, name="v")

# Non-negative continuous variables that capture the value of the maximum deviation
z = model.addVar(vtype=GRB.CONTINUOUS, name="z")

Using license file c:\gurobi\gurobi.lic


Each pair of corresponding data values $x_{i}$ and $y_{i}$ gives rise to a constraint.

In [4]:
# Deviation constraints

deviations = model.addConstrs( (b*x[i] + a + u[i] - v[i] == y[i] for i in observations), name='deviations')

The objective function of problem 1 is to minimize the total positive and negative deviations.

In [5]:
# Objective function of problem 1

model.setObjective(u.sum('*') + v.sum('*'))

In [6]:
# Verify model formulation

model.write('CurveFitting.lp')

# Run optimization engine

model.optimize()

Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 19 rows, 41 columns and 75 nonzeros
Model fingerprint: 0x0bec2f7b
Coefficient statistics:
  Matrix range     [5e-01, 1e+01]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [7e-01, 7e+00]
Presolve removed 0 rows and 1 columns
Presolve time: 0.01s
Presolved: 19 rows, 40 columns, 75 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0      handle free variables                          0s
      20    1.1466250e+01   0.000000e+00   0.000000e+00      0s

Solved in 20 iterations and 0.01 seconds
Optimal objective  1.146625000e+01


In [7]:
# Output report

print("\n\n_________________________________________________________________________________")
print(f"The best straight line that minimizes the absolute value of the deviations is:")
print("_________________________________________________________________________________")
print(f"y = {b.x:.4f}x + ({a.x:.4f})")



_________________________________________________________________________________
The best straight line that minimizes the absolute value of the deviations is:
_________________________________________________________________________________
y = 0.6375x + (0.5813)


For Problem 2, it is necessary to introduce another variable $z$ to capture the value of the maximum deviations

In [8]:
# Maximum deviation constraints

maxPositive_deviation = model.addConstrs( (z >= u[i] for i in observations), name='maxPositive_deviation')

maxNegative_deviation = model.addConstrs( (z >= v[i] for i in observations), name='maxNegative_deviation')

The objective function for Problem 2 is to minimize the maximum deviation.

In [9]:
# Objective function for Problem 2

model.setObjective(z)

# Run optimization engine

model.optimize()

Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 57 rows, 41 columns and 151 nonzeros
Coefficient statistics:
  Matrix range     [5e-01, 1e+01]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [7e-01, 7e+00]
Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    0.0000000e+00   1.146625e+01   0.000000e+00      0s
      11    1.7250000e+00   0.000000e+00   0.000000e+00      0s

Solved in 11 iterations and 0.01 seconds
Optimal objective  1.725000000e+00


In [10]:
# Output report

print("\n\n_________________________________________________________________________________")
print(f"The best straight line that minimizes the maximum deviation is:")
print("_________________________________________________________________________________")
print(f"y = {b.x:.4f}x + ({a.x:.4f})")



_________________________________________________________________________________
The best straight line that minimizes the maximum deviation is:
_________________________________________________________________________________
y = 0.6250x + (-0.4000)


---
## References

H. Paul Williams, Model Building in Mathematical Programming, fifth edition.

Copyright © 2020 Gurobi Optimization, LLC