<a href="https://colab.research.google.com/github/veillette/jupyterNotebooks/blob/main/Optics/SineFitUsingCurveFit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook will demonstrate how to use the scipy.optimize.curve_fit function to fit a sine curve to data.

The scipy.optimize.curve_fit function is a powerful tool for fitting curves to data. It can be used to fit a wide variety of curves, including sine curves.

The scipy.optimize.curve_fit function takes three arguments:

- f: The function that you want to fit to the data.
- xdata: The x-values of the data.
- ydata: The y-values of the data.

The f argument can be any callable function that takes a single argument and returns a single value. The xdata and ydata arguments must be arrays of the same length.

The scipy.optimize.curve_fit function returns two values:

- popt: The optimal parameters for the curve.
- pcov: The covariance matrix of the parameters.

The popt array contains the values of the parameters that best fit the data. The pcov matrix contains the covariance of the parameters. The covariance matrix can be used to determine the uncertainty of the parameters.

Once you have the optimal parameters, you can use them to plot the fitted curve.

In [None]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

Create some data for the x and y arrays

In [None]:
x = np.array([0,1,2,3,4,5,6,7,8])

In [None]:
y = np.array([0.4,1.9,1.5,0.2,-1.6,-1.8,-0.4,1.4,1.9])

Check that the data fields have the same length. Noticed the double equality sign, which indicates that this is a conditional. It will return true or false.

In [None]:
# Sanity Check
len(x)==len(y)

For your information, a sanity check is a quick and simple check that is performed to ensure that something is working as expected. Sanity checks are used to help identify problems early in the development process, which can save time and effort in the long run.

Plot the data to see that it looks like a sinusoidal function.

In [None]:
plt.plot(x,y,'r+')
plt.show()

In order to carry out our fit, we need to define a callable function that takes a single argument and returns a single value. This is our fit function. For our purposes, let's pick

$$y(x) = A \sin ( k x) $$

where $A$, and $k$ are our fitting parameters.

In [None]:
# Let's define a fitting function, with a name, ay myFit
# Note that it is crucial that the first argument of the claasble function be your independent variable, in our case x.
# The other argument of the function must be your fitting parameters, in our case A and k.
# CurveFit will return the optimized values in the order of the fitting function, so A first and k second for our case.

def myFit(x, A, k):
  return A* np.sin( k*x )

Looking at our graph above, we can already guess that $A$ should be approximately 2 and $k$ should be about 1 (since the function has a period of $2\pi$

A pretty common mistake is to define a fitting function that does not evaluate to a number
Let's assignig random values to x, A and k, and see that myFit outputs a number as expected.

In [None]:
# Sanity check
myFit(3,1.1,0.3)

Now we are ready to perform the fit.

The purpose of curve fitting is to find the parameters of a function that best fit a set of data points. The scipy.optimize.curve_fit function takes three parameters.

the first parameter is the function that defines the model or curve you want to fit to your data. The curve_fit function will try to adjust the parameters of this function to minimize the difference between the predicted values of the function and the actual data points.

The second and third parameters are arrays or lists containing the independent and dependent variables respectively. In other words, x represents the input data, while y represents the corresponding output data that you want to fit the curve to.

The scipy.optimize.curve_fit function returns two values:

- The optimal parameters for the curve.
- The covariance matrix of the parameters.

You can name these two values anything, you want but I like to use explicit names.

In [None]:
optimalParameters, covarianceMatrix = curve_fit(myFit, x, y)

In case the fit fails or error out, the initial guess parameters are suggested values that are close to the actual parameters of the function you're trying to fit. These guesses give the optimization algorithm a good starting point to converge to the optimal solution.
For our case, good initial guess parameters would be A=2 and k=1.

The p0 parameter is an optional argument for the curve_fit function.
It takes an array of initial guess parameters. So, we will create an array ``guess=[2,1]``
containing the initial guesses for A and k.

We then need to apply p0 in curve_fit:
```
optimalParameters, covarianceMatrix = curve_fit(myFit, x, y, po=guess)
```
By providing the initial guess parameters, you're guiding the curve fitting process, potentially helping it converge to a better solution.

In [None]:
print(optimalParameters)

One better way to access the values is to *destructure* the data. This is a useful trick to know.

In [None]:
ABest, kBest = optimalParameters
print( 'After, the fit we find that the optimal values for A and k are the following:')
print( 'A is ', ABest)
print( 'k is ', kBest)

Here is a better way to present the data, which is useful to know. The f formatting option can be used to limit the number of significant figures or decimal places for any numeric type.

In [None]:
print(f'A is equal to {ABest:.2f}, whereas k is equal to {kBest:.2f}')

One neat aspect of curve_fit and that it yields the undertainties of the best fit parameters.

I will not go into the details, so you can blindly follow the method below.

In [None]:
# Calculate the standard deviation of the parameters
std_dev = np.sqrt(np.diag(covarianceMatrix))
print("Show the standard deviation (A.K.A. error) of the two fitting parameters, A and k")
print(std_dev)

In [None]:
A_std_dev, k_std_dev = std_dev

print(f'A is equal to {ABest:.2f} +/- {A_std_dev:.2f} whereas k is equal to {kBest:.2f} +/- {k_std_dev:.2f}')

In theory, we are done, but it is neat to show the graph of the data with the best fit line.

Let's create a best fit line with that, with a 100 points ranging from xmin =0 to xmax =9.

In [None]:
xLine = np.linspace(0,9,100)
#print(xLine)

# now we need the y values, with the optimal A and k parameters.
# We use the broadcasting properties for the x values, which will return an array of y values.
yLine= myFit(xLine,ABest,kBest)
#print(yLine)

Let's show the two plots on the same graph.

In [None]:
plt.plot(x,y,'r+')
plt.plot(xLine,yLine)
plt.show()

At this point, you shoul be able to create your own fit function and use it for your own purposes. You can always refer to this notebook for your own needs.
