# Model Fitting


One of the most common things in scientific computing is model fitting. Numerical Recipes devotes a number of chapters to this.

* scipy "curve_fit"
* astropy.modeling
* lmfit (emcee)
* pyspeckit

We will also need to be able to read data. We will use pickle (-01) and an ascii file reader (-02)

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import math
import matplotlib.mlab as mlab


# pickle
you may need to re-visit n6503-case1 and reset **peakpos** in **In[18]** and execute **In[20]** and **In[19]** to get the spectrum file

In [None]:
!ls -l n6503-*.p


In [None]:
try:
    import cPickle as pickle
except:
    import pickle
   
sp = pickle.load(open("n6503-sp.p","rb"))
# print(sp.keys())
velz = sp['z']
flux = sp['i']
plt.plot(velz,flux)
plt.xlabel(sp['zunit'])
plt.ylabel(sp['iunit'])
plt.title("Pos: %s %s" % (str(sp['xpos']),str(sp['ypos'])))
#  plt.title("Pos: %d %d" % (sp['xpos'],sp['ypos']))

In [None]:
# compute moments of this spectrum to get an idea what the mean and dispersion of this signal is
# recall this method can easily result in zdisp < 0 for noisy data
tmp1 = flux*velz 
tmp2 = flux*velz*velz
zmean = tmp1.sum()/flux.sum()
zdisp = tmp2.sum()/flux.sum() - zmean*zmean

print("mean,var:",zmean,zdisp)
if zdisp > 0:
    sigma = math.sqrt(zdisp)
    print("sigma,FWHM:",sigma,sigma*2.355)

Note the conversion factor from $\sigma$ to FWHM is $2\sqrt{2\ln{2}} \approx 2.355$, see also
https://en.wikipedia.org/wiki/Full_width_at_half_maximum

In [None]:
# noisy spectra can easily result in bogus values for mean and dispersion. Let's try something else:
imax = flux.argmax()
print("Max at %d: %g %g" % (imax, velz[imax], flux[imax]))
nn = 3
flux1 = flux[imax-nn:imax+nn]
velz1 = velz[imax-nn:imax+nn]

tmp1 = flux1*velz1 
tmp2 = flux1*velz1*velz1
zmean1 = tmp1.sum()/flux1.sum()
zdisp1 = tmp2.sum()/flux1.sum() - zmean1*zmean1

print("mean,var:",zmean1,zdisp1)
if zdisp1 > 0:
    sigma1 = math.sqrt(zdisp1)
    print("sigma,FWHM:",sigma1,sigma1*2.355)

The [scipy](https://docs.scipy.org/doc/scipy/reference/) module has a large number of optimization and fitting routines that work with numpy arrays. They directly call lower level C routines, generally making fitting a fast process.

To fit an actual gaussian , instead of using moments, we use the [curve_fit](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html#scipy.optimize.curve_fit)
function in [scipy.optimize](https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html):


In [None]:
from scipy.optimize import curve_fit

def gauss(x, *p):
    # if len(p) != 3: raise ValueError("Error, found %d, (%s), need 3" % (len(p),str(p)))
    A, mu, sigma = p
    return A*np.exp(-(x-mu)**2/(2.*sigma**2)) 

# p0 is the initial guess for the fitting coefficients (A, mu and sigma above, in that order)
# does it matter what the initial conditions are?
p0 = [0.01, 10, 10]
#p0 = [0.004, 24, 6]
#p0 = [0.007, 100, 22, 0]

coeff, cm = curve_fit(gauss, velz, flux, p0=p0)
flux_fit = gauss(velz, *coeff)

plt.plot(velz,flux,     label='Data')
plt.plot(velz,flux_fit, label='Fit')
plt.legend()

print("Fitted amp            :",coeff[0])
print("Fitted mean           :",coeff[1])
print("Fitted sigma and FWHM :",coeff[2], coeff[2]*2.355)
print("Covariance Matrix     :\n",cm)
# what are now the errors in the fitted values?
print("error amp:",math.sqrt(cm[0][0]))
print("error mean:",math.sqrt(cm[1][1]))
print("error sigma:",math.sqrt(cm[2][2]))

Q1:  extend the gauss model to have a baseline that is not 0.

# PySpecKit

PySpecKit is an extensible spectroscopic analysis toolkit for astronomy. See
http://pyspeckit.bitbucket.org/html/sphinx/index.html

Installing this with
```
    pip install pyspeckit
```
resulted in an error 
```
    AttributeError: module 'distutils.config' has no attribute 'ConfigParser'
```
turns out this was a python2/3 hack that was needed. The current released version of pyspeckit did not handle this. A manual install of the development release solved this, although a update to pip may be needed as well:
```
    pip install --upgrade pip
    pip install https://bitbucket.org/pyspeckit/pyspeckit/get/master.tar.gz
```


### modify to run

The cell below is an adapted version of a gaussian fit case from the pySpecKit manual.  By default, this will create some known data with noise. Copy the cell and change it to make it work with our spectrum from n6503.  How does it compare to scipy's curve_fit ?   Note that in this method initial conditions are generated to help in a robust way of fitting the gauss.

In [None]:
import pyspeckit

# set up a gauss (amp,center,sigma)
xaxis = np.linspace(-50.0,150.0,100)
amp = 1.0
sigma = 10.0
center = 50.0
synth_data = amp*np.exp(-(xaxis-center)**2/(sigma**2 * 2.))

# Add 10% noise (but fix the random seed)
np.random.seed(123)
stddev = 0.1
noise = np.random.randn(xaxis.size)*stddev
error = stddev*np.ones_like(synth_data)
data = noise+synth_data

# this will give a "blank header" warning, which is fine
sp = pyspeckit.Spectrum(data=data, error=error, xarr=xaxis,
                        xarrkwargs={'unit':'km/s'},
                        unit='erg/s/cm^2/AA')

sp.plotter()

# Fit with automatic guesses
sp.specfit(fittype='gaussian')

# Fit with input guesses
# The guesses initialize the fitter
# This approach uses the 0th, 1st, and 2nd moments
amplitude_guess = data.max()
center_guess = (data*xaxis).sum()/data.sum()
width_guess = data.sum() / amplitude_guess / np.sqrt(2*np.pi)
guesses = [amplitude_guess, center_guess, width_guess]
sp.specfit(fittype='gaussian', guesses=guesses)

sp.plotter(errstyle='fill')
sp.specfit.plot_fit()