# Model comparison / model selection
## using the Akaike Information Criterion and Bayesian Information Criterion

In [None]:
# import all packages we will need in this notebook
import pandas
import matplotlib.pyplot as plt
import numpy as np
import scipy

In [None]:
# read in some data we created for this example (.dat is a generic filename, it's just a text file)
data_filename='https://raw.githubusercontent.com/uofscphysics/STEM_Python_Course/Summer2020/02_Week2/Data/1D_intro_examples.dat'
example_data_1D = pandas.read_csv(data_filename,sep=',',header=0)#this file is separated by spaces and its first line contains the names of the columns (header) 
print(example_data_1D.head())

In [None]:
#Let's plot the data, with error bars, that we read from file (See Day 2)
plt.errorbar(example_data_1D['x'], #x,y,and error are the column names
             example_data_1D['y'], 
             yerr=example_data_1D['error'],#yerr denotes an error in the y-direction for plotting
             fmt='.') #fmt is "format", saying that I want data marked by "points"
plt.xlabel('Days since I left the honey jar out') #set the x-axis label 
plt.ylabel('Number of ants') #set the y-axis label
plt.show()

In [None]:
#The data were generated with a simple quadratic equation:
#ax^2+bx+c. 

# FILL IN :  define three models, a quadratic, cubic and exponential

# def modelA_quadratic
#    """A quadratic in x (this happens to be the true model)"""

#def modelB_cubic
#    """A third-order polynomial model."""
    
#def modelC_exponential
#    """An exponential model"""



In [None]:
# For Model A define a function that returns 
# the negative log likelihood:  -ln( p(Model|theta,data) )

#def neg_ln_likelihoodA_quadratic(theta, args):
#    """ This function accepts an argument "theta", which is 
#    a list of parameter values [alpha,beta,gamma] for model A.
#    It then calculates a log-likelihood by computing the 
#    chi-squared statistic (i.e., assuming gaussian uncertainties), 
#    which compares the observations and errors (provided in args) 
#    to the model A.
#    """   

# FILL IN THE FUNCTION DEFINITION HERE

In [None]:
# Repeat for Model B
 
# def neg_ln_likelihoodB_cubic(theta, args):


In [None]:
# Repeat for Model C
# def neg_ln_likelihoodC_exponential(theta, args):


Let's have a look at what these models look like, using some parameters that are in the right ball park

In [None]:
# Make a function that plots the data (as above) and 
# all three models (given some single set of parameters for each)


#def plot_three_models(thetaA, thetaB, thetaC):


In [None]:
# Show us a plot : 



In [None]:
# Use the scipy.optimize.minimize function to
# do a maximum likelihood estimation 
# (equivalent to chi2 minimization) 
# to find the best parameters for each model

# FILL IN THE ARGUMENTS TO EACH CALL OF MINIMIZE()

# maxlike_resultA = scipy.optimize.minimize( )

# maxlike_resultB = scipy.optimize.minimize( )

# maxlike_resultC = scipy.optimize.minimize( )


In [None]:
# Print the results (the minimum ln(p))

In [None]:
# Use the plotting function defined above to show 
# a plot of the three models with their 'best-fit' parameters


# Model comparison :  the AIC and BIC 

The Akaike information criterion is defined as:

### AIC = 2 k - 2 ln(L)

it balances a model's ability to fit the data (measured by the maximum likelihood value L) against the number of parameters 'k' that the model requires.  A smaller value of the AIC indicates a better model (i.e., one that matches the data well, without being unnecessarily complex).

The Bayesian information criterion is very similar. It replaces the 2 in the first term with ln(n), where n is the number of data points.  This puts more weight on the first term (which penalizes complexity) when the size of the sample is large.  As with the AIC, smaller is better.

### BIC = k ln( n ) - 2 ln( L )

These two metrics are the most commonly used, but many others exist, with subtle differences in their properties.  One should take care to apply the appropriate criteria based on the data, the models, and the problem.

In [None]:
# Let us define a function that computes the AIC and BIC for each of our three models



In [None]:
# Compute each max likelihood value and report.  

# Remember that we have found the minimum
# of the negative log likelihood for each
# function. This is reported as the 'fun'
# entry in our set of results from the 
# scipy.optimize.minimize() function calls.

# The opposite of that minimum is our maximum log likelihood.



### Compute the AIC and BIC for each model

In [None]:
# Collect them into 3-element lists for convenience


In [None]:
# Make a nice pandas DataFrame table 


In [None]:
# Compute the AIC and BIC Weights :  e^(-0.5*deltaAIC)


# Compute the AIC / BIC odds ratios :  weight_max / weight_i


In [None]:
# print the updated pandas table


### next topic :  using the Bayesian evidence (Bayes factors) to compare models considering the entire parameter space

#### Reading list

A good broad book on Bayesian data analysis
* Sivia, D. and Skilling, J. "Data Analysis: A Bayesian Tutorial"
https://books.google.com/books/about/Data_Analysis.html?id=lYMSDAAAQBAJ

Some summary papers: 

* Wagenmakers and Farrell 2004
https://link.springer.com/content/pdf/10.3758/BF03206482.pdf

* Symonds and Moussalli 2010
http://byrneslab.net/classes/biol607/readings/Symonds_and_Moussalli_2010_behav_ecol.pdf