## Notes on Minnow

This notebook collects some notes on how the pulsar timing code [minnow](https://github.com/meyers-academic/minnow/tree/main) works, based on helpful input from Pat Meyers. Some of the components of `minnow` will be essential if we are to run the Kalman state-space algorithms on real PTA data with `Argus`

For all that follows we will assume that there is just a single pulsar, $N_{\rm psr} =1$. The extension to general $N_{\rm psr}$ should be straightforward. 

For the purposes of running state-space algorithms, we need to define the following:

* $\boldsymbol{X}$ : the state vector, dimension $n_X$
* $\boldsymbol{Y}$ : the observation vector, dimension $n_Y$
* $\boldsymbol{F}$ : the state transition matrix, dimension $n_X$ $\times$ $n_X$
* $\boldsymbol{H}$ : the measurement matrix, dimension $n_Y$ $\times$ $n_X$
* $\boldsymbol{Q}$ : the process noise covariance matrix, dimension $n_X$ $\times$ $n_X$
* $\boldsymbol{Q}$ : the measurement noise covariance matrix, dimension $n_Y$ $\times$ $n_Y$

### 1. Observations

The 'raw' pulsar data comes in the form of a `.toa` file and a `.par` file.

The `.toa` gives you the pulse times of arrival, the `.par` file gives the best guess of some of the pulsar parameters such as its spin frequency, position on the sky, etc. Note that these parameters are very well known a-priori. 

The `.toa` and the `.par` get passed through a timing software like `TEMPO` or `PINT` to produce **timing residuals**

**Question 1.1: I am not eactly clear on the how different observation frequency bands come into play here, but I don't think it is a problem for us. One can imagine getting multiple `.toa` files are different observation frequencies, and producing different timing residuals**.

For now I am just going to use the heuristic that there is a single data file that holds TOAs and timing residuals at different frequencies. This seems to be how the `.feather` data files are structured in `minnow`, see e.g. code block below 


In [12]:
import pyarrow.feather as feather
df = feather.read_feather('../data/data_from_minnow/v1p1_de440_pint_bipm2019-B1855+09.feather')
df


Unnamed: 0,toas,stoas,toaerrs,residuals,freqs,backend_flags,Mmat_0,Mmat_1,Mmat_2,Mmat_3,...,flags_wt,flags_flux,flags_fluxe,flags_proc,flags_pta,flags_ver,flags_to,flags_pout_gibbs,flags_clkcorr,flags_simul
0,4.610194e+09,4.610194e+09,1.212000e-06,-1.570598e-06,1386.037543,L-wide_ASP,0.005362,1.318638e+06,-1.621386e+14,-1.0,...,2.6903,3.25644,0.021,15y,NANOGrav,2021.08.25-9d8d617,-0.839e-6,0.012491908440860854,2.630754008821355e-05,
1,4.610194e+09,4.610194e+09,1.017600e-05,-1.715160e-06,1410.038193,L-wide_ASP,0.005362,1.318638e+06,-1.621386e+14,-1.0,...,1.086,1.02708,0.053,15y,NANOGrav,2021.08.25-9d8d617,-0.839e-6,0.012905991732230643,2.6307540088161908e-05,
2,4.610194e+09,4.610194e+09,1.705000e-06,-8.360033e-07,1422.038518,L-wide_ASP,0.005362,1.318638e+06,-1.621386e+14,-1.0,...,2.5761,2.44634,0.023,15y,NANOGrav,2021.08.25-9d8d617,-0.839e-6,0.012457192575780427,2.630754008813714e-05,
3,4.610194e+09,4.610194e+09,9.520000e-07,-5.140002e-07,1414.038302,L-wide_ASP,0.005362,1.318638e+06,-1.621386e+14,-1.0,...,2.8314,4.10634,0.022,15y,NANOGrav,2021.08.25-9d8d617,-0.839e-6,0.01234654947538391,2.630754008815364e-05,
4,4.610194e+09,4.610194e+09,1.968000e-06,-2.606156e-08,1402.037977,L-wide_ASP,0.005362,1.318638e+06,-1.621386e+14,-1.0,...,2.6525,2.10044,0.022,15y,NANOGrav,2021.08.25-9d8d617,-0.839e-6,0.01288848991589095,2.6307540088178913e-05,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7753,5.102021e+09,5.102021e+09,2.059000e-06,8.940844e-07,1722.979881,L-wide_PUPPI,0.005362,-1.318589e+06,-1.621266e+14,-1.0,...,15.068,0.673438,0.0079,15y,NANOGrav,2021.08.25-9d8d617,,0.01296979848578532,2.776931610305224e-05,
7754,5.102021e+09,5.102021e+09,6.560000e-07,1.781750e-06,1686.731252,L-wide_PUPPI,0.005362,-1.318589e+06,-1.621266e+14,-1.0,...,11.499,3.57581,0.013,15y,NANOGrav,2021.08.25-9d8d617,,0.012244980649401821,2.776931610308031e-05,
7755,5.102021e+09,5.102021e+09,1.982000e-06,2.220510e-06,1674.821045,L-wide_PUPPI,0.005362,-1.318589e+06,-1.621266e+14,-1.0,...,10.409,1.12112,0.013,15y,NANOGrav,2021.08.25-9d8d617,,0.012503069874448748,2.7769316103089936e-05,
7756,5.102021e+09,5.102021e+09,6.450000e-07,1.934482e-06,1749.682344,L-wide_PUPPI,0.005362,-1.318589e+06,-1.621266e+14,-1.0,...,8.843,4.59824,0.017,15y,NANOGrav,2021.08.25-9d8d617,,0.012712839123538945,2.7769316103032728e-05,


The observation $\boldsymbol{Y}$ vector is a vector of timing residuals $\delta t$ where the dimension $n_Y$ is set by the number of observation frequency bands, e.g. 

$$\boldsymbol{Y} = \left [\delta t(\nu_1),\delta t(\nu_2),\delta t(\nu_3),\dots \right ]$$

Note that at a given timestep there may be missing data at a particular frequency band, but the Kalman filter can handle this no problem. Similarly, different pulsars may be observed at different times, but this can also be handled by the algorithm straightforwardly. 

### 2. Hidden states