# Comparing Spectra

This notebooks demonstrates how you can use the χ² metric to compare spectra.

In [None]:
using NeXLSpectrum
using DataFrames, Gadfly, InvertedIndices

In [None]:
k412 = [ loadspectrum(joinpath(@__DIR__, "..","test","K412 spectra","III-E K412[$i][4].msa")) for i in 0:4 ]

In [None]:
det = matching(k412[1], 132.0)

In [None]:
set_default_plot_size(8inch, 4inch)
plot(k412..., xmax=10.0e3)

I'll present two different ways to compare spectra.
  * Direct spectrum to spectrum comparison (`χ²(...)`)
  * Comparing a spectrum to the sum of the other spectra (`measure_dissimilarity(...)`).

First, `χ²`.  This metric will be approximately unity when the spectra differ only by count statistics.  

Overall, the spectra compare very well one to another.  The largest `χ²` metric is 1.10 when comparing one spectrum to another over large ROI.

In [None]:
fullroi = channel(100.0, k412[1]):channel(10.0e3, k412[1])
χ²(k412, fullroi)

However, individual peaks can compare less well.

In [None]:
χ²(k412, NeXLSpectrum.fwhmroi(k412[1], n"Si K-L3"))

In [None]:
χ²(k412, NeXLSpectrum.fwhmroi(k412[1], n"Fe K-L3"))

In [None]:
χ²(k412, NeXLSpectrum.fwhmroi(k412[1], n"O K-L3"))

In [None]:
χ²(k412, NeXLSpectrum.fwhmroi(k412[1], n"Mg K-L3"))

However, the `χ²` matrices can be hard to interpret.  Which spectrum is the "problem child"?   What we really want to know is how each spectrum compares with the mean of the others.

We want to retain the spectra that are most similar to the mean.  That is what `measure_dissimilarity(...)` is used for.

In [None]:
NeXLSpectrum.measure_dissimilarity(k412, det, n"O")

We expect a bit of variation in O since the soft X-ray is quite susceptible to absorption and topography.  

Let's remove spectra 1 and 4 and see what happens.

In [None]:
NeXLSpectrum.measure_dissimilarity(k412[[false, true, true, false, true]], det, n"O")

As we increase the X-ray energy, the variability decreases.

In [None]:
NeXLSpectrum.measure_dissimilarity(k412, det, n"Mg"), NeXLSpectrum.measure_dissimilarity(k412[[true, false, true, true, false]], det, n"Mg")

In [None]:
NeXLSpectrum.measure_dissimilarity(k412, det, n"Al"), NeXLSpectrum.measure_dissimilarity(k412[[true, true, false, true, false]], det, n"Al")

In [None]:
NeXLSpectrum.measure_dissimilarity(k412, det, n"Si"), NeXLSpectrum.measure_dissimilarity(k412[[false, true, true, false, true]], det, n"Si")

In [None]:
NeXLSpectrum.measure_dissimilarity(k412, det, n"Ca"), NeXLSpectrum.measure_dissimilarity(k412[[true, true, true, true, false]], det, n"Ca")

In [None]:
NeXLSpectrum.measure_dissimilarity(k412, det, n"Fe"), NeXLSpectrum.measure_dissimilarity(k412[[true, true, true, false, true]], det, n"Fe")

Let's try applying these functions to a spectrum that we know should compare well since they represent sub-samplings of the same source.

  * `subdivide(...)` takes a single spectrum and distributes the counts at random among N spectra creating N spectra that sums to the original spectrum.
  * `subsample(...)` takes a single spectrum and emulates taking a fraction of the same live-time.  The results won't necessarily sum to the original.

In [None]:
sd=mapreduce(_->subdivide(k412[2], 8), append!, 1:6)

In [None]:
describe(DataFrame(
    :Spectrum=>eachindex(sd),
    [ Symbol(symbol(elm))=>NeXLSpectrum.measure_dissimilarity(sd, det, elm) for elm in [n"O", n"Mg", n"Al", n"Si", n"Ca", n"Fe"] ]...
))

In [None]:
sd2=mapreduce(_->map(i->subsample(k412[1], 0.1),1:8),append!,1:10)

In [None]:
describe(DataFrame(
    :Spectrum=>eachindex(sd2),
    [ Symbol(symbol(elm))=>NeXLSpectrum.measure_dissimilarity(sd2, det, elm) for elm in [n"O", n"Mg", n"Al", n"Si", n"Ca", n"Fe"] ]...
))

Interestingly, these are consistently slightly less than unity?  Why?