RMS quality control check for Non-LOFAR images #512

AntoniaR · 2016-05-31T08:43:13Z

The RMS noise check has proven to be the most powerful method for identifying bad images. This method works best by modelling the typical RMS noise in the dataset and then using a sigma threshold.

TraP now stores this number for all images, making it fairly straight forward to implement a check for non-LOFAR images. For LOFAR images, we use a comparison to the theoretical noise but this is not appropriate for non-LOFAR images.

Enable image rejection based on a maximum and minimum RMS value (could potentially be the first step in implementing, with the max and min values given in the pipeline configuration file)
In off-line mode, TraP measures the RMS values of all the images at the start of the pipeline run. Then it could fit these numbers with a Gaussian and conduct a +/- sigma clip, where the sigma value is given in the pipeline configuration file. The mean value and actual thresholds used will then need storing in the database and quoted on Banana.
In an on-line mode, this would not work, but maybe then only use step 1 with a future extension to dynamically model the distribution as images are inserted.
Future extension - enable a sky region specific threshold. But this would require the thresholds used for each sky region to be stored somewhere in the database.

As this is the most useful check, I have already written an off-line version of this strategy which works really well for MWA data and Yvette has used on AARTFAAC data. It produces an images_to_process.py file that can be read by TraP and a plot of the typical RMS noise values. I will upload the most recent version of this script to GitHub later today.

AntoniaR · 2016-05-31T10:11:29Z

I've just uploaded my standalone script which conducts this test here: https://github.com/transientskp/scripts/tree/master/TraP_fits_QC

gijzelaerr · 2016-06-23T10:54:48Z

The script contains an error on line 38 and it does not run. sigma is not defined. https://github.com/transientskp/scripts/blob/master/TraP_fits_QC/TraP_fits_QC.py#L38

gijzelaerr · 2016-06-23T10:57:23Z

what do you mean with 'use identical value to that in TraP settings' for f?

AntoniaR · 2016-06-23T11:06:43Z

Will look at error when I find a moment later, the version I use on Struis works fine.

In the TraP job_params.cfg when I wrote this code there was a parameter just known as "f". Fairly recently you or Tim improved the cfg file and "f" was renamed as "rms_est_fraction".

AntoniaR · 2016-07-01T14:02:05Z

When the option on the command line is set to F (my usual setting) it runs completely fine so the main functionality of the script (i.e. how it produces / fits / filters) that is required for this issue are there.

The thing that is broken is the loop that repeats the fitting / filtering separately on individual frequency bands - important for this issue but maybe you have a more optimal method for this anyway.

I can only work on this for a short time longer today, but will attempt to fix the = T setting and upload it to github before I leave.

AntoniaR · 2016-07-01T14:14:21Z

Ok... Fully working version uploaded.

It was actually a really simple fix, this line https://github.com/transientskp/scripts/blob/master/TraP_fits_QC/TraP_fits_QC.py#L38 was just meant to be a duplicate of this line https://github.com/transientskp/scripts/blob/master/TraP_fits_QC/TraP_fits_QC.py#L48
I changed how I stored the information at one point (array to dictionary) as it made life easier but forgot to filter the change through.

gijzelaerr · 2016-07-18T12:33:08Z

Ok, so for educational purposes I post this here. It took me a while to figure out what was required or not but I think in the end everything can reduced to a couple of lines. Basically you can use scipy.stats.norm to calculate mu and sigma values, which you can then use to calculate t_low and t_high (reject threshold values) which you can then use to make database queries. This looks like this:

from tkp.db.model import Image
from scipy.stats import norm
from sqlalchemy import not_

def reject_bad_rms(sqlalchemy_session, dataset_id, est_sigma):
    images = sqlalchemy_session.query(Image.rms_qc).filter(Image.dataset_id == dataset_id).all()
    mu, sigma = norm.fit(images)
    t_low = mu - sigma * est_sigma
    t_high = mu + sigma * est_sigma
    bad_images = sqlalchemy_session.query(Image).filter(not_(Image.rms_qc.between(t_low, t_high))).all()

gijzelaerr · 2016-07-21T10:30:42Z

I've added the code that does a RMS check per image based on RMS for the full dataset:

https://github.com/transientskp/tkp/pull/518/files#diff-9a872d7b6b3ffd498e95b8ec585d9166R83

Question now is, how do we integrate this properly? We just rewrote the internal main logic of TraP to support streaming data, and process data in steps per timestep. This is in conflict with this feature request I realize now. To solve this we could make a different main flow for batch processing and stream processing, where in the case of batch processing first all images are inserted in the database, then a quality check is ran and then the rest of TraP follows. I'm not really sure if that is the smart way to go, since I'm afraid the two flows will diverge too much and decrease maintainability. @timstaley what do you think?

AntoniaR · 2016-10-19T13:53:10Z

Is there any progress on this issue? From what I understand, the RMS clip is also really useful for AARTFAAC as well as all the other non-LOFAR telescopes. So, as a minimum, can we get my original option 1 implemented ASAP?

Enable image rejection based on a maximum and minimum RMS value (could potentially be the first step in implementing, with the max and min values given in the pipeline configuration file)

gijzelaerr · 2016-10-19T13:57:04Z

this has been implemented in banana and has been waiting your feedback for a while now. If the numbers seem to make sense I can try to implement it into TraP, but there is not a lot of time left me working at API. I don't really like hacking features into TraP last minute.

gijzelaerr · 2016-10-19T14:00:11Z

correcting myself, the gaussian histogram fitting has been implemented. If the numbers and graphs make sense that could become the quality check.

AntoniaR · 2016-10-19T14:03:23Z

I had already looked at the Banana outputs and said it was fine and closed the issue (transientskp/banana#71). However, this quality check needs implementing in TraP and not Banana as we do not want to be running source extraction, source association or updating runcat source parameters using the images that fail quality control.

But my understanding is that in Banana it is the full fitting strategy and not the strategy that would work in a streaming mode for AARTFAAC images. My option 1 would work for AARTFAAC images and could be improved in the longer term.

gijzelaerr · 2016-10-19T14:23:33Z

I'm quite sure you didn't look at the banana output, this issue is about
the notebook.

Op 19 okt. 2016 4:06 p.m. schreef "Antonia Rowlinson" <
notifications@github.com>:

I had already looked at the Banana outputs and said it was fine and closed
the issue (transientskp/banana#71
transientskp/banana#71). However, this
quality check needs implementing in TraP and not Banana as we do not want
to be running source extraction, source association or updating runcat
source parameters using the images that fail quality control.

But my understanding is that in Banana it is the full fitting strategy and
not the strategy that would work in a streaming mode for AARTFAAC images.
My option 1 would work for AARTFAAC images and could be improved in the
longer term.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#512 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAT6pCEu02T2K3FkBGakbJJGSntjfDJ2ks5q1iMtgaJpZM4IqTZD
.

AntoniaR · 2016-10-19T14:25:19Z

I looked at Banana and did not use the notebook as it was not necessary.

gijzelaerr · 2016-10-19T14:33:03Z

Ok in that case the numbers are probably sound and will use this method to
build the QC.

Op 19 okt. 2016 4:26 p.m. schreef "Antonia Rowlinson" <
notifications@github.com>:

I looked at Banana and did not use the notebook as it was not necessary.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#512 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAT6pHDo2HhlIYRPtXp1AZt3blCg0tBTks5q1ihPgaJpZM4IqTZD
.

AntoniaR · 2018-07-11T14:40:15Z

I have now confirmed this is roughly working but has some issues for users. These changes to be made for R4.1:

if an image already fails this test

tkp/tkp/quality/rms.py

Line 109 in 0e117a1

if not rms_min < image.rms_qc < rms_max:

it should be rejected immediately, this will save time as we don't need to query the image database or refit the histogram. It will also stop us wasting time fitting sources etc in the image. I recommend moving lines 109-111 up to line 99
we need an option to turn off the history and sigma clipping of a Gaussian. For instance there may be datasets where we want to specify specific rms_max and rms_min value. I suggest around line 99 (after the initial rejection in point 1) setting an option like:
if history == 0: return false
Then only the global max/min thresholds are applied and we can specify it easily in the existing job_params.cfg file
est_sigma is currently using the sigma clipping value used for calculating the rms (typically = 4). This needs changing as the sigma clipping value and the image rejection sigma value are typically not the same. i.e. my standard settings are est_sigma=4 and image rejection sigma=2. This should be a new user defined parameter in the job_params.cfg (called maybe img_rej_sigma?) file with a default value of 2.

tkp/tkp/quality/rms.py

Lines 106 to 107 in 0e117a1

t_low = mu - sigma * est_sigma

t_high = mu + sigma * est_sigma

AntoniaR · 2020-07-15T08:38:12Z

Fixing the set_sigma and getting this fully working is a goal for Release 5.0

AntoniaR · 2021-07-26T10:16:44Z

Started initial work on this.

Introduced a new parameter in job_params.cfg to reject images based on the distribution of the image rms values from the histogram. Default is 3 sigma, i.e. an image whose rms is more (or less) than 3 sigma from the mean of the rms of the other images will be rejected:

rms_rej_sigma = 3	     ; threshold for rejecting images using rms histogram

The code is in place in branch Issue512 but needs testing. https://github.com/transientskp/tkp/tree/Issue512

AntoniaR · 2021-08-02T10:00:24Z

This issue is completed for Streaming mode and can be closed with the pull request #591

AntoniaR assigned gijzelaerr and AntoniaR May 31, 2016

AntoniaR added the priority high label May 31, 2016

gijzelaerr mentioned this issue Jun 2, 2016

Add more AARTFAAC quality control checks #420

Closed

gijzelaerr added upcoming discussion and removed upcoming discussion labels Jun 2, 2016

gijzelaerr added this to the 3.3 milestone Jun 23, 2016

gijzelaerr mentioned this issue Jun 23, 2016

Quality Control for non-LOFAR images #412

Closed

AntoniaR mentioned this issue Jun 23, 2016

Histogram displaying typical RMS noise properties of images transientskp/banana#71

Closed

gijzelaerr mentioned this issue Jul 18, 2016

RMS check based on avera dataset RMS #518

Merged

gijzelaerr modified the milestones: 4.1, 3.3 Aug 5, 2016

AntoniaR assigned mkuiack Oct 19, 2016

gijzelaerr removed their assignment Feb 6, 2017

AntoniaR modified the milestones: 4.1, 5.0 Jul 15, 2020

AntoniaR mentioned this issue Jul 15, 2020

restructure quality checks to support other telescopes #349

Closed

This was referenced Jul 29, 2021

Quality check not being completed correctly on images #588

Open

Introduction of new threshold for rms histogram and fixing errors #589

Merged

Fixing the quality control tests for all images #591

Merged

AntoniaR linked a pull request Aug 2, 2021 that will close this issue

Fixing the quality control tests for all images #591

Merged

DavidRuhe closed this as completed in #591 Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMS quality control check for Non-LOFAR images #512

RMS quality control check for Non-LOFAR images #512

AntoniaR commented May 31, 2016

AntoniaR commented May 31, 2016

gijzelaerr commented Jun 23, 2016

gijzelaerr commented Jun 23, 2016

AntoniaR commented Jun 23, 2016

AntoniaR commented Jul 1, 2016

AntoniaR commented Jul 1, 2016

gijzelaerr commented Jul 18, 2016 •

edited

gijzelaerr commented Jul 21, 2016

AntoniaR commented Oct 19, 2016

gijzelaerr commented Oct 19, 2016

gijzelaerr commented Oct 19, 2016

AntoniaR commented Oct 19, 2016

gijzelaerr commented Oct 19, 2016

AntoniaR commented Oct 19, 2016

gijzelaerr commented Oct 19, 2016

AntoniaR commented Jul 11, 2018

AntoniaR commented Jul 15, 2020

AntoniaR commented Jul 26, 2021

AntoniaR commented Aug 2, 2021

RMS quality control check for Non-LOFAR images #512

RMS quality control check for Non-LOFAR images #512

Comments

AntoniaR commented May 31, 2016

AntoniaR commented May 31, 2016

gijzelaerr commented Jun 23, 2016

gijzelaerr commented Jun 23, 2016

AntoniaR commented Jun 23, 2016

AntoniaR commented Jul 1, 2016

AntoniaR commented Jul 1, 2016

gijzelaerr commented Jul 18, 2016 • edited

gijzelaerr commented Jul 21, 2016

AntoniaR commented Oct 19, 2016

gijzelaerr commented Oct 19, 2016

gijzelaerr commented Oct 19, 2016

AntoniaR commented Oct 19, 2016

gijzelaerr commented Oct 19, 2016

AntoniaR commented Oct 19, 2016

gijzelaerr commented Oct 19, 2016

AntoniaR commented Jul 11, 2018

AntoniaR commented Jul 15, 2020

AntoniaR commented Jul 26, 2021

AntoniaR commented Aug 2, 2021

gijzelaerr commented Jul 18, 2016 •

edited