Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMS quality control check for Non-LOFAR images #512

Closed
AntoniaR opened this issue May 31, 2016 · 19 comments · Fixed by #591
Closed

RMS quality control check for Non-LOFAR images #512

AntoniaR opened this issue May 31, 2016 · 19 comments · Fixed by #591
Assignees
Milestone

Comments

@AntoniaR
Copy link
Contributor

The RMS noise check has proven to be the most powerful method for identifying bad images. This method works best by modelling the typical RMS noise in the dataset and then using a sigma threshold.

TraP now stores this number for all images, making it fairly straight forward to implement a check for non-LOFAR images. For LOFAR images, we use a comparison to the theoretical noise but this is not appropriate for non-LOFAR images.

  1. Enable image rejection based on a maximum and minimum RMS value (could potentially be the first step in implementing, with the max and min values given in the pipeline configuration file)
  2. In off-line mode, TraP measures the RMS values of all the images at the start of the pipeline run. Then it could fit these numbers with a Gaussian and conduct a +/- sigma clip, where the sigma value is given in the pipeline configuration file. The mean value and actual thresholds used will then need storing in the database and quoted on Banana.
  3. In an on-line mode, this would not work, but maybe then only use step 1 with a future extension to dynamically model the distribution as images are inserted.
  4. Future extension - enable a sky region specific threshold. But this would require the thresholds used for each sky region to be stored somewhere in the database.

As this is the most useful check, I have already written an off-line version of this strategy which works really well for MWA data and Yvette has used on AARTFAAC data. It produces an images_to_process.py file that can be read by TraP and a plot of the typical RMS noise values. I will upload the most recent version of this script to GitHub later today.

@AntoniaR
Copy link
Contributor Author

I've just uploaded my standalone script which conducts this test here: https://github.com/transientskp/scripts/tree/master/TraP_fits_QC

@gijzelaerr
Copy link
Collaborator

The script contains an error on line 38 and it does not run. sigma is not defined. https://github.com/transientskp/scripts/blob/master/TraP_fits_QC/TraP_fits_QC.py#L38

@gijzelaerr
Copy link
Collaborator

what do you mean with 'use identical value to that in TraP settings' for f?

@AntoniaR
Copy link
Contributor Author

Will look at error when I find a moment later, the version I use on Struis works fine.

In the TraP job_params.cfg when I wrote this code there was a parameter just known as "f". Fairly recently you or Tim improved the cfg file and "f" was renamed as "rms_est_fraction".

@AntoniaR
Copy link
Contributor Author

AntoniaR commented Jul 1, 2016

When the option on the command line is set to F (my usual setting) it runs completely fine so the main functionality of the script (i.e. how it produces / fits / filters) that is required for this issue are there.

The thing that is broken is the loop that repeats the fitting / filtering separately on individual frequency bands - important for this issue but maybe you have a more optimal method for this anyway.

I can only work on this for a short time longer today, but will attempt to fix the = T setting and upload it to github before I leave.

@AntoniaR
Copy link
Contributor Author

AntoniaR commented Jul 1, 2016

Ok... Fully working version uploaded.

It was actually a really simple fix, this line https://github.com/transientskp/scripts/blob/master/TraP_fits_QC/TraP_fits_QC.py#L38 was just meant to be a duplicate of this line https://github.com/transientskp/scripts/blob/master/TraP_fits_QC/TraP_fits_QC.py#L48
I changed how I stored the information at one point (array to dictionary) as it made life easier but forgot to filter the change through.

@gijzelaerr
Copy link
Collaborator

gijzelaerr commented Jul 18, 2016

Ok, so for educational purposes I post this here. It took me a while to figure out what was required or not but I think in the end everything can reduced to a couple of lines. Basically you can use scipy.stats.norm to calculate mu and sigma values, which you can then use to calculate t_low and t_high (reject threshold values) which you can then use to make database queries. This looks like this:

from tkp.db.model import Image
from scipy.stats import norm
from sqlalchemy import not_

def reject_bad_rms(sqlalchemy_session, dataset_id, est_sigma):
    images = sqlalchemy_session.query(Image.rms_qc).filter(Image.dataset_id == dataset_id).all()
    mu, sigma = norm.fit(images)
    t_low = mu - sigma * est_sigma
    t_high = mu + sigma * est_sigma
    bad_images = sqlalchemy_session.query(Image).filter(not_(Image.rms_qc.between(t_low, t_high))).all()

@gijzelaerr
Copy link
Collaborator

I've added the code that does a RMS check per image based on RMS for the full dataset:

https://github.com/transientskp/tkp/pull/518/files#diff-9a872d7b6b3ffd498e95b8ec585d9166R83

Question now is, how do we integrate this properly? We just rewrote the internal main logic of TraP to support streaming data, and process data in steps per timestep. This is in conflict with this feature request I realize now. To solve this we could make a different main flow for batch processing and stream processing, where in the case of batch processing first all images are inserted in the database, then a quality check is ran and then the rest of TraP follows. I'm not really sure if that is the smart way to go, since I'm afraid the two flows will diverge too much and decrease maintainability. @timstaley what do you think?

@gijzelaerr gijzelaerr modified the milestones: 4.1, 3.3 Aug 5, 2016
@AntoniaR
Copy link
Contributor Author

Is there any progress on this issue? From what I understand, the RMS clip is also really useful for AARTFAAC as well as all the other non-LOFAR telescopes. So, as a minimum, can we get my original option 1 implemented ASAP?

Enable image rejection based on a maximum and minimum RMS value (could potentially be the first step in implementing, with the max and min values given in the pipeline configuration file)

@gijzelaerr
Copy link
Collaborator

this has been implemented in banana and has been waiting your feedback for a while now. If the numbers seem to make sense I can try to implement it into TraP, but there is not a lot of time left me working at API. I don't really like hacking features into TraP last minute.

@gijzelaerr
Copy link
Collaborator

correcting myself, the gaussian histogram fitting has been implemented. If the numbers and graphs make sense that could become the quality check.

@AntoniaR
Copy link
Contributor Author

I had already looked at the Banana outputs and said it was fine and closed the issue (transientskp/banana#71). However, this quality check needs implementing in TraP and not Banana as we do not want to be running source extraction, source association or updating runcat source parameters using the images that fail quality control.

But my understanding is that in Banana it is the full fitting strategy and not the strategy that would work in a streaming mode for AARTFAAC images. My option 1 would work for AARTFAAC images and could be improved in the longer term.

@gijzelaerr
Copy link
Collaborator

I'm quite sure you didn't look at the banana output, this issue is about
the notebook.

Op 19 okt. 2016 4:06 p.m. schreef "Antonia Rowlinson" <
notifications@github.com>:

I had already looked at the Banana outputs and said it was fine and closed
the issue (transientskp/banana#71
transientskp/banana#71). However, this
quality check needs implementing in TraP and not Banana as we do not want
to be running source extraction, source association or updating runcat
source parameters using the images that fail quality control.

But my understanding is that in Banana it is the full fitting strategy and
not the strategy that would work in a streaming mode for AARTFAAC images.
My option 1 would work for AARTFAAC images and could be improved in the
longer term.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#512 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAT6pCEu02T2K3FkBGakbJJGSntjfDJ2ks5q1iMtgaJpZM4IqTZD
.

@AntoniaR
Copy link
Contributor Author

I looked at Banana and did not use the notebook as it was not necessary.

@gijzelaerr
Copy link
Collaborator

Ok in that case the numbers are probably sound and will use this method to
build the QC.

Op 19 okt. 2016 4:26 p.m. schreef "Antonia Rowlinson" <
notifications@github.com>:

I looked at Banana and did not use the notebook as it was not necessary.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#512 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAT6pHDo2HhlIYRPtXp1AZt3blCg0tBTks5q1ihPgaJpZM4IqTZD
.

@gijzelaerr gijzelaerr removed their assignment Feb 6, 2017
@AntoniaR
Copy link
Contributor Author

I have now confirmed this is roughly working but has some issues for users. These changes to be made for R4.1:

  1. if an image already fails this test

    if not rms_min < image.rms_qc < rms_max:
    it should be rejected immediately, this will save time as we don't need to query the image database or refit the histogram. It will also stop us wasting time fitting sources etc in the image. I recommend moving lines 109-111 up to line 99

  2. we need an option to turn off the history and sigma clipping of a Gaussian. For instance there may be datasets where we want to specify specific rms_max and rms_min value. I suggest around line 99 (after the initial rejection in point 1) setting an option like:
    if history == 0: return false
    Then only the global max/min thresholds are applied and we can specify it easily in the existing job_params.cfg file

  3. est_sigma is currently using the sigma clipping value used for calculating the rms (typically = 4). This needs changing as the sigma clipping value and the image rejection sigma value are typically not the same. i.e. my standard settings are est_sigma=4 and image rejection sigma=2. This should be a new user defined parameter in the job_params.cfg (called maybe img_rej_sigma?) file with a default value of 2.

    tkp/tkp/quality/rms.py

    Lines 106 to 107 in 0e117a1

    t_low = mu - sigma * est_sigma
    t_high = mu + sigma * est_sigma

@AntoniaR AntoniaR modified the milestones: 4.1, 5.0 Jul 15, 2020
@AntoniaR
Copy link
Contributor Author

Fixing the set_sigma and getting this fully working is a goal for Release 5.0

@AntoniaR
Copy link
Contributor Author

Started initial work on this.

Introduced a new parameter in job_params.cfg to reject images based on the distribution of the image rms values from the histogram. Default is 3 sigma, i.e. an image whose rms is more (or less) than 3 sigma from the mean of the rms of the other images will be rejected:

rms_rej_sigma = 3	     ; threshold for rejecting images using rms histogram

The code is in place in branch Issue512 but needs testing. https://github.com/transientskp/tkp/tree/Issue512

@AntoniaR
Copy link
Contributor Author

AntoniaR commented Aug 2, 2021

This issue is completed for Streaming mode and can be closed with the pull request #591

@AntoniaR AntoniaR linked a pull request Aug 2, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants