# Demo: Applying Quality Control.
In this example, we apply Quality Control (QC) on the demo data. 

## Create your dataset
We start by creating a dataset.

In [None]:
%config InlineBackend.print_figure_kwargs = {'bbox_inches':None} #else the legend is cutoff in ipython inline plots

In [None]:
import metobs_toolkit
your_dataset = metobs_toolkit.Dataset()
your_dataset.update_file_paths(
    input_data_file=metobs_toolkit.demo_datafile, # path to the data file
    input_metadata_file=metobs_toolkit.demo_metadatafile,
    template_file=metobs_toolkit.demo_template,
)

your_dataset.import_data_from_file()

A number of quality control methods are available in the toolkit. We can classify them into two groups:

1. **Quality control for missing/duplicated or invalid timestamps**. This is applied to the raw data and is not based on the observational value but merely on the presence of a record. 
2. **Quality control for bad observations**. These are not automatically executed. These checks are performed in a sequence of specific checks, that are looking for signatures of typically bad observations.

## Quality control for missing/duplicated and invalid timestamps
Since this is applied to the raw data, the following quality control checks are automatically performed when reading the data:

* Nan check: Test if the value of an observation can be converted to a numeric value.
* Gap finder: Test if there are missing records. These are stored as `metobs_toolkit.Gap()`.
* Duplicate check: Test if each observation (station name, timestamp, observation type) is unique.

As an example, you can see that there is a missing timestamp (=gap) in the time series of some stations:

In [None]:
your_dataset.get_station('vlinder02').make_plot(colorby='label')


## Quality control for bad observations
The following checks are available:

* *Gross value check*: A threshold check that observations should be between the thresholds
* *Persistence check*: Test observations to change over a specific period.
* *Repetitions check*: Test if an observation changes after several records.
* *Spike check*: Test if observations do not produce spikes in time series.
* *Window variation check*: Test if the variation exceeds the threshold in moving time windows.
* *Toolkit Buddy check*: Spatial buddy check.
* *TITAN Buddy check*: The [Titanlib version of the buddy check](https://github.com/metno/titanlib/wiki/Buddy-check).
* *TITAN Spatial consistency test*: Apply the Titanlib (robust) [Spatial-Consistency-Test](https://github.com/metno/titanlib/wiki/Spatial-consistency-test-resistant) (SCT).

Each check requires a set of specific settings, often stored per specific observation type. A set of default settings, for temperature observations, are stored in the settings of each dataset. Use the *get_info()* method, and scroll to the QC section to see all QC settings.


In [None]:
your_dataset.settings.get_info()


Use the ``update_qc_settings()`` method to update the default settings.

In [None]:
your_dataset.update_qc_settings(obstype='temp',
                                gross_value_max_value=26.3,
                                persis_time_win_to_check='30min' #30 minutes
                                )

To apply the quality control on the full dataset use the ``apply_quality_control()`` method. Spatial quality control checks can be applied by using the ``apply_buddy_check()``, ``apply_titan_buddy_check()`` and ``apply_titan_sct_resistant_check()`` methods.

In [None]:
your_dataset.apply_quality_control(
        obstype="temp",  # which observations to check
        gross_value=True,  # apply gross_value check?
        persistence=True,  # apply persistence check?
        step=True,  # apply the step check?
        window_variation=True,  # apply internal consistency check?
    )

Use the dataset.show() or the time series plot methods to see the effect of the quality control.

In [None]:
your_dataset.make_plot(obstype='temp', colorby='label')

If you are interested in the performance of the applied QC, you can use the ``get_qc_stats()`` method to get an overview of the frequency statistics.

In [None]:
your_dataset.get_qc_stats(obstype='temp', make_plot=True)