# QC flags

By default, the QC flags are applied. This means that for numeric data columns that have a QC flag column, values where the QC flag is not "0" are set to NaN.

See {doc}`select-sites` for more information about selecting sites and
{doc}`daily` / {func}`uscrn.get_data` and {doc}`nrt` / {func}`uscrn.get_nrt_data` for more information about loading data.

In [11]:
import pandas as pd

import uscrn

In [57]:
station_id = "1045"  # Boulder, CO

df = uscrn.get_data(2019, "hourly", station_id=station_id, n_jobs=1)
df_no_qc = uscrn.get_data(2019, "hourly", station_id=station_id, apply_qc=False, n_jobs=1)

Discovering files...
1 file(s) found
https://www.ncei.noaa.gov/pub/data/uscrn/products/hourly02/2019/CRNH0203-2019-CO_Boulder_14_W.txt
Reading files...


[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.8s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.8s


Discovering files...
1 file(s) found
https://www.ncei.noaa.gov/pub/data/uscrn/products/hourly02/2019/CRNH0203-2019-CO_Boulder_14_W.txt
Reading files...


[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.8s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.8s


In [58]:
qc_vns = [k for k, v in df.attrs["attrs"].items() if v["qc_flag_name"]]

counts = []
for vn in qc_vns:
    fn = df.attrs["attrs"][vn]["qc_flag_name"]
    counts.append(df[fn].value_counts().convert_dtypes().rename(vn))

counts = pd.DataFrame(counts)
counts

Unnamed: 0,0,3
solarad,8756,4.0
solarad_max,8750,10.0
solarad_min,8756,4.0
sur_temp,8756,4.0
sur_temp_max,8756,4.0
sur_temp_min,8756,4.0
rh_hr_avg,8760,


In [36]:
vn = counts.sort_values(by="0").iloc[0].name

pd.concat(
    [
        df[vn].isnull().value_counts().rename("qc"),
        df_no_qc[vn].isnull().value_counts().rename("no qc"),
    ],
    axis=1,
)

Unnamed: 0_level_0,qc,no qc
solarad_max,Unnamed: 1_level_1,Unnamed: 2_level_1
False,8749,8759
True,11,1


In [38]:
df.sur_temp_type.value_counts()

sur_temp_type
C    8759
U       1
Name: count, dtype: int64

## IR surface measurement type

NRT data are (presumably) more likely to have non-corrected values present.

In [42]:
df = uscrn.get_nrt_data((-4, None), "hourly", n_jobs=2)

Discovering files...
  Looking for files in these years
  - 2025
Found 4 file(s) to load
https://www.ncei.noaa.gov/pub/data/uscrn/products/hourly02/updates/2025/CRN60H0203-202504061500.txt
...
https://www.ncei.noaa.gov/pub/data/uscrn/products/hourly02/updates/2025/CRN60H0203-202504061800.txt
Reading files...


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done   2 out of   4 | elapsed:    0.2s remaining:    0.2s
[Parallel(n_jobs=2)]: Done   4 out of   4 | elapsed:    0.6s finished


In [43]:
df.sur_temp_type.value_counts()

sur_temp_type
C    567
U     60
Name: count, dtype: int64

In [50]:
wbans = sorted(df.query("sur_temp_type == 'U'").wban.unique())
print(wbans)
print(len(wbans))

['23801', '23802', '63862', '63867', '63868', '63891', '63892', '63893', '63894', '63895', '63897', '63899', '73801', '73802', '73803']
15
