Let's see how we can go about checking the quality of the data in ASPEN-processed files.

First we import the necessary modules

In [4]:
from halodrops import sonde
from halodrops.helper import paths
from halodrops.qc import profile

We will go about checking the QC for all ASPEN-processed files from the HALO flight on 1st April, 2022. First, we get a dictionary of all sondes in the flight. It will be called `Sondes`, its keys will be sonde-IDs and their values will be corresponding instances of the `Sonde` class.

In [5]:
data_directory = '/Users/geet/Documents/Repositories/Owned/halodrops/sample/'
flight_id = '20220401'

# Instantiate paths object
f0401 = paths.Paths(data_directory,flight_id)
# Create Sondes dictionary
Sondes = f0401.populate_sonde_instances()

The post-ASPEN file for 213450447 with filename D20220401_101259QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213341449 with filename D20220401_093402QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213450599 with filename D20220401_125710QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213010063 with filename D20220401_101634QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an at

Let's start by looking at data from one sonde from the flight. 

In [6]:
ds = Sondes['210430717'].aspen_ds

First, we'll check the profile fullness of the `u_wind` variable.

The profile fullness (or profile coverage) is the fraction of timestamps that have data. Therefore, a variable that provides measurements at every timestamp (i.e. timestamps are the coordinates of the independent time dimension) would have a value of 1.

In [7]:
var = 'u_wind'
print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')

0.93: Profile coverage of u_wind


That's nice. This means that 93% of the timestamps in the dataset have a non-NaN measurement of `u_wind` associated with them. Now, let's check for `tdry`, which is the dry air temperature.

In [8]:
var = 'tdry'
print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')

0.48: Profile coverage of tdry


Oof! That looks bad. That's almost half of the coverage for `u_wind`. 

But, there's a catch. The temperature sensor in the RD-41 sonde has a sampling frequency of 2 Hz, whereas the GPS sensor (from where the horizontal winds are derived) has a sampling frequency of 4 Hz. The time-coordinates are the same for both variables and are spaced every 0.25 seconds, which aligns exactly with the GPS sensor frequency. Therefore, it is a bit unfair to compare the fraction of temperature values against all time coordinates, given that it is not supposed to be measuring so frequently. A better way would be to compare the profile-coverage weighted for the sampling frequency. 

So, if the temperature sensor has to measure at every other time-coordinate, then it's profile-coverage should be taken only for half the time-coordinates, or simply multiplied by two. This is exactly what the [weighted_fullness](../apidocs/halodrops/halodrops.qc.profile.md#halodrops.qc.profile.weighted_fullness) function does.

In [9]:
var = 'tdry'
sampling_frequency = 2 # in hertz
print(f'{profile.weighted_fullness(ds,var,sampling_frequency):.02f}: Profile coverage of {var}')

0.97: Profile coverage of tdry


Now, that doesn't look too bad, does it? It's actually performing better than the `u_wind` variable, accounting for sensor sampling frequencies.

Checking for some other variables now...

In [12]:
vars = ['rh','u_wind','pres']
freqs = (2,4,2)

for var,freq in zip(vars,freqs):
    print('---')
    print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')
    print(f'{profile.weighted_fullness(ds,var,freq):.02f}: Weighted profile coverage of {var}')

---
0.44: Profile coverage of rh
0.89: Weighted profile coverage of rh
---
0.93: Profile coverage of u_wind
0.93: Weighted profile coverage of u_wind
---
0.48: Profile coverage of pres
0.96: Weighted profile coverage of pres
