# Merging processed data
This notebook relies on the data from the previous notebook (but there is no need to run the previous notebook for this one to work however).

In [1]:
import gnssvod as gv
import pandas as pd

## Merge
In the previous notebook, we processed raw RINEX observation files individually for each receiver and saved the results in corresponding NetCDF files.

In the case of a GNSS-VOD set up, receivers are analysed as pairs. One receiver lies above the forest canopy and provides a clear-sky reference, and the other one lies below the canopy and measures the forest attenuation.

Here we merge the data from these two receivers before making any plots. We also save the merged data in chunks that are always the same (for example we save them in daily chunks). This makes it easier to manipulate data and avoids relying on the temporal chunks with which data was initially logged (here data was logged in hourly log files that span from xx:07 too xx+1:06).

### gv.gather_stations()
This function will do several things
- It will read processed observation files that were saved in NetCDF format (output of "preprocess").
- It will combine data from the various receivers/stations according to user-specified pairing rules.
- It will only process data belonging to the requested time interval.
- It will save paired data in temporal chunks specified by the time interval.
- If requested, it will also return the paired data as an object

#### Specifying input files

In [2]:
# first let's indicate where to find the data for each receiver
pattern={'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/nc/*.nc',
         'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/nc/*.nc'}

#### Specifying time interval
Then we need to define the temporal interval and the temporal chunks we will want for the output data
                                                                             
Here we decide to process all data from '28-04-2021' to '29-04-2021', meaning 2 days, starting at '28-04-2021'

In [3]:
startday = pd.to_datetime('28-04-2021',format='%d-%m-%Y')
timeintervals=pd.interval_range(start=startday, periods=2, freq='D', closed='left')
timeintervals

IntervalIndex([[2021-04-28 00:00:00, 2021-04-29 00:00:00), [2021-04-29 00:00:00, 2021-04-30 00:00:00)], dtype='interval[datetime64[ns], left]')

Using the timeintervals above will save the results in chunks of 1 day. If we wanted the results in hourly chunks, we could have written instead:

`timeintervals=pd.interval_range(start=startday, periods=48, freq='H', closed='left')`

Now the only thing left is to define how to combine the stations, using the same dictionary keys as in 'pattern'.

In [4]:
# define how to make pairs, always give reference station first, matching the dictionary keys of 'pattern'
pairings={'Dav':('Dav2_Twr','Dav1_Grnd')}

# run function
out = gv.gather_stations(pattern,pairings,timeintervals,outputresult=True)

Extracting Epochs from files
----- Processing Dav
-- Processing interval [2021-04-28 00:00:00, 2021-04-29 00:00:00)
Found 3 file(s) for Dav2_Twr
Reading
Found 3 file(s) for Dav1_Grnd
Reading
Concatenating stations
-- Processing interval [2021-04-29 00:00:00, 2021-04-30 00:00:00)
Found 4 file(s) for Dav2_Twr
Reading
Found 4 file(s) for Dav1_Grnd
Reading
Concatenating stations


If outputresult was set to 'True' (default is 'False'), the returned result is of the form

out = dict(key=pd.DataFrame,
<br>&emsp;&emsp;&emsp;&emsp;&emsp;key=pd.DataFrame)

In our case, something like:

out = {'Dav':pd.DataFrame}

In [5]:
out

{'Dav':                                           S1         S2         S7  \
 Station   Epoch               SV                                     
 Dav2_Twr  2021-04-28 21:07:00 C06  38.000000  38.000000  31.000000   
                               C09  41.000000  41.000000  36.000000   
                               C11  43.428571  43.428571  41.000000   
                               C14  45.000000  45.000000  42.285714   
                               C16  38.000000  38.000000  33.000000   
 ...                                      ...        ...        ...   
 Dav1_Grnd 2021-04-29 03:07:00 R16  32.200000  31.700000        NaN   
                               R23  27.700000        NaN        NaN   
                               S23  36.000000        NaN        NaN   
                               S27  29.100000        NaN        NaN   
                               S36  35.000000        NaN        NaN   
 
                                       Azimuth  Elevation  
 Station

We can see that a new MultiIndex level named 'Station' has been added. Data from both stations now appear in the same table, with aligned Epochs and SV numbers.

#### Specifying output destination
Instead of just returning the result as an output of the function, we can specify where to save it instead. Again it may also be useful to get rid of some variables that are not useful in order to reduce file size.

In [6]:
# define where to save output data, matching the dictionary keys in 'pairings'
outputdir = {'Dav':'data_RINEX2.11/Dav_paired/'}
# define which variables to keep
keepvars = ['S*','Azimuth','Elevation']

# run function
out = gv.gather_stations(pattern,pairings,timeintervals,keepvars=keepvars,outputdir=outputdir)

Extracting Epochs from files
----- Processing Dav
-- Processing interval [2021-04-28 00:00:00, 2021-04-29 00:00:00)
Found 3 file(s) for Dav2_Twr
Reading
Found 3 file(s) for Dav1_Grnd
Reading
Concatenating stations
Saving result in data_RINEX2.11/Dav_paired/
Saved 43172 observations in Dav_20210428000000_20210429000000.nc
-- Processing interval [2021-04-29 00:00:00, 2021-04-30 00:00:00)
Found 4 file(s) for Dav2_Twr
Reading
Found 4 file(s) for Dav1_Grnd
Reading
Concatenating stations
Saving result in data_RINEX2.11/Dav_paired/
Saved 46164 observations in Dav_20210429000000_20210430000000.nc


As we asked, the results have been saved as daily files (even though the input files are hourly files). The file names are generated based on the key of the 'pairing' argument (here 'Dav') and the specified time intervals.