# Demonstration notebook for processing raw RINEX data
In this Notebook, we process some example RINEX files to demonstrate gnssvod.

In [1]:
import gnssvod as gv

## gv.preprocess()
The main pre-processing function is preprocess(). This function  will do several things
- It will read RINEX observation files as pandas data frames
- It can aggregate the raw data to a lower temporal rate if specified.
- It will by default download orbit and clock files for the corresponding days from the GSSC ESA server
- From the orbit and clock files, it will calculate azimuth and elevation for each measurement
- It can save each processed file as a netcdf file in the outputdir folder or return the results as a dictionary

### specifying input files
The function exclusively reads RINEX observation files. Such files typically end with the extension '.yyO' where yy is the last two digit of the year. The function can be used to process a single file, a group of files, or several groups of files corresponding to several receivers, as shown in the examples below. All of this is done by specifying a pattern as the first argument to the function.

### specifying output destinations
Results are saved to a NetCDF file when an output directory is specified and/or returned as a dictionary when "outputresult=True" is passed.

Let's read a single file using the example data to begin with

In [2]:
pattern = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O'}
result = gv.preprocess(pattern,outputresult=True)

Created a temporary directory at /tmp/tmp0cpgz6ok
data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O exists | Reading...
Observation file  data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O  is read in 5.33 seconds.
Processing 112382 individual observations
Calculating Azimuth and Elevation
This file does not exist: /tmp/tmp0cpgz6ok/GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3
Downloading: GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz

GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz: 0.98MB [00:00, 1.25MB/s]                            


 | Download completed for GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz
/tmp/tmp0cpgz6ok/GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3 file is read in 1.57 seconds
This file does not exist: /tmp/tmp0cpgz6ok/GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3
Downloading: GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz

GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz: 0.98MB [00:00, 1.95MB/s]                            


 | Download completed for GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz
/tmp/tmp0cpgz6ok/GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3 file is read in 1.36 seconds
This file does not exist: /tmp/tmp0cpgz6ok/GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK
Downloading: GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz

GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz: 4.54MB [00:00, 7.04MB/s]                            


 | Download completed for GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz
/tmp/tmp0cpgz6ok/GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK file is read in 3.74 seconds
This file does not exist: /tmp/tmp0cpgz6ok/GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK
Downloading: GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz

GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz: 4.55MB [00:00, 6.83MB/s]                            


 | Download completed for GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz
/tmp/tmp0cpgz6ok/GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK file is read in 3.83 seconds
SP3 interpolation is done in 22.82 seconds
Removed the temporary directory at /tmp/tmp0cpgz6ok


The default logs should indicate how many observations were read in the file.

If this is the first time you run the script, it also shows some orbit files were downloaded. Orbit files are necessary to calculate the azimuth and elevation of the satellites. A temporary folder is automatically created to store those orbit files and process them.

If you process very recent data (less than 3 days old), it could be that the orbit and clock files are not available on the ESA server yet and there would then be an error.

The result returned by the function is a dictionary providing lists of Observation objects.

In [3]:
result

{'Dav2_Twr': [<gnssvod.io.io.Observation at 0x7f3f70e6fbe0>]}

Since we processed one file, there is only one Observation object in the list. Let us access this first and unique item.

In [4]:
obs = result['Dav2_Twr'][0]
obs

<gnssvod.io.io.Observation at 0x7f3f70e6fbe0>

Observation objects are custom classes introduced in the `gnsspy` package by Mustafa Serkan Işık and Volkan Özbey. A significant number of base functions in `gnssvod` are based on gnsspy.

Observation objects contain the following properties
- obs.filename          = the name of the source file
- obs.epoch             = a datetime indicate the day at the start of the record
- obs.observation       = a pandas data frame containing all measurements
- obs.approx_position   = the approximate receiver position as provided in the RINEX file [X,Y,Z]
- obs.receiver_type     = the receiver type if provided in the RINEX file
- obs.antenna_type      = the antenna type if provided in the RINEX file
- obs.interval          = the measurement frequency in seconds
- obs.receiver_clock    = the receiver clock if provided in the RINEX file
- obs.version           = the version of the RINEX file
- obs.observation_types = the observation types reported as columns in obs.observation

Let's just look at the data..

In [None]:
obs.observation

Unnamed: 0_level_0,Unnamed: 1_level_0,C1,L1,D1,S1,C2,L2,D2,S2,C7,L7,D7,S7,Azimuth,Elevation
Epoch,SV,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2021-04-28 21:07:08,G01,2.300005e+07,1.208661e+08,-2545.050,47.0,2.300006e+07,9.418140e+07,-1983.223,44.0,,,,,132.372864,59.902810
2021-04-28 21:07:08,G03,2.257959e+07,1.186567e+08,-528.921,47.0,2.257959e+07,9.245973e+07,-412.197,42.0,,,,,-16.744176,79.170432
2021-04-28 21:07:08,G04,2.373431e+07,1.247247e+08,1971.157,45.0,2.373431e+07,9.718807e+07,1535.983,42.0,,,,,-169.822432,50.693499
2021-04-28 21:07:08,G09,2.620530e+07,1.377098e+08,3155.798,40.0,2.620531e+07,1.073063e+08,2459.033,40.0,,,,,-150.328510,19.288991
2021-04-28 21:07:08,G17,2.423057e+07,1.273325e+08,-403.684,49.0,2.423058e+07,9.922014e+07,-314.543,40.0,,,,,-81.696037,43.789922
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-04-28 22:07:07,C16,,2.282536e+08,-1695.972,36.0,4.383370e+07,2.282536e+08,-1695.972,36.0,4.383372e+07,,-1311.197,31.0,41.172558,7.343129
2021-04-28 22:07:07,C27,,1.432542e+08,1694.596,44.0,2.751047e+07,1.432542e+08,1694.596,44.0,,,,,-42.395746,20.938009
2021-04-28 22:07:07,C28,,1.326840e+08,-1191.327,49.0,2.548056e+07,1.326840e+08,-1191.327,49.0,,,,,-99.551780,49.140759
2021-04-28 22:07:07,S23,4.111082e+07,2.160388e+08,-516.361,45.0,,,,,,,,,,


The pandas data frame has a MultIndex that contains both Epoch and SV as indices. The Epoch is the local time of the measurement and the SV is a satellite identification number (also called PRN).

The columns correspond to:
- C# = Pseudorange from the receiver to the satellite, in meters
- L# = Carrier phase, in cycles
- D# = Doppler, in Hz
- S# = Carrier to noise density C/N$_0$, in dB (receiver-dependent)

And the numbers (S1, S2, etc. ) indicate the corresponding GNSS frequency

The azimuth and elevation of the satellite with respect to the receiver are expressed in degrees. Computation speed for the azimuth and elevation can vary according to your hardware. Most of the time is spent interpolating the orbit parameters to the time stamps of each measurement. This is why it is sometimes useful to resample high frequency data (here one measurement per second) to for instance one measurement each 15 seconds.

### resampling

We can pass "interval='15s'" to resample the data during the preprocessing. The returned data will be smaller and the calculation of the azimuths and elevations (reported as "SP3 interpolation") will be faster.

In [6]:
pattern = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O'}
result = gv.preprocess(pattern,interval='15s',outputresult=True)
# and show data frame
result['Dav2_Twr'][0].observation

Created a temporary directory at /tmp/tmpzpvxc68f
data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O exists | Reading...
Observation file  data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O  is read in 5.28 seconds.
Processing 112382 individual observations
Calculating Azimuth and Elevation
This file does not exist: /tmp/tmpzpvxc68f/GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3
Downloading: GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz

GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz: 0.98MB [00:00, 2.26MB/s]                            


 | Download completed for GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz
/tmp/tmpzpvxc68f/GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3 file is read in 1.17 seconds
This file does not exist: /tmp/tmpzpvxc68f/GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3
Downloading: GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz

GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz: 0.98MB [00:00, 2.02MB/s]                            


 | Download completed for GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz
/tmp/tmpzpvxc68f/GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3 file is read in 1.20 seconds
This file does not exist: /tmp/tmpzpvxc68f/GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK
Downloading: GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz

GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz: 4.54MB [00:00, 8.54MB/s]                            


 | Download completed for GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz
/tmp/tmpzpvxc68f/GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK file is read in 3.91 seconds
This file does not exist: /tmp/tmpzpvxc68f/GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK
Downloading: GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz

GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz: 4.55MB [00:01, 4.56MB/s]                            


 | Download completed for GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz
/tmp/tmpzpvxc68f/GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK file is read in 3.86 seconds
SP3 interpolation is done in 4.27 seconds
Removed the temporary directory at /tmp/tmpzpvxc68f


Unnamed: 0_level_0,Unnamed: 1_level_0,C1,C2,C7,D1,D2,D7,L1,L2,L7,S1,S2,S7,Azimuth,Elevation
Epoch,SV,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2021-04-28 21:07:00,C06,,4.307927e+07,4.307927e+07,-1539.026429,-1539.026429,-1189.841571,2.243250e+08,2.243250e+08,1.734648e+08,38.000000,38.000000,31.000000,36.614495,10.132689
2021-04-28 21:07:00,C09,,4.080972e+07,4.080973e+07,-1076.832857,-1076.832857,-832.746571,2.125070e+08,2.125070e+08,1.643239e+08,41.000000,41.000000,36.000000,49.034472,32.742503
2021-04-28 21:07:00,C11,,2.596665e+07,2.596666e+07,-3555.085714,-3555.085714,-2748.967571,1.352152e+08,1.352152e+08,1.045570e+08,43.428571,43.428571,41.000000,177.234549,35.079888
2021-04-28 21:07:00,C14,,2.382462e+07,2.382462e+07,32.760429,32.760429,25.282429,1.240611e+08,1.240611e+08,9.593200e+07,45.000000,45.000000,42.285714,-96.373353,76.785189
2021-04-28 21:07:00,C16,,4.268828e+07,4.268830e+07,-1595.855857,-1595.855857,-1234.117000,2.222891e+08,2.222891e+08,1.718881e+08,38.000000,38.000000,33.000000,38.266603,15.249211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-04-28 22:07:00,R18,2.326705e+07,2.326705e+07,,-3848.505125,-2993.250750,,1.242018e+08,9.660093e+07,,46.375000,42.000000,,-167.160718,45.315939
2021-04-28 22:07:00,R19,2.237872e+07,2.237872e+07,,-52.467125,-40.688500,,1.197111e+08,9.310873e+07,,37.375000,38.875000,,-74.711561,63.385367
2021-04-28 22:07:00,R20,2.596147e+07,2.596148e+07,,3565.564875,2773.281125,,1.388277e+08,1.079770e+08,,37.875000,34.750000,,-26.146660,13.545577
2021-04-28 22:07:00,S23,4.111048e+07,,,-515.803000,,,2.160370e+08,,,44.750000,,,,


There are now less rows in the data frame.

## Batch processing
We now use the preprocessing function to process many files and save the outputs as NetCDF files (instead of returning as objects). If we were to process several hundreds of files, your computer may not have sufficient memory to hold all of the outputs, so it makes sense to save processed data as a NetCDF file.

### Specifying several groups of files
Instead of specifying just one file, we use the dictionary to specify a pattern. All files matching the pattern will be processed. We can process several groups files by specifying different matching patterns (see below).

### Specifying where to save data
Same as for specifying the inputs, we use a dictionary to indicate where to save data. The function will create the destination folder if it does not exist.

### Specifying a list of variables to save
For calculating GNSS-VOD, we only need the "S" variables. We can reduce the size of the saved NetCDF files by discarding the other variables, this is done with the 'keepvars' argument, which will only keep the variables present in the passed list. This argument supports UNIX-style pattern matching (e.g. 'S*' will match all variables starting with 'S')

### Compression
Unless `encoding=None` is passed as argument, `gv.preprocess()` will compress all S* variables, as well as Azimuth and Elevation when saving to NetCDF. These variables are encoded as Int16 with a scale factor of 0.1. The decoding is automatically applied when reading the data with xarray.

In [9]:
# use gnssvod to batch process the observation RINEX files 
# (files with extension .yyO for each station)
# pattern = {'choice_of_name_for_station1':'pattern to match (UNIX-style)',
#            'choice_of_name_for_station2':'pattern to match (UNIX-style)',
#             ...}
#
pattern = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/rinex/*.*O',
          'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/rinex/*.*O'}
outputdir = {'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/nc/',
            'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/nc/'}
# what variables should be kept
keepvars = ['S?','S??']

gv.preprocess(pattern,interval='15s',keepvars=keepvars,outputdir=outputdir)

Created a temporary directory at /tmp/tmppiyvotr2
Could not find any files matching the pattern data_RINEX2.11/Dav2_Twr/nc/*.nc
data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O exists | Reading...
Observation file  data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282106.21O  is read in 5.36 seconds.
Processing 112382 individual observations
Calculating Azimuth and Elevation
This file does not exist: /tmp/tmppiyvotr2/GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3
Downloading: GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz

GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz: 0.98MB [00:00, 2.24MB/s]                            


 | Download completed for GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3.gz
/tmp/tmppiyvotr2/GFZ0MGXRAP_20211180000_01D_05M_ORB.SP3 file is read in 1.25 seconds
This file does not exist: /tmp/tmppiyvotr2/GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3
Downloading: GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz

GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz: 0.98MB [00:00, 1.98MB/s]                            


 | Download completed for GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3.gz
/tmp/tmppiyvotr2/GFZ0MGXRAP_20211190000_01D_05M_ORB.SP3 file is read in 1.22 seconds
This file does not exist: /tmp/tmppiyvotr2/GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK
Downloading: GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz

GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz: 4.54MB [00:00, 7.73MB/s]                            


 | Download completed for GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK.gz
/tmp/tmppiyvotr2/GFZ0MGXRAP_20211180000_01D_30S_CLK.CLK file is read in 4.46 seconds
This file does not exist: /tmp/tmppiyvotr2/GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK
Downloading: GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz

GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz: 4.55MB [00:00, 7.66MB/s]                            


 | Download completed for GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK.gz
/tmp/tmppiyvotr2/GFZ0MGXRAP_20211190000_01D_30S_CLK.CLK file is read in 3.66 seconds
SP3 interpolation is done in 4.28 seconds
Saved 7550 individual observations in data_RINEX2.11/Dav2_Twr/nc/Reach_Dav2_Twr-raw_202104282106.nc
data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282206.21O exists | Reading...
Observation file  data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282206.21O  is read in 5.40 seconds.
Processing 112113 individual observations
Calculating Azimuth and Elevation
Saved 7533 individual observations in data_RINEX2.11/Dav2_Twr/nc/Reach_Dav2_Twr-raw_202104282206.nc
data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282306.21O exists | Reading...
Observation file  data_RINEX2.11/Dav2_Twr/rinex/Reach_Dav2_Twr-raw_202104282306.21O  is read in 5.47 seconds.
Processing 115429 individual observations
Calculating Azimuth and Elevation
Saved 7756 individual observations in data_RINEX2.11/Dav2_Twr

### Skipping existing files by default
The preprocess function will scan the destination folder for existing NetCDF files. If some files are found that have already been processed, these files will be skipped unless overwrite=True has been passed.

Here because the destination folder was empty, a user warning appears in the log above but can be ignored ("Could not find any files matching the pattern data_RINEX2.11/Dav2_Twr/nc/*.nc")