### Basic zip archive data access and usage
For simplicity, the following example loads only 200 sweeps directly from the zipfile into pandas DataFrame objects. These operations can be significantly accelerated using dask instead (see `dask_processing.ipynb`).

In order to run this yourself, you'll need to download a zip file archive, and adjust `data_path` accordingly.

In [1]:
# this is a development environment hack!
# after pip install of sea_ingest, you'd just import sea_ingest
import __init__ as sea_ingest
from labbench import stopwatch

data_path = 'data/NIT-2022-12-13.zip'
with stopwatch('ziparchive read'):
    dfs = sea_ingest.read_seamf_zipfile(data_path, allow=200, tz="America/New_York")

[1;30m INFO  [0m [32m2023-04-04 09:41:01.833[0m • [34mlabbench:[0m ziparchive read 3.057 s elapsed


### Returned dictionary structure
The data are returned as a dictionary of `pd.DataFrame`, named by data product or metadata type.

### DataFrame structure
The data products are arranged as tables.
* The trace axis (time elapsed, FFT bin frequency, etc.) is given by the `column` attribute
* The trace index (timestamp, RF center frequency, and any trace specificiations like the detector) are arranged as levels of a multilevel index.

Some advantages to arranging the table this way:
* All data values (below, `dfs['pfp'].values`) are the same kind of quantity, in this case dBm/10 MHz. (TODO: attach units with pint :))
    - This means that operations like `10**(dfs['pfp']/10)` do not apply to the index
* We can use _any_ of the indexing metadata fields to query subsets of the data quickly

In [2]:
dfs['pfp']

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Frame time elapsed (s),0.000000,0.000018,0.000036,0.000054,0.000071,0.000089,0.000107,0.000125,0.000143,0.000161,...,0.009821,0.009839,0.009857,0.009875,0.009893,0.009911,0.009929,0.009946,0.009964,0.009982
datetime,frequency,capture_statistic,detector,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
2022-10-26 20:50:03.701000-04:00,3.555000e+09,min,rms,-101.5625,-101.5000,-101.5000,-101.5625,-101.4375,-101.5000,-101.3750,-101.5625,-101.5000,-101.3750,...,-101.6875,-101.6250,-101.4375,-101.4375,-101.4375,-101.4375,-101.5000,-101.5625,-101.5000,-101.5000
2022-10-26 20:50:03.701000-04:00,3.555000e+09,max,rms,-100.3125,-100.5000,-100.0625,-100.2500,-100.3125,-100.2500,-100.3750,-100.2500,-100.3750,-100.2500,...,-100.4375,-100.2500,-100.3125,-100.3750,-100.3125,-100.3125,-100.3125,-100.3125,-100.3125,-100.3750
2022-10-26 20:50:03.701000-04:00,3.555000e+09,mean,rms,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,...,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750
2022-10-26 20:50:03.701000-04:00,3.555000e+09,min,peak,-94.8125,-94.3750,-94.6250,-94.5000,-94.4375,-94.1875,-94.6250,-94.6875,-95.1250,-94.7500,...,-95.1875,-94.6875,-94.7500,-94.8750,-94.4375,-94.7500,-94.5625,-94.8125,-94.7500,-94.8750
2022-10-26 20:50:03.701000-04:00,3.555000e+09,max,peak,-89.0625,-90.2500,-89.8750,-89.1875,-90.0625,-90.8750,-90.1250,-89.8750,-90.1250,-89.8750,...,-90.1250,-90.5625,-89.3125,-89.9375,-89.4375,-90.1875,-89.8125,-89.6875,-89.8750,-90.1875
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-10-27 01:41:27.014000-04:00,3.695000e+09,max,rms,-100.1250,-100.1250,-100.2500,-100.1250,-100.1250,-100.1875,-100.0000,-100.0625,-100.1875,-100.1250,...,-100.0625,-100.1875,-100.1250,-100.0625,-100.0625,-100.1250,-100.1250,-99.9375,-100.1250,-100.1250
2022-10-27 01:41:27.014000-04:00,3.695000e+09,mean,rms,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,...,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875,-100.6875
2022-10-27 01:41:27.014000-04:00,3.695000e+09,min,peak,-94.2500,-94.5000,-94.6875,-94.3750,-94.1250,-94.5000,-95.0000,-94.3750,-94.5000,-94.6250,...,-94.2500,-94.3125,-94.4375,-94.3750,-94.1875,-94.5000,-94.6875,-94.4375,-94.5000,-95.0000
2022-10-27 01:41:27.014000-04:00,3.695000e+09,max,peak,-90.4375,-90.2500,-89.8750,-89.6250,-90.2500,-89.5000,-90.0000,-89.8125,-90.3750,-89.3750,...,-90.3125,-88.9375,-90.3125,-89.9375,-90.1875,-89.3750,-90.1875,-89.6250,-90.1875,-90.1875


### Quick indexing tutorial

You can access each index level using the index value. One way is with the `.loc` accessor, specifying `axis=0` to indicate that all slices are applied to the index (otherwise the 2nd field applies to columns). For example:

In [3]:
dfs['pfp'].loc(axis=0)[:'2022-10-26 20:55', 3.555e9, 'max', 'rms']

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Frame time elapsed (s),0.000000,0.000018,0.000036,0.000054,0.000071,0.000089,0.000107,0.000125,0.000143,0.000161,...,0.009821,0.009839,0.009857,0.009875,0.009893,0.009911,0.009929,0.009946,0.009964,0.009982
datetime,frequency,capture_statistic,detector,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
2022-10-26 20:50:03.701000-04:00,3555000000.0,max,rms,-100.3125,-100.5,-100.0625,-100.25,-100.3125,-100.25,-100.375,-100.25,-100.375,-100.25,...,-100.4375,-100.25,-100.3125,-100.375,-100.3125,-100.3125,-100.3125,-100.3125,-100.3125,-100.375
2022-10-26 20:51:27.659000-04:00,3555000000.0,max,rms,-78.75,-77.625,-76.9375,-78.5,-77.5,-79.625,-85.6875,-86.8125,-88.1875,-88.25,...,-85.8125,-88.25,-88.0,-88.0625,-86.875,-83.0625,-82.25,-78.5625,-77.625,-78.25
2022-10-26 20:52:51.837000-04:00,3555000000.0,max,rms,-100.1875,-83.6875,-82.8125,-81.3125,-79.25,-77.3125,-79.0625,-78.875,-78.625,-77.125,...,-100.3125,-100.375,-100.0625,-100.25,-100.25,-100.3125,-100.3125,-100.125,-100.25,-100.375
2022-10-26 20:54:16.029000-04:00,3555000000.0,max,rms,-80.8125,-78.9375,-79.375,-78.25,-79.4375,-82.9375,-82.6875,-83.0,-82.5,-83.125,...,-77.9375,-78.4375,-80.8125,-84.0,-86.625,-87.5625,-83.9375,-82.125,-82.625,-81.3125


In many cases, we'd like to index a single value in a given level, especially for categorical data like the string-referred `capture_statistic` and `detector` fields. For this, pandas provides the `.xs` accessor.

In [4]:
dfs['pfp'].xs(key=3.555e9, level='frequency')

Unnamed: 0_level_0,Unnamed: 1_level_0,Frame time elapsed (s),0.000000,0.000018,0.000036,0.000054,0.000071,0.000089,0.000107,0.000125,0.000143,0.000161,...,0.009821,0.009839,0.009857,0.009875,0.009893,0.009911,0.009929,0.009946,0.009964,0.009982
datetime,capture_statistic,detector,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
2022-10-26 20:50:03.701000-04:00,min,rms,-101.5625,-101.5000,-101.5000,-101.5625,-101.4375,-101.5000,-101.3750,-101.5625,-101.5000,-101.375,...,-101.6875,-101.6250,-101.4375,-101.4375,-101.4375,-101.4375,-101.5000,-101.5625,-101.5000,-101.5000
2022-10-26 20:50:03.701000-04:00,max,rms,-100.3125,-100.5000,-100.0625,-100.2500,-100.3125,-100.2500,-100.3750,-100.2500,-100.3750,-100.250,...,-100.4375,-100.2500,-100.3125,-100.3750,-100.3125,-100.3125,-100.3125,-100.3125,-100.3125,-100.3750
2022-10-26 20:50:03.701000-04:00,mean,rms,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.875,...,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750,-100.8750
2022-10-26 20:50:03.701000-04:00,min,peak,-94.8125,-94.3750,-94.6250,-94.5000,-94.4375,-94.1875,-94.6250,-94.6875,-95.1250,-94.750,...,-95.1875,-94.6875,-94.7500,-94.8750,-94.4375,-94.7500,-94.5625,-94.8125,-94.7500,-94.8750
2022-10-26 20:50:03.701000-04:00,max,peak,-89.0625,-90.2500,-89.8750,-89.1875,-90.0625,-90.8750,-90.1250,-89.8750,-90.1250,-89.875,...,-90.1250,-90.5625,-89.3125,-89.9375,-89.4375,-90.1875,-89.8125,-89.6875,-89.8750,-90.1875
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-10-27 01:40:14.976000-04:00,max,rms,-77.9375,-77.8125,-78.9375,-79.5000,-83.5000,-85.9375,-85.5000,-87.0000,-88.3750,-85.500,...,-79.3125,-79.9375,-79.3750,-77.9375,-79.1250,-78.7500,-76.6875,-77.1250,-76.5000,-77.5625
2022-10-27 01:40:14.976000-04:00,mean,rms,-83.7500,-83.9375,-84.1875,-86.1250,-87.3125,-88.6250,-88.1875,-89.3750,-90.1250,-89.500,...,-82.8125,-82.8125,-82.5625,-82.8750,-82.8750,-84.2500,-83.8750,-83.8125,-83.9375,-83.5625
2022-10-27 01:40:14.976000-04:00,min,peak,-80.0625,-79.8750,-79.8125,-80.8750,-79.9375,-80.1875,-80.8125,-80.3750,-80.4375,-80.500,...,-77.8750,-77.5625,-78.2500,-78.5625,-78.1875,-82.0000,-85.2500,-85.1875,-84.5625,-83.5000
2022-10-27 01:40:14.976000-04:00,max,peak,-71.8750,-70.8750,-72.3750,-73.5000,-73.8750,-73.8750,-73.8750,-73.8750,-74.7500,-74.000,...,-68.9375,-70.8750,-69.7500,-69.0000,-70.1875,-71.1875,-73.4375,-74.0625,-72.7500,-72.3750


For flexible queries to single index values of a specified data product and multiple index levels at a time, you can also use `seamf.trace`. In the following example, notice that this drops the selected levels from the index. 

In [5]:
sea_ingest.trace(dfs, 'pfp', frequency=3.555e9, capture_statistic='max', detector='rms').loc[:'2022-10-26 20:55']

Frame time elapsed (s),0.000000,0.000018,0.000036,0.000054,0.000071,0.000089,0.000107,0.000125,0.000143,0.000161,...,0.009821,0.009839,0.009857,0.009875,0.009893,0.009911,0.009929,0.009946,0.009964,0.009982
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-10-26 20:50:03.701000-04:00,-100.3125,-100.5,-100.0625,-100.25,-100.3125,-100.25,-100.375,-100.25,-100.375,-100.25,...,-100.4375,-100.25,-100.3125,-100.375,-100.3125,-100.3125,-100.3125,-100.3125,-100.3125,-100.375
2022-10-26 20:51:27.659000-04:00,-78.75,-77.625,-76.9375,-78.5,-77.5,-79.625,-85.6875,-86.8125,-88.1875,-88.25,...,-85.8125,-88.25,-88.0,-88.0625,-86.875,-83.0625,-82.25,-78.5625,-77.625,-78.25
2022-10-26 20:52:51.837000-04:00,-100.1875,-83.6875,-82.8125,-81.3125,-79.25,-77.3125,-79.0625,-78.875,-78.625,-77.125,...,-100.3125,-100.375,-100.0625,-100.25,-100.25,-100.3125,-100.3125,-100.125,-100.25,-100.375
2022-10-26 20:54:16.029000-04:00,-80.8125,-78.9375,-79.375,-78.25,-79.4375,-82.9375,-82.6875,-83.0,-82.5,-83.125,...,-77.9375,-78.4375,-80.8125,-84.0,-86.625,-87.5625,-83.9375,-82.125,-82.625,-81.3125
