In [12]:
import datetime
import pprint
import pyucrio

rio = pyucrio.PyUCRio()

# Downloading data

PyUCRio allows you to download data for a given dataset, time frame, and optionally the site. A progress bar is shown by default, and it can be disabled or modified using the optional parameters. The output path of the downloaded data can be modified when you initialize the `pyucrio.PyUCRio()` object. We show an example of this near the bottom of this crib sheet.

To figure out the dataset name that we want to download data for, we can use the `rio.data.list_datasets()` function, or navigate to the [Dataset Descriptions](https://data.phys.ucalgary.ca/about_datasets) page and dive into a particular instrument array page.

Below we are going to download some NORSTAR riometer data. We will use the `NORSTAR_RIOMETER_K0_TXT` dataset name, and the `start`, `end`, and `site_uid` parameters to filter further.

In [3]:
# download a few days of NORSTAR riometer data from Churchill
dataset_name = "NORSTAR_RIOMETER_K0_TXT"
start_dt = datetime.datetime(2021, 11, 3, 0, 0)
end_dt = datetime.datetime(2021, 11, 5, 23, 59)
site_uid = "chur"
r = rio.data.ucalgary.download(dataset_name, start_dt, end_dt, site_uid=site_uid)

Downloading NORSTAR_RIOMETER_K0_TXT files:   0%|          | 0.00/1.61M [00:00<?, ?B/s]

In [4]:
# view information about the downloaded data
r.pretty_print()

FileListingResponse:
  count             : 3
  dataset           : Dataset(name=NORSTAR_RIOMETER_K0_TXT, short_description='NORSTAR Single Frequency Riometers K0 raw data, in ASCII format', provider='UCalgary', level='L0', doi_details='https://commons.datacite.org/doi.org/10.11575/afyx-m516', ...)
  filenames         : [3 filenames]
  output_root_path  : /home/darrenc/pyucrio_data/NORSTAR_RIOMETER_K0_TXT
  total_bytes       : 0


In [5]:
# download a couple days of data from all NORSTAR riometer sites
dataset_name = "NORSTAR_RIOMETER_K0_TXT"
start_dt = datetime.datetime(2021, 11, 3, 0, 0)
end_dt = datetime.datetime(2021, 11, 5, 23, 59)
_ = rio.data.ucalgary.download(dataset_name, start_dt, end_dt)

Downloading NORSTAR_RIOMETER_K0_TXT files:   0%|          | 0.00/17.7M [00:00<?, ?B/s]

## Changing the download location

To change where data is downloaded to, you can adjust an attribute in the PyUCRio() class that was initialized at the beginning of the code.

Note that the below code is commented out on purpose here since we just want to show how to do this, and not actually do it.

In [6]:
# NOTE: the path you set can be a regular string path (nice for Linux and Mac)
# or a pathlib Path() object (nice for Windows).

#------------------
# rio.download_output_root_path = "some path"
#
# import pathlib
# rio.download_output_root_path = Path("some path")

## Customizing the progress bar

You also have control over the progress bar a bit, where certain methods have additional progress bar parameters to help you customize them as you'd like.

For the `download()` method, the following are available to you:

- `progress_bar_disable`: Disable the progress bar,
- `progress_bar_ncols`: Set the width of the progress bar,
- `progress_bar_ascii`: Change the ASCII character used in the progress bar,
- `progress_bar_desc`: Change the description prefix for the progress bar,

The `progress_bar_*` parameters can be used to enable/disable/adjust the progress bar. Excluding the `progress_bar_disable` parameter, all others are straight pass-throughs to the tqdm progress bar function. The `progress_bar_ncols` parameter allows for adjusting the width. The `progress_bar_ascii` parameter allows for adjusting the appearance of the progress bar. And the `progress_bar_desc` parameter allows for adjusting the description at the beginning of the progress bar. Further details can be found on the [tqdm documentation](https://tqdm.github.io/docs/tqdm/#tqdm-objects).

You can also change the progress bar style in a more global manner, using the `rio.progress_bar_backend` parameter.

Note that the below code is commented out on purpose here since we just want to show how to do this, and not actually do it.

In [7]:
# disable the progress bar in a download() call
# -----------------------------------------------
# r = rio.data.ucalgary.download(dataset_name, start_dt, end_dt, progress_bar_disable=True)

# globally set the progress bar style
# --------------------------------------
# rio.progress_bar_backend = "standard"
# rio.progress_bar_backend = "notebook"
# rio.progress_bar_backend = "auto"  # the default


# NOTE: Just a heads up, if you're working in Spyder, the tqdm progress bar PyUCRio uses doesn't 
# get detected properly. So setting the progress bar to 'standard' is recommended in this circumstance.

# Read data

Downloading data is only one part of the process. To allow you to not have to repeatedly download data, the `download()` and `read()` functions are split into two processes. 

The data reading routines are simple at the core. They take in a list of filenames on your computer, read in those files, and return the results back as an object. Be sure to pass in only one type of data at a time, otherwise the read routine will get rather confused!

The advantage of this is that the read function just needs filenames. You can download data to any storage medium, and manually leverage `glob` like functions to get filenames. This can be beneficial if you don't have an internet connection at the time, but have already downloaded data. Or, you can simply run the `download()` function repeatedly; it will not re-download data if you already have it, unless the `overwrite` parameter is enabled.

There are two methods that can be used for reading data:

1) using the generic method
2) using a specific dataset read function

The generic method is the recommended way as it is simpler. However, if more control is wanted then you can use the specific read functions directly. The generic method simply uses the dataset name to determine which specific read function to use.

In [8]:
# we will show the generic method first, since it is the easiest way
#
# NOTE: we are reading the day of data from Churchill that we downloaded 
# earlier, using 2 parallel processes to improve performance
data = rio.data.ucalgary.read(r.dataset, r.filenames, n_parallel=2)

print(data)
print()

data.pretty_print()

Data(data=[3 RiometerData objects], timestamp=[3 datetimes], metadata=[3 dictionaries], problematic_files=[], calibrated_data=None, dataset=Dataset(name=NORSTAR_RIOMETER_K0_TXT, short_description='NORSTAR Single Fre...))

Data:
  data               : [3 RiometerData objects]
  timestamp          : [3 datetimes]
  metadata           : [3 dictionaries]
  problematic_files  : []
  calibrated_data    : None
  dataset            : Dataset(name=NORSTAR_RIOMETER_K0_TXT, short_description='NORSTAR Single Fre...)


In [9]:
# Since we know we're reading in NORSTAR riometer k0 data, we can also use 
# the specific read routine. Use these specific read functions if you want
# more control than the simpler read() function.
data = rio.data.ucalgary.readers.read_norstar_riometer(r.filenames, n_parallel=2, dataset=r.dataset)

print(data)
print()

data.pretty_print()

Data(data=[3 RiometerData objects], timestamp=[3 datetimes], metadata=[3 dictionaries], problematic_files=[], calibrated_data=None, dataset=Dataset(name=NORSTAR_RIOMETER_K0_TXT, short_description='NORSTAR Single Fre...))

Data:
  data               : [3 RiometerData objects]
  timestamp          : [3 datetimes]
  metadata           : [3 dictionaries]
  problematic_files  : []
  calibrated_data    : None
  dataset            : Dataset(name=NORSTAR_RIOMETER_K0_TXT, short_description='NORSTAR Single Fre...)


# Reading portions of data

You can use the `start_time` and `end_time` parameters to help control data reading to certain periods of time, even though you pass in files with more data in them.

Let's have a look at an example of this.

In [21]:
# download a couple days of data from the Churchill riometer site
dataset_name = "NORSTAR_RIOMETER_K2_TXT"
start_dt = datetime.datetime(2008, 3, 1, 0, 0)
end_dt = datetime.datetime(2008, 3, 5, 23, 59)
site_uid = "chur"
r = rio.data.ucalgary.download(dataset_name, start_dt, end_dt, site_uid=site_uid)

# now let's only read in a bit of that data
start_dt = datetime.datetime(2008, 3, 2, 12, 0)
end_dt = datetime.datetime(2008, 3, 3, 11, 59)
data = rio.data.ucalgary.read(r.dataset, r.filenames, n_parallel=2, start_time=start_dt, end_time=end_dt)

# show the data objects
print(data)
print()
data.pretty_print()
print()
data.data[0].pretty_print()

# show the first and last 5 timestamps
print()
pprint.pprint(data.data[0].timestamp[0:5])
print()
pprint.pprint(data.data[-1].timestamp[-5:])

Downloading NORSTAR_RIOMETER_K2_TXT files:   0%|          | 0.00/3.81M [00:00<?, ?B/s]

Data(data=[2 RiometerData objects], timestamp=[2 datetimes], metadata=[2 dictionaries], problematic_files=[], calibrated_data=None, dataset=Dataset(name=NORSTAR_RIOMETER_K2_TXT, short_description='NORSTAR Single Fre...))

Data:
  data               : [2 RiometerData objects]
  timestamp          : [2 datetimes]
  metadata           : [2 dictionaries]
  problematic_files  : []
  calibrated_data    : None
  dataset            : Dataset(name=NORSTAR_RIOMETER_K2_TXT, short_description='NORSTAR Single Fre...)

RiometerData:
  timestamp   : array(dims=(8640,), dtype=object)
  raw_signal  : array(dims=(8640,), dtype=float32)
  absorption  : array(dims=(8640,), dtype=float32)

array([datetime.datetime(2008, 3, 2, 12, 0),
       datetime.datetime(2008, 3, 2, 12, 0, 5),
       datetime.datetime(2008, 3, 2, 12, 0, 10),
       datetime.datetime(2008, 3, 2, 12, 0, 15),
       datetime.datetime(2008, 3, 2, 12, 0, 20)], dtype=object)

array([datetime.datetime(2008, 3, 3, 11, 58, 40),
       datetime.

# Managing downloaded data

Managing data is hard! For the riometer data, the biggest concern to keep in mind is the available storage. The riometer data is quite small in comparison to other datasets (like All-sky Imager data used in PyAuroraX), but is still something to pay attention to. 

To help with this, we have some utility functions at your fingertips. The `show_data_usage()` function can help show you how much data is on your computer in PyUCRio's download output root path. Then `purge_download_output_root_path()` can delete all the data in that directory.

In [13]:
# to view the amount of data that is currently downloaded, do the following
rio.show_data_usage()

Dataset name              Size   
NORSTAR_RIOMETER_K0_TXT   17.7 MB
SWAN_HSR_K0_H5            7.4 MB 
NORSTAR_RIOMETER_K2_TXT   2.3 MB 

Total size: 27.4 MB


In [14]:
# alternatively, you can get the data usage information returned as a dictionary
data_usage_dict = rio.show_data_usage(return_dict=True)
pprint.pprint(data_usage_dict)

{'NORSTAR_RIOMETER_K0_TXT': {'path_obj': PosixPath('/home/darrenc/pyucrio_data/NORSTAR_RIOMETER_K0_TXT'),
                             'size_bytes': 17700276,
                             'size_str': '17.7 MB'},
 'NORSTAR_RIOMETER_K2_TXT': {'path_obj': PosixPath('/home/darrenc/pyucrio_data/NORSTAR_RIOMETER_K2_TXT'),
                             'size_bytes': 2283291,
                             'size_str': '2.3 MB'},
 'SWAN_HSR_K0_H5': {'path_obj': PosixPath('/home/darrenc/pyucrio_data/SWAN_HSR_K0_H5'),
                    'size_bytes': 7412137,
                    'size_str': '7.4 MB'}}


In [15]:
# to clean up all data we've downloaded, you can delete
# the data using a helper function, or manually delete
# it yourself
#
# delete all data
rio.purge_download_output_root_path()

# delete data for single specific dataset
# aurorax.purge_download_output_root_path(dataset_name="NORSTAR_RIOMETER_K0_TXT")
