# Supplementary: Pre-Loading Data
## Goals
- Loading data and setting it up into time series format is very time-consuming. This file performs this task and saves it in a file

# Note on Data
All data is in **0000UTC**, where as we are in **0600UTC**. Data was also measured in 0600UTC.

**Sol irr data** is split across days at 0600UTC.

**Cloud data** is split across days at 0000UTC. However, since the cloud data is measured via camera, the value is set to -100 when it is too dark. Additionally, it just completely has no data for a good portion of night.

For our purposes, we are ignoring all times in which cloud data does not exist or is equal to -100. Thus, **we are only including data for daylight**.

In [1]:
from IPython.display import Audio, display

# play a sound whenever an exception is hit
def play_sound(self, etype, value, tb, tb_offset=None):
    self.showtraceback((etype, value, tb), tb_offset=tb_offset)
    display(Audio("../../sfx.mp3", autoplay=True))
get_ipython().set_custom_exc((Exception,), play_sound)

# put this function at the end of a long cell to play a sound when completed
def beep_completed():
    display(Audio("../../sfx.mp3", autoplay=True))

In [2]:
import numpy as np
import xarray as xr
import pickle

from datetime import date
from pre_load_utility import data
from sklearn.preprocessing import MinMaxScaler

In [3]:
import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices("GPU")
for device in gpu_devices:
    tf.config.experimental.set_memory_growth(device, True)

# All Data!

In [3]:
data(n_steps_in=3, n_steps_out=2, date_ranges=None, scale=False, resample="15Min",
     prefix="../../!data/pre-loaded/02all_data_3in_2out/")
beep_completed()

---------paths to open determined
---------sol irr data opened
---------cloud coverage data opened
---------data merged
---------data loaded
---------data preprocessed
---------time series set up
---------written to file, filename: ../../!data/pre-loaded/02all_data_3in_2out/all_dates.3.2.15Min.unscaled. followed by X/y


In [6]:
data(n_steps_in=57*6, n_steps_out=57, date_ranges=None, scale=False, resample="15Min",
     prefix="../../!data/pre-loaded/03all_data_57*6in_57out/")
beep_completed()

---------paths to open determined
---------sol irr data opened
---------cloud coverage data opened
---------data merged
---------data loaded
---------data preprocessed
---------time series set up
---------written to file, filename: ../../!data/pre-loaded/03all_data_57*6in_57out/all_dates.342.57.15Min.unscaled. followed by X/y


In [7]:
data(n_steps_in=4*4, n_steps_out=4, date_ranges=None, scale=False, resample="15Min",
     prefix="../../!data/pre-loaded/04all_data_16in_4out/")
beep_completed()

---------paths to open determined
---------sol irr data opened
---------cloud coverage data opened
---------data merged
---------data loaded
---------data preprocessed
---------time series set up
---------written to file, filename: ../../!data/pre-loaded/04all_data_16in_4out/all_dates.16.4.15Min.unscaled. followed by X/y


In [3]:
data(n_steps_in=57*14, n_steps_out=57*7, date_ranges=None, scale=False, resample="15Min",
     prefix="../../!data/pre-loaded/05all_data_57*14in_57*7out/")
beep_completed()

---------paths to open determined
---------sol irr data opened
---------cloud coverage data opened
---------data merged
---------data loaded
---------data preprocessed
---------time series set up
---------written to file, filename: ../../!data/pre-loaded/05all_data_57*14in_57*7out/all_dates.798.399.15Min.unscaled. followed by X/y
