tb - 10/18/2021 - Reproduce results from Ryan and output (1) number of free parameters ; and (2) mean error statistics of the net shortwave flux prediction (in $W m^{-2} $) from best-performing model.

Kind instruction email from Ryan L below:

•	The Python library, called ml4rt, is available here.  It's under the MIT licence, so feel free to use/modify for any non-commercial purposes.
•	To read the files with learning examples (all NetCDF files in the directories tropical_sites and non_tropical_sites), use example_io.read_file, which you can find here.  I recommend always using the options exclude_summit_greenland = True (the RRTM, which is the model we're trying to emulate, made some errors on these profiles, so the labels are inaccurate) and max_heating_rate_k_day = numpy.inf (in other words, don't filter by heating rate).  example_io.read_file returns a dictionary, and the format of this dictionary is documented in the method string for example_io.read_file.
•	One file (non_tropical_sites/learning_examples_20170101-20181224.nc) contains normalization parameters, used to normalize data before input to the neural net.
•	To normalize data, use the method normalization.normalize_data (here), where new_example_dict contains the examples you want to normalize and training_example_dict contains normalization parameters (read from non_tropical_sites/learning_examples_20170101-20181224.nc).  For some example usage of normalization.normalize_data, see neural_net.read_file_for_generator (here), where the method is called 3 times.
•	To read the model (a U-net++), use the method neural_net.read_model (here).  Make sure the metafile (model_metadata.dill) is always in the same directory as the actual model (model.h5).
•	To read the data and prepare it for input to the the neural net (i.e., pre-process it the same way the training data was pre-processed), use neural_net.create_data (here) or neural_net.data_generator (here).
•	To apply the trained neural net to some data, use neural_net.create_data or neural_net.data_generator to read/pre-process the data, then use neural_net.apply_model (here) with net_type_string = "u_net".  I recommend verbose = True and num_examples_per_batch = 5000 (but you may need it smaller if you have a computer with limited RAM).  predictor_matrix will be the predictor_matrix returned by neural_net.create_data or neural_net.data_generator.

I hope this is enough to get you started.  Please let me know if you have any questions about the code.  It's a bit complicated because [a] the neural net has both scalar and vector inputs (the scalars being solar zenith angle and surface albedo, the vectors being 1-D profiles of other variables); [b] the neural net has both scalar and vector outputs (the scalars being TOA upwelling flux, sfc downwelling flux, and net flux; the vector being a 1-D profile of heating rates; other possible vector outputs are full profiles of upwelling and downwelling flux, but I've since stopped predicting these, since NWP models don't require them from a paramzn).

# Imports

In [22]:
from ml4rt.io.example_io import *
from ml4rt.utils.normalization import *
from ml4rt.machine_learning.neural_net import *
#from ml4rt.machine_learning.u_net_architecture import *
from ml4rt.machine_learning.u_net_pp_architecture import *

import dill

import numpy as np

# Read File

In [2]:
read_file

<function ml4rt.io.example_io.read_file(netcdf_file_name, exclude_summit_greenland=False, max_heating_rate_k_day=41.5, id_strings_to_read=None, allow_missing_ids=False)>

In [3]:
path_data = '/work/FAC/FGSE/IDYST/tbeucler/default/tbeucler/2021_Ryan_Lagerquist_SW/'

In [4]:
read2020 = read_file(path_data+'tropical_sites/learning_examples_2020.nc',
                     exclude_summit_greenland=True,
                     max_heating_rate_k_day=np.inf)



In [5]:
read2020.keys()

dict_keys(['example_id_strings', 'scalar_predictor_names', 'vector_predictor_names', 'scalar_target_names', 'vector_target_names', 'scalar_predictor_matrix', 'vector_predictor_matrix', 'scalar_target_matrix', 'vector_target_matrix', 'valid_times_unix_sec', 'standard_atmo_flags', 'heights_m_agl'])

In [6]:
read2020['scalar_predictor_matrix'].shape

(264638, 6)

In [7]:
read2020['scalar_predictor_names']

['zenith_angle_radians',
 'albedo',
 'latitude_deg_n',
 'longitude_deg_e',
 'column_liquid_water_path_kg_m02',
 'column_ice_water_path_kg_m02']

In [8]:
read2020['vector_predictor_matrix'].shape

(264638, 73, 12)

In [9]:
read2020['vector_predictor_names']

['pressure_pascals',
 'temperature_kelvins',
 'specific_humidity_kg_kg01',
 'liquid_water_content_kg_m03',
 'ice_water_content_kg_m03',
 'liquid_water_path_kg_m02',
 'ice_water_path_kg_m02',
 'vapour_path_kg_m02',
 'upward_liquid_water_path_kg_m02',
 'upward_ice_water_path_kg_m02',
 'upward_vapour_path_kg_m02',
 'relative_humidity_unitless']

In [10]:
read2020['scalar_target_names']

['shortwave_surface_down_flux_w_m02', 'shortwave_toa_up_flux_w_m02']

In [11]:
read2020['scalar_target_matrix'].shape

(264638, 2)

In [12]:
read2020['vector_target_names']

['shortwave_down_flux_w_m02',
 'shortwave_up_flux_w_m02',
 'shortwave_heating_rate_k_day01',
 'shortwave_down_flux_increment_w_m03',
 'shortwave_up_flux_increment_w_m03']

In [13]:
read2020['vector_target_matrix'].shape

(264638, 73, 5)

# Normalize data

## Load normalization parameters from non tropical sites

In [14]:
path_norm_param = path_data+'non_tropical_sites/learning_examples_20170101-20181224.nc'

In [15]:
norm_param = read_file(path_norm_param,
                     exclude_summit_greenland=True,
                     max_heating_rate_k_day=np.inf)



In [16]:
norm_param.keys()

dict_keys(['example_id_strings', 'scalar_predictor_names', 'vector_predictor_names', 'scalar_target_names', 'vector_target_names', 'scalar_predictor_matrix', 'vector_predictor_matrix', 'scalar_target_matrix', 'vector_target_matrix', 'valid_times_unix_sec', 'standard_atmo_flags', 'heights_m_agl'])

## Apply normalization 
Following [https://github.com/thunderhoser/ml4rt/blob/8ad36b52ead1c4870bdccaa804dee9a52144cce0/ml4rt/machine_learning/neural_net.py#L213]

In [17]:
normalize_data

<function ml4rt.utils.normalization.normalize_data(new_example_dict, training_example_dict, normalization_type_string, min_normalized_value=-1.0, max_normalized_value=1.0, separate_heights=False, apply_to_predictors=True, apply_to_vector_targets=True, apply_to_scalar_targets=True)>

The normalization is automatically applied as part of data_generator (see _read_file_for_generator)

In [18]:
data_generator

<function ml4rt.machine_learning.neural_net.data_generator(option_dict, for_inference, net_type_string)>

# Read UNet++ trained by Ryan

In [20]:
read_model

<function ml4rt.machine_learning.neural_net.read_model(hdf5_file_name)>

In [21]:
path_model = path_data + 'actual_model/model.h5'

In [23]:
model_arc = dill.load(open(path_data+'actual_model/model_metadata.dill','rb'))

In [24]:
model_arc.keys()

dict_keys(['num_epochs', 'num_training_batches_per_epoch', 'training_option_dict', 'num_validation_batches_per_epoch', 'validation_option_dict', 'net_type_string', 'loss_function_or_dict', 'do_early_stopping', 'plateau_lr_multiplier'])

In [27]:
model_arc['net_type_string']

'u_net'

In [42]:
model_arc['training_option_dict']

{'scalar_predictor_names': ['zenith_angle_radians', 'albedo'],
 'vector_predictor_names': ['pressure_pascals',
  'temperature_kelvins',
  'specific_humidity_kg_kg01',
  'liquid_water_content_kg_m03',
  'ice_water_content_kg_m03',
  'relative_humidity_unitless',
  'liquid_water_path_kg_m02',
  'ice_water_path_kg_m02',
  'vapour_path_kg_m02',
  'upward_liquid_water_path_kg_m02',
  'upward_ice_water_path_kg_m02',
  'upward_vapour_path_kg_m02'],
 'scalar_target_names': ['shortwave_surface_down_flux_w_m02',
  'shortwave_toa_up_flux_w_m02'],
 'vector_target_names': ['shortwave_heating_rate_k_day01'],
 'heights_m_agl': array([1.00e+01, 2.00e+01, 4.00e+01, 6.00e+01, 8.00e+01, 1.00e+02,
        1.20e+02, 1.40e+02, 1.60e+02, 1.80e+02, 2.00e+02, 2.25e+02,
        2.50e+02, 2.75e+02, 3.00e+02, 3.50e+02, 4.00e+02, 4.50e+02,
        5.00e+02, 6.00e+02, 7.00e+02, 8.00e+02, 9.00e+02, 1.00e+03,
        1.10e+03, 1.20e+03, 1.30e+03, 1.40e+03, 1.50e+03, 1.60e+03,
        1.70e+03, 1.80e+03, 1.90e+03, 2.0

In [45]:
dic_custom = model_arc['training_option_dict'].copy()

In [47]:
dic_custom['vector_target_names'] = 

{'scalar_predictor_names': ['zenith_angle_radians', 'albedo'],
 'vector_predictor_names': ['pressure_pascals',
  'temperature_kelvins',
  'specific_humidity_kg_kg01',
  'liquid_water_content_kg_m03',
  'ice_water_content_kg_m03',
  'relative_humidity_unitless',
  'liquid_water_path_kg_m02',
  'ice_water_path_kg_m02',
  'vapour_path_kg_m02',
  'upward_liquid_water_path_kg_m02',
  'upward_ice_water_path_kg_m02',
  'upward_vapour_path_kg_m02'],
 'scalar_target_names': ['shortwave_surface_down_flux_w_m02',
  'shortwave_toa_up_flux_w_m02'],
 'vector_target_names': ['shortwave_heating_rate_k_day01'],
 'heights_m_agl': array([1.00e+01, 2.00e+01, 4.00e+01, 6.00e+01, 8.00e+01, 1.00e+02,
        1.20e+02, 1.40e+02, 1.60e+02, 1.80e+02, 2.00e+02, 2.25e+02,
        2.50e+02, 2.75e+02, 3.00e+02, 3.50e+02, 4.00e+02, 4.50e+02,
        5.00e+02, 6.00e+02, 7.00e+02, 8.00e+02, 9.00e+02, 1.00e+03,
        1.10e+03, 1.20e+03, 1.30e+03, 1.40e+03, 1.50e+03, 1.60e+03,
        1.70e+03, 1.80e+03, 1.90e+03, 2.0

# Design own UNet++ to predict shortwave radiative flux

In [29]:
data_gen_test = data_generator(model_arc['training_option_dict'],False,'u_net_pp')

In [37]:
dir(data_gen_test)

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_yieldfrom',
 'send',
 'throw']

In [41]:
create_data_test = create_data(model_arc['training_option_dict'],False,'u_net',exclude_summit_greenland=True)

Reading training examples (for normalization) from: "/scratch1/RDARCH/rda-ghpcs/Ryan.Lagerquist/ml4rt_project/examples/non_tropical_sites/learning_examples_20190101-20201031.nc"...


FileNotFoundError: [Errno 2] No such file or directory: b'/scratch1/RDARCH/rda-ghpcs/Ryan.Lagerquist/ml4rt_project/examples/non_tropical_sites/learning_examples_20190101-20201031.nc'