Skip to content

Stage 2: Handling Dictable Config Files

j-derrico edited this page Feb 5, 2023 · 3 revisions

Background

The term "Dictable" here refers to any of a set of configuration files or information that store data as keys and values, and that can be ingested with Python as a dictionary. There are several standard file formats that fit this description and that Python knows how to ingest – JSON and INI from the standard library, and YAML and Fortran namelists from widely-used 3rd party libraries. UFS Components and workflows use many of these types of files, along with several others that could be treated similarly but that don't follow a standard. Here we will focus on enabling a tool to ingest those file types that already have an interface available in a Python library.

Here is an overview of the use of the files that will be targeted for this set of tools:

  • Fortran Namelists

    • Most of the UFS Component software leverages this type of file for configuration.
    • This format allows for named sections of key/value pairs, translating to a 2 tiered dict of dicts in Python.
    • Requires the 3rd party library f90nml to ingest the files.
    • Basic example:
      &salad
        base = kale
        vegetable = tomatoes, cucumbers
        dressing = balsamic
      &dessert
        type = cake
        flavor = chocolate
      
  • YAML Files

    • Workflows typically leverage this type of configuration file since it provides a human-readable, easy syntax option for configuring multiple components, platforms, user settings, etc in a single data structure.
    • There is no limit to the depth of the dict data structure that results from this use case.
    • Requires the 3rd party library pyyaml to ingest the files.
    • Basic example:
      salad:
        base: kale
        vegetable:
          - tomatoes
          - cucumbers
        dressing: balsamic
      dessert:
        type: cake
        flavor: chocolate
      
  • INI Files

    • Some workflows use this file format since it provides bash-like syntax and enables sections for high-level organization
    • This format allows for named sections of key/value pairs, translating to a 2 tiered dict of dicts in Python.
    • Parsed by the Python standard library configparser
    • Basic example:
      [salad]
      base = kale
      vegetable = tomatoes, cucumbers
      dressing = balsamic
      
      [dessert]
      type = cake
      flavor = chocolate
      
  • Bash Files

    • Some workflows use this file format since they are written primarily in bash
    • Similar to INI files, but without sections, resulting in a single layer of key/value pairs
    • One major downside to this file type is the lack of sections, resulting in a proliferation of variables that are labeled for their use
    • Parsed by the Python standard library configparser
    • Basic example:
      salad_base=kale
      salad_vegetable=tomatoes,cucumbers
      salad_dressing=balsamic
      
      
      dessert_type=cake
      dessert_flavor=chocolate
      
  • JSON Files

    • This file may not be leveraged in the UFS, but is a standard file type, so will be included here for reference and completeness.
    • It is very similar to YAML files, but with a more Python dictionary feel.
    • Parsed by the Python standard library json
    • Basic example:
      { 	    
              "salad": {
          	    "base": "kale",
          	    "vegetable": [
          		    "tomatoes",
          		    "cucumbers"
          	    ],
          	    "dressing": "balsamic"
          },
          "dessert": {
      	    "type": "cake",
      	    "flavor": "chocolate"
          }
      }
      

Description

Stage II of the development of tools to manage dictable configuration files is to enable a user to update dictable configuration files by providing a set of key/value pairs to override those in an original file. While some use cases may still prefer to use templates, this added feature would allow a user to override ANY  setting, even those that are hard-coded in a template, for example.

In our basic example above describing menu items via Fortran namelist, let's say that a user would like a salad with spring mix instead of kale, but does not wish to change any other part of the salad,  the user may provide a YAML file to define this new configuration that looks like this:

salad:
  base: spring mix

The config management tool would then provide the user with a working, updated namelist like this:

&salad
  base = spring mix
  vegetable = tomatoes, cucumbers
  dressing = balsamic
&dessert
  type = cake
  flavor = chocolate

To achieve this goal, the User provides a dictable configuration file (or command line configuration information such as salad.base=spring mix) and the original namelist to be modified. Both the user-preferred dictable and the base configuration file are parsed into Config objects. Do some validation to ensure the user provided keys that can be updated, and report back to the user. The base configuration is then updated with any user-preferred values. The tool then writes the updated config file in the user-preferred format (defaults to the input format). For example, in the use case above the user may wish to specify that the final result be in YAML, or even INI format.

image2022-8-14_11-14-48

In the validation step, the main goal is to report to the user that the sections and keys provided to update the base dictable are already present. When they are not present, the tool should provide a very clear, stern warning that the user is doing something a bit shady by adding new sections or keys. We'll allow it since the user knows best, but definitely want to report on it.

Areas for future expansion include telling a user that similar sections or keys exist and asking whether they made a mistake. This is kind of like the behavior of git when you type git comit and it says "did you mean git commit?".

This tool should only need to take advantage of the existing Config class to create the Config objects, so no new classes are explicitly defined here.

The workflow described above gives us an additional feature for free – the ability to compare two dictables. The UFS Community often runs into the debugging case of "your namelist worked, why does mine fail" and are trying to compare two namelists that have been defined with keys and sections in arbitrary order, the task becomes tedious. Here, a User can provide two namelists and a "compare" flag and the "stern warning" becomes an account of all the differences between the two namelists. 

User Interface

The User should always provide at least two dictable objects, the user-dict and the base-dict. We can allow the user to provide a section of a more nested dict, or multiple dictable files to be parsed into the same Config object. The Base dictable should always be a single file. The user should also provide an output location for the resulting updated file. An optional argument for the output file type will allow the user to specify an output type different from that provided as input.

The default behavior of ingesting a dictable is to use the file extension, but the UI can also provide an optional flag to set one explicitly for those files that might not contain an expected one.

The  User can set a "compare" flag if they only need to compare two input dictables. Still, the user can provide user-preferred dictables as described above. 

The name of the tool can be set_config.py or something like it.

Stage 2 Discussion Page