Skip to content

Python Library

William W. Kimball, Jr., MBA, MSIS edited this page Jan 9, 2023 · 8 revisions
  1. Introduction
  2. Examples
    1. Parser Setup
    2. Getting Data From The Document
    3. Changing Data In The Document
    4. Merging Documents

Introduction

The YAML Path project makes a reference implementation of its standard available as a Python library (module). This library may be used by other projects to easily integrate YAML Path capabilities. While there are several supporting library files like enumerations, types, and exceptions, the most interesting library files include:

  • yamlpath.py: Encapsulate a YAML Path and its parsing logic.
  • processor.py: Processes YAMLPath instances to read or write data to YAML/Compatible sources.
  • eyamlprocessor.py: Extends the Processor class to support EYAML data encryption and decryption.
  • merger.py: Merges multiple documents together.

Examples

The following are rudimentary examples of using the core classes of the YAMLPath library.

Note that these examples use ConsolePrinter to handle STDOUT and STDERR messaging. Even if you won't be using STDOUT or STDERR, some kind of logger must be passed to these libraries so they can write messages somewhere. These messages contain important information about issues, often with remediation suggestions. Your custom message handler or logger must be a subclass of -- or provide the same API as -- ConsolePrinter; review the header documentation in consoleprinter.py for details.

Parser Setup

This library is based on ruamel.yaml and thus requires a prepared instance of its YAML parser. A convenience method is available for preparing this parser for you: Parsers.get_yaml_editor() from the Parsers helper class.

As mentioned above, you must also provide some kind of logging facility, even if you won't be writing a command-line tool. This example will illustrate how to set up a quiet version of the ConsolePrinter, which will only emit critical failure messages.

from types import SimpleNamespace

from yamlpath.common import Parsers
from yamlpath.wrappers import ConsolePrinter
from yamlpath import Processor


# The various classes of this library must be able to write messages somewhere
# when things go bad.  This project provides a CLI-centric logging class named
# ConsolePrinter.  Even when not writing a CLI tool, you must still configure
# and pass ConsolePrinter or a class of your own with the same public API.  For
# just muting logging output -- except for unrecoverable errors -- you can use
# this simple configuration object:
logging_args = SimpleNamespace(quiet=True, verbose=False, debug=False)
log = ConsolePrinter(logging_args)

# Prep the YAML parser and round-trip editor (tweak to your needs).  You do not
# have to use Parsers.get_yaml_editor() but you must create a properly-
# configured instance of ruamel.yaml.YAML.
yaml = Parsers.get_yaml_editor()

# At this point, you'd load or parse your YAML file, stream, or string.  This
# example demonstrates loading YAML data from an external file.  You could also
# use the same function to load data from STDIN or even a String variable.  See
# the Parser class for more detail.
yaml_file = "your-file.yaml"
(yaml_data, doc_loaded) = Parsers.get_yaml_data(yaml, log, yaml_file)
if not doc_loaded:
    # There was an issue loading the file; an error message has already been
    # printed via ConsolePrinter.
    exit(1)

# Pass the logging facility and parsed YAML data to the YAMLPath Processor
processor = Processor(log, yaml_data)

# At this point, the Processor is ready to handle YAML Paths

Getting Data From The Document

These libraries use Generators to get nodes from parsed YAML data. This enables your code to receive and process data as it is found within the document -- one node at a time -- rather than being forced to wait for all matching data to be found. Note that when your YAML Path contains a Collector, all matches for that Collector must first be gathered before the entire List of collected nodes is returned, so there is some delay in this case.

Identify which node(s) to get from your document via a YAML Path. Be aware that the get_nodes method will attempt to create any missing document structure in order to match the supplied YAML Path unless you set mustexist=True. The default is False, which is only to ensure parity with the default behavior of the set_value method. Be careful to either set mustexist=True or provide a default value when deliberately using this.

You should catch yamlpath.exceptions.YAMLPathExceptions unless you prefer Python's native stack traces. When using EYAML, you should also catch yamlpath.eyaml.exceptions.EYAMLCommandExceptions for the same reason. Whether you are working with a single result or many, you should consume the Generator output with a pattern similar to:

from yamlpath import YAMLPath
from yamlpath.exceptions import YAMLPathException

yaml_path = YAMLPath("see.segments.of.a.yaml.path")
try:
    for node_coordinate in processor.get_nodes(yaml_path, mustexist=True):
        log.debug("Got {} from '{}'.".format(node_coordinate, yaml_path))
        # Do something with each node_coordinate.node (the actual data)
except YAMLPathException as ex:
    # If merely retrieving data, this exception may be deemed non-critical
    # unless your later code absolutely depends upon a result.
    log.error(ex)

Changing Data In The Document

At its simplest, you only need to supply the the YAML Path to one or more nodes to update, and the value to apply to them. Catching yamlpath.exceptions.YAMLPathException is optional but usually preferred over allowing Python to dump the call stack in front of your users. When using EYAML, the same applies to yamlpath.eyaml.exceptions.EYAMLCommandException.

from yamlpath.exceptions import YAMLPathException

try:
    processor.set_value(yaml_path, new_value)
except YAMLPathException as ex:
    log.critical(ex, 119)
except EYAMLCommandException as ex:
    log.critical(ex, 120)

Merging Documents

A document merge naturally requires at least two documents. At the code-level, this means two populated DOM objects (populated instances of yaml_data from above). You do not need to use a Processor for merging. In the least amount of code, a merge looks like this:

from yamlpath.exceptions import YAMLPathException
from yamlpath.merger.exceptions import MergeException
from yamlpath.merger import Merger, MergerConfig

# Obtain or build the lhs_data and rhs_data objects using get_yaml_data or
# equivalent.

# You'll still need to supply a logger and some arguments used by the merge
# engine.  For purely default behavior, you could create args as a bare
# SimpleNamespace.  Initialize the new Merger instance with the LHS document.
merger = Merger(log, lhs_data, MergerConfig(log, args))

# Merge RHS into LHS
try:
    merger.merge_with(rhs_data)
except MergeException as mex:
    log.critical(mex, 129)
except YAMLPathException as yex:
    log.critical(yex, 130)

# At this point, merger.data is the merged result; do what you will with it,
# including merging more data into it.  When you are ready to dump (write)
# out the merged data, you must prepare the document and your
# ruamel.yaml.YAML instance -- usually obtained from func.get_yaml_editor()
# -- like this:
merger.prepare_for_dump(my_yaml_editor)