Skip to content

yaml merge Array of Hash Options

William W. Kimball, Jr., MBA, MSIS edited this page Oct 25, 2020 · 8 revisions
  1. Introduction
  2. Merging All Records Together
  3. Deeply Merge Records Together
  4. Block RHS Records
  5. Overwrite LHS Records
  6. Accept Only Unique Records
  7. Configuration File Options
    1. Configuration File Section: defaults
    2. Configuration File Section: rules
    3. Configuration File Section: keys

This document is part of the body of knowledge about yaml-merge, one of the reference command-line tools provided by the YAML Path project.

Introduction

The yaml-merge command-line tool enables users to control how it merges Arrays-of-Hashes (AoH). This is different from merging regular Arrays, discussed elsewhere. By default, every Hash element of the RHS document's Array will be appended to the LHS AoH at the same location. The available AoH merge options include:

  1. all (the default) is as described above; every Hash element is appended from the RHS document to the LHS document.
  2. deep treats all Hashes as records with a mandatory identity key, deeply merging records with matching identity keys and simply appending the remainder from the RHS document to the LHS document.
  3. left causes RHS AoHs to be ignored when an AoH already exists at the same location within the LHS document. RHS AoHs are retained only where there is no LHS AoH at the same location.
  4. right causes LHS AoHs to be entirely replaced by the RHS AoH when an AoH exists at the same location within both documents. LHS AoHs are retained only where there is no RHS AoH at the same location.
  5. unique ignores RHS AoH elements which already exist -- in their entirety -- within the LHS AoH, appending the rest. There is no identity key comparison for this mode; rather, each entire RHS Hash is compared for an LHS equivalent.

Each of these options will be explored in the following sections. All sections will use these two documents for their discussions:

File: LHS.yaml

---
baubles:
  - name: Doohickey
    sku: 0-000-1
    price: 4.75
    weight: 2.7g
  - name: Doodad
    sku: 0-000-2
    price: 10.5
    weight: 5g
  - name: Oddball
    sku: 0-000-3
    price: 25.99
    weight: 25kg
coordinates:
  - x: 4
    y: -2
  - x: 1
    y: -1
  - x: 0
    y: 0
lhs_exclusive:
  - step: 1
    action: echo Hello, lefties of the World!
  - step: 2
    action: exit 0

File: RHS.yaml

---
baubles:
  - name: Fob
    sku: 0-000-4
    price: 0.99
    weight: 18mg
  - name: Doohickey
    price: 10.5
  - name: Oddball
    sku: 0-000-3
    description: This ball is odd
coordinates:
  - x: 0
    y: 0
  - x: 1
    y: 1
  - x: 4
    y: 2
rhs_exclusive:
  - step: 1
    action: echo Hello, righties of the World!
  - step: 2
    action: exit 0

Merging All Records Together

This is the default mode and ensures that every AoH element from every merge document is retained, including duplicates. No effort is made to identify or use an identity key. When the two example documents are merged with --aoh=all or -O all, the resulting document becomes:

---
baubles:
  - name: Doohickey
    sku: 0-000-1
    price: 4.75
    weight: 2.7g
  - name: Doodad
    sku: 0-000-2
    price: 10.5
    weight: 5g
  - name: Oddball
    sku: 0-000-3
    price: 25.99
    weight: 25kg
  - name: Fob
    sku: 0-000-4
    price: 0.99
    weight: 18mg
  - name: Doohickey
    price: 10.5
  - name: Oddball
    sku: 0-000-3
    description: This ball is odd
coordinates:
  - x: 4
    y: -2
  - x: 1
    y: -1
  - x: 0
    y: 0
  - x: 0
    y: 0
  - x: 1
    y: 1
  - x: 4
    y: 2
lhs_exclusive:
  - step: 1
    action: echo Hello, lefties of the World!
  - step: 2
    action: exit 0
rhs_exclusive:
  - step: 1
    action: echo Hello, righties of the World!
  - step: 2
    action: exit 0

For this particular data-set, this merge mode isn't especially helpful. However, it would be perfect for a data-set which comprised AoH elements such that duplicates are desired.

Deeply Merge Records Together

For the sample data-set, this is the most useful AoH merge mode for baubles but it is destructive to coordinates, so use this very powerful mode with deliberate intent. A deep merge treats each AoH element as if it were a data record. Each such record is uniquely identified by an identity key which is used during the merge to match records together.

An identity key is required for this mode. If any record is found without this key name, the merge is aborted and an error emitted. Under default conditions, an implicit identity key is determined; the first key name of the first RHS AoH record becomes the identity key for the entire merge. Users can explicitly instruct yaml-merge to use a different identity key for each AoH in the data by using a configuration file.

When --aoh=deep or -O deep are set, the sample documents produce this result:

---
baubles:
  - name: Doohickey
    sku: 0-000-1
    price: 10.5
    weight: 2.7g
  - name: Doodad
    sku: 0-000-2
    price: 10.5
    weight: 5g
  - name: Oddball
    sku: 0-000-3
    price: 25.99
    weight: 25kg
    description: This ball is odd
  - name: Fob
    sku: 0-000-4
    price: 0.99
    weight: 18mg
coordinates:
  - x: 4
    y: 2
  - x: 1
    y: 1
  - x: 0
    y: 0
lhs_exclusive:
  - step: 1
    action: echo Hello, lefties of the World!
  - step: 2
    action: exit 0
rhs_exclusive:
  - step: 1
    action: echo Hello, righties of the World!
  - step: 2
    action: exit 0```

In this case, `name` was implicitly determined to be the identity key for the `baubles` merge and `x` for `coordinates`.  This worked especially well for the baubles, reducing 6 records to just 4 by correctly merging updates from the RHS document into the LHS document and appending one novel record to the end of the LHS set.  However, using `x` as the identity key for those coordinates destroyed about half of the records.  They were really destroyed; the records were just merged by matching `x` values, overwriting the `y` field at each match.  As both `lhs_unique` and `rhs_unique` had no counterparts in the opposite merge documents, they were brought into the final result without change.

## Block RHS Records

It is possible to entirely discard all AoH records from the RHS document at any location where the LHS document already presents AoH records.  For any parent location present in both document, the records in the RHS document are not evaluated, they are simply discarded.  Any AoH records located in the RHS document where the LHS document has none, are added.

Using `--aoh=left` or `-O left` with the sample documents produces this result:

```yaml
---
baubles:
  - name: Doohickey
    sku: 0-000-1
    price: 4.75
    weight: 2.7g
  - name: Doodad
    sku: 0-000-2
    price: 10.5
    weight: 5g
  - name: Oddball
    sku: 0-000-3
    price: 25.99
    weight: 25kg
coordinates:
  - x: 4
    y: -2
  - x: 1
    y: -1
  - x: 0
    y: 0
lhs_exclusive:
  - step: 1
    action: echo Hello, lefties of the World!
  - step: 2
    action: exit 0
rhs_exclusive:
  - step: 1
    action: echo Hello, righties of the World!
  - step: 2
    action: exit 0

As you can see, none of the baubles or coordinates records from the RHS document appear in the outcome. Because both lhs_exclusive and rhs_exclusive had no counterpart in their opposite documents, they were added unchanged to the merged document.

Overwrite LHS Records

It is also possible to completely discard all AoH records in the LHS document where there are also AoH records in the RHS document at identical location. Setting --aoh=right or -O right produces this result:

---
baubles:
  - name: Fob
    sku: 0-000-4
    price: 0.99
    weight: 18mg
  - name: Doohickey
    price: 10.5
  - name: Oddball
    sku: 0-000-3
    description: This ball is odd
coordinates:
  - x: 0
    y: 0
  - x: 1
    y: 1
  - x: 4
    y: 2
lhs_exclusive:
  - step: 1
    action: echo Hello, lefties of the World!
  - step: 2
    action: exit 0
rhs_exclusive:
  - step: 1
    action: echo Hello, righties of the World!
  - step: 2
    action: exit 0

Inverse the behavior of Block RHS Records, this merge discards every baubles and coordinates AoH record from the LHS document because the RHS document also defined the same AoHs. Again, because both lhs_exclusive and rhs_exclusive had no counterpart in their opposite documents, they were added unchanged to the merged document.

Accept Only Unique Records

Some data-sets simply need to be de-duplicated. In this mode, each RHS record is compared in its entirety against every record in the LHS AoH. When there is any difference at all, the RHS record is appended to the LHS AoH set. The record is otherwise discarded.

Setting --aoh=unique or -O unique, the sample documents produce this result:

---
baubles:
  - name: Doohickey
    sku: 0-000-1
    price: 4.75
    weight: 2.7g
  - name: Doodad
    sku: 0-000-2
    price: 10.5
    weight: 5g
  - name: Oddball
    sku: 0-000-3
    price: 25.99
    weight: 25kg
  - name: Fob
    sku: 0-000-4
    price: 0.99
    weight: 18mg
  - name: Doohickey
    price: 10.5
  - name: Oddball
    sku: 0-000-3
    description: This ball is odd
coordinates:
  - x: 4
    y: -2
  - x: 1
    y: -1
  - x: 0
    y: 0
  - x: 1
    y: 1
  - x: 4
    y: 2
lhs_exclusive:
  - step: 1
    action: echo Hello, lefties of the World!
  - step: 2
    action: exit 0
rhs_exclusive:
  - step: 1
    action: echo Hello, righties of the World!
  - step: 2
    action: exit 0

This result is more interesting for the coordinates AoH than for baubles or the two exclusive AoH sets. You can see that for coordinates, the duplicate {x = 0, y = 0} record was discarded. As there were no entirely duplicate AoH records for baubles -- recall that the unique mode gives no consideration to identity keys -- every record from RHS was appended to the LHS list.

Configuration File Options

The yaml-merge tool can read per YAML Path merging options from an INI-Style configuration file via its --config (-c) argument. Whereas the --aoh (-O) argument supplies an overarching mode for merging AoHs, using a configuration file permits far more precise control whenever you need a different mode for specific parts of the merge documents.

Configuration File Section: defaults

The [defaults] section permits a key named, aoh, which behaves identically to the --aoh (-O) command-line argument to the yaml-merge tool. The [defaults]aoh setting is overridden by the same-named command-line argument, when supplied. In practice, this file may look like:

File merge-options.ini

[defaults]
aoh = all

Note the spaces around the = sign are optional but only an = sign may be used to separate each key from its value.

Configuration File Section: rules

The [rules] section takes any YAML Paths as keys and any of the AoH merge modes that are available to the --aoh (-O) command-line argument. This enables extremely fine precision for applying the available modes.

Using the same two documents as all prior examples, adding a configuration file with these contents:

[rules]
baubles = deep
/coordinates = unique

... would produce this merged document:

---
baubles:
  - name: Doohickey
    sku: 0-000-1
    price: 10.5
    weight: 2.7g
  - name: Doodad
    sku: 0-000-2
    price: 10.5
    weight: 5g
  - name: Oddball
    sku: 0-000-3
    price: 25.99
    weight: 25kg
    description: This ball is odd
  - name: Fob
    sku: 0-000-4
    price: 0.99
    weight: 18mg
coordinates:
  - x: 4
    y: -2
  - x: 1
    y: -1
  - x: 0
    y: 0
  - x: 1
    y: 1
  - x: 4
    y: 2
lhs_exclusive:
  - step: 1
    action: echo Hello, lefties of the World!
  - step: 2
    action: exit 0
rhs_exclusive:
  - step: 1
    action: echo Hello, righties of the World!
  - step: 2
    action: exit 0

Notice:

  1. The baubles AoH was deeply merged, producing the ideal result whereby the RHS records updated matching LHS records and new RHS records were appended to the LHS set.
  2. The /coordinates AoH was uniquely merged, producing the ideal result for it whereby duplicate coordinates were removed.

Configuration File Section: keys

Like the [rules] section, the [keys] section takes any YAML Paths as keys. In contrast, each entry specifies the identity key for the AoH at the specified YAML Path, overriding implicit identity key detection for the targeted AoHs.

Consider the baubles AoH in the previous samples. Using only yaml-merge, how would you rename the Doohickey to a Whatchamacallit? There are at least two ways.

Case-by-Case Identity Keys

First, create a new merge document, RHS2.yaml, with this content:

---
baubles:
  - name: Whatchamacallit
    sku: 0-000-1

Then, change the INI configuration file to:

[rules]
baubles = deep
/coordinates = unique

[keys]
/baubles[name = Whatchamacallit] = sku

Finally, merge all three documents together and verify the results. yaml-merge --config=config.ini LHS.yaml RHS.yaml RHS2.yaml produces:

---
baubles:
  - name: Whatchamacallit
    sku: 0-000-1
    price: 10.5
    weight: 2.7g
  - name: Doodad
    sku: 0-000-2
    price: 10.5
    weight: 5g
  - name: Oddball
    sku: 0-000-3
    price: 25.99
    weight: 25kg
    description: This ball is odd
  - name: Fob
    sku: 0-000-4
    price: 0.99
    weight: 18mg
coordinates:
  - x: 4
    y: -2
  - x: 1
    y: -1
  - x: 0
    y: 0
  - x: 1
    y: 1
  - x: 4
    y: 2
lhs_exclusive:
  - step: 1
    action: echo Hello, lefties of the World!
  - step: 2
    action: exit 0
rhs_exclusive:
  - step: 1
    action: echo Hello, righties of the World!
  - step: 2
    action: exit 0

Notice that all updates from RHS.yaml were applied to the baubles in LHS.yaml by merging on name and then RHS2.yaml successfully turned the Doohickey into a Whatchamacallit by merging on sku.

Direct Edit

We could alternatively just take the challenge literally and employ yaml-merge like a fancy version of yaml-set:

# Begin by displaying the starting name of each bauble:
yaml-get --query='(/baubles/name)' LHS.yaml 
["Doohickey", "Doodad", "Oddball"]

# Then, rename only the bauble record named Doohickey:
echo Whatchamacallit | yaml-merge --mergeat=/baubles[name=Doohickey]/name LHS.yaml - | yaml-get --query='(/baubles/name)' -
["Whatchamacallit", "Doodad", "Oddball"]
Clone this wiki locally