Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml-merge is very slow or stuck when merging Arrays or Arrays-of-Hashes into a nonexistent path #220

Closed
abramov-oleg opened this issue Oct 17, 2023 · 5 comments · Fixed by #222
Assignees
Labels
bug Something isn't working
Milestone

Comments

@abramov-oleg
Copy link

Operating System

Ubuntu 20.04.6 LTS

Version of Python and packages in use at the time of the issue.

Python 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00)
[GCC 11.4.0] on linux

Version of yamlpath installed: 3.8.0
Version of ruamel.yaml installed: 0.17.21

Minimum sample of YAML (or compatible) data necessary to trigger the issue

merge_test_base.yaml

key: value

merge_test_mod.yaml

array:
  - element

Alternatively, for array-of-hashes.
merge_test_mod.yaml

array_of_hash:
  - name: one
    val: some

Complete steps to reproduce the issue when triggered via:

  1. Command-Line Tools (yaml-get, yaml-set, or eyaml-rotate-keys): Precise command-line arguments which trigger the defect.
yaml-merge -w merge_test_result.yaml -m new_key merge_test_base.yaml merge_test_mod.yaml

Expected Outcome

Immediate completion, with contents of merge_test_result.yaml

---
key: value
new_key:
  array:
    - element

Actual Outcome

Script does not complete after several minutes, didn't wait more than that.

@wwkimball wwkimball added the bug Something isn't working label Oct 17, 2023
@wwkimball
Copy link
Owner

Wow! There are innumerable tests for the Merger class and the yaml-merge CLI tool. All of them pass every day. None of them pick up this edge-case. Thank you for discovering and reporting this!

I'm in the midst of a major refactoring effort to adopt some significant changes to ruamel.yaml's internal workings. I'll try to get to a good pause in that effort to put out a hotfix for this issue, still using the older ruamel.yaml.

@wwkimball wwkimball added this to the 3.8.1 milestone Oct 17, 2023
@abramov-oleg
Copy link
Author

Wow! There are innumerable tests for the Merger class and the yaml-merge CLI tool. All of them pass every day. None of them pick up this edge-case. Thank you for discovering and reporting this!

I'm in the midst of a major refactoring effort to adopt some significant changes to ruamel.yaml's internal workings. I'll try to get to a good pause in that effort to put out a hotfix for this issue, still using the older ruamel.yaml.

I've arrived at this case as I was looking for a workaround for Impossible to add Array data to non-Array destination-type errors while using the Merger. Can you please give me a quick tip, is it possible to do a merge where lhs keys are completely replaced (without checks for data types) by rhs? Sorry if I missed it in the docs.

@wwkimball
Copy link
Owner

So I can be sure I understand, is this what you're trying to accomplish:

lhs.yaml

root:
  is_a:  hash

rhs.yaml

root:
  - is_an
  - array

output.yaml

root:
  - is_an
  - array

@abramov-oleg
Copy link
Author

Yes, or like this

lhs.yaml

root:
  key: value

rhs.yaml

root:
  key:
    - element

output.yaml

root:
  key:
    - element

@wwkimball
Copy link
Owner

wwkimball commented Oct 18, 2023

Yes, or like this

lhs.yaml

root:
  key: value

rhs.yaml

root:
  key:
    - element

output.yaml

root:
  key:
    - element

There is a way, but I have to caution users against doing this without taking great care to be precise with how this technique is utilized. You are deliberately corrupting data (stomping a Hash into an Array). In most cases, this is undesirable. However, for specific, carefully curated cases, it can be exactly what you need.

I'll present the Best Practice first, then the High Risk (But Simpler) method. In both, assume we are using the lhs.yaml and rhs.yaml in the quoted message above.

Best Practice

Use a configuration file to precisely identify which Hash elements within LHS are allowed to be stomped. For this method, create a new INI-style configuration file, like this:

stompies.yaml

[rules]
/root = right

Note that each [rules] entry is a YAML Path and the merge rule to apply to it. right means, "Discard the LHS content (at that specific YAML Path) and entirely replace it with the RHS content (at the same YAML Path), no matter what it is." This is not a merge. Rather, this is a wholesale replacement of the LHS node with the RHS node. Even when boths nodes are Hashes, the LHS node will still be entirely replaced with the RHS content. There are other options available besides right, all discussed in the help documentation (yaml-merge --help) and the Wiki.

Using this configuration file approach, you can add any number of rules to specifically identify which LHS nodes you know need to be specially handled without affecting any other nodes in the document. You can also setup other useful rules like AoH merge keys, and more.

Command: yaml-merge -c stompies.yaml lhs.yaml rhs.yaml

Produces:

---
root:
  key:
    - element

High Risk (But Simpler)

You can also set a global configuration rule to just stomp every Hash in LHS with whatever is in RHS at the same path. Note that while this would achieve the immediate goal, it exposes your operation to perhaps unexpected results. Remember that a right replacement isn't a merge; it is a wholesale replacement of whatever was in LHS with whatever is in RHS.

Command: yaml-merge -H right lhs.yaml rhs.yaml

Produces:

---
root:
  key:
    - element

wwkimball added a commit that referenced this issue Oct 18, 2023
wwkimball added a commit that referenced this issue Oct 18, 2023
This was referenced Oct 18, 2023
@wwkimball wwkimball linked a pull request Oct 18, 2023 that will close this issue
wwkimball added a commit that referenced this issue Oct 18, 2023
* Fix #220
* Mention #220 in CHANGES
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants