Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple components with same name in a block #433

Closed
keckler opened this issue Oct 6, 2021 · 11 comments · Fixed by #1668
Closed

Multiple components with same name in a block #433

keckler opened this issue Oct 6, 2021 · 11 comments · Fixed by #1668
Assignees
Labels
bug Something is wrong: Highest Priority complex Expected to be a complex issue

Comments

@keckler
Copy link
Member

keckler commented Oct 6, 2021

As best as I can tell, it seems that if one defines multiple components that have the same name within given block, previous components with that name will be overwritten by components further down in the input file. Please correct me if I am wrong on this, though. I am having a hard time convincing myself, but some simple testing makes me inclined to believe.

Further, I believe that no warning is given to the user in this case.

The relevant code, I think, is here, where components are being added to a KeyedList using their names. I think a check could be put in there to first see if a component with the same name has already been added.

Some might argue that this is a feature, rather than a bug. That could be debated, but my opinion is that a warning message would be helpful in this scenario for reducing input bugs.

@youngmit
Copy link
Contributor

youngmit commented Oct 6, 2021

This is actually a little bit tricky. According to the YAML Spec, mapping keys MUST be unique. So in this case, we technically have invalid YAML. However, the YAML parser is not seeing this as an error, and therefore not complying with the spec. This is being tracked in yaml/pyyaml#165, which is itself a duplicate of yaml/pyyaml#41, which was submitted back in 2016 by our very own @ntouran. Not sure how best to proceed, since by the time we see any data, its already merged the duplicate keys somehow outside of our control.

@youngmit
Copy link
Contributor

youngmit commented Oct 6, 2021

This is an issue in the dependency upstream, though we could considering using some workarounds (e.g., https://gist.github.com/pypt/94d747fe5180851196eb). We should keep this open to monitor issue status.

@ntouran
Copy link
Member

ntouran commented Oct 6, 2021

Wait a minute though, aren't we not using pyyaml anymore? Blueprints uses ruamel which originally was a fork of pyyaml. I seem to have some recollection of being able to detect duplicates at another level nowadays. E.g. see this ruamel doc about duplicates..

In any case I agree that we would prefer to crash or at the very least warn in this scenario if possible. I guess updating the construct method linked above to check for an existing name wouldn't work because we think the yaml data has already over-written the duplicated key by the time this code executes?

@youngmit
Copy link
Contributor

youngmit commented Oct 6, 2021

Ooh, good points! I pulled those comments from our internal issue tracker, so they may be outdated. I can try experimenting with the allow_duplicate_keys setting, but as it describes itself, it seems like this should already be an error 🤔

@keckler
Copy link
Member Author

keckler commented Oct 19, 2021

Just to document it, I think the same type of thing is occurring when multiple assembly definitions have the same specifier. When the assemblies get put into a core based on the specifiers, no warning is printed and only one of the assembly definitions is actually used.

Probably this same thing is happening in other places as well where items are hashed by some identifier.

@john-science john-science added the bug Something is wrong: Highest Priority label Oct 19, 2021
@john-science
Copy link
Member

john-science commented Oct 19, 2021

@keckler Yeah, it stands to reason a similar thing would be happening everywhere we are using this YAML library.

While I can't find any examples of this hurting people, it certainly doesn't make me happy.

I am raising this ticket to a bug.

@jakehader
Copy link
Member

This would be great to fix in the nearer-term as this can lead to user input bugs.

@john-science
Copy link
Member

john-science commented Aug 22, 2023

For future reference, the problem is in our Blueprint.load() method:

def load(cls, stream, roundTrip=False):
"""This class method is a wrapper around the `yamlize.Object.load()` method.
The reason for the wrapper is to allow us to default to `Cloader`. Essentially,
the `CLoader` class is 10x faster, but doesn't allow for "round trip" (read-
write) access to YAMLs; for that we have the `RoundTripLoader`.
"""
loader = RoundTripLoader if roundTrip else CLoader
return super().load(stream, Loader=loader)

We need to inject some custom logic to detect duplicate sub-keys in YAML files.

I thought the problem was yamlize, then I thought it was ruamel.yaml, but no. The problem is the official YAML spec. Apparently, the official YAML spec defines this as a duplicate key, and thus an error:

blocks:
    # error
blocks:
    # error

But if the duplicate keys are NOT at top level, they aren't an error:

blocks:
    fuel:
        duct:
            # not an error
        duct:
            # not an error

After extensive testing, ruamel.yaml, pyyaml, and yamllint all silently pass on this and they have no tools to throw errors or even warnings in this case.

So, now we are in the extremely unenviable position that if we want to catch such things we need to write our own YAML-text-parsing tool to go through and find these not-top-level duplicates.

@john-science john-science added the complex Expected to be a complex issue label Mar 19, 2024
@john-science
Copy link
Member

I'll be honest, considering this "bug" is latent in all YAML readers, and in the YAML spec, I'm not pumped about trying to fix this in ARMI.

The "solution" is to write our own YAML parser, like yaml.CLoader. A quick looks shows that class has three base classes, totalling several thousand lines of code. I do not want to copy/paste all of that into ARMI to make a single tweak. But I also can't find immediately find the line in that repo that enforces this part of the spec.

I consider this more a limitation of the technology choice we made (YAML), than I do a "bug" in ARMI.

@john-science
Copy link
Member

john-science commented Mar 20, 2024

As an alternative to changing the behavior of something that is so widely supported in the YAML spec, I propose we just add some documentation to somewhere in the User Docs, Inputs section:

******
Inputs
******

A warning/note at the top about duplicates in the YAML spec would seem to do the job.

@keckler
Copy link
Member Author

keckler commented Mar 20, 2024

That would certainly be the easiest path forward, but others please chime in if they think this wouldn't be acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong: Highest Priority complex Expected to be a complex issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants