Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyhocon can't parse nested YAML maps #50

Closed
deleted opened this issue Jul 22, 2022 · 10 comments · Fixed by #51
Closed

pyhocon can't parse nested YAML maps #50

deleted opened this issue Jul 22, 2022 · 10 comments · Fixed by #51

Comments

@deleted
Copy link

deleted commented Jul 22, 2022

I discovered this while trying to use dataconf to parse a YAML config file with one level of nesting.

For example, adding the following to test_parse.py:

    def test_yaml_nested(self) -> None:

        @dataclass
        class B:
            c: Text

        @dataclass
        class A:
            b: B

        conf = """
        b:
          c: test
        """
        assert loads(conf, A) == A(b=B(c="test"))

This test should pass, but results in the following test failure:

=================================================== FAILURES ====================================================
__________________________________________ TestParser.test_yaml_nested __________________________________________

self = <tests.test_parse.TestParser object at 0x7f440c099f90>

    def test_yaml_nested(self) -> None:
    
        @dataclass
        class B:
            c: Text
    
        @dataclass
        class A:
            b: B
    
        conf = """
        b:
          c: test
        """
>       assert loads(conf, A) == A(b=B(c="test"))

tests/test_parse.py:298: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dataconf/main.py:102: in loads
    return string(s, clazz, **kwargs)
dataconf/main.py:82: in string
    return multi.string(s, **kwargs).on(clazz)
dataconf/main.py:67: in on
    return parse(conf, clazz, self.strict, **self.kwargs)
dataconf/main.py:17: in parse
    return utils.__parse(conf, clazz, "", strict, ignore_unexpected)
dataconf/utils.py:76: in __parse
    fs[f.name] = __parse(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

value = 'c: test', clazz = <class 'tests.test_parse.TestParser.test_yaml_nested.<locals>.B'>, path = '.b'
strict = True, ignore_unexpected = False

    def __parse(value: any, clazz: Type, path: str, strict: bool, ignore_unexpected: bool):
    
        if is_dataclass(clazz):
    
            if not isinstance(value, ConfigTree):
>               raise TypeConfigException(
                    f"expected type {clazz} at {path}, got {type(value)}"
                )
E               dataconf.exceptions.TypeConfigException: expected type <class 'tests.test_parse.TestParser.test_yaml_nested.<locals>.B'> at .b, got <class 'str'>

dataconf/utils.py:55: TypeConfigException
============================================ short test summary info ============================================
FAILED tests/test_parse.py::TestParser::test_yaml_nested - dataconf.exceptions.TypeConfigException: expected t...
========================================= 1 failed, 30 passed in 0.40s ==========================================

Changing the input to:

        conf = """
        b:
            {c: test}
        """

causes the test to pass, but I suspect this is because it is coincedentally also valid HOCON.

@dwsmith1983
Copy link
Collaborator

@deleted thanks for the feedback. We will take a look at this.

@dwsmith1983
Copy link
Collaborator

dwsmith1983 commented Jul 22, 2022

@deleted the format in yaml is interpreted has a nested dictionary as opposed to nested dataclass. I am not sure if this would be bug. @zifeo any thoughts?

@deleted
Copy link
Author

deleted commented Jul 22, 2022

@dwsmith1983 I'm not sure I'm quite understanding your comment above:

the format in yaml is interrupted has a nested dictionary as opposed to nested dataclass.

Can you clarify?

I will attempt to reframe the issue: The test case here is, in YAML terms, a top-level mapping with one key (b) whose value is another mapping with one key c. If we were to parse it with pyyaml, we'd get a dictionary containing a nested dictionary: {'b': {'c': 'test'}}. If we feed that result into dataconf.dict() with the dataclasses defined in the test, we get the expected result:

In [17]: conf = """
    ...:         b:
    ...:           c: test
    ...:        """

In [18]: dataconf.dict(yaml.load(conf, Loader=yaml.SafeLoader), A)
Out[18]: A(b=B(c='test'))

But if we call

dataconf.loads(conf, A)

we get that TypeConfigException

@dwsmith1983
Copy link
Collaborator

@zifeo I was saying that it is read as nested dictionaries not dataclasses in that yaml format.

@zifeo
Copy link
Owner

zifeo commented Jul 25, 2022

@deleted @dwsmith1983 Thanks for the report.

I have been playing a bit more with the PyHocon library to which we are delegating the parsing. It turns out that yaml support within Dataconf is a bit a misnomer (load KO, dump OK). Both yaml and hocon can be seen a superset of json (hocon with some java .properties features), however the relation between them is not well defined (e.g. not mixeable).

Thus PyHocon does not currently support yaml:

>>> from pyhocon import ConfigFactory
>>> ConfigFactory.parse_string("""
... b:
...   {c: test}
... """)
ConfigTree([('b', ConfigTree([('c', 'test')]))])
>>> ConfigFactory.parse_string("""
... b:
...   c: test
... """)
ConfigTree([('b', 'c: test')])

I see two options:

  1. commit and attempt to parse the string with yaml then sending the result through PyHocon, or fallback to the later (however nested values might be challenging)
  2. remove yaml support "branding"

@deleted I am curious about your use case. Can you tell me a bit more? Would hocon be an option?

@dwsmith1983
Copy link
Collaborator

dwsmith1983 commented Jul 25, 2022

@zifeo I think two is better. YAML to HOCON or JSON instead isn't a big change on the users end.

@deleted
Copy link
Author

deleted commented Jul 26, 2022

@zifeo HOCON seems pretty nice, I'd prefer YAML because it a more standard config file format in the community at large. I'd be concerned that some of my users might balk at having to learn a new config language.

Another option I'll throw out there for you guys is that you could add a dataconf.yaml() method that would parse input with pyyaml first, then pass it to dataconf.dict(). pyyaml is already a transitive dependency, according to your poetry.lock It is a dev dependency only. My bad.

@zifeo
Copy link
Owner

zifeo commented Jul 28, 2022

@deleted While I am not a huge fan of adding yaml in addition to string (the current api focuses more on the medium than the content), I like your idea of having different loader. This makes even more sens if we want to support toml one day. #51 Is open with a suggested API, feel free to have a look.

@zifeo zifeo closed this as completed in #51 Jul 31, 2022
@zifeo
Copy link
Owner

zifeo commented Aug 1, 2022

@deleted released in 2.1.0, cheers

@deleted
Copy link
Author

deleted commented Aug 2, 2022

Woot! Thanks @zifeo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants