Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala sealed trait ability (mostly) in dataconf using dataclass #10

Merged
merged 17 commits into from Sep 20, 2021

Conversation

dwsmith1983
Copy link
Collaborator

@zifeo added the ability for nested dataclass methods (sealed trait pureconfig functionality in Scala), updated README, incremented to 0.1.6, and created version.py which will provide version and get the latest from the pyproject.toml

Also, on my local, the following test always fails. How do I get it to pass?

tests/test_parser.py:186 (TestParser.test_missing_type)
self = <tests.test_parser.TestParser object at 0x7fe398c417c0>

    def test_missing_type(self) -> None:
    
        with pytest.raises(MissingTypeException):
>           loads("", Dict)
E           Failed: DID NOT RAISE <class 'dataconf.exceptions.MissingTypeException'>

@dwsmith1983
Copy link
Collaborator Author

@zifeo the lint test failed. Not sure why

Cache Size: ~13 MB (13738048 B)
/usr/bin/tar --use-compress-program zstd -d -xf /home/runner/work/_temp/a9c1d847-2022-40d2-a351-a81aa3882ad2/cache.tzst -P -C /home/runner/work/dataconf/dataconf
Error: The operation was canceled.

@dwsmith1983
Copy link
Collaborator Author

@zifeo I see that from dataconf.version import __version__ is the failing behavior for zimports. Is there anyway to add this to a safe list? This one allows anyone to call the version (pretty common)

import dataconf
print(dataconf.__version__)

Copy link
Owner

@zifeo zifeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. pytest raise failure: which env/python version are you using? I see there are gone now.
  2. linting: some colons were missing. You can run the lint locally with pre-commit run --all-files and before every commit with pre-commit install (run once for setup)
  3. z-imports is a bit tricky, but including it in the explicit listing should do it.

Now regarding the global pull request, thank you for suggesting that change. Although I never used pure config, I get the idea. I see that you went for data classes nested under a classical class. Does this provide any advantage over simple inheritance (InputType.__subclasses__()) or union type (InputType = Union[StringImpl, IntImpl])?

tests/test_parser.py Outdated Show resolved Hide resolved
dataconf/version.py Show resolved Hide resolved
dataconf/utils.py Outdated Show resolved Hide resolved
@dwsmith1983
Copy link
Collaborator Author

dwsmith1983 commented Aug 26, 2021

Does this provide any advantage over simple inheritance (InputType.__subclasses__()) or union type (InputType = Union[StringImpl, IntImpl])?

I was replicating the idea one would have using pureconfig with Hocon when I wrote input_source: InputType(). In Scala, we just specify the trait. We dont have traits in Python so I used the class to replicate the functionality. I like the .__subclasses__() approach since it looks cleaner.

Why I picked this method was just solely based on my experience from Scala. Anyone on data engineering side working with Scala would easily pick this up where I work since it would mimic our dataflow library usage. That was the motivation. If there is a better way, let me know.
https://pureconfig.github.io/docs/overriding-behavior-for-sealed-families.html

@zifeo __subclasses__() looks cleaner but fails but Union works fine. Union[InputType.StringImpl, InputType.IntImpl]

…st comp from the for loop arg break down the for loop resulting in 0.4-0.35 test times to 0.25
@dwsmith1983 dwsmith1983 requested a review from zifeo August 26, 2021 15:22
@zifeo
Copy link
Owner

zifeo commented Aug 29, 2021

@dwsmith1983 What kind of failure do you get from the subclass approach? Currently if you use the union type, it does not use https://github.com/zifeo/dataconf/pull/10/files#diff-4198c13639f9dd31ef7b259b7e7167468728b79898427be0fa2e3be56f4526acR116-R127, right?

@dwsmith1983
Copy link
Collaborator Author

dwsmith1983 commented Aug 30, 2021

@dwsmith1983 What kind of failure do you get from the subclass approach? Currently if you use the union type, it does not use https://github.com/zifeo/dataconf/pull/10/files#diff-4198c13639f9dd31ef7b259b7e7167468728b79898427be0fa2e3be56f4526acR116-R127, right?

With union, correct it skips it. With __subclass__(), clazz == []. It looks like the only the usage of InputType() so far allows for single type parameter since dir(clazz) will bring up InputType. Any ideas? In some cases, InputType() could have many subclasses so automatic parsing would be nicer for the user than a long union list.

@zifeo
Copy link
Owner

zifeo commented Aug 30, 2021

@dwsmith1983 I was thinking of something like

class Parent:
  pass

@dataclass
class StringImpl(Parent):
   name: Text
   age: Text

@dataclass
class IntImpl(Parent):
  area_code: int
  phone_num: Text

@dataclass
 class Base:
     location: Text
     input_source: Parent

str_conf = """
 {
     location: Europe
     input_source {
         name: Thailand
         age: "12"
     }
 }
 """

@dwsmith1983
Copy link
Collaborator Author

@zifeo wouldn't need to make parent then an ABC class which would be nice with the abstract methods and overriding. That would be seamless transition from Scala/Java hocon with traits to Python.

https://docs.python.org/3/library/abc.html

@zifeo
Copy link
Owner

zifeo commented Aug 31, 2021

@dwsmith1983 not really fan of the register syntax, but it could support definitely both!

@dwsmith1983
Copy link
Collaborator Author

dwsmith1983 commented Sep 1, 2021

@zifeo have you tested the syntax you suggested? I just gave it try. I didn't step through it but from the error, it looks like the same is occurring dir(clazz) == []. Did it work for you?

I just stepped through it. It is class but the overriding classes of StringImp and IntImpl aren't seen from dir(clazz). That is, we see the config tree but no matching dataclass.

@zifeo
Copy link
Owner

zifeo commented Sep 1, 2021

@dwsmith1983 can you push your changes? I will give it a try.

@dwsmith1983
Copy link
Collaborator Author

@zifeo pushed

@dwsmith1983
Copy link
Collaborator Author

@zifeo any updates?

@zifeo
Copy link
Owner

zifeo commented Sep 13, 2021

@dwsmith1983 What about the following?

diff --git a/dataconf/utils.py b/dataconf/utils.py
index cb1082a..4fd3ee3 100644
--- a/dataconf/utils.py
+++ b/dataconf/utils.py
@@ -118,14 +118,19 @@ def __parse(value: any, clazz, path):
     #       if subclasses are a dataclass parse values
     #       the idea here is to replicate parsing of a sealed trait in Scala
     #       when using pureconfig
-    for key in dir(clazz):
-        nested_class = getattr(clazz, key)
-        if is_dataclass(nested_class):
-            field_keys = set(
-                [i.name for i in fields(getattr(clazz, nested_class.__name__))]
-            )
-            if field_keys == set(value.keys()):
-                return __parse(value, getattr(clazz, nested_class.__name__), "")
+    child_failures = []
+    for child_clazz in sorted(clazz.__subclasses__(), key=lambda c: c.__name__):
+        if is_dataclass(child_clazz):
+            try:
+                return __parse(value, child_clazz, path)
+            except TypeConfigException as f:
+                child_failures.append(f)
+
+    if len(child_failures) > 0:
+        fails = '\n- '.join(child_failures)
+        raise TypeConfigException(
+            f"expected type {clazz} at {path}, failed subclasses:{fails}"
+        )
 
     raise TypeConfigException(f"expected type {clazz} at {path}, got {type(value)}")
 
diff --git a/tests/test_parser.py b/tests/test_parser.py
index 32b2088..87d280a 100644
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -249,7 +249,7 @@ class TestParser:
         @dataclass
         class Base:
             location: Text
-            input_source: InputType()
+            input_source: InputType
 
         str_conf = """
         {

@dwsmith1983
Copy link
Collaborator Author

dwsmith1983 commented Sep 16, 2021

@zifeo I added it over. I removed len(child_failures) to if child_failures since if list is empty it will return false. No need for the length check.

child_failures.append(str(f))

# no need to check length; false if empty
if child_failures:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I removed the length check @zifeo

try:
return __parse(value, child_clazz, path)
except TypeConfigException as f:
child_failures.append(str(f))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zifeo also changed f to str(f) for the join. We will need a test for failure too to make sure all is working.

Copy link
Owner

@zifeo zifeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the doc and all good.

@dwsmith1983
Copy link
Collaborator Author

@zifeo updated doc and added a fail test check for the abstract class trait

@zifeo zifeo merged commit 1a5257a into zifeo:master Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants