-
-
Notifications
You must be signed in to change notification settings - Fork 118
Allow disambiguating union types by unique literal types #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think doing something like this would be an awesome feature. Not sure if I would enable it by default due to backwards compatibility, but it should definitely be easily accessible. It'd also be great if the tag could come from other places, like the name of the class or something like that. This will probably be the focus of the release after this one. |
Disambiguating by modifying unstructuring to add the class name as a tag also makes a lot of sense and has advantages in having less repetition in the class definition. The way I did it has the advantage that it's translatable 1:1 to and from TypedDict definitions or TypeScript definitions of the same data structures and doesn't require external information (the configuration of cattrs) to understand how ser/deser will work. My variant is implemented in a way that (should) be backwards compatible since it only disambiguates a case that would otherwise throw an error (if the disambiguation function is added behind the existing unique-property disambiguator). |
So I've been thinking about this since we have a lot of these kinds of situations at work. I think in some situations we can do better than a literal field though. I think that piece of information is unnecessary on the Python level, so the field is just pure overhead. The field should be Here's a counterproposal. @attrs.define
class MatchedThing:
xxxxxx: str
@attrs.define
class UnmatchedThing:
yyyyyy: str
@attrs.define
class MaybeSomething:
something: MatchedThing | UnmatchedThing
# Set up the converter to handle MaybeSomething as a tagged union.
configure_tagged_union(
converter,
MaybeSomething,
"matched", # the field name
{True: MatchedThing, False: UnmatchedThing}, # the mapping of the field name values to classes, and reverse
) What do you think? |
What you're saying makes sense with the disambiguating attribute not really being a class member at all. I think the core issue is that there's no real standard integrated way to represent tagged unions / sumtypes in python. So that results in different workarounds people use to create them. The reason I did it as above is that I started from JSON data that I already had and started representing that (without "structuring") as TypedDicts: class MatchedThing(TypedDict):
matched: Literal[True] = True
xxxxxx: str
class UnmatchedThing(TypedDict):
matched: Literal[False] = False
yyyyyy: str And disambiguated them by checking the value Only later did I move to attrs. I can think of three issues with your suggestion with the last being the largest:
@attrs.define
class MatchedThing:
matched: Literal[True] = True
xxxxxx: str
@attrs.define
class UnmatchedThing:
matched: Literal[False] = False
yyyyyy: str
@attrs.define
class ThirdThing:
matched: Literal["specialcase"] = "specialcase"
zzzzzz: str
MaybeMatched = MatchedThing | UnmatchedThing
@attrs.define
class Foo:
x: int
y: int
something: MaybeMatched
@attrs.define
class Bar:
greeting: str
first_thing: MatchedThing
second_thing: MaybeMatched
unmatched_thing: Optional[UnmatchedThing]
different_thing: MaybeMatched | ThirdThing How would this look with your idea? |
You might have noticed cattrs generally puts logic on converters instead of on the models; this is a conscious design decision. The idea is the same model should be reusable in many different contexts/converters. That said, practicality beats purity, so this isn't set in stone ;) Point 1: I think you'd be using the match statement to process these, or a chain of isinstance() checks on lower Python versions. Point 2: yeah, you'd have to look at the converter and not the model. But the model could be reused in different contexts and with different union rules. Point 3: I'm not sure if I agree or not, it's a difficult question. You can't have a union of one type in Python's type system (i.e. As for your example, I think it'd work out of the box without support for tagged unions since all your unions have unique required fields, but that's beside the point ;) Here's your example adjusted: @define
class MatchedThing:
xxxxxx: str
@define
class UnmatchedThing:
yyyyyy: str
@define
class ThirdThing:
zzzzzz: str
MaybeMatched = MatchedThing | UnmatchedThing
@define
class Foo:
x: int
y: int
something: MaybeMatched
@define
class Bar:
greeting: str
first_thing: MatchedThing
second_thing: MaybeMatched
unmatched_thing: Optional[UnmatchedThing]
different_thing: MaybeMatched | ThirdThing
converter = Converter()
configure_tagged_union(
converter,
MaybeMatched,
"matched",
{True: MatchedThing, False: UnmatchedThing}
)
configure_tagged_union(
converter,
MaybeMatched | ThirdThing,
"matched",
{True: MatchedThing, False: UnmatchedThing, "specialcase": ThirdThing}
) I see your point though, there's repetition. But on the other hand, you don't need to prepare the classes to be used in unions, you just need to prepare the converter. Maybe the verboseness could be solved with a hook factory or something. Needs more consideration ;) |
Your example is actually better than I thought, because now the second parameter of your Not having the disambiguating member be part of the model might still cause an issue for some use cases. It seems weird that the model serializes differently depending on which union it is a part of. It might be good to look at other languages or libraries to see how they do it. The typedload library for example disambiguates by simply running the structuring variants one after another until one doesn't throw an error. As more "inspiration" the "serde" library of rust uses the type name as the "tag", and is configurable in whether the tag is added inside the object or outside of it. See https://serde.rs/enum-representations.html . It's easier for Rust though because unions are always tagged. |
We'll probably need to support both having the tag name inside the class and not, and combinations. I was playing around with this idea: @define
class MatchedThing:
matched: Final[bool] = True
xxxxxx: str According to https://peps.python.org/pep-0591/, a final variable defined in the class body with a default should be a class variable. I like this approach since it's clean. But attrs needs to be upgraded to support this; it doesn't yet. |
i tried to achieve something very similar, and was about to post a new issue/question, but then i encountered this issue 🙃 i have multiple each however, while it seems to work, mypy doesn't seem too keen about it for unclear reasons. full example below: from __future__ import annotations
import enum
from typing import Literal
import attrs
import cattrs
class E(enum.StrEnum):
a = enum.auto()
b = enum.auto()
@attrs.define(kw_only=True)
class A:
type: Literal[E.a]
@attrs.define(kw_only=True)
class B:
type: Literal[E.b]
extra: int
d = {"type": "a"}
# Annotating the result is needed, otherwise mypy complains:
# error: Need type annotation for "result" [var-annotated]
result: A | B
# mypy complains about the .structure() statement below:
# error: Argument 2 has incompatible type "UnionType"; expected "type[Never]" [arg-type]
result = cattrs.structure(d, A | B)
print(result) thoughts or advice welcome! |
@wbolster You're not doing anything wrong, it's just a limitation of the Python typing system. The method is annotated to accept |
Description
Right now, it seems like union types are only supported if (a) the options are all attrs classes or None and (b) the attrs classes have unique attributes.
I use a ton of "tagged unions", so it would be great there was support for those. (They have one attribute that has an unambiguous value). Here's an example:
In this example, I'm using a literal boolean but much more common is to use a literal string, for example
type: Literal["request", "response", "error"]
. Other possibilities are literal integers (covered by the below code) and literal enum values (not sure if those work with the below code)This pattern is very common in JSON files.
Here's an implementation that seems to work for me (attrs classes only though the same should work with typing.get_type_hints(anything))`:
It should probably be added to the existing disambiguation.py machinery, but those parts aren't publicly exposed
The text was updated successfully, but these errors were encountered: