-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New @strict_dataclass
decorator for dataclass validation
#2895
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This man is coooking 👨🏻🍳
@Wauplin A I've run into a constraint: some validated fields are class-dependent multiple-choice values, or have an admissible range. We can write a validator for each of these cases but, ideally, a single composable validator would be shared across models. An example: (validator) def choice_str(value: str, choices: Iterable[str]):
"""Ensures that `value` is one of the choices in `choices`."""
if value not in choices:
raise ValueError(f"Value must be one of {choices}, got {value}") (validated field, what I would like to be able to do) position_embedding_type: str = validated_field(
choice_str, choices=["absolute", "relative_key", "relative_key_query"], default="absolute"
) An alternative would be to define the model-specific validator as albert_position_embedding_type_validator = functools.partial(choice_str, choices=...) But I was wondering whether it would be possible to pipe validator arguments from |
I would prefer to not to pipe arguments into the validator on demand. It brings a level of complexity more to def one_of(choices: List):
def _inner(value: Any):
"""Check `value` is one of the options in `choices`."""
if value not in choices:
raise ValueError(f"Value must be one of {choices}, got {value}.")
return _inner
@strict_dataclass
class AlbertConfig:
position_embedding_type: str = validated_field(one_of(["absolute", "relative_key", "relative_key_query"]), default="absolute")
... I've renamed |
@gante I pushed b1720fa to support @strict_dataclass
class AlbertConfig:
position_embedding_type: Literal["absolute", "relative_key", "relative_key_query"] = "absolute"
... (much better for readability + IDE autocompletion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice and clean PR 🔥
(I had to update my review, I just saw you added Literal
type annotation 👍 )
I'm going to redo the transformers-side PR with the latest suggestions 👍 |
Co-authored-by: Célina <hanouticelina@gmail.com>
Follow-up after huggingface/transformers#36329 and slack discussions (private).
The idea is to add a layer of validation on top of Python's built-in dataclasses.
Example
What it does ?
@strict_dataclass
built on top of@dataclass
. When decorated, class values are validated.Dict[str, List[Optional[DummyClass]]
is correctly validated)validated_field
(built on top offield
)What it doesn't do (yet) ?
__post_init__
and then "on-demand" with a.validate()
method?. We cannot run them on each field-assignment as it would prevent modifying related value (if values A and B must be coherent, we want to be able to change both A and B and then validate).Why not chose pydantic ? (or
attrs
? ormarshmallow_dataclass
?)pydantic
as a base requirement to 🤗transformers
transformers#36329 related to adding pydantic as a new dependency. Would be an heavy addition + require careful logic to support both v1 and v2.@strict_dataclass
is not meant for heavy load where performances is critical. Common use case will be to validate a model configuration (only done once and very neglectable compared to running a model). This allows us to keep code minimal.Plan:
transformers
@gante)We won't push for it / release it until we are sure at least the
transformers
use case is covered.Notes:
This
@strict_dataclass
might be useful inhuggingface_hub
itself in the future but that's not its primary goal for now.