-
Notifications
You must be signed in to change notification settings - Fork 750
New @strict
decorator for dataclass validation
#2895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This man is coooking 👨🏻🍳
@Wauplin A I've run into a constraint: some validated fields are class-dependent multiple-choice values, or have an admissible range. We can write a validator for each of these cases but, ideally, a single composable validator would be shared across models. An example: (validator) def choice_str(value: str, choices: Iterable[str]):
"""Ensures that `value` is one of the choices in `choices`."""
if value not in choices:
raise ValueError(f"Value must be one of {choices}, got {value}") (validated field, what I would like to be able to do) position_embedding_type: str = validated_field(
choice_str, choices=["absolute", "relative_key", "relative_key_query"], default="absolute"
) An alternative would be to define the model-specific validator as albert_position_embedding_type_validator = functools.partial(choice_str, choices=...) But I was wondering whether it would be possible to pipe validator arguments from |
I would prefer to not to pipe arguments into the validator on demand. It brings a level of complexity more to def one_of(choices: List):
def _inner(value: Any):
"""Check `value` is one of the options in `choices`."""
if value not in choices:
raise ValueError(f"Value must be one of {choices}, got {value}.")
return _inner
@strict_dataclass
class AlbertConfig:
position_embedding_type: str = validated_field(one_of(["absolute", "relative_key", "relative_key_query"]), default="absolute")
... I've renamed |
@gante I pushed b1720fa to support @strict_dataclass
class AlbertConfig:
position_embedding_type: Literal["absolute", "relative_key", "relative_key_query"] = "absolute"
... (much better for readability + IDE autocompletion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice and clean PR 🔥
(I had to update my review, I just saw you added Literal
type annotation 👍 )
I'm going to redo the transformers-side PR with the latest suggestions 👍 |
Co-authored-by: Célina <hanouticelina@gmail.com>
@strict_dataclass
decorator for dataclass validation@strict
decorator for dataclass validation
@strict
decorator for dataclass validation@strict
decorator for dataclass validation
@hanouticelina @gante @ArthurZucker PR is in its final state and is ready for review :) Documentation's here if needed: https://moon-ci-docs.huggingface.co/docs/huggingface_hub/pr_2895/en/package_reference/dataclasses. Implementation currently used in huggingface/transformers#36534 as a test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool PR! 🔥
Co-authored-by: célina <hanouticelina@gmail.com>
Co-authored-by: célina <hanouticelina@gmail.com>
…face/huggingface_hub into first-draft-for-strict-dataclass
Thanks for the thorough review @hanouticelina! I've addressed all of your comments and updated the tests/docs accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🙌 (but I'm not confident to add an approval to the technical parts :) )
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's goo 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy with it!
merging! |
* New @strict_dataclass decorator * expose main methods * typog * Support Literal[...] type * Update src/huggingface_hub/utils/_strict_dataclass.py Co-authored-by: Célina <hanouticelina@gmail.com> * nit * accept kwargs * Accept kwargs, move to huggingface.dataclasses, fix autocompletion, add tests, add docs * docs * @as_validated_field decorator * code quality * class validators * inherit class validators from not strict classes * Update docs/source/en/package_reference/dataclasses.md Co-authored-by: célina <hanouticelina@gmail.com> * remove duplicated definition of _setattr * Update docs/source/en/package_reference/dataclasses.md Co-authored-by: célina <hanouticelina@gmail.com> * optional is an alias for union[, None] * dumb tests * Raise if already defined by user * docs * Update docs/source/en/package_reference/dataclasses.md Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/package_reference/dataclasses.md Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * doc --------- Co-authored-by: Célina <hanouticelina@gmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Documentation https://moon-ci-docs.huggingface.co/docs/huggingface_hub/pr_2895/en/package_reference/dataclasses
Follow-up after huggingface/transformers#36329 and slack discussions (private).
The idea is to add a layer of validation on top of Python's built-in dataclasses.
Example
How to test
Test it with:
What it does ?
@strict
to be used on top of@dataclass
. When decorated, class values are validated.Dict[str, List[Optional[DummyClass]]
is correctly validated)validated_field
(built on top offield
)validate_***
are considered as class validators and executed at initialization after__post_init__
.obj.validate()
.Why not chose pydantic ? (or
attrs
? ormarshmallow_dataclass
?)pydantic
as a base requirement to 🤗transformers
transformers#36329 related to adding pydantic as a new dependency. Would be an heavy addition + require careful logic to support both v1 and v2.@strict_dataclass
is not meant for heavy load where performances is critical. Common use case will be to validate a model configuration (only done once and very neglectable compared to running a model). This allows us to keep code minimal.Plan:
transformers
@gante)We won't push for it / release it until we are sure at least the
transformers
use case is covered.Notes:
This
@strict
might be useful inhuggingface_hub
itself in the future but that's not its primary goal for now.