Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More types & some refactoring #80

Merged
merged 18 commits into from Feb 9, 2021
Merged

More types & some refactoring #80

merged 18 commits into from Feb 9, 2021

Conversation

ydylla
Copy link
Contributor

@ydylla ydylla commented Jan 31, 2021

This pull request adds support for more types and also utilizes the code generation capabilities for them.

The following types where added:

  • uuid.UUID
  • PosixPath, WindowsPath, PurePath, PurePosixPath, PureWindowsPath from pathlib
  • IPv4Address, IPv6Address, IPv4Network, IPv6Network, IPv4Interface, IPv6Interface from ipaddress
  • date & datetime from datetime

I needed these types because I wanted to use pyserde in combination with asyncpg.
asyncpg returns dict like objects which can be converted to dataclasses by pyserde's from_dict function.

Since asyncpg also supports the above mentioned types, the dicts already contain instances of them.
That is the reason why I also added a new argument reuse_instances for from_dict & from_tuple.
When this is set to True these functions check if the field already contains an instances of the target type and reuse it when possible.

This is faster for Path & IPAddress then calling the constructor again.
For UUID it is also the only way to handle existing instances because uuid.UUID() does not accept them as an argument.
It is also possible to change the default value of reuse_instances via the serialize & deserialize decorators.

To not cause slowdowns when serializing or deserializing to json/msgpack/toml/yaml reuse_instances is always set to False there, because we will never see existing instances there.

That is the part where this pull request drifted into refactoring.
Since I had to add the reuse_instances argument at all these places, I also used the opportunity to remove some unused arguments (named & strict) and renamed asdict to to_dict and astuple to to_tuple.
These are breaking changes, but I felt it was worth the gained similarity between serialization and deserialization code.
In my option the public exposed functions are now also named more uniformly.

For msgpack external types I also changed the behaviour slightly. It is not required anymore that the dataclasses have a special meaning _type attribute.
Instead to_msgpack & from_msgpack search the ext_dict for the correct type or type code and also throw exceptions if they can not find them.

I am sorry that this became so intertwined, I understand if you don't want to merge the breaking changes.
I tried to make the commits cherry-pickable so maybe you only want to use some commits.

During development, I also noticed that Unions do not work properly for these types.
I will make a separate pull request to fix that.

Finally, I have a question: What is the purpose of setting SE_NAME and why is it used in is_serializable?
Could it be removed? is_serializable could use TO_ITER and TO_DICT for the check like is_deserializable.

@codecov
Copy link

codecov bot commented Jan 31, 2021

Codecov Report

Merging #80 (6a086f4) into master (56b3f53) will decrease coverage by 2.60%.
The diff coverage is 73.70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #80      +/-   ##
==========================================
- Coverage   88.35%   85.74%   -2.61%     
==========================================
  Files          10       11       +1     
  Lines         773      863      +90     
  Branches      162      178      +16     
==========================================
+ Hits          683      740      +57     
- Misses         61       83      +22     
- Partials       29       40      +11     
Impacted Files Coverage Δ
serde/core.py 82.20% <ø> (-0.15%) ⬇️
serde/py36_datetime_compat.py 48.71% <48.71%> (ø)
serde/de.py 91.41% <74.64%> (+0.89%) ⬆️
serde/se.py 94.08% <84.00%> (+0.67%) ⬆️
serde/json.py 100.00% <100.00%> (ø)
serde/more_types.py 100.00% <100.00%> (+30.00%) ⬆️
serde/msgpack.py 100.00% <100.00%> (ø)
serde/toml.py 100.00% <100.00%> (ø)
serde/yaml.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56b3f53...6a086f4. Read the comment docs.

@ydylla
Copy link
Contributor Author

ydylla commented Jan 31, 2021

I totally forgot that this also adds support for date & datetime from datetime 😃

python 3.6 has no fromisoformat functions so we have to ship our own (copied from python 3.8)
@yukinarit
Copy link
Owner

Hi @ydylla,

Thank you for the huge contribution! Let me check little by little 👍

@yukinarit
Copy link
Owner

@ydylla

Finally, I have a question: What is the purpose of setting SE_NAME and why is it used in is_serializable?
Could it be removed? is_serializable could use TO_ITER and TO_DICT for the check like is_deserializable.

SE_NAME was used because it was just added before TO_ITER and TO_DICT as far as remember. You can remove it and use TO_ITER and TO_DICT instead 🙂

Copy link
Owner

@yukinarit yukinarit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ydylla
I have reviewed the first half of commits.
Please check the review commits, thank you!

.gitignore Outdated Show resolved Hide resolved
serde/msgpack.py Show resolved Hide resolved
serde/__init__.py Show resolved Hide resolved
@ydylla
Copy link
Contributor Author

ydylla commented Feb 1, 2021

SE_NAME was used because it was just added before TO_ITER and TO_DICT as far as remember. You can remove it and use TO_ITER and TO_DICT instead 🙂

Okay I will change it. I also guessed this was the reason 😄

serde/msgpack.py Outdated
ext_type_code = None
if ext_dict is not None:
obj_type = type(obj)
ext_type_code = next((code for code, ext_type in ext_dict.items() if obj_type is ext_type), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ext_type_code = ext_dict.get(obj_type, None)

seems simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know these kind of one liners are bad practice.
But I wanted both functions to accept the same ext_dict: Dict[int, Type] (like in the test).
And for to_msgpack we have to get the key (integer type code) of the dict by searching with its value (the type).
A simple ext_dict.get( only works for the other direction. See from_msgpack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts I think it would be more efficient if the from_msgpack accepted a reversed dict. That way, the app would do the one time reversal and lookups would be efficient in both directions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to_msgpack because from_msgpack already uses a simple lookup.

Yes two dicts would be more efficient/faster. Another way could be to add the ext_type_code directly to the to_msgpack function as argument. Maybe the app already knows which type code to use.

In the original code the type code was saved as _type attribute. In my opinion the knowledge about the name of this special attribute should not be part of pyserde.
But with the ext_type_code as argument one could use it like this:

d = DerivedA(i=1, s="a", j=10)
to_msgpack(d, ext_type_code=d._type)

So basically the app has to implement the lookup itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yukinarit Thanks for the review.

I changed to_msgpack to use a Dict[Type, int] but I am not really happy with it
I guess it will confuse at least some users and also forces the application to save two dicts with basically the same information. But for know it is good enough, maybe someone else has a better idea in future.

Also keep in mind this is now a breaking change.

serde/msgpack.py Show resolved Hide resolved
serde/de.py Show resolved Hide resolved
serde/__init__.py Show resolved Hide resolved
@yukinarit
Copy link
Owner

@ydylla
Also, please update supported types section in README.md

@adsharma
Copy link
Contributor

adsharma commented Feb 2, 2021 via email

@ydylla
Copy link
Contributor Author

ydylla commented Feb 2, 2021

Also, please update supported types section in README.md

Done, I found it to verbose to add all types of ipaddress & pathlib separately. But feel free to change it if you want a longer list 😃

@jfuechsl
Copy link
Contributor

jfuechsl commented Feb 2, 2021

I'd say it's a question of preference. as* has a more declarational conotation (which I like) and to_* is more imperative (which would be technically "more correct").
If you decide to go with to_*, I would ask to keep the original names as aliases for backwards compatibility.

@yukinarit
Copy link
Owner

@adsharma @jfuechsl
Thanks for your input!

I'd say it's a question of preference. as* has a more declarational conotation (which I like) and to_* is more imperative (which would be technically "more correct").

Yeah, from_dict/to_dict naming sounds more consistent with other formats but renaming at this time is confusing for the exiting users (although pyserde user base is still small).

@ydylla
Thank you for the good suggestion, but I would like not to change. Could you please remove the commit?

@ydylla
Copy link
Contributor Author

ydylla commented Feb 3, 2021

I readded asdict & astuple in ae28a33 with their old signatures and behavior.
I did not revert the commit because I wanted to keep the to_obj function, it simplifies the code.
Another reason is in principle you seam to agree with me, so I would suggest to keep the to_* functions and also document them as the new default one.

@adsharma
Copy link
Contributor

adsharma commented Feb 3, 2021 via email

@yukinarit
Copy link
Owner

LGTM 👍

@yukinarit
Copy link
Owner

I will merge once this thread is resolved 👍

A single ext_type_code argument would probably be even better, so that the application code can implement the lookup however it likes
@yukinarit
Copy link
Owner

@adsharma Is everything ok? Could you check if your reviews were all addressed?

@adsharma
Copy link
Contributor

adsharma commented Feb 9, 2021 via email

@yukinarit yukinarit merged commit c7750a5 into yukinarit:master Feb 9, 2021
github-actions bot pushed a commit that referenced this pull request Feb 9, 2021
@ydylla
Copy link
Contributor Author

ydylla commented Feb 9, 2021

Nice. Thanks @yukinarit and @adsharma.

@ydylla ydylla deleted the more-types branch May 8, 2021 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants