Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization rework #2503

Merged
merged 9 commits into from Nov 30, 2020
Merged

Serialization rework #2503

merged 9 commits into from Nov 30, 2020

Conversation

s-ludwig
Copy link
Member

This follows up on #2492 and #2493 and addresses the remaining issues. @ferencdg, please comment on any issues you might see, I could have broken a use case that I didn't see here.

Copy link
Contributor

@Geod24 Geod24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A couple comments. Would this also allow one to use std_data_json instead of vibe.data.json ? Maybe an example of this could be useful for user code.

stream/vibe/stream/wrapper.d Outdated Show resolved Hide resolved
stream/vibe/stream/wrapper.d Show resolved Hide resolved
web/vibe/web/common.d Show resolved Hide resolved
Comment on lines 514 to 529
static void serialize(R)(ref R output_range, ref Point value)
{
output_range.put(nativeToBigEndian(value.x));
output_range.put(nativeToBigEndian(value.y));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extend the example to use an overload set (on value) ?
Also, maybe make value be const ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it const and turned it into a template parameter. Overload set would also be possible with the latest changes, that could maybe be added as a second example? But it does get quite lengthy then.

web/vibe/web/rest.d Show resolved Hide resolved
data/vibe/data/json.d Show resolved Hide resolved
@s-ludwig
Copy link
Member Author

Hm, while writing the parameter documentation for resultSerializer I realized that the definition should probably changed, so that deserialize is called either as deserialize!T, or as deserialize(range, [ref] result) to allow generic serializers.

@s-ludwig
Copy link
Member Author

Would this also allow one to use std_data_json instead of vibe.data.json?

In theory yes, but is there a serializer implementation for it?

@ferencdg
Copy link
Contributor

thanks for cleaning up my PR @s-ludwig

Now follows the same structure as the other new stream APIs.
Previously required a random access range.
- Adds a documentation comment and an example to resultSerializer
- Both, serialize() and deserialize(), work on ranges of ubyte now

Note that this might later be extended to allow ranges of char, but for now this is the more generic baseline implementation.
- Renamed isSerializerSupportSinkType to doesSerializerSupportStringSink and made it independent of the serialized type
- Adds support for output range based string sink serialization in addition to delegate based
- Sink serialization takes precedence independently of the used serializer to enforce consistent semantics
- Fixes JsonStringSerializer to escape strings written with the sink approach
- Fixes the documentation order of string serialization vs. sink serialization
version(unittest) caused some nasty linker issues in the past.
User code might want to ensure that custom types match this trait.
Allows to use overload sets and template functions for generic deserialization.
@Geod24
Copy link
Contributor

Geod24 commented Nov 28, 2020

I don't think the deserializer should take its argument by ref.
This creates tons of issues, as it:

  1. Forces to have a valid state on init (e.g. invariant) and that default construction not be disabled;
  2. Might require copy because NRVO is less likely to be performed;
  3. Prevents working with qualified types (const / immutable).

@s-ludwig
Copy link
Member Author

I don't think the deserializer should take its argument by ref.
This creates tons of issues, as it:

  1. Forces to have a valid state on init (e.g. invariant) and that default construction not be disabled;
  2. Might require copy because NRVO is less likely to be performed;
  3. Prevents working with qualified types (const / immutable).

All of these would only work under the assumption that the serializer performs a recursive construction of the result value using constructor calls. I don't think that this is realistic for a generic serializer for multiple reasons.

Also, the init state is practically mandated by the language to be valid anyway. Returning large structs by value, where performing an extra copy is a performance issue (in the same order of magnitude as the serialization and the network transfer), doesn't sound like a good idea, too, at least I wouldn't count on the library handling it with NRVO throughout if I were a user.

So while I do understand that these are issues in theory, I'm not really convinced that they are in practice. The counter argument is pretty simple, but maybe you have a better idea how to solve it:

Having it as a return value means that deserialize must be a template with the value type as its first template argument, which in turn means that overload sets won't work and type-specific serializers need to awkwardly define their fixed type as a template argument. Or we revert it to what it was before and generic serializers don't work.

@Geod24
Copy link
Contributor

Geod24 commented Nov 29, 2020

All of these would only work under the assumption that the serializer performs a recursive construction of the result value using constructor calls. I don't think that this is realistic for a generic serializer for multiple reasons.

Well, constructor is one thing, but the serializer an also define a hook (a la toString) for user-customization.

Also, the init state is practically mandated by the language to be valid anyway.

It's mandated to exist. But some types need to be constructed to be valid, which is a common complain in D.

So while I do understand that these are issues in theory, I'm not really convinced that they are in practice. The counter argument is pretty simple, but maybe you have a better idea how to solve it:

Well, those are drawn from experience writing a deserializer. Namely, this.

The reason for doing so was that we needed to have, in our REST API, types that are not under our control. Those types are extern(C++) and have copy constructors / destructors which we do not want triggered. Currently we just didn't declare the copy ctors (so they don't get called, ever), but that's a hack.

The TL;DR from my experience was: If you want a generic serializer, it needs to handle const/immutable/shared.
The only proper way to handle const/immutable/shared is to be able to initialize your structs to the value you want them to be.

Having it as a return value means that deserialize must be a template with the value type as its first template argument, which in turn means that overload sets won't work and type-specific serializers need to awkwardly define their fixed type as a template argument. Or we revert it to what it was before and generic serializers don't work.

I'm not sure I follow. If you can provide an instance of the type to be filled, you can provide the typeof of it. Do you have some (pseudo) code example of the issue ?

@s-ludwig
Copy link
Member Author

If the copy constructor is the issue, then why not just move the value instead of copying?

In your deserializeFull you make the assumption that there is a constructor that takes all fields in order as its arguments, even though that's the default this sounds like a dangerous assumption when any kind of type may be thrown at the serializer. I'm not sure whether that is a good compromise for an open set of serializable types. For a closed set, it would at least be a possible to define a mixin within each struct to make this safe.

Example:

int deserialize(R)(R range);
float deserialize(R)(R range);
T deserialize(T, R)(R range) if(is(T == struct));

How to invoke deserialize in each case alone or in a combination of those cases being defined?

@Geod24
Copy link
Contributor

Geod24 commented Nov 29, 2020

If the copy constructor is the issue, then why not just move the value instead of copying?

It breaks internal pointers. Which are invalid in D, but not in C++. And even in D, I know Weka is using them for example.

In your deserializeFull you make the assumption that there is a constructor that takes all fields in order as its arguments, even though that's the default this sounds like a dangerous assumption when any kind of type may be thrown at the serializer.

Not only. Any type is able to explicitly support the serialization by specifying the static fromBinary methods, which will override the default behavior. It's true that there is no single assumption you can make that will hold for all types, so the contract here is that the default behavior is a good default, and we have a mean to override it if needed.

I would really have to have a way to provide a pattern matcher to the function, but I haven't found quite the right approach yet.

How to invoke deserialize in each case alone or in a combination of those cases being defined?

The API I went with requires you to provide the type you want as template parameter. E.g. auto x = deserializeFull!MyStruct(rng);.

@s-ludwig
Copy link
Member Author

Serialization of possibly immutable C++ structs with internal pointers, set up in D, which may choose to move the struct implicitly, still sounds like a rather esoteric and dangerous combination to me (although I see how this could come up in custom IPC scenarios), and nothing to endorse. But since you have practical evidence I obviously can't really dismiss it.

So then that means it's template-only serialization, which, since it matches the general serialization API, is really not a big deal. Being able to work with concrete types and overload sets would have been nice, though.

Allows to construct types with assignment restrictions with the trade-off of requiring generic serializer definitions instead of overload sets.
Copy link
Contributor

@Geod24 Geod24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. What's your timeline for a release ? We're going to try it out in our app very soon.

@Geod24 Geod24 merged commit e052b61 into master Nov 30, 2020
@Geod24 Geod24 deleted the serialization_rework branch November 30, 2020 14:50
@s-ludwig
Copy link
Member Author

We can do a beta release immediately and I'll update the change log ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants