-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support zero-copy string deserialization via ByteString. #752
Comments
I don't think adding arbitrary support for things like |
i agree with you in general, regarding swapping out generated types However. ByteString is just a newtype around bytes::Bytes. we can roll our own type and export if, if you dont want to depend on other repos. strings are very heavily used, and it feels a shame to give up this much performance. its not exactly niche use case. |
I totally get that, in the short term if you have control over your proto files you can just swap it for |
Hi, just stumbled on this and I think it makes sense to include such a feature. As @VladimirBramstedt pointed out, this is definitely not a niche use case and providing a reasonable default implementation for no copy strings looks like an important feature. I have no specific opinions about In my use case it would be a bit problematic to swap the type definition in proto files, as I share my proto with other teams which develop the "client side" of the project in various languages, so letting people manually deal with string/bytes conversion might be problematic and error prone. I can still go and manually replace the I would make it optional though, as done with |
Right I am not disagreeing with wanting to add this but I don't want to add another ad-hoc type to the config. I want to solve this generically and spend effort/review/design time on that instead. Even though the initial cost is more than just adding bytestring support, it will enable many other users to do what they want and reduce complexity at the same time. |
i dont disagree either. i think your approach is the correct long term solution, i just wonder how long. for me this specifically affects a production workload, and these kind of saving translate to money. if long term is years, i guess i have no choice but to maintain a fork. Now, as for the long-term solution, maybe this isnt the correct ticket to discuss this, but my thought was to actually use the capabilities build into protobufs to do type mapping. protobufs support custom options, see https://developers.google.com/protocol-buffers/docs/proto#customoptions message Test {
optional string s = 1 [(prost.gen_types) = "bytestring::ByteString"]; // use bytestring
repeated string rs = 2; [(prost.gen_types) { type = "std::collections::HashSet<String>", decode: "::my::custom::decoder::func"]; //or maybe something even more free
optional OtherMessage foo = 3 [(prost.wrap) = "Box"]; // maybe even use it to allow wrappers here.
} where normal rust types can be implemented in prost, and "custom types" can be decoded using either their Message impl if possible (for your own type custom types), or a user provided custom function to allow users to decode to types that are not defined in their own crate (similar to how serde does with the bonus point is that since you can use custom options on mesage or file level too, it can make the information that is currently split between the build.rs file (for what prost will do) and the actual protobuf, reside in the same file for a better overview. thats my quick take, and if you think thats a good idea id love to drive this forward. same idea used for validation, for an example https://github.com/bufbuild/protoc-gen-validate/blob/main/README.md |
If maintaining a fork is what is needed then that is fair, I think if it saves you money then it makes sense to contribute upstream to help the project out. I think going the extensions/custom options method is probably the best way, but that will be a while since the extensions support is still in early design phase. I think the quickest way to have generic support is to abstract how we have bytes/btree support by having a map that maps path -> string type name. Then the user can implement Message for custom types and use them in their generated code by just changing the build config. I think this solution would take less than a few weeks to get shipped as I don't think its actually that complicated and we can then remove bytes/btree support specifically. |
so instead of having like let typemap = BtreeMap::from([(".package.msg.field", "std::string::String"),
(".package.msg.field2","bytestring::BytesString")]);
config.type_map(typemap) i think thats doable, as a medium term solution. |
I don't think we even need BtreeMap? I think we could just have a config option that is just |
As for the types that implement Message in the |
thats a start. i could play with the type override approach and see how far i get, if you wish. |
Hiya. there have been a few PRs which allowed prost to use the
bytes::Bytes
type for the protobufbytes
type, allowing them to just be slices of a biggerBytes
buffer. Strings however dont quite work. https://docs.rs/bytestring/latest/bytestring/ is just a wrapper aroundbytes::Bytes
with added UTF-8 validity checks. it feels like it would be a good addition (as a features, same as enabling usingbytes::Bytes
instead ofVec<u8>
for byte types. we should be able to reuse all the code from that feature.hopefully we can go from something like
to
to being able zero-copy this when deserializing from Bytes.
Would that be something we are interested in?
edit: did a little benching with proto that contains a lot of strings.
when decoding from
bytes::Bytes
, much faster:decode 1000000 items
stdstring:6181ms
bytestring: 2922ms
when decoding from `&[u8], somewhat slower:
decode 1000000 items
stdstring:6349ms
bytestring: 6974ms
because in the &[u8] case we need to copy anyway, and then i guess all the extra indirection of using Bytes shows.
seems to me like quite a big performance boost for string-heavy scenarios.
The text was updated successfully, but these errors were encountered: