Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redeclaring imported protos #74

Closed
tsloughter opened this issue Aug 13, 2016 · 15 comments
Closed

Redeclaring imported protos #74

tsloughter opened this issue Aug 13, 2016 · 15 comments

Comments

@tsloughter
Copy link
Contributor

Why are imported protos redeclared in the generated code for the protos that import them?

Meaning for proto a.proto with import b.proto where b.proto contains message B { ...} and I have msg_name_prefix set to the name of the proto file I find that in a.erl and a.hrl the record #a_b{} which is a duplicate of #b{}.

Shouldn't it simply use b.hrl and b.erl from a? Or at the very least ignore the msg_name_prefix when importing protos?

@tomas-abrahamsson
Copy link
Owner

Is this a correct understanding of your scenario: You compile a.proto with msg_name_prefix set to a_ and you compile b.proto with no such option?

It is a good question, and it ties in with name resolution, knowing if a referred-to type, possibly in an imported proto file, is an enum or another message, for example. The gpb works the way it reads and imports all protos to get one coherent view of all definitions, then it generates one .erl/hrl file for all of this view. Thus, in your scenario, if I've understood it correctly, one would need only to compile a.proto, and the generated code would also contain the stuff for b.proto since that is imported.

A different approach to the code generation altogether could of course be to generate one .erl/.hrl file for each .proto, and on encoding/decoding makes calls between the modules. I don't know whether such an approach would have an impact on performance or not. In the early days of gpb, performance was an important factor, and it still is for some.

@tomas-abrahamsson
Copy link
Owner

I should maybe also add that the current way of reading all imported proto files to get one unified view, and then generate only one .erl/hrl from that makes some things simpler.

Generating one .erl/hrl for each proto can be tricky in some cases, since for ordinary proto files, there's also a path involved. For examples of naming collisions, consider a.proto importing dir1/b.proto and dir2/b.proto. Contrived, yes, but not impossible in a large project. Or consider a project with two sub projects/deps p1 and p2, where each such sub project imports their own shipped b.proto, but these two differ, for example different versions.

It is of course possible to support one .erl/hrl per proto approach, but it would probably mean some thinking and work to get good ways to handle and resolve naming collisions.

@tsloughter
Copy link
Contributor Author

Right. This is a big complicated to explain, hehe, I'm looking to convert our plugin at work to use gpb instead of a mix of erlang_protobuffs and custom code. Currently it does create a .erl/.hrl per message and it does a check by reading in all the messages and types to error or warn (configurable) on confict, but we can live without that -- and I have a mostly equivalent thing working by using msg_name_prefix on proto files with multiple messages in them, but it breaks when importing protos.

Essentially we have a b.proto that defines message B and an a.proto that defines message Request and message Response. And in Request one of the fields is of type B.

What we do for protos with multiple messages defined is automatically namespace them with the name of the proto, so we get #a_request{} and #a_response{} but using the msg_name_prefix feature in gpb with how it imports means that it also defines #a_b{} and now our Erlang code can't pass a #b{} record in the #a_request{} for encoding.

Kind of hacky :) but it works pretty well for namespacing in Erlang for us.

@tomas-abrahamsson
Copy link
Owner

Hmm... I think I see the problem you're facing. I was first thinking a more general name transformation function could be useful (some addition to the snake_case option), but a problem is that such name transformations are applied after all imports have been read---in order for resolving to work and see the right names---and at that point, after resolving, there is no longer any info on which proto file a given message was found. It is of course possible to add this, but the really difficult issue is to come up with good names to the options to control the mechanism...

@tsloughter
Copy link
Contributor Author

Yea, sadly it seems because msgs are just a tuple {{msg, Name}, Fields} there isn't a way to add a tag saying it is imported from X to allow for optional alternative ways of encoding (like calling out to X:encode_msg/1) without modifying a lot of code to handle this new msg tuple :(

@tomas-abrahamsson
Copy link
Owner

That's right (maps would have been welcome, had they existed when I began with gpb :) )

A way could be to do it in a way similar to how proto3 messages are tracked in a {proto3_msgs,MsgNames}, so we could add a new tuple, for example {{msg_containment,ProtoName},MsgNames}. Maybe msg_containment isn't the best of names, and possibly the mapping should be the other way around, depending on which way one would most often want to look up things. (It is possible to mix proto3 and proto2 syntax files with imports, and they need to be treated differently later on, that's why there was a need to keep track of whether a message is defined using proto3 syntax or not)

@tomas-abrahamsson
Copy link
Owner

I was toying with an idea to have an option, {msg_name_prefix_by_proto,[{"A.proto","a_"},{"B.proto","b_"},{"*","other_"}]} or something along those lines, though I have no code for it. It could even be made to work from the command line, something like -msgprefixbyproto A.proto=a_,B.proto=b_,*=other_. One could imagine filename-glob or maybe regexp matching, and an implicit match-all case {"*",""} tacked on to the end of the list.

@tomas-abrahamsson
Copy link
Owner

Was thinking that msg_name_prefix_by_proto was a but unwieldy, maybe it would be possible to bake this into the existing msg_name_prefix option, having the option take either a string or a list of 2-tuples. Even as a command line option, the formats would be sufficiently different to not go into ambiguities for all common practical use cases. Or a bit prettier if the option values would be either a string() or {by_proto,[{...,...}...]} or something like that.

@tsloughter
Copy link
Contributor Author

Hm, yea, that would be nice and workable.

@tsloughter
Copy link
Contributor Author

That worked perfectly and was easily implemented :).

I'll clean up all my 3 changes (I also have the snake_case update) and get you a PR of 3 commits today most likely.

@tsloughter
Copy link
Contributor Author

Curious, why not support something like:

option erlang_prefix = "some_prefix";

Directly in the .proto? Then apply such an option to each message def when it is read in, there by having the same prefix for imports as they would when parsed originally from their own .proto?

@tsloughter
Copy link
Contributor Author

Argh, does protoc not ignore options it doesn't understand? :(

@tomas-abrahamsson
Copy link
Owner

It does save some proto options, and just throws away most, unfortunately. But I figure it might be better to have this kind of configuration outside of the proto file for two reasons: (a) it might differ from use case to use case --- I originally added prefixing and suffixing imgaining it as a way to avoid name collisions eg in a larger project combining two proto sub projects, and (b) I think people sometimes want to include the proto files to the Erlang verbatim with absolutely no change (eg easier comparison that two sides are running with the same definitions), yet still they might want to have prefixing or suffixing or such.

@tomas-abrahamsson
Copy link
Owner

does protoc not ignore options it doesn't understand?

Sorry, re-reading, I understand you mean Google's protobuf compiler, I've no idea what it does, but I, too would expect it to just ignore unknown options. There are also some kind of custom options that might be related, but I haven't spent much time looking into that.

@tomas-abrahamsson
Copy link
Owner

I merged your by_proto prefix option in #76 and just pushed it as 3.25.0. (Closing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants