Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid forking "interface" packages #1

Open
dsnet opened this issue Dec 27, 2020 · 10 comments
Open

Avoid forking "interface" packages #1

dsnet opened this issue Dec 27, 2020 · 10 comments

Comments

@dsnet
Copy link

dsnet commented Dec 27, 2020

Hi, thanks for your interest in possibly upgrading gogo/protobuf to work with the new APIs.

TL;DR: The gogogo project should only consist of a generator and the runtime implementation. There should be no need to fork any of the other packages since a proper implementation of the protobuf reflection API will make gogogo-generated messages compatible with the google.golang.org/protobuf module.

I recommend not forking anything except:

Fundamentally, gogogo should avoid forking reflect/protoreflect and reflect/protodesc since these are the set of interfaces that defines what a protobuf message is. If gogogo uses a different definition of a protobuf message, then gogogo doesn't solve the compatibility issues that was afflicting the original gogo project.

Related, I believe gogogo should avoid forking reflect/protoregistry or anything under types/... since this leads to gnarly diamond dependency problems. When a .proto file imports a google/protobuf/descriptor.proto there should only be one obvious Go package path that represents that .proto file. To avoid ambiguous dependency selection, a given .proto should only either be generated by protoc-gen-go or protoc-gen-gogogo, but not both. We should aim for a world where it is theoretically possible for a protoc-gen-gogogo-generated package to depend on a protoc-gen-go-generated package and vice-versa. The generation of a given package should not care about the exact API representation of any of its dependencies (other than the naming of certain types).

The gogogo project should not need to fork the proto, encoding/protojson, and encoding/prototext. These packages are implemented in terms of protobuf reflection. If the messages generated by protoc-gen-gogogo do not work with these, then it suggests that gogogo's implementation of the protoreflect interfaces is somehow broken. The runtime/protoiface package defines a set of interfaces that a generated message may optionally implement that allows it to have a specialized fast-path implementation (without relying on protobuf reflection). However, the specialized implementation using protoiface should be semantically equivalent to doing so with pure protobuf reflection (but much faster).

@xen0n
Copy link
Owner

xen0n commented Dec 27, 2020

Hi, thanks for such a quick response to my take at reviving gogo! A bit of explanation.

My original intent is to re-implement gogo extensions on top of APIv2, to the extent possible with the public API surface. The cmd/protoc-gen-go package is the obvious place to start with, but then more and more internal packages are pulled in. DoNotImplement interfaces popped up here and there, it's effectively impossible to write an alternative generator using only public protobuffers/protobuf-go APIs, that is also at feature-parity with protoc-gen-go. In particular, various concrete types that appear identical are not compatible under Go's type system, due to coming from different packages, even without the DoNotImplement marker.

I may not have enough time today detailing the type incompatibilities I encountered (gotta enjoy the weekend in some other way than writing Go code), but you must have a better understanding than me. Maybe the intrusiveness needed to replicate gogo behavior is too deep to allow in protobuf-go, but then a migration story is needed, be it auto-generated aliases or some officially-approved way to inject tags or alter types. I'll continue tinkering in the meantime and come back with perhaps more understanding of the whole picture.

And after all I'm quite pleased to know the upstream is actually watching and helping, thanks for reaching out and starting the conversation!

@xen0n
Copy link
Owner

xen0n commented Dec 27, 2020

Also note that in commit a5f1ba0 the generated code is not using gogogo packages or types.

I plan to introduce some mechanism to automatically sync upstream changes, and to ensure the non-codegen-related packages are never touched. Of course, if deeply-intrusive extensions like moretags could somehow be implemented without forking majority of the upstream packages, I'd happily drop the extra weight. I fully realize that a full fork approach is too heavy to be maintainable, and that gogo development bottlenecked and stagnated exactly because of that approach used.

@dsnet
Copy link
Author

dsnet commented Dec 27, 2020

DoNotImplement interfaces popped up here and there, it's effectively impossible to write an alternative generator using only public protobuffers/protobuf-go APIs, that is also at feature-parity with protoc-gen-go

The reason for the DoNotImplement is because we're in a catch-22 here. In Go, an interface with only public methods cannot be added to without breaking compatibility. In other words, once the interface is defined, it is frozen forever. However, this is problematic for our use case since the protoreflect package by nature is trying to represent the protobuf language, which is not frozen and continually evolves over time (e.g., the addition of maps and oneofs in 2014, and the recent addition of proto3 optional in 2020). Fundamentally, we need the freedom to add methods without breaking all implementations. This is why we made the protobuf type-specific interfaces (e.g., MessageDescriptor) defined in protoreflect to be non-implementable by default.

Thus, there are two ways to construct a protoreflect descriptor:

  1. Use the protodesc package to convert a descriptorpb.FileDescriptorProto to a protoreflect.FileDescriptor. The descriptorpb messages are by nature extensible, and so using that as the starting point ensures that the interfaces it implements are always up-to-date.
  2. Embed the protoreflect interfaces in a struct, so that the struct implements the interface, while manually implementing all the methods known today. If a new method is added to an interface in protoreflect, an application may panic if they try to call it, but at least the program will build. So long as the concrete implementations are held in gogogo protobuf, then upgrading to a newer version of gogogo that implements the newly added method will fix the problem.

The internal/filedesc package doesn't use approach 2 since it has the ability to directly implement the DoNotImplement method. However, it should be relatively easy to patch the local fork of internal/filedesc to use the embedding trick for approach 2.

@dsnet
Copy link
Author

dsnet commented Dec 27, 2020

I plan to introduce some mechanism to automatically sync upstream changes, and to ensure the non-codegen-related packages are never touched.

I strongly recommend against this since the point of external APIs like those in protoreflect is so that different implementations can cooperate so long as they comply by a common set of interfaces. Automatically synchronizing to the internal implementation of some other code base goes against this principle since it means that this project is now tied to the internal implementation of google.golang.org/protobuf. One of the major sources of problems between gogo/protobuf and golang/protobuf is that the former regularly relied on internal implementation details of the latter, such that the latter kept breaking the former whenever it changed what fundamentally was supposed to be an internal implementation detail. It would be a mistake for gogogo to repeat this problem again.

While I argue against automatic syncs, it's fine to manually cherry pick changes as deemed appropriate. In order for gogogo to support the vast array of features that gogo provided, it's highly likely that the internal implementation of gogogo will diverge sufficiently that synchronization won't even be feasible (and that's okay assuming the gogogo project owners are willing to maintain the implementation).

@silasdavis
Copy link

silasdavis commented Jan 14, 2021

I am working on supporting something equivalent to gogoproto.customtype (if it comes to anything I would be interested in adding it here). @dsnet I read you comments above, and agreed it would be preferable to fork as little as possible. So I have made a start with:

I recommend not forking anything except:

* [`cmd/protoc-gen-go`](https://pkg.go.dev/google.golang.org/protobuf/cmd/protoc-gen-go) (perhaps renamed as `protoc-gen-gogogo`)

* [`runtime/protoimpl`](https://pkg.go.dev/google.golang.org/protobuf/runtime/protoimpl) (the runtime support machinery for generated message types)

There are a few internal imports in internal_gengo:

https://github.com/protocolbuffers/protobuf-go/blob/master/cmd/protoc-gen-go/internal_gengo/main.go#L20-L21

The genid comes from files generated by the generate-proto tool from descriptor.proto and can be pulled in and eventually self-hosted. However google.golang.org/protobuf/internal/encoding/tag pulls in a large sub-tree of internal packages. Perhaps this just needs to be rewritten or the specific dependencies pulled in.

It sounds like we will also need internal/filedesc. It does seem like we are going to end up pulling in a significant fraction of internal this way (it looks like it will be easier to delete unused code rather than selectively fork individual packages).

I will report back if I get anywhere with this fork-avoiding approach...

@silasdavis
Copy link

silasdavis commented Jan 14, 2021

right so it's only the Unmarshal side of tag.go that pulls in a lot internal packages, I've been able to contain that: https://github.com/monax/peptide/blob/e5cb03ff81961d5f006fa9b92f015fb9a7a28af4/internal/encoding/tag/tag/tag.go

@xen0n
Copy link
Owner

xen0n commented Jan 15, 2021

I'm still interested in porting gogo but simply have limited time. I didn't try the less forking approach just yet (a lot of things at $DAY_JOB), but certainly that's the way to go.

@silasdavis Is your code open-source yet? If so may you post a link here so I and other interested users could have a glimpse of it?

@silasdavis
Copy link

silasdavis commented Jan 15, 2021

Hi @xen0n, thanks for starting this off. I've just made the repo public here: https://github.com/monax/peptide (not remotely wedded to the name or the repo).

It's a bit a rough around the edges as a proof-of-concept, mostly what I've done is stood up a subset of the protobuf-go tests (which pass ./test.bash) and shamelessly looted your moretags (https://github.com/monax/peptide/blob/e5cb03ff81961d5f006fa9b92f015fb9a7a28af4/cmd/protoc-gen-go-peptide/internal_gengo/main.go#L416-L422) just to see if it would compile. Next on my list would be to add the other extensions you have supported and add a basic test proto for them. After that I'd like to work on gogoproto.customtype support using protoreflect.

I have no desire to further fragment efforts so I think your repository should be the home of future efforts if you think my approach can be integrated. Please let me know how I can help and I will endeavour to do so as is compatible with $DAY_JOB :).

I think maybe I need to go a little further down the implementation rabbit-hole to prove out whether the minimal 🍴 approach is viable (I suspect I may hit more of the issues with the internal API that you have already encountered but we'll see).

If we get anywhere I offer my help as co-maintainer, though I would like to take an incremental approach to supporting all of gogoproto, and possibly we can find a way to open up custom code generation for users somehow avoiding the kitchen sink approach (I'm not sure).

Some other notes:

@silasdavis
Copy link

silasdavis commented Jan 17, 2021

monax/peptide#1 adds the 3 extensions implemented here and a test file gogogo.proto that uses each of and they compile and survive a serialisation round trip (I had to add another couple of lines to handle the default value for custom enums). Also has the tests running on github actions.

@silasdavis
Copy link

The approach I had hoped might work for custom types, namely embedding a protoimpl.MessageState, implementing a custom ProtoReflect() protoreflect.Message and then delegating to the MessageState but intercepting custom types does unfortunately not work. It's a shame because it looked like it might and would not need to fork protoimpl. The problem is that by design protoimpl is not reusable, in particular it lazy initialises a Converter based on the Go reflection on the type and this converter is hard coded (starting here: https://github.com/protocolbuffers/protobuf-go/blob/master/internal/impl/message.go#L77-L95).

There is no way via the protoreflect interfaces to communicate the mapping between the Go and protobuf types. Yes you can implement the entire reflection interface but that is a pretty big surface area particularly if you want to handle concurrency and fast path properly. I had really hoped there was a middle way. I understand the desire for the protobuf team to limit its public API, but if the aim of protoreflect is, amongst other things, to enable custom code generation I really think there should be a small incision into protoreflect to all it to be used as a base. My hunch is that if there was some way for an implementation to provide an implementation of Converter to messageInfo (the reflectiveMessageState implementation) it might fit the bill:

// A Converter coverts to/from Go reflect.Value types and protobuf protoreflect.Value types.
type Converter interface {
	// PBValueOf converts a reflect.Value to a protoreflect.Value.
	PBValueOf(reflect.Value) pref.Value

	// GoValueOf converts a protoreflect.Value to a reflect.Value.
	GoValueOf(pref.Value) reflect.Value

	// IsValidPB returns whether a protoreflect.Value is compatible with this type.
	IsValidPB(pref.Value) bool

	// IsValidGo returns whether a reflect.Value is compatible with this type.
	IsValidGo(reflect.Value) bool

	// New returns a new field value.
	// For scalars, it returns the default value of the field.
	// For composite types, it returns a new mutable value.
	New() pref.Value

	// Zero returns a new field value.
	// For scalars, it returns the default value of the field.
	// For composite types, it returns an immutable, empty value.
	Zero() pref.Value
}

The nice thing about being able to use the MessageState created by protoimpl is that you can use the existing code generation with little modification and benefit from the allocation-free magic they use with something like:

// Extension interfaces
// The extend interface is intended to be implemented on a generated type from non-generated code in the same package.
//
// Generators should wrap their existing ProtoReflect() function with a call to ProtoExtend(), like:
//
//  func (x *TestMessage) ProtoReflect() protoreflect.Message {
//  	return x.ProtoExtend(x.protoReflect())
//  }
//
// Where `TestMessage` is a generated protobuf message and `protoReflect()` is the 'original' function implementing
// proto.Message.
//
// This allows the default ProtoMessage to be extended without reimplementing the entire interface
type Extend interface {
	ProtoExtend(message protoreflect.Message) protoreflect.Message
}

// This struct implements Extend and is intended to be embedded into a generated message type by the generator.
// If other code implements ProtoExtend() directly on the generator struct then that implementation will be run, otherwise
// the behaviour is unchanged from the generated code
type NoopExtend struct {
}

func (_ NoopExtend) ProtoExtend(message protoreflect.Message) protoreflect.Message {
	//
	return message
}
type CustomTypeMessage struct {
	state         protoimpl.MessageState
	sizeCache     protoimpl.SizeCache
	unknownFields protoimpl.UnknownFields
     
	peptide.NoopExtend

	Hash *Hash `protobuf:"bytes,1,opt,name=Hash,proto3" json:"Hash,omitempty"`
}

func (x *CustomTypeMessage) ProtoReflect() protoreflect.Message {
        // Call decorator
	return x.ProtoExtend(x.protoReflect())
}

func (x *CustomTypeMessage) protoReflect() protoreflect.Message {
	mi := &file_cmd_protoc_gen_go_peptide_testdata_gogo_customtype_proto_msgTypes[0]
	if protoimpl.UnsafeEnabled && x != nil {
		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
		if ms.LoadMessageInfo() == nil {
			ms.StoreMessageInfo(mi)
		}
		return ms
	}
	return mi.MessageOf(x)
}
// Now in non-generated code

func (x *CustomTypeMessage) ProtoExtend(message protoreflect.Message) protoreflect.Message {
       // Get access to `protoimpl.MessageState` which does most of the heavy lifting
	return &customTypeMessage{message}
}

type customTypeMessage struct {
	protoreflect.Message
}

func (c customTypeMessage) ProtoMethods() *protoiface.Methods {
	return nil
}

// Override methods selectively
func (c customTypeMessage) Range(f func(protoreflect.FieldDescriptor, protoreflect.Value) bool) {
      // Alas `messageInfo.initOnce` will panic for custom types because `Converter` is hard coded
	c.Message.Range(f)
}

If protoimpl.TypeBuilder took a pluggable Converter we might be on to something...

What I have most recently tried therefore is forking protoimpl, I've stripped out legacy support but it still pulls in a few internal packages. The biggest issue is that if you want to fork filetype/build.go it pulls in most of the repo, so at that point you are back to forking quite a lot of packages. Here is my current partial attempt: https://github.com/monax/peptide/tree/c09f8f30359583ad627e8d8651129e426c2a6c8f

I didn't intend to write so much, this obviously ought to be reported up stream to see if the protobuf team might be willing to provide an additional public API. I was thinking that if I could turn up with a working proof of concept using Converter that would help. However I need to consider how much time I want to sink into this, also if my 'minimal fork' repo becomes unsatisfyingly non-minimal then maybe it is better to work in a full fork, define the new public interface, and propose a version of that interface upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants