-
Notifications
You must be signed in to change notification settings - Fork 652
Open
Labels
Description
I would like to support preserving unknown fields for ProtoBuf, here is an issue for discussion beforehand.
Motivation
Basically it is required in protobuf spec, so it needs to be implemented to make kotlinx.serialization
a valid protobuf parser.
Basic design
A new helper class will be introduced
interface UnknownFieldsHolder: Map<ProtoId, ProtoField>
Users can add to class property according to their needs:
@Serializable
data class ForwardCompatibleData(
@ProtoNumber(1) val name: String,
// No @ProtoNumber needed
// Maybe introducing a new annotation?
val unknownFields: UnknownFieldsHolder,
)
Behaviours
- The ProtoBuf format should keeps all undeclare fields in the
UnknownFieldsHolder
when deserializing; - All data in the
UnknownFieldsHolder
should be written back to wire as is during serializing; - If new fields are added in
oneof
field, the new value will be stored inUnknownFieldsHolder
, leaving the @ProtoOneOf property to be null; - Other formats should not deseralize or serialize this property.
Further more
Enumeration
Enums in protobuf can also have unknown values
in old version of code, but it is much more easier to store it in sealed interface / objects styles:
sealed interface IPhoneType {
val value: Int
}
data object WorkPhone: IPhoneType {
override val value: Int
get() = 1
}
data object HomePhone: IPhoneType {
override val value: Int
get() = 2
}
data class UnknownPhoneType(override val value: Int): IPhoneType
Other formats
Do other formats need similar functionality?
bcmedeiros
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
pdvrieze commentedon Apr 26, 2024
@xiaozhikang0916 XML has this option. Both for tags and attributes (but they need to be stored separately). What you need is:
JsonElement
or xmlNode
)xiaozhikang0916 commentedon Apr 28, 2024
It is required for this feature, but I will keep it an internal interface, because I believe using protobuf in active code should be restricted in the scope of serializable class, and map-like structure ( with integer key of proto id ) is not dev-friendly.
pdvrieze commentedon Apr 29, 2024
@xiaozhikang0916 You probably want some way to specify which numbers could be omitted (deprecated fields), or whether to write these values at all (changes in known attributes may change the validity of the opaque unsupported attributes).
xiaozhikang0916 commentedon Apr 29, 2024
@pdvrieze I got your point, but IMO an option to omit, or ignore any fields ( deprecated or not ) is also a breach of the protobuf spec.
If a deprecated field is not intented to be used by developers, just omit it in the declaration or keep the property internal/private. In anyway, this field should present in the wire date, and the wire data is not accessible to the user.
pdvrieze commentedon Apr 29, 2024
@xiaozhikang0916 What I'm considering is the case where the data is read, processed and serialized. In some cases it is semantically valid to retain the (extra) data, in others it is not. It may be possible to make a copy of the container without the "unknown field" data to avoid serializing invalid data, but this may be challenging for deeply nested hierarchies.
As a protocol the protobuf specification can not say anything of program semantics. The semantics are per definition program specific, only the developers of the program can decide what the correct way is to handle unknown data. The protobuf specification only states that fields are preserved during parsing and serialization, this is not about reading in data, doing some application specific stuff and then saving it again, rather it is to do with transformation of data through different paths.
As to the deprecated fields, an example there is that you have a deprecated localtime field, as well as a current utcTime field. In the example the localTime is no longer present in the data type. When then the utcTime is updated the localTime is no longer valid (the time changed), but because the application no longer knows about localtime it cannot update this "unknown field". The only correct behaviour in that case is to not emit unknown (but possibly incorrect) fields.
In other cases, say you want to add security headers to a message, it is perfectly valid to just encapsulate an opaque message and preserve all fields, it depends on the context. What I'm suggesting is a way (either using annotations or as a format option) to configure the behaviour.
Flavien commentedon Jun 4, 2024
This is kind of critical, and should also work for JSON. Most JSON libraries, have first-class support for this (e.g. in .NET)
pdvrieze commentedon Jun 4, 2024
@Flavien The challenge here is that the serialization library (and its formats) is designed to support serialization and deserialization (like most serialization libraries there is a bias towards the code side, not the wire format, although the bias is small).
Deserialization in this context is seen as reading data in a format (eg. protobuf or Json) and reading that data in as a native datastructure to be used in the program. Serialization is the reverse (writing a readable version of the data structure to a wire/storage format). Handling arbitrary data can be supported, but this is by necessity format (and data structure) specific. In the case of Json you can do it by having
JsonObject
properties that can contain arbitrary nested Json. Protobuf can do the same thing.Note that for Json the library provides direct access to the underlying Json parser. I am not sure that is true for Protobuf.
xiaozhikang0916 commentedon Jun 5, 2024
It is pretty good to have such feature in more formats, but it needs efforts on every formats.
And if it will be implemented in multiple formats e.g. xml (already have), protobuf (planned in this issue) and json (requested in #2698 and #2701), it is worthy to have some support in the base interface defined in core lib.
pdvrieze commentedon Jun 5, 2024
@xiaozhikang0916 In principle library support is good, but in this case the only element I can really see is to provide an annotation. The problem is that the actual data type that needs to be used is format dependent, so I don't think this can be supported well in a format/agnostic way (if a generic supertype is used then it can't be encoded - especially by a different format)