-
Notifications
You must be signed in to change notification settings - Fork 23
Conversation
common/bytes_test.go
Outdated
// {"D4_a__1=", url}, | ||
// {"D4/a++1=", std}, | ||
// {"D4_a0-9=", url}, | ||
// {"Finey-1=", url}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a second I thought this was an ode to Hal Finney but then noticed it was just one n
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jaekwon for this change, I have posted some thoughts and suggestions. I could definitely be wrong but I think we can simplify Byte by implementing HexByte, Base64Byte etc, instead of HexEncoder, Base64Encoder etc, and I show the problem with an example. Please take a look.
common/bytes.go
Outdated
// } | ||
type ByteEncoder interface { | ||
Marshal(bytes []byte) ([]byte, error) | ||
Unmarshal(dst *[]byte, src []byte) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pros: Good work on the Unmarshal! Your signature will help gut out allocations and byte slice copies everytime, by enabling byte slice reuse.
var sav []byte
if err := cmn.ByteCodec.Unmarshal(&sav, data); err != nil {
}
// More stuff
if err := cmn.ByteCodec.Unmarshal(&sav, pipedIn); err != nil {
}
as opposed to the traditional style as from the Go stdlib which will always allocate a slice no matter what
var sav []byte
blob, err := cmn.ByteCodec.Unmarshal(data)
if err != nil {
}
copy(sav, blob)
// More stuff
if err := cmn.ByteCodec.Unmarshal(&sav, pipedIn); err != nil {
}
However, the user doesn't need to worry about pre-allocating a slice before Unmarshaling.
Cons:
- With the commendation above though, I haven't yet seen any codec employing byte slice reuse. HexEncoder does
*dst, err = hex.DecodeString
which still performs the byte slice allocation, assigning to*dst
and completely not reusing the preallocated slice(if that was the intention of the user). - The new signature if employing byte slice reuse makes it so that the user has to think about how many bytes that they are expecting. If we aren't going to be resuing pre-allocated slices, perhaps let's just revert to the old style?
Perhaps in the future, let's work on fulfilling the purpose of that signature (*[]byte, []byte)
. The approach is still commendable though.
common/bytes.go
Outdated
Unmarshal(dst *[]byte, src []byte) error | ||
} | ||
|
||
// hexEncoder implements ByteEncoder encoding the slice as a hexidecimal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Encoder/Codec/g as we do both Encoding and Decoding
common/bytes.go
Outdated
|
||
// String gets a simple string for printing (the json output minus quotes) | ||
func (b Bytes) String() string { | ||
raw, err := encoder.Marshal(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this method is fixated on the json encoder.
common/bytes.go
Outdated
return encoder.Marshal(b) | ||
} | ||
|
||
func (b *Bytes) UnmarshalJSON(data []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely could be wrong but here are some thoughts:
I am not convinced that we should have a default codec and then use it in here magically on behalf of the user.
Why not let the user call say HexCodec.UnmarshalJSON(data)
? What happens when we have two different programs that each import this package but might want to differently encode Byte?
For example if I have a struct with the address stored in hex but the digest, stored in base64
type Data struct {
Address cmn.Byte `json:"addr"` // <- to be stored in hex
PassphraseDigest cmn.Byte `json:"pass_addr"` // <-- to be stored in Base64
}
then oops in order to successfully encode/decode that struct, I will have to implement a custom MarshalJSON, UnmarshalJSON that has to set the codec after each field has been traversed or invoke a separate codec for each field. Moreover this most likely would be done concurrently encoding and decoding other structs. It also would require the user to be an advanced Go user to successfully use the data types thus a higher mental load on the user. For example, this is what I think it will take for a user to successfully marshal the struct defined above that has fields that both are of type cmn.Bytes but they want the representations to be different on the wire:
func (d *Data) MarshalJSON() ([]byte, error) {
var err error
repr := make(map[string]interface{})
if repr["addr"], err = cmn.HexCodec.MarshalJSON(repr.Address); err != nil {
return nil, err
}
if repr["pass_digest"], err = cmn.Base64Codec.MarshalJSON(repr.PassphraseDigest); err != nil {
return nil, err
}
// Then for the other fields of the Data struct
overallMap := make(map[string]interface{})
overallBlob, err := json.Marshal(d)
if err != nil {
return nil, err
}
if err := json.Unmarshal(overallBlob, &overallMap); err != nil {
return nil, err
}
// Now overwrite the keys of the custom base64 and hex keys encoded differently together
for k, v := range repr {
overallMap[k] = v
}
return json.Marshal(overallMap)
}
and notice how that requires advanced knowledge of the encoding/json package but also that the complexity increases with every single custom field added plus it'll do double work in the step of encoding the non-cmn.Bytes fields
Proposed change
I defined separate types with different codecs, I now only need to define the struct as:
type Data struct {
Address cmn.HexBytes `json:"addr"`
PassphraseDigest cmn.Base64Bytes `json:"pass_digest"`
}
Instead of implementing *Coder, perhaps we couldimplement different *Byte types, each with its respective Marshal and String methods. This way the user doesn't have to first set a global codec everytime, plus they now have better control in structs.
type JSONBytes []byte
var _ json.Codec = (*JSONBytes)(nil)
var _ fmt.Stringer = (*JSONBytes)(nil)
type HexBytes baseBytes
var _ json.Codec = (*HexBytes)(nil)
var _ fmt.Stringer = (*HexBytes)(nil)
type Base64Bytes baseBytes
var _ json.Codec = (*Base64Bytes)(nil)
var _ fmt.Stringer = (*Base64Bytes)(nil)
// Implement the respective *arshal* and String methods for each type
sm.sorted = true | ||
} | ||
|
||
// Returns a copy of sorted KVPairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really a copy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. good catch...
@odeke-em I agree with you that the original encoding logic was problematic in many ways. For now, all I need in common/bytes is a universal way to refer to Much of your original comments apply to the github.com/tendermint/go-wire/data module, so please submit a new Issue on that repo with what you've written here, w/ consideration for this new cmn.Bytes. |
Cool, thank you @jaekwon for the PR and for considering those comments. For sure, I've opened tendermint/go-amino#65. |
* Canonical KVPair in common * Simplify common/Bytes to just hex encode
Instead of defining your own KVPair, always use this one.