New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream serialization improvements #3180
Conversation
These changes decode valid SIGHASH types on signatures in assembly (asm) representations of scriptSig scripts. This squashed commit incorporates substantial helpful feedback from jtimon, laanwj, and sipa.
CExtPubKey should be serializable like CPubKey
Zcash: Excludes changes to CBanEntry and CHDChain, which we don't have yet.
Also add a variadic CDataStream ctor for ease-of-use.
CDataStream and CAutoFile had a ReadVersion and WriteVersion method that was never used. Remove them.
The stream implementations had two cascading layers (the upper one with operator<< and operator>>, and a lower one with read and write). The lower layer's functions are never cascaded (nor should they, as they should only be used from the higher layer), so make them return void instead.
Make the various stream implementations' nType and nVersion private and const (except in CDataStream where we really need a setter).
Given that in default GetSerializeSize implementations created by ADD_SERIALIZE_METHODS we're already using CSizeComputer(), get rid of the specialized GetSerializeSize methods everywhere, and just use CSizeComputer. This removes a lot of code which isn't actually used anywhere. For CCompactSize and CVarInt this actually removes a more efficient size computing algorithm, which is brought back in a later commit.
Remove the nType and nVersion as parameters to all serialization methods and functions. There is only one place where it's read and has an impact (in CAddress), and even there it does not impact any of the recursively invoked serializers. Instead, the few places that need nType or nVersion are changed to read it directly from the stream object, through GetType() and GetVersion() methods which are added to all stream classes.
Suggested by Pavel Janik.
The CSerAction's ForRead() method does not depend on any runtime data, so guarantee that requests to it can be optimized out by making it constexpr. Suggested by Cory Fields.
To get the advantages of faster GetSerializeSize() implementations back that were removed in "Make GetSerializeSize a wrapper on top of CSizeComputer", reintroduce them in the few places in the form of a specialized Serialize() implementation. This actually gets us in a better state than before, as these even get used when they're invoked indirectly in the serialization of another object.
Dbwrapper used GetSerializeSize() to compute the size of the buffer to preallocate. For some cases (specifically: CCoins) this requires a costly compression call. Avoid this by just using fixed size preallocations instead.
@zkbot try |
⌛ Trying commit c7d7198 with merge 3b68ab255fcbb233af50a76a684b7673738e3c0d... |
Upstream serialization improvements Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#5264 - bitcoin/bitcoin#6914 - bitcoin/bitcoin#6215 - bitcoin/bitcoin#8068 - Only the `COMPACTSIZE` wrapper commit - bitcoin/bitcoin#8658 - bitcoin/bitcoin#8708 - Only the serializer variadics commit - bitcoin/bitcoin#9039 - bitcoin/bitcoin#9125 - Only the first two commits (the last two block on other upstream PRs) Part of #2074.
☀️ Test successful - pr-try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK once @arielgabizon comment addressed: #3180 (comment)
@zkbot r+ |
📌 Commit c7d7198 has been approved by |
Upstream serialization improvements Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#5264 - bitcoin/bitcoin#6914 - bitcoin/bitcoin#6215 - bitcoin/bitcoin#8068 - Only the `COMPACTSIZE` wrapper commit - bitcoin/bitcoin#8658 - bitcoin/bitcoin#8708 - Only the serializer variadics commit - bitcoin/bitcoin#9039 - bitcoin/bitcoin#9125 - Only the first two commits (the last two block on other upstream PRs) Part of #2074.
src/serialize.h
Outdated
@@ -1146,4 +1020,16 @@ inline void SerReadWriteMany(Stream& s, int nType, int nVersion, CSerActionUnser | |||
::UnserializeMany(s, nType, nVersion, args...); | |||
} | |||
|
|||
template <typename T> | |||
size_t GetSerializeSize(const T& t, int nType, int nVersion = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to have the default value of 0 for nVersion
?
Seems worth thinking of especially with all our transaction versions (though I don't remember if we are using nVersion
for those)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side benefit of reviewing this is that I finally understood what that weird ADD_SERIALIZE_METHODS
macro is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetSerializeSize
is not just for transactions; it is used by all serialization operations in zcashd
. The justification for leaving the default as 0 is that changing it could likely break a bunch of other parsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, yeah now I see a default of 0 in all the old versions of the method.
@@ -380,7 +380,7 @@ class CDiskBlockIndex : public CBlockIndex | |||
|
|||
// Only read/write nSproutValue if the client version used to create | |||
// this index was storing them. | |||
if ((nType & SER_DISK) && (nVersion >= SPROUT_VALUE_VERSION)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous commit has errors nType
not defined here and doesn't compile.
This change might be moved to the previous commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh OK, this line wasn't in the bitcoin code, that's why it's still wrong here.
Previously we had both nVersion as a class parameter *and* a serialization argument, and in several inherited serializers the latter was set to the former, in order to pass the serialized object's version into underlying parsers. zcash#3180 pulled in the upstream changes to clean this up, and in doing so these lines became no-ops - setting the class parameter to itself. Clang throws warnings on this, which turn into errors on the MacOS builder. We can just remove these, because upstream already had done so in earlier PRs, indicating that they were not being relied on by underlying parsers.
postmerge ack |
src/key.h
Outdated
{ | ||
unsigned int len = ::ReadCompactSize(s); | ||
unsigned char code[BIP32_EXTKEY_SIZE]; | ||
s.read((char *)&code[0], len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This differs from the near-duplicate code in pubkey.h by not checking len
. Why is there near-duplicate code and why does it differ?
ssPriv >> privCheck; | ||
|
||
BOOST_CHECK(pubCheck == pubkeyNew); | ||
BOOST_CHECK(privCheck == keyNew); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test that reads an invalid serialization, e.g. with a wrong length.
* - T* indirect: a pointer to an array of capacity elements of type T | ||
* (only the first _size are initialized). | ||
* | ||
* The data type T must be movable by memmove/realloc(). Once we switch to C++, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably meant to say C++11.
}; | ||
|
||
private: | ||
size_type _size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document how this encodes the size: 29a8ade#diff-99a4340a7ab38fb07a2365d0e8d030daR263
T* indirect = indirect_ptr(0); | ||
T* src = indirect; | ||
T* dst = direct_ptr(0); | ||
memcpy(dst, src, size() * sizeof(T)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have UB if new_capacity < size()
. Is that intentional? I would add an assertion new_capacity >= size()
at the top if that is supposed to be a precondition of this method. (I know it's private, but still.)
src/serialize.h
Outdated
} | ||
|
||
template<typename Stream, typename Arg, typename... Args> | ||
void SerializeMany(Stream& s, int nType, int nVersion, Arg&& arg, Args&&... args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's that klunking sound I hear? Is it the sound of language features being duplicated?
if (!(nType & SER_GETHASH)) | ||
inline void SerializationOp(Stream& s, Operation ser_action) { | ||
int nVersion = s.GetVersion(); | ||
if (!(s.GetType() & SER_GETHASH)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the version excluded when computing the hash? Don't we want the hash to change if the version does?
(I know this is not affected by this commit.)
if (!(nType & SER_GETHASH)) | ||
inline void SerializationOp(Stream& s, Operation ser_action) { | ||
int nVersion = s.GetVersion(); | ||
if (!(s.GetType() & SER_GETHASH)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the version excluded when computing the hash? Don't we want the hash to change if the version does?
(I know this is not affected by this commit.)
@@ -380,7 +380,7 @@ class CDiskBlockIndex : public CBlockIndex | |||
|
|||
// Only read/write nSproutValue if the client version used to create | |||
// this index was storing them. | |||
if ((nType & SER_DISK) && (nVersion >= SPROUT_VALUE_VERSION)) { | |||
if ((s.GetType() & SER_DISK) && (nVersion >= SPROUT_VALUE_VERSION)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nVersion
here refers to s.GetVersion()
, which may be different from the block header version this->nVersion
. I suggest replacing nVersion
with s.GetVersion()
here and on line 346 to make it more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, although the local doesn't technically shadow the member variable, there is a way to refer to the member variable (as just nVersion
) that would usually be correct and here means something different. That's confusing.
template <typename Stream> | ||
void Unserialize(Stream& s) | ||
{ | ||
unsigned int len = ::ReadCompactSize(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This differs from the near-duplicate code in pubkey.h by not checking len
. Why is there near-duplicate code and why does it differ?
Remove now-unshadowed serialization lines that do nothing Previously we had both nVersion as a class parameter *and* a serialization argument, and in several inherited serializers the latter was set to the former, in order to pass the serialized object's version into underlying parsers. #3180 pulled in the upstream changes to clean this up, and in doing so these lines became no-ops - setting the class parameter to itself. Clang throws warnings on this, which turn into errors on the MacOS builder. We can just remove these, because upstream already had done so in earlier PRs, indicating that they were not being relied on by underlying parsers.
Originally excluded from zcash#3180 due to these PRs not having been backported yet.
Originally excluded from zcash#3180 due to these PRs not having been backported yet.
Backport of bitcoin/bitcoin#8020 Cherry-picked from bitcoin/bitcoin#8020 - cherry-pick bitcoin/bitcoin@a68ec21f7e - conflict with bitcoin/bitcoin@957e5d2 - cherry-pick bitcoin/bitcoin@8cc9cfe160 - cherry-pick bitcoin/bitcoin@382c871d28 - replace CCoinsKeyHasher with SaltedTxidHasher - cherry-pick bitcoin/bitcoin@0b1295b066 - self-conflict with previously-applied bitcoin/bitcoin@b8a657936 (#3180)
Cherry-picked from the following upstream PRs:
COMPACTSIZE
wrapper commitPart of #2074.