-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Make log servers return an empty version range only when it is correct to do so #12188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
an empty version range (on peeks) only when it is correct to do so. This is so the receiver will receive all versions (even though the sender is sending an empty version range) that it is supposed to receive. Changes: - Make the cluster controller collect the set of log servers participated in recovery and propagate that information to the other processes. - Extend the ServerPeekCursor to take a flag that tells the source log server whether it can return an empty version range or not (in the context of version vector/unicast), and make the source log server return an empty version range only when this flag is set. - Make the peek APIs set the flag appropriately when initializing ServerPeekCursors. - Also, take the internal logic used by SetPeekCursor and MergedPeekCursor into account while initializing the above mentioned flag.
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
int bestSet, | ||
int& bestServer, | ||
Optional<Version> end, | ||
Optional<std::map<uint8_t, std::vector<uint16_t>>> knownLockedTLogIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo - do both these have to be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TagPartitionedLogSystem::peekLogRouter(), an invoker of this, takes these arguments as Optional. Outside the recovery context, I don't think these arguments need to be specified so I think it makes sense to have these as optional. Also I plan to feed the locked server list from the serialized TagPartitionedLogSystem to this method from TagPartitionedLogSystem::peekLogRouter() in a follow up PR, so I think it makes sense to have this argument as Optional in that sense too. The "Optional end" argument has a value assigned to it in both the contexts that this method is currently getting called from, so it may not need to be Optional (but we may not want to change this purely based on the current invokers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more context, I added the argument Optional<Version> end
to TagPartitionedLogSystem::peekLogRouter()
. In that iteration end
would have to be present and not equal to -1 to indicate we are in recovery and should return up to end
.
It would be good to not add any extra work for the non-version-vector case. ie. only call resetBestServerIfNotLocked
if version vector is enabled, as was done for resetBestServerIfNotAvailable()
I think knownLockedTLogIds
could be a reference, maybe const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1997,11 +2038,13 @@ Optional<DurableVersionInfo> TagPartitionedLogSystem::getDurableVersion(UID dbgi | |||
std::vector<TLogLockResult> results; | |||
std::string sServerState; | |||
LocalityGroup unResponsiveSet; | |||
std::vector<uint16_t> lockedTLogIds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can lockedTLogIds.reserve(logSet->logServers.size());
to prevent repeated reallocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
results[new_safe_range_begin].end, | ||
results, | ||
failedLogsCompletePolicy, | ||
lockedTLogIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid copying the vector into the constructor here, I think you can define DurableVersionInfo::std::vector<uint16_t> knownLockedTLogIds;
to be DurableVersionInfo::std::vector<uint16_t> &&knownLockedTLogIds;
. Or alternatively, make a constructor for DurableVersionInfo(...lockedTLogIds)
that sets lockedTLogIds = std::move(lockedTLogIds);
The same could be done with results
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Done now.
|
||
TLogPeekRequest(Version begin, | ||
Tag tag, | ||
bool returnIfBlocked, | ||
bool onlySpilled, | ||
Optional<std::pair<UID, int>> sequence = Optional<std::pair<UID, int>>(), | ||
Optional<Version> end = Optional<Version>()) | ||
Optional<Version> end = Optional<Version>(), | ||
Optional<bool> returnEmptyIfStopped = Optional<bool>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the new variable returnEmptyIfStopped
necessary? Prior to the PR, Optional end
being present indicated "return empty if stopped" and end
should only be set in that case. The Optional end
could be renamed to clarify that intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, aded a @todo item for this in the code. Will investigate this in a follow up PR, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit I think fdb's convention is to use "SOMEDAY" to easily find todos
int bestSet, | ||
int& bestServer, | ||
Optional<Version> end, | ||
Optional<std::map<uint8_t, std::vector<uint16_t>>> knownLockedTLogIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more context, I added the argument Optional<Version> end
to TagPartitionedLogSystem::peekLogRouter()
. In that iteration end
would have to be present and not equal to -1 to indicate we are in recovery and should return up to end
.
It would be good to not add any extra work for the non-version-vector case. ie. only call resetBestServerIfNotLocked
if version vector is enabled, as was done for resetBestServerIfNotAvailable()
I think knownLockedTLogIds
could be a reference, maybe const.
Tag tag, | ||
bool useSatellite, | ||
Optional<Version> end, | ||
Optional<std::map<uint8_t, std::vector<uint16_t>>> knownStoppedTLogIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think knownStoppedTLogIds
can be a reference, maybe pipe it through as a const as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (I think we have to make it a const in order to make it a reference, as we are assigning a default value to it in the constructor).
quorumResults.push_back(quorum(tLogCommitResults, tLogCommitResults.size() - it->tLogWriteAntiQuorum)); | ||
logGroupLocal++; | ||
} | ||
|
||
return minVersionWhenReady(waitForAll(quorumResults), allReplies); | ||
} | ||
|
||
void TagPartitionedLogSystem::resetBestServerIfNotLocked( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a few comments describing this function's purpose, so maintainers do not have to dig through PR history, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - this phase is repeated: "the peek logic"
This is so the peek logic the peek logic will
mention explicitly this only applies during recovery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -667,13 +668,31 @@ Future<Version> TagPartitionedLogSystem::push(const ILogSystem::PushVersionSet& | |||
tLogCommitResults.push_back(commitSuccess); | |||
location++; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2656,6 +2705,7 @@ ACTOR Future<Void> TagPartitionedLogSystem::recruitOldLogRouters(TagPartitionedL | |||
req.tLogPolicy = tLogPolicy; | |||
req.locality = locality; | |||
req.recoverAt = self->recoverAt.get(); | |||
req.knownLockedTLogIds = self->knownLockedTLogIds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do the same for log router request below as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to for the same reason that I mentioned above (that they are for previous/old epochs).
@@ -708,8 +731,10 @@ Reference<ILogSystem::IPeekCursor> TagPartitionedLogSystem::peekAll(UID dbgid, | |||
.detail("Begin", begin) | |||
.detail("End", end) | |||
.detail("BestLogs", localSets[bestSet]->logServerString()); | |||
int bestServer = localSets[bestSet]->bestLocationFor(tag); | |||
resetBestServerIfNotLocked(bestSetIdx, bestServer, end, knownLockedTLogIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other call sites where SetPeekCursor
is used. Should we call this funcation at those places as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to. At a high level, the logic is that we will need to call this only when we are peeking from the latest epoch. The reason is that the peek's end-epoch is equal to the recovery version for the latest epoch and all tLog's in that epoch may not have received mutations/versions till the recovery version. But the end-epoch of a previous/old epoch is equal to the "knownCommittedVersion" of that epoch, and all tLogs in that epoch are guaranteed to have all (relevant) versions till that version. So no need to call this on an old epoch.
@@ -564,6 +573,17 @@ Version ILogSystem::ServerPeekCursor::popped() const { | |||
return poppedVersion; | |||
} | |||
|
|||
void resetBestServerIfNotAvailable(std::vector<Reference<AsyncVar<OptionalInterface<TLogInterface>>>> const& logServers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add static
to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -594,9 +614,17 @@ ILogSystem::MergedPeekCursor::MergedPeekCursor( | |||
logSet->updateLocalitySet(logSet->tLogLocalities); | |||
} | |||
|
|||
if (SERVER_KNOBS->ENABLE_VERSION_VECTOR_TLOG_UNICAST) { | |||
resetBestServerIfNotAvailable(logServers, bestServer, end); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: Need to do "this->bestServer = bestServer;" after invoking "resetBestServerIfNotAvailable()".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
end(end) { | ||
if (SERVER_KNOBS->ENABLE_VERSION_VECTOR_TLOG_UNICAST && bestSet >= 0) { | ||
resetBestServerIfNotAvailable(logSets[bestSet]->logServers, bestServer, end); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same change here too: Need to do "this->bestServer = bestServer;" after invoking "resetBestServerIfNotAvailable()".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the final version of the PR I think renaming the parameter would be cleanest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit- instead of having a parameter bestServer
with the same name as the member variable bestServer
, rename the parameter to be bestServerIn
or something, this way you do not need line 893 and avoids a little confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, a few nits
end(end) { | ||
if (SERVER_KNOBS->ENABLE_VERSION_VECTOR_TLOG_UNICAST && bestSet >= 0) { | ||
resetBestServerIfNotAvailable(logSets[bestSet]->logServers, bestServer, end); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit- instead of having a parameter bestServer
with the same name as the member variable bestServer
, rename the parameter to be bestServerIn
or something, this way you do not need line 893 and avoids a little confusion.
quorumResults.push_back(quorum(tLogCommitResults, tLogCommitResults.size() - it->tLogWriteAntiQuorum)); | ||
logGroupLocal++; | ||
} | ||
|
||
return minVersionWhenReady(waitForAll(quorumResults), allReplies); | ||
} | ||
|
||
void TagPartitionedLogSystem::resetBestServerIfNotLocked( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - this phase is repeated: "the peek logic"
This is so the peek logic the peek logic will
mention explicitly this only applies during recovery?
|
||
TLogPeekRequest(Version begin, | ||
Tag tag, | ||
bool returnIfBlocked, | ||
bool onlySpilled, | ||
Optional<std::pair<UID, int>> sequence = Optional<std::pair<UID, int>>(), | ||
Optional<Version> end = Optional<Version>()) | ||
Optional<Version> end = Optional<Version>(), | ||
Optional<bool> returnEmptyIfStopped = Optional<bool>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit I think fdb's convention is to use "SOMEDAY" to easily find todos
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
In the context of version vector/unicast, make log servers return an empty version range (on peeks) only when it is correct to do so. This is so the receiver will receive all versions (even though the sender is sending an empty version range) that it is supposed to receive.
Changes:
Testing:
Joshua id (with version vector disabled): 20250714-225445-sre-c46e945698f06a44 (no failures).
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)