New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WT-10862 Support read-only fast-truncate data format change in MongoDB 6.0 release #9396
Conversation
@agorrod @kommiharibabu Would appreciate any feedback you might have on this approach. I'm also not sure if this is the right point in time to think about upgrade/downgrade testing. If so, I can look into doing some manual checks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Monica - this looks like it's in the right direction.
I've got a question about the implementation, and I wonder about testing. Does our existing compatibility testing cover this backport, or do we need to add something specific?
@@ -845,6 +848,28 @@ __wt_cell_unpack_safe(WT_SESSION_IMPL *session, const WT_PAGE_HEADER *dsk, WT_CE | |||
break; | |||
} | |||
|
|||
/* | |||
* Unpack any fast-truncate information. Note that there is no way to write fast-truncate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just skip the fast truncate information, rather than unpacking it?
I guess it's a question of whether information that was generated by a newer release should be parsed out and maybe, therefore, alter decisions made at runtime.
It feels as though it is going to be hard to test that combination (i.e: run with a newer version, generate fast truncate information, downgrade, read fast truncate information, ensure correct decisions are being made).
So, I'd be inclined towards ignoring the information as early as possible, rather than using it to populate data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Alex, but I believe we need to unpack it, otherwise, we may get the problem. Instead of saving the unpacked data into the unpack_addr, just save them into a local variable and ignore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't believe it's possible to simply skip the information - I think it could disrupt other information that is being unpacked as it would affect the position of the pointer. I can definitely unpack it into a local WT_PAGE_DELETED structure rather than into a structure inside the unpack_addr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this logic into a dedicated function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quchenhao Would there be any specific advantages of having a separate function for this? We're only unpacking the information once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make the code cleaner.
I don't think our existing compatibility testing would cover this scenario. From what I understand, our testing takes changes to the develop branch and tests them against older branches.
Yes, I think any sort of testing may have to be manual. I've done a basic test of generating fast truncate information with a newer version, downgrading, and then attempting to verify the database file. I didn't run into any problems but I also suspect it might not be hitting the unpack logic. |
src/include/cell_inline.h
Outdated
page_del.prepare_state = 0; /* No prepare can have been in progress. */ | ||
page_del.previous_state = WT_REF_DISK; /* The leaf page is on disk. */ | ||
page_del.committed = 1; /* There is no running transaction. */ | ||
|
||
/* Avoid a stale transaction ID on restart. */ | ||
if (dsk->write_gen <= S2BT(session)->base_write_gen && | ||
!F_ISSET(session, WT_SESSION_DEBUG_DO_NOT_CLEAR_TXN_ID)) | ||
page_del.txnid = WT_TXN_NONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is no longer required as we are just unpacking the existing page_del
info from the cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need this information in the downgraded version. If the fast truncate is stable and visible, the downgraded version should be able to read that and return not found for any data in the subtree. I don't think we can directly ignore this information. @kommiharibabu @agorrod @mm-ng I think this change may not be enough. In addition, we also need the code in 6.0 to cleanup the fast truncated tree.
I assume we will always downgrade from a clean shut down. Therefore, there is no need to handle the fast truncate information in rts. But I think we better to still do that unless we have a way to prevent the user to directly downgrade from an unclean shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quchenhao My intention for this change was only to be able to read database files from newer versions which may or may not write the additional fast-truncate information. There would be no guarantee of whether all the fast-truncate behavior would be the same as newer versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should address the issue of a potential disk leak elsewhere, as it should also be backported to older versions and I feel that it is separate from this change. The disk leak should not be an issue currently as MongoDB does not write out fast-truncate information. I've created a new ticket WT-11321 to track the issue, and it should be completed before WT-10307 but doesn't need to be completed for this current ticket.
src/include/cell.h
Outdated
@@ -197,6 +198,8 @@ struct __wt_cell_unpack_addr { | |||
WT_CELL_COMMON_FIELDS; | |||
|
|||
WT_TIME_AGGREGATE ta; /* Address validity window */ | |||
|
|||
WT_PAGE_DELETED page_del; /* Fast-truncate information */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the latest code change, this variable also no longer needed.
src/include/cell_inline.h
Outdated
WT_RET(__wt_vunpack_uint(&p, end == NULL ? 0 : WT_PTRDIFF(end, p), &page_del.timestamp)); | ||
WT_RET( | ||
__wt_vunpack_uint(&p, end == NULL ? 0 : WT_PTRDIFF(end, p), &page_del.durable_timestamp)); | ||
page_del.prepare_state = 0; /* No prepare can have been in progress. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use WT_PREPARE_INIT flag here.
src/include/cell_inline.h
Outdated
__wt_vunpack_uint(&p, end == NULL ? 0 : WT_PTRDIFF(end, p), &page_del.durable_timestamp)); | ||
page_del.prepare_state = 0; /* No prepare can have been in progress. */ | ||
page_del.previous_state = WT_REF_DISK; /* The leaf page is on disk. */ | ||
page_del.committed = 1; /* There is no running transaction. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use true instead of 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Monica. I want you to perform some manual downgrade/upgrade scenario before merging the change.
Thanks Hari. I have performed some manual upgrade/downgrade testing and have described the steps taken in the Jira ticket. @agorrod @quchenhao let me know if you have any further thoughts on the PR, otherwise I think this can be merged soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be more comfortable knowing that there was automated testing covering this scenario somewhere.
I'm guessing that automated testing actually belongs in develop (or latest), since it involves running with that version as well as this older version. Could you create a new ticket that captures upgrade/downgrade testing with fast truncate aimed at ensuring this backport is correct?
((*upd_entry)->type == WT_UPDATE_TOMBSTONE && (*upd_entry)->txnid == WT_TS_NONE && | ||
(*upd_entry)->start_ts == WT_TS_NONE)) || | ||
((*upd_entry)->type == WT_UPDATE_TOMBSTONE && | ||
(*upd_entry)->txnid == WT_TXN_NONE && (*upd_entry)->start_ts == WT_TS_NONE)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch - do we need to be concerned about this? It looks like a bug regardless of the backport here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it looks like a typo that I decided to just fix here. I think it's actually ok because WT_TS_NONE
and WT_TXN_NONE
are actually the same value in the codebase, so I don't know if we should call it a bug. It's just confusing to read in the condition.
src/include/cell_inline.h
Outdated
/* | ||
* Unpack any fast-truncate information. Note that there is no way to write fast-truncate | ||
* information to disk in versions before 6.1, but this information may still exist in the | ||
* database files if we are downgrading from newer versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this deserves another sentence along the lines of "If the information is present, skip past it and ignore the values. That is the equivalent of assuming the content is globally visible, which matches the existing behavior in 6.1."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agorrod I don't quite understand the second part of your sentence when you say it matches the existing behavior in 6.1. Do you mean that by ignoring the fast truncate information in 6.0, and is equivalent to how 6.1 treats globally visible truncates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I just had my versions wrong. I meant ignoring version information is equivalent to the version that doesn't ever include visibility when writing out truncated content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely create a ticket to add in automated testing for fast-truncate if that is what you are after. Edit: Created WT-11331. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
evergreen merge |
No description provided.