Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WT-10862 Support read-only fast-truncate data format change in MongoDB 6.0 release #9396

Merged
merged 10 commits into from Jul 21, 2023

Conversation

mm-ng
Copy link
Contributor

@mm-ng mm-ng commented Jul 13, 2023

No description provided.

@mm-ng mm-ng marked this pull request as ready for review July 14, 2023 05:19
@mm-ng mm-ng changed the title WT-10862 (read support only) WT-10862 Support read-only truncate data format change in MongoDB 6.0 release Jul 14, 2023
@mm-ng
Copy link
Contributor Author

mm-ng commented Jul 14, 2023

@agorrod @kommiharibabu Would appreciate any feedback you might have on this approach. I'm also not sure if this is the right point in time to think about upgrade/downgrade testing. If so, I can look into doing some manual checks.

src/include/connection.h Outdated Show resolved Hide resolved
src/include/cell_inline.h Outdated Show resolved Hide resolved
Copy link
Member

@agorrod agorrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Monica - this looks like it's in the right direction.

I've got a question about the implementation, and I wonder about testing. Does our existing compatibility testing cover this backport, or do we need to add something specific?

@@ -845,6 +848,28 @@ __wt_cell_unpack_safe(WT_SESSION_IMPL *session, const WT_PAGE_HEADER *dsk, WT_CE
break;
}

/*
* Unpack any fast-truncate information. Note that there is no way to write fast-truncate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just skip the fast truncate information, rather than unpacking it?

I guess it's a question of whether information that was generated by a newer release should be parsed out and maybe, therefore, alter decisions made at runtime.

It feels as though it is going to be hard to test that combination (i.e: run with a newer version, generate fast truncate information, downgrade, read fast truncate information, ensure correct decisions are being made).

So, I'd be inclined towards ignoring the information as early as possible, rather than using it to populate data structures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Alex, but I believe we need to unpack it, otherwise, we may get the problem. Instead of saving the unpacked data into the unpack_addr, just save them into a local variable and ignore them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't believe it's possible to simply skip the information - I think it could disrupt other information that is being unpacked as it would affect the position of the pointer. I can definitely unpack it into a local WT_PAGE_DELETED structure rather than into a structure inside the unpack_addr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this logic into a dedicated function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quchenhao Would there be any specific advantages of having a separate function for this? We're only unpacking the information once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make the code cleaner.

@mm-ng
Copy link
Contributor Author

mm-ng commented Jul 17, 2023

Thanks Monica - this looks like it's in the right direction.

I've got a question about the implementation, and I wonder about testing. Does our existing compatibility testing cover this backport, or do we need to add something specific?

I don't think our existing compatibility testing would cover this scenario. From what I understand, our testing takes changes to the develop branch and tests them against older branches.

It feels as though it is going to be hard to test that combination (i.e: run with a newer version, generate fast truncate information, downgrade, read fast truncate information, ensure correct decisions are being made).

Yes, I think any sort of testing may have to be manual. I've done a basic test of generating fast truncate information with a newer version, downgrading, and then attempting to verify the database file. I didn't run into any problems but I also suspect it might not be hitting the unpack logic.

Comment on lines 862 to 869
page_del.prepare_state = 0; /* No prepare can have been in progress. */
page_del.previous_state = WT_REF_DISK; /* The leaf page is on disk. */
page_del.committed = 1; /* There is no running transaction. */

/* Avoid a stale transaction ID on restart. */
if (dsk->write_gen <= S2BT(session)->base_write_gen &&
!F_ISSET(session, WT_SESSION_DEBUG_DO_NOT_CLEAR_TXN_ID))
page_del.txnid = WT_TXN_NONE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is no longer required as we are just unpacking the existing page_del info from the cell.

Copy link
Contributor

@quchenhao quchenhao Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need this information in the downgraded version. If the fast truncate is stable and visible, the downgraded version should be able to read that and return not found for any data in the subtree. I don't think we can directly ignore this information. @kommiharibabu @agorrod @mm-ng I think this change may not be enough. In addition, we also need the code in 6.0 to cleanup the fast truncated tree.

I assume we will always downgrade from a clean shut down. Therefore, there is no need to handle the fast truncate information in rts. But I think we better to still do that unless we have a way to prevent the user to directly downgrade from an unclean shutdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quchenhao My intention for this change was only to be able to read database files from newer versions which may or may not write the additional fast-truncate information. There would be no guarantee of whether all the fast-truncate behavior would be the same as newer versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should address the issue of a potential disk leak elsewhere, as it should also be backported to older versions and I feel that it is separate from this change. The disk leak should not be an issue currently as MongoDB does not write out fast-truncate information. I've created a new ticket WT-11321 to track the issue, and it should be completed before WT-10307 but doesn't need to be completed for this current ticket.

@@ -197,6 +198,8 @@ struct __wt_cell_unpack_addr {
WT_CELL_COMMON_FIELDS;

WT_TIME_AGGREGATE ta; /* Address validity window */

WT_PAGE_DELETED page_del; /* Fast-truncate information */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the latest code change, this variable also no longer needed.

WT_RET(__wt_vunpack_uint(&p, end == NULL ? 0 : WT_PTRDIFF(end, p), &page_del.timestamp));
WT_RET(
__wt_vunpack_uint(&p, end == NULL ? 0 : WT_PTRDIFF(end, p), &page_del.durable_timestamp));
page_del.prepare_state = 0; /* No prepare can have been in progress. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use WT_PREPARE_INIT flag here.

__wt_vunpack_uint(&p, end == NULL ? 0 : WT_PTRDIFF(end, p), &page_del.durable_timestamp));
page_del.prepare_state = 0; /* No prepare can have been in progress. */
page_del.previous_state = WT_REF_DISK; /* The leaf page is on disk. */
page_del.committed = 1; /* There is no running transaction. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use true instead of 1.

Copy link
Contributor

@kommiharibabu kommiharibabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Monica. I want you to perform some manual downgrade/upgrade scenario before merging the change.

@mm-ng
Copy link
Contributor Author

mm-ng commented Jul 20, 2023

LGTM. Thanks Monica. I want you to perform some manual downgrade/upgrade scenario before merging the change.

Thanks Hari. I have performed some manual upgrade/downgrade testing and have described the steps taken in the Jira ticket.

@agorrod @quchenhao let me know if you have any further thoughts on the PR, otherwise I think this can be merged soon.

@mm-ng mm-ng requested review from agorrod and quchenhao July 20, 2023 03:12
Copy link
Contributor

@quchenhao quchenhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@agorrod agorrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more comfortable knowing that there was automated testing covering this scenario somewhere.

I'm guessing that automated testing actually belongs in develop (or latest), since it involves running with that version as well as this older version. Could you create a new ticket that captures upgrade/downgrade testing with fast truncate aimed at ensuring this backport is correct?

((*upd_entry)->type == WT_UPDATE_TOMBSTONE && (*upd_entry)->txnid == WT_TS_NONE &&
(*upd_entry)->start_ts == WT_TS_NONE)) ||
((*upd_entry)->type == WT_UPDATE_TOMBSTONE &&
(*upd_entry)->txnid == WT_TXN_NONE && (*upd_entry)->start_ts == WT_TS_NONE)) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch - do we need to be concerned about this? It looks like a bug regardless of the backport here?

Copy link
Contributor Author

@mm-ng mm-ng Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it looks like a typo that I decided to just fix here. I think it's actually ok because WT_TS_NONE and WT_TXN_NONE are actually the same value in the codebase, so I don't know if we should call it a bug. It's just confusing to read in the condition.

/*
* Unpack any fast-truncate information. Note that there is no way to write fast-truncate
* information to disk in versions before 6.1, but this information may still exist in the
* database files if we are downgrading from newer versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves another sentence along the lines of "If the information is present, skip past it and ignore the values. That is the equivalent of assuming the content is globally visible, which matches the existing behavior in 6.1."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agorrod I don't quite understand the second part of your sentence when you say it matches the existing behavior in 6.1. Do you mean that by ignoring the fast truncate information in 6.0, and is equivalent to how 6.1 treats globally visible truncates?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I just had my versions wrong. I meant ignoring version information is equivalent to the version that doesn't ever include visibility when writing out truncated content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agorrod I've updated the comment in 20f851a, let me know if you'd like me to make any more changes to the wording.

@mm-ng
Copy link
Contributor Author

mm-ng commented Jul 20, 2023

I'd be more comfortable knowing that there was automated testing covering this scenario somewhere.

I'm guessing that automated testing actually belongs in develop (or latest), since it involves running with that version as well as this older version. Could you create a new ticket that captures upgrade/downgrade testing with fast truncate aimed at ensuring this backport is correct?

I can definitely create a ticket to add in automated testing for fast-truncate if that is what you are after.

Edit: Created WT-11331.

@mm-ng mm-ng requested a review from agorrod July 20, 2023 22:58
@mm-ng mm-ng changed the title WT-10862 Support read-only truncate data format change in MongoDB 6.0 release WT-10862 Support read-only fast-truncate data format change in MongoDB 6.0 release Jul 21, 2023
Copy link
Member

@agorrod agorrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mm-ng
Copy link
Contributor Author

mm-ng commented Jul 21, 2023

evergreen merge

@mm-ng mm-ng merged commit 6b52dd5 into mongodb-6.0 Jul 21, 2023
5 checks passed
@mm-ng mm-ng deleted the wt-10862-truncate-read-support-only branch July 21, 2023 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants