Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] PITR - Single file compaction does not reset Hybrid Time filter set during restore #17261

Closed
1 task done
sanketkedia opened this issue May 10, 2023 · 0 comments
Closed
1 task done
Assignees
Projects

Comments

@sanketkedia
Copy link
Contributor

sanketkedia commented May 10, 2023

Jira Link: DB-6496

Description

If there's only one input file for compaction then the output file generated by compaction still contains the hybrid time filter. It is not reset. This is because the largest and smallest frontiers are cloned as is from the first input file and then iterating over the subsequent input files, we update them. During update, we reset the filter. So if there's only one input file the filter is not reset.

for (size_t level_idx = 0; level_idx < compact_->compaction->num_input_levels(); level_idx++) {
    for (FileMetaData *fmd : *compact_->compaction->inputs(level_idx) ) {
      out.meta.UpdateBoundariesExceptKey(fmd->smallest, UpdateBoundariesType::kSmallest);
      out.meta.UpdateBoundariesExceptKey(fmd->largest, UpdateBoundariesType::kLargest);
    }
}
void UserFrontier::Update(const UserFrontier* rhs, UpdateUserValueType type, UserFrontierPtr* out) {
  if (!rhs) {
    return;
  }
  if (*out) {
    (**out).Update(*rhs, type);
  } else {
    *out = rhs->Clone();
  }
}

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@sanketkedia sanketkedia added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels May 10, 2023
@sanketkedia sanketkedia self-assigned this May 10, 2023
@sanketkedia sanketkedia added this to To do in PITR via automation May 10, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 10, 2023
sanketkedia added a commit that referenced this issue May 12, 2023
…e filter

Summary:
If there's only one input file for compaction then the output file generated by compaction still contains the hybrid time filter. It is not reset. This is because the largest and smallest
frontiers are cloned as is from the first input file and then iterating over the subsequent input files, we update them. During update, we reset the filter. So if there's only one input file
the filter is not reset.

This is a perf fix rather than correctness. If we have this filter then we go through the hassle of creating a filtering iterator wrapper over the regular rocksdb iterator and comparing every key against it which is not necessary given that there are no records in the file that will match
Jira: DB-6496

Test Plan:
ybd release --cxx_test docdb-test --gtest-filter DocDBTests/DocDBTestWrapper.SetHybridTimeFilterSingleFile/0
ybd release --cxx_test docdb-test --gtest-filter DocDBTests/DocDBTestWrapper.SetHybridTimeFilterSingleFile/1

Reviewers: bogdan, sergei

Reviewed By: sergei

Subscribers: zdrudi, mhaddad, slingam, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D25216
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label May 16, 2023
sanketkedia added a commit that referenced this issue Jun 13, 2023
…eset Hybrid Time filter

Summary:
Original commit: 639133f / D25216
If there's only one input file for compaction then the output file generated by compaction still contains the hybrid time filter. It is not reset. This is because the largest and smallest
frontiers are cloned as is from the first input file and then iterating over the subsequent input files, we update them. During update, we reset the filter. So if there's only one input file
the filter is not reset.

This is a perf fix rather than correctness. If we have this filter then we go through the hassle of creating a filtering iterator wrapper over the regular rocksdb iterator and comparing every key against it which is not necessary given that there are no records in the file that will match
Jira: DB-6496

Test Plan:
ybd release --cxx_test docdb-test --gtest-filter DocDBTests/DocDBTestWrapper.SetHybridTimeFilterSingleFile/0
ybd release --cxx_test docdb-test --gtest-filter DocDBTests/DocDBTestWrapper.SetHybridTimeFilterSingleFile/1

Reviewers: sergei, bogdan

Reviewed By: bogdan

Subscribers: ybase, slingam, mhaddad, zdrudi

Differential Revision: https://phorge.dev.yugabyte.com/D26093
sanketkedia added a commit that referenced this issue Jun 13, 2023
…eset Hybrid Time filter

Summary:
Original commit: 639133f / D25216
If there's only one input file for compaction then the output file generated by compaction still contains the hybrid time filter. It is not reset. This is because the largest and smallest
frontiers are cloned as is from the first input file and then iterating over the subsequent input files, we update them. During update, we reset the filter. So if there's only one input file
the filter is not reset.

This is a perf fix rather than correctness. If we have this filter then we go through the hassle of creating a filtering iterator wrapper over the regular rocksdb iterator and comparing every key against it which is not necessary given that there are no records in the file that will match
Jira: DB-6496

Test Plan:
ybd release --cxx_test docdb-test --gtest-filter DocDBTests/DocDBTestWrapper.SetHybridTimeFilterSingleFile/0
ybd release --cxx_test docdb-test --gtest-filter DocDBTests/DocDBTestWrapper.SetHybridTimeFilterSingleFile/1

Reviewers: sergei, bogdan

Reviewed By: bogdan

Subscribers: zdrudi, mhaddad, slingam, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D26092
sanketkedia added a commit that referenced this issue Jun 13, 2023
…eset Hybrid Time filter

Summary:
Original commit: 639133f / D25216
If there's only one input file for compaction then the output file generated by compaction still contains the hybrid time filter. It is not reset. This is because the largest and smallest
frontiers are cloned as is from the first input file and then iterating over the subsequent input files, we update them. During update, we reset the filter. So if there's only one input file
the filter is not reset.

This is a perf fix rather than correctness. If we have this filter then we go through the hassle of creating a filtering iterator wrapper over the regular rocksdb iterator and comparing every key against it which is not necessary given that there are no records in the file that will match
Jira: DB-6496

Test Plan:
ybd release --cxx_test docdb-test --gtest-filter DocDBTests/DocDBTestWrapper.SetHybridTimeFilterSingleFile/0
ybd release --cxx_test docdb-test --gtest-filter DocDBTests/DocDBTestWrapper.SetHybridTimeFilterSingleFile/1

Reviewers: sergei, bogdan

Reviewed By: bogdan

Subscribers: ybase, slingam, mhaddad, zdrudi

Differential Revision: https://phorge.dev.yugabyte.com/D26091
PITR automation moved this from To do to Done Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
PITR
Done
Development

No branches or pull requests

3 participants