Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix getGarbageLen to retrun correct size #714

Merged
merged 9 commits into from Jan 4, 2024
Merged

Conversation

devleejb
Copy link
Member

@devleejb devleejb commented Jan 3, 2024

What this PR does / why we need it?

  • To ensure that the getGarbageLen function returns the correct length, I used a Set to eliminate duplicate counting
  • Add test case for getGarbageLen

Any background context you want to provide?

Refer to issue yorkie/#688 for details.

What are the relevant tickets?

Fixes # yorkie-team/yorkie#688

Checklist

  • Added relevant tests or not required
  • Didn't break anything

Copy link

codecov bot commented Jan 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5ef1e8f) 81.75% compared to head (4b8a34c) 81.76%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #714   +/-   ##
=======================================
  Coverage   81.75%   81.76%           
=======================================
  Files          57       57           
  Lines        4161     4163    +2     
  Branches      805      805           
=======================================
+ Hits         3402     3404    +2     
  Misses        502      502           
  Partials      257      257           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

It seems like an intuitive solution. Because of Set, we use space for GC targets. but I'm curious if there are any ideas for further improvement.

CC) @justiceHui

@devleejb
Copy link
Member Author

devleejb commented Jan 3, 2024

Improved time complexity for getGarbageLen using sort

Discussed this topic with @justiceHui .

@devleejb
Copy link
Member Author

devleejb commented Jan 3, 2024

I have executed the benchmark, and the results have changed as shown below. I am curious to know if there is a significant difference that can be observed.

Before sorting
image

After sorting
image

@devleejb
Copy link
Member Author

devleejb commented Jan 3, 2024

@hackerwins @justiceHui
I have newly written a benchmark to measure the performance of getGarbageLen with @justiceHui . To simulate the worst-case scenario, I created a scenario where deeply nested objects are generated and deleted starting from the children.

I expect the following code will pass the tests successfully. However, although getGarbageLen operates as expected, garbageCollect returns a value of 2^(size - 1) + size. I would like to discuss the reasons behind this discrepancy.

Additionally, when running the test without reversing the childList, garbageCollect returns 2^size.

const benchmarkGCLen = (size: number) => {
  const doc = new Document<{ a: any }>('test-doc');

  doc.update((root) => {
    let temp = root;
    for (let i = 0; i < size; i++) {
      temp.a = {};
      temp = temp.a;
    }
  });
  doc.update((root) => {
    let temp = root;
    const childList = [];
    for (let i = 0; i < size; i++) {
      temp.a = {};
      childList.push(temp);
      temp = temp.a;
    }

    for (const child of childList.reverse()) {
      delete child.a;
    }
  });
  const expect = 2 * size;
  assert.equal(expect, doc.getGarbageLen());
  assert.equal(expect, doc.garbageCollect(MaxTimeTicket));
};

@devleejb
Copy link
Member Author

devleejb commented Jan 4, 2024

@hackerwins As discussed during today's daily sync, I have restored the commit to the state where duplicate counting is eliminated using a Set.

Regarding performance optimization, we agreed to proceed after @chacha912 submit the related issue.

Copy link
Contributor

@chacha912 chacha912 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! I left a brief comment.
And thanks for reporting the bug in the GC logic. I will create a separate issue for this bug.

test/integration/gc_test.ts Outdated Show resolved Hide resolved
@justiceHui
Copy link
Member

justiceHui commented Jan 4, 2024

I modified the benchmark code with @devleejb test for the worst-case scenario of getGarbageLen using the Set, and benchmarked it with O(n log n) time implementation using sorting. In the image below, GC represents the getGarbageLen using Set, and new GC represents using sorting.

In this scenario, garbageCollect returns expected value, but if delete nodes without reverse the childList, it will return 2^(size-1).

const benchmarkSkewTreeGC = (size: number) => {
  const doc = new Document<{ a: any }>('test-doc');

  doc.update((root) => {
    let temp = root;
    const childList = [];
    for (let i = 0; i < size; i++) {
      temp.a = {};
      childList.push(temp);
      temp = temp.a;
    }
    for (const child of childList.reverse()) {
      delete child.a;
    }
  });

  assert.equal(size, doc.getGarbageLen());
  assert.equal(size, doc.garbageCollect(MaxTimeTicket));
};
image

Copy link
Contributor

@chacha912 chacha912 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@hackerwins hackerwins merged commit 1c65dd9 into yorkie-team:main Jan 4, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants