Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getGarbageLen() does not match the actual number of elements for garbage collection #688

Closed
chacha912 opened this issue Nov 23, 2023 · 5 comments · Fixed by #743
Closed
Assignees
Labels
bug 🐞 Something isn't working good first issue 🐤 Good for newcomers sdk ⚒️

Comments

@chacha912
Copy link
Contributor

What happened:

The getGarbageLen() function iterates through elements in root.removedElementSetByCreatedAt, counting the descendants' numbers. This causes elements to be counted multiple times, leading to a difference between the count provided by getGarbageLen() and the actual number of elements being garbage collected.

What you expected to happen:

The counts by getGarbageLen() and the actual number of elements being garbage collected should match.

How to reproduce it (as minimally and precisely as possible):

Execute the following test code in the yorkie-js-sdk:

// gc-test.ts
it.only('gc-test', async function ({
  task,
}) {
  type TestDoc = { point?: { x?: number } };
  const docKey = toDocKey(`${task.name}-${new Date().getTime()}`);
  const doc1 = new yorkie.Document<TestDoc>(docKey);
  const doc2 = new yorkie.Document<TestDoc>(docKey);

  const client1 = new yorkie.Client(testRPCAddr);
  const client2 = new yorkie.Client(testRPCAddr);

  await client1.activate();
  await client2.activate();

  await client1.attach(doc1, { isRealtimeSync: false });
  doc1.update((root) => (root.point = { x: 0 }));
  await client1.sync();
  await client2.attach(doc2, { isRealtimeSync: false });

  doc1.update((root) => {
    delete root.point;
  });
  assert.equal(doc1.getGarbageLen(), 2);

  doc2.update((root) => {
    delete root.point?.x;
  });
  assert.equal(doc2.getGarbageLen(), 1);

  await client1.sync();
  await client2.sync();
  await client1.sync();

  // to-be: getGarbageLen() should also be 2.
  assert.equal(doc1.getGarbageLen(), 3);
  assert.equal(doc2.getGarbageLen(), 3);

  assert.equal(doc1.garbageCollect(MaxTimeTicket), 2);
  assert.equal(doc2.garbageCollect(MaxTimeTicket), 2);

  await client1.deactivate();
  await client2.deactivate();
});

Anything else we need to know?:

Environment:

  • Operating system:
  • Browser and version:
  • Yorkie version (use yorkie version): v0.4.8
  • Yorkie JS SDK version: v0.4.8
@chacha912 chacha912 added the bug 🐞 Something isn't working label Nov 23, 2023
@hackerwins hackerwins added the good first issue 🐤 Good for newcomers label Dec 5, 2023
@devleejb
Copy link
Member

devleejb commented Jan 2, 2024

Looking at the code below, I think your test case generates overlap between list this.removedElementSetByCreatedAt and list this.removedElementSetByCreatedAt element's descendants. Is it the right reason for the bug?
This code is in src/document/crdt/root.ts at yorkie-js-sdk

    for (const createdAt of this.removedElementSetByCreatedAt) {
      count++;
      const pair = this.elementPairMapByCreatedAt.get(createdAt)!;
      if (pair.element instanceof CRDTContainer) {
        pair.element.getDescendants(() => {
          count++;
          return false;
        });
      }
    }

This screenshot is the values when calling assert.equal(doc2.getGarbageLen(), 3);(Last getGarbageLen in your test code). 2:6593d7b935bb65f993625525:2 is duplicated.

image

@chacha912
Copy link
Contributor Author

@devleejb Yes. The removedElementSet can potentially include descendants separately, which can lead to issues with duplicate counting of descendant elements. One possible solution is to exclude descendants from the count when they are present in the removedElementSet.

For instance, you can also consider the following test case:

it.only('gc-test', async function ({ task }) {
    type TestDoc = { point?: { x?: number; y?: number } };
    const docKey = toDocKey(`${task.name}-${new Date().getTime()}`);
    const doc1 = new yorkie.Document<TestDoc>(docKey);
    const doc2 = new yorkie.Document<TestDoc>(docKey);

    const client1 = new yorkie.Client(testRPCAddr);
    const client2 = new yorkie.Client(testRPCAddr);

    await client1.activate();
    await client2.activate();

    // 1. initial state
    await client1.attach(doc1, { isRealtimeSync: false });
    doc1.update((root) => (root.point = { x: 0, y: 0 }));
    await client1.sync();
    await client2.attach(doc2, { isRealtimeSync: false });

    // 2. client1 updates doc
    doc1.update((root) => {
      delete root.point;
    });
    assert.equal(doc1.getGarbageLen(), 3); // point, x, y

    // 3. client2 updates doc
    doc2.update((root) => {
      delete root.point?.x;
    });
    assert.equal(doc2.getGarbageLen(), 1); // x

    await client1.sync();
    await client2.sync();
    await client1.sync();

    // 4. getGarbageLen() should be 3.
    assert.equal(doc1.getGarbageLen(), 4); // point, x, y, x
    assert.equal(doc2.getGarbageLen(), 4); // x, point, x, y

    // Actual garbage-collected nodes
    assert.equal(doc1.garbageCollect(MaxTimeTicket), 3); // point, x, y
    assert.equal(doc2.garbageCollect(MaxTimeTicket), 3); // point, x, y

    await client1.deactivate();
    await client2.deactivate();
  });

image

@devleejb
Copy link
Member

devleejb commented Jan 3, 2024

@chacha912
Thank you for your response.

I will make the modification in the direction of eliminating duplicate counting when calling getGarbageLen(). While I haven't identified any problems, are there any side effects to consider when making this modification?

Additionally, I considered not adding duplicates to removedElementSetByCreatedAt. However, I excluded this solution as I deemed the overhead during the registerRemovedElement operation .

@chacha912
Copy link
Contributor Author

@devleejb, I agree that avoiding adding duplicates to the removedElementSet can introduce overhead during the registerRemovedElement operation, as you mentioned. Since getGarbageLen is mainly used for testing purposes, I think your suggested approach is good.

@devleejb
Copy link
Member

devleejb commented Jan 3, 2024

@chacha912
OK. I will implement the approach of removing duplicates when counting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working good first issue 🐤 Good for newcomers sdk ⚒️
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants