Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix resource group keep setting incorrect softMemoryLimit #19626

Conversation

brandboat
Copy link
Member

@brandboat brandboat commented Nov 3, 2023

Description

When using database resource group manager, if softMemoryLimit was setup to 80% and then modified to 100MB. The softMemoryLimit in resource group is always 80% instead of 100MB.
The root cause is after using fraction(i.e. percentage) in softMemoryLimit in resource group, SqlQueryManager start updating softMemoryLimit to resource group per second. And the background update will not stop even if a static value is set in softMemoryLimit.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# General
* Fix resource groups not noticing updated `softMemoryLimit` if changed from a percent-based value to exact value. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 3, 2023
@brandboat
Copy link
Member Author

https://github.com/trinodb/trino/actions/runs/6747796781/job/18345256146?pr=19626
failed test TestMinWorkerRequirement.testMultipleRequiredWorkerNodesSessionOverride in ci is flaky #19446

@wendigo wendigo requested a review from hashhar November 4, 2023 08:14
MILLISECONDS.sleep(2000);
ResourceGroupInfo groupInfo = manager.tryGetResourceGroupInfo(new ResourceGroupId(new ResourceGroupId("global"), "bi-user"))
.orElseThrow(() -> new IllegalStateException("Resource group not found"));
assertEquals(groupInfo.getSoftMemoryLimit(), DataSize.of(123, MEGABYTE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also assert that the initial change to percent based value also took effect using an assert similar to this.

Copy link
Member Author

@brandboat brandboat Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also assert that the initial change to percent based value also took effect using an assert similar to this.

Not sure if there is any better way to do assertion, but I found this might cause flaky tests. (the Math.round part)

        ResourceGroupInfo groupInfo =
                manager.tryGetResourceGroupInfo(new ResourceGroupId(new ResourceGroupId("global"), "bi-user"))
                        .orElseThrow(() -> new IllegalStateException("Resource group not found"));
        assertEquals(
                groupInfo.getSoftMemoryLimit(),
                DataSize.of(Math.round(queryRunner.getCoordinator().getClusterMemoryManager().getClusterMemoryBytes() * 0.2), BYTE));

(updated) : make softMemoryLimit percentage to 100% to avoid Round-off Error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(updated) : make softMemoryLimit percentage to 100% to avoid Round-off Error

You could have used org.testng.Assert#assertEquals(double, double, double) (with delta) to avoid exact match

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tips !

Comment on lines 383 to 384
dao.updateResourceGroup(2, "bi-${USER}", "1MB", 3, 2, 2, null, null, null, null, null, 1L, TEST_ENVIRONMENT);
dbConfigurationManager.load();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this "required" for the test? AFAIK we only care about switching from fraction based to exact value so whatever was used before fraction based doesn't seem to matter.

Copy link
Member Author

@brandboat brandboat Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the comments !

is this "required" for the test?

We don't need that actually, will get rid of this line.

@hashhar hashhar requested a review from losipiuk November 6, 2023 07:30
@brandboat brandboat force-pushed the bugfix/fix-incorrect-resource-group-softmemorylimit branch from da0fe02 to 75d01bc Compare November 6, 2023 08:06
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@losipiuk / @sopel39 can you PTAL as well since I'm not the expert here.

dbConfigurationManager.load();

// wait for SqlQueryManager which enforce memory limits per second
MILLISECONDS.sleep(2000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would prefer to do active wait with shorter sleeps, to finish test as quickly as possible. But given other tests use long sleeps already you can leave it as is. Test methods can be cleaned up in batch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to do active wait with shorter sleeps, to finish test as quickly as possible.

Did you mean MILLISECONDS.sleep(1000); ? If so, I can change that in this pr. While I won't touch other test methods as they should be take care in a separate pr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean MILLISECONDS.sleep(1000); ? If so, I can change that in this pr. While I won't touch other test methods as they should be take care in a separate pr.

This has a risk of becoming flaky then, right? I agree with @losipiuk that active loop with shorter sleeps would be more reliable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a risk of becoming flaky then, right? I agree with losipiuk that active loop with shorter sleeps would be more reliable.

Sorry about misunderstanding... did you mean something like

for (int i = 0; i < 2000; i++) {
  MILLISECONDS.sleep(1);
}

Copy link
Member

@losipiuk losipiuk Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use assertEventually

        assertEventually(
                new Duration(2, TimeUnit.SECONDS),
                new Duration(100, TimeUnit.MILLISECONDS),
                () -> {.... code which does assertions});

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not an expert but LGTM

@brandboat brandboat force-pushed the bugfix/fix-incorrect-resource-group-softmemorylimit branch from 75d01bc to 83a11ef Compare November 9, 2023 01:27
@hashhar hashhar merged commit 77aea84 into trinodb:master Nov 9, 2023
22 checks passed
@github-actions github-actions bot added this to the 433 milestone Nov 9, 2023
@brandboat brandboat deleted the bugfix/fix-incorrect-resource-group-softmemorylimit branch November 9, 2023 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants