Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARN-11678. Update CGroupElasticMemoryController for cgroup v2 support #7430

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

laysfire
Copy link

Description of PR

Currently, CGroupElasticMemoryController's implementation is based on CGroup V1. The PR si to Update CGroupElasticMemoryController for cgroup v2 support.

The CGroupElasticMemoryController's implementation is following:

  1. Disable OOM Killer by writing 1 to memory.oom_control file
  2. Update memory limit, memory.memsw.limit_in_bytes for virtual memory control, and memory.limit_in_bytes for physical memory control
  3. Launch subprocess OOM-Listener to listen OOM event through cgroup.event_control
  4. When OOM happens, OOM-Listener notify NM
  5. NM call DefaultOOMHandler to resolve OOM

While in CGroup V2, there is no way to disable OOM Killer. It means that once memory usage exceed the threshold containers will be killed randomly by system and NM can not do anything.
But CGroup V2 provide throttle mechanism. The memory.high is the memory usage throttle limit. If a cgroup's memory use goes over the high boundary specified here, the cgroup's processes are throttled and put under heavy reclaim pressure (refer https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html#:~:text=memory.max-,memory.,put%20under%20heavy%20reclaim%20pressure.).
And CGroup V2 provide PSI (Pressure Stall Information), we can get notification by writing information to memory.pressure(refer https://docs.kernel.org/accounting/psi.html).

So the implementation based CGroup V2 can be as follows:

  1. Update memory limit, memory.swap.max for virtual memory control and memory.high & memory.max for physical memory control (memory.max shoulb be a little more than memory.high, maybe 5GB)
  2. Launch subprocess OOM-Listener to monitor memory pressure by writing some like "some 150000 100000" to memory.pressure file.
  3. Once the memory usage goes over memory.high, the processes are throttled which mean that some tasks will be stall.
  4. OOM-Listener is woken up and notify the NM
  5. NM call DefaultOOMHandler to resolve OOM

How was this patch tested?

Unit test and manually test.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 17m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 22m 49s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 0m 48s trunk passed
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 29s trunk passed
+1 💚 javadoc 0m 29s trunk passed
+1 💚 spotbugs 0m 58s trunk passed
+1 💚 shadedclient 19m 32s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 20s the patch passed
+1 💚 compile 0m 42s the patch passed
-1 ❌ cc 0m 42s /results-compile-cc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 4 new + 51 unchanged - 0 fixed = 55 total (was 51)
+1 💚 golang 0m 42s the patch passed
+1 💚 javac 0m 42s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 17s /results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 4 new + 3 unchanged - 0 fixed = 7 total (was 3)
+1 💚 mvnsite 0m 22s the patch passed
+1 💚 javadoc 0m 19s the patch passed
+1 💚 spotbugs 0m 51s the patch passed
+1 💚 shadedclient 19m 27s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 55s hadoop-yarn-server-nodemanager in the patch passed.
-1 ❌ asflicense 0m 24s /results-asflicense.txt The patch generated 2 ASF License warnings.
87m 12s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/2/artifact/out/Dockerfile
GITHUB PR #7430
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc golang
uname Linux 150800d52082 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0ed2817
Default Java Red Hat, Inc.-1.8.0_412-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/2/testReport/
Max. process+thread count 545 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/2/console
versions git=2.9.5 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 35m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 38m 34s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 1m 25s trunk passed
+1 💚 checkstyle 0m 39s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 45s trunk passed
+1 💚 spotbugs 1m 28s trunk passed
+1 💚 shadedclient 36m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 33s the patch passed
+1 💚 compile 1m 15s the patch passed
-1 ❌ cc 1m 15s /results-compile-cc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 4 new + 51 unchanged - 0 fixed = 55 total (was 51)
+1 💚 golang 1m 15s the patch passed
+1 💚 javac 1m 15s the patch passed
+1 💚 blanks 0m 1s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 29s /results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 4 new + 3 unchanged - 0 fixed = 7 total (was 3)
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 31s the patch passed
+1 💚 spotbugs 1m 27s the patch passed
+1 💚 shadedclient 38m 13s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 33s hadoop-yarn-server-nodemanager in the patch passed.
-1 ❌ asflicense 0m 37s /results-asflicense.txt The patch generated 2 ASF License warnings.
161m 28s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/1/artifact/out/Dockerfile
GITHUB PR #7430
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc golang
uname Linux d566c71d49d4 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0ed2817
Default Java Red Hat, Inc.-1.8.0_412-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/1/testReport/
Max. process+thread count 681 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/1/console
versions git=2.9.5 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 32m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 35m 47s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 1m 18s trunk passed
+1 💚 checkstyle 0m 40s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 46s trunk passed
+1 💚 spotbugs 1m 26s trunk passed
+1 💚 shadedclient 33m 59s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 32s the patch passed
+1 💚 compile 1m 11s the patch passed
+1 💚 cc 1m 11s the patch passed
+1 💚 golang 1m 11s the patch passed
+1 💚 javac 1m 11s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 28s /results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 4 new + 3 unchanged - 0 fixed = 7 total (was 3)
+1 💚 mvnsite 0m 36s the patch passed
+1 💚 javadoc 0m 32s the patch passed
+1 💚 spotbugs 1m 26s the patch passed
+1 💚 shadedclient 33m 24s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 30s hadoop-yarn-server-nodemanager in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
147m 26s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/3/artifact/out/Dockerfile
GITHUB PR #7430
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc golang
uname Linux 2617d1dbc8b1 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 68a8d8c
Default Java Red Hat, Inc.-1.8.0_412-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/3/testReport/
Max. process+thread count 555 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/3/console
versions git=2.9.5 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 38m 23s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 1m 22s trunk passed
+1 💚 checkstyle 0m 38s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 44s trunk passed
+1 💚 spotbugs 1m 29s trunk passed
+1 💚 shadedclient 36m 0s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 1m 11s the patch passed
+1 💚 cc 1m 11s the patch passed
+1 💚 golang 1m 11s the patch passed
+1 💚 javac 1m 11s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 26s /results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 4 new + 3 unchanged - 0 fixed = 7 total (was 3)
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 javadoc 0m 32s the patch passed
+1 💚 spotbugs 1m 25s the patch passed
+1 💚 shadedclient 36m 39s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 35s hadoop-yarn-server-nodemanager in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
123m 58s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/4/artifact/out/Dockerfile
GITHUB PR #7430
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc golang
uname Linux 9381bf63476a 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 68a8d8c
Default Java Red Hat, Inc.-1.8.0_412-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/4/testReport/
Max. process+thread count 707 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/4/console
versions git=2.9.5 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 42m 26s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 1m 23s trunk passed
+1 💚 checkstyle 0m 38s trunk passed
+1 💚 mvnsite 0m 46s trunk passed
+1 💚 javadoc 0m 45s trunk passed
+1 💚 spotbugs 1m 30s trunk passed
+1 💚 shadedclient 38m 11s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 33s the patch passed
+1 💚 compile 1m 15s the patch passed
+1 💚 cc 1m 15s the patch passed
+1 💚 golang 1m 15s the patch passed
+1 💚 javac 1m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 29s /results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 4 new + 3 unchanged - 0 fixed = 7 total (was 3)
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 javadoc 0m 32s the patch passed
+1 💚 spotbugs 1m 30s the patch passed
+1 💚 shadedclient 40m 14s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 28m 18s hadoop-yarn-server-nodemanager in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
160m 12s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/5/artifact/out/Dockerfile
GITHUB PR #7430
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc golang
uname Linux b32895b0c877 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 34fc0eb
Default Java Red Hat, Inc.-1.8.0_412-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/5/testReport/
Max. process+thread count 559 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7430/5/console
versions git=2.9.5 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@laysfire
Copy link
Author

@brumi1024 hi, Would you mind help review this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants