[Clang] Make `--lto-partitions` only default for HIP #133164

jhuber6 · 2025-03-26T21:07:31Z

Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the libc project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.

I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the --target=amdgcn-amd-amdhsa for its output,
which means we'd need to forward the default somehow.

Summary: The default behavior for LTO on other targets does not specify the number of LTO partitions. Recent changes made this default to 8 on AMDGPU which had some issues with the `libc` project. The option to disable this is HIP only so I think for now we should restrict this just to HIP. I'm definitely on board with getting some more parallelism here, but I think it should probably be restricted to just offloading languages. The new driver goes through the `--target=amdgcn-amd-amdhsa` for its output, which means we'd need to forward the default somehow.

llvmbot · 2025-03-26T21:08:08Z

@llvm/pr-subscribers-libc
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the libc project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.

I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the --target=amdgcn-amd-amdhsa for its output,
which means we'd need to forward the default somehow.

Full diff: https://github.com/llvm/llvm-project/pull/133164.diff

3 Files Affected:

(modified) clang/lib/Driver/ToolChains/AMDGPU.cpp (+8-14)
(modified) clang/test/Driver/amdgpu-toolchain.c (+1-15)
(modified) libc/startup/gpu/CMakeLists.txt (+2-8)

diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index e919f4e941f47..c754cae2f96b2 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -625,6 +625,7 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,
     CmdArgs.push_back("-shared");
   }
 
+  llvm::errs() << JA.getOffloadingDeviceKind() << "\n";
   addLinkerCompressDebugSectionsOption(getToolChain(), Args, CmdArgs);
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
@@ -633,7 +634,7 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,
     const bool ThinLTO = (C.getDriver().getLTOMode() == LTOK_Thin);
     addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0], ThinLTO);
 
-    if (!ThinLTO)
+    if (!ThinLTO && JA.getOffloadingDeviceKind() == Action::OFK_HIP)
       addFullLTOPartitionOption(C.getDriver(), Args, CmdArgs);
   } else if (Args.hasArg(options::OPT_mcpu_EQ)) {
     CmdArgs.push_back(Args.MakeArgString(
@@ -712,14 +713,12 @@ void amdgpu::getAMDGPUTargetFeatures(const Driver &D,
 }
 
 static unsigned getFullLTOPartitions(const Driver &D, const ArgList &Args) {
-  const Arg *A = Args.getLastArg(options::OPT_flto_partitions_EQ);
-  // In the absence of an option, use 8 as the default.
-  if (!A)
-    return 8;
   int Value = 0;
-  if (StringRef(A->getValue()).getAsInteger(10, Value) || (Value < 1)) {
+  StringRef A = Args.getLastArgValue(options::OPT_flto_partitions_EQ, "8");
+  if (A.getAsInteger(10, Value) || (Value < 1)) {
+    Arg *Arg = Args.getLastArg(options::OPT_flto_partitions_EQ);
     D.Diag(diag::err_drv_invalid_int_value)
-        << A->getAsString(Args) << A->getValue();
+        << Arg->getAsString(Args) << Arg->getValue();
     return 1;
   }
 
@@ -729,13 +728,8 @@ static unsigned getFullLTOPartitions(const Driver &D, const ArgList &Args) {
 void amdgpu::addFullLTOPartitionOption(const Driver &D,
                                        const llvm::opt::ArgList &Args,
                                        llvm::opt::ArgStringList &CmdArgs) {
-  // TODO: Should this be restricted to fgpu-rdc only ? Currently we'll
-  //       also do it for non gpu-rdc LTO
-
-  if (unsigned NumParts = getFullLTOPartitions(D, Args); NumParts > 1) {
-    CmdArgs.push_back(
-        Args.MakeArgString("--lto-partitions=" + Twine(NumParts)));
-  }
+  CmdArgs.push_back(Args.MakeArgString("--lto-partitions=" +
+                                       Twine(getFullLTOPartitions(D, Args))));
 }
 
 /// AMDGPU Toolchain
diff --git a/clang/test/Driver/amdgpu-toolchain.c b/clang/test/Driver/amdgpu-toolchain.c
index 6617108e59fcf..20656f1fadeb7 100644
--- a/clang/test/Driver/amdgpu-toolchain.c
+++ b/clang/test/Driver/amdgpu-toolchain.c
@@ -21,7 +21,7 @@
 // RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+:sramecc- -nogpulib \
 // RUN:   -L. -flto -fconvergent-functions %s 2>&1 | FileCheck -check-prefix=LTO %s
 // LTO: clang{{.*}} "-flto=full"{{.*}}"-fconvergent-functions"
-// LTO: ld.lld{{.*}}"-L."{{.*}}"-plugin-opt=mcpu=gfx90a"{{.*}}"--lto-partitions={{[0-9]+}}"{{.*}}"-plugin-opt=-mattr=-sramecc,+xnack"
+// LTO: ld.lld{{.*}}"-L."{{.*}}"-plugin-opt=mcpu=gfx90a"{{.*}}"-plugin-opt=-mattr=-sramecc,+xnack"
 
 // RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+:sramecc- -nogpulib \
 // RUN:   -L. -fconvergent-functions %s 2>&1 | FileCheck -check-prefix=MCPU %s
@@ -38,17 +38,3 @@
 // RUN: %clang -target amdgcn-amd-amdhsa -march=gfx90a -stdlib -startfiles \
 // RUN:   -nogpulib -nogpuinc -### %s 2>&1 | FileCheck -check-prefix=STARTUP %s
 // STARTUP: ld.lld{{.*}}"-lc" "-lm" "{{.*}}crt1.o"
-
-// Check --flto-partitions
-
-// RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib \
-// RUN:   -L. -flto --flto-partitions=42 %s 2>&1 | FileCheck -check-prefix=LTO_PARTS %s
-// LTO_PARTS: ld.lld{{.*}}"-L."{{.*}}"-plugin-opt=mcpu=gfx90a"{{.*}}"--lto-partitions=42"
-
-// RUN: not %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib \
-// RUN:   -L. -flto --flto-partitions=a %s 2>&1 | FileCheck -check-prefix=LTO_PARTS_INV0 %s
-// LTO_PARTS_INV0: clang: error: invalid integral value 'a' in '--flto-partitions=a'
-
-// RUN: not %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib \
-// RUN:   -L. -flto --flto-partitions=0 %s 2>&1 | FileCheck -check-prefix=LTO_PARTS_INV1 %s
-// LTO_PARTS_INV1: clang: error: invalid integral value '0' in '--flto-partitions=0'
diff --git a/libc/startup/gpu/CMakeLists.txt b/libc/startup/gpu/CMakeLists.txt
index a30e6f07a66a3..fa326ef46a9d1 100644
--- a/libc/startup/gpu/CMakeLists.txt
+++ b/libc/startup/gpu/CMakeLists.txt
@@ -33,14 +33,8 @@ function(add_startup_object name)
     set_target_properties(${fq_target_name}.exe PROPERTIES
       RUNTIME_OUTPUT_DIRECTORY ${LIBC_LIBRARY_DIR}
       RUNTIME_OUTPUT_NAME ${name}.o)
-    # FIXME: A bug in the AMDGPU LTO pass is incorrectly removing the kernels.
-    if(LIBC_TARGET_ARCHITECTURE_IS_NVPTX)
-      target_link_options(${fq_target_name}.exe PRIVATE
-                          "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm")
-    else()
-      target_link_options(${fq_target_name}.exe PRIVATE
-                          "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
-    endif()
+    target_link_options(${fq_target_name}.exe PRIVATE
+                        "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm")
   endif()
 endfunction()

shiltian · 2025-03-26T21:38:21Z

libc/startup/gpu/CMakeLists.txt

-      target_link_options(${fq_target_name}.exe PRIVATE
-                          "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
-    endif()
+    target_link_options(${fq_target_name}.exe PRIVATE


Related, this works around what this patch disables.

shiltian · 2025-03-26T21:38:49Z

clang/test/Driver/amdgpu-toolchain.c

-
-// RUN: not %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib \
-// RUN:   -L. -flto --flto-partitions=0 %s 2>&1 | FileCheck -check-prefix=LTO_PARTS_INV1 %s
-// LTO_PARTS_INV1: clang: error: invalid integral value '0' in '--flto-partitions=0'


can you move these checks to a hip file then?

clang/test/Driver/amdgpu-toolchain.c

jhuber6 · 2025-03-26T22:08:36Z

I'll clean up the option later as well. --flto with double dashes is wrong.

…lvm#133164) Summary: The default behavior for LTO on other targets does not specify the number of LTO partitions. Recent changes made this default to 8 on AMDGPU which had some issues with the `libc` project. The option to disable this is HIP only so I think for now we should restrict this just to HIP. I'm definitely on board with getting some more parallelism here, but I think it should probably be restricted to just offloading languages. The new driver goes through the `--target=amdgcn-amd-amdhsa` for its output, which means we'd need to forward the default somehow.

jhuber6 requested review from arsenm, Pierre-vh, scchan, shiltian and yxsamliu March 26, 2025 21:07

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' libc labels Mar 26, 2025

shiltian reviewed Mar 26, 2025

View reviewed changes

test

05c09d3

shiltian reviewed Mar 26, 2025

View reviewed changes

clang/test/Driver/amdgpu-toolchain.c Show resolved Hide resolved

shiltian approved these changes Mar 26, 2025

View reviewed changes

update

56552f1

jhuber6 force-pushed the Partitions branch from 00abe86 to 56552f1 Compare March 26, 2025 23:40

Pierre-vh approved these changes Mar 27, 2025

View reviewed changes

jhuber6 merged commit d0aa1f9 into llvm:main Mar 27, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang] Make `--lto-partitions` only default for HIP #133164

[Clang] Make `--lto-partitions` only default for HIP #133164

Uh oh!

jhuber6 commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025 •

edited

Loading

Uh oh!

shiltian Mar 26, 2025

Uh oh!

jhuber6 Mar 26, 2025

Uh oh!

shiltian Mar 26, 2025

Uh oh!

Uh oh!

jhuber6 commented Mar 26, 2025

Uh oh!

Uh oh!

Uh oh!

[Clang] Make --lto-partitions only default for HIP #133164

[Clang] Make --lto-partitions only default for HIP #133164

Uh oh!

Conversation

jhuber6 commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jhuber6 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jhuber6 commented Mar 26, 2025

Uh oh!

Uh oh!

Uh oh!

[Clang] Make `--lto-partitions` only default for HIP #133164

[Clang] Make `--lto-partitions` only default for HIP #133164

llvmbot commented Mar 26, 2025 •

edited

Loading