Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang] Make --lto-partitions only default for HIP #133164

Merged
merged 3 commits into from
Mar 27, 2025
Merged

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Mar 26, 2025

Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the libc project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.

I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the --target=amdgcn-amd-amdhsa for its output,
which means we'd need to forward the default somehow.

Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the `libc` project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.

I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the `--target=amdgcn-amd-amdhsa` for its output,
which means we'd need to forward the default somehow.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' libc labels Mar 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-libc
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the libc project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.

I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the --target=amdgcn-amd-amdhsa for its output,
which means we'd need to forward the default somehow.


Full diff: https://github.com/llvm/llvm-project/pull/133164.diff

3 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/AMDGPU.cpp (+8-14)
  • (modified) clang/test/Driver/amdgpu-toolchain.c (+1-15)
  • (modified) libc/startup/gpu/CMakeLists.txt (+2-8)
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index e919f4e941f47..c754cae2f96b2 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -625,6 +625,7 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,
     CmdArgs.push_back("-shared");
   }
 
+  llvm::errs() << JA.getOffloadingDeviceKind() << "\n";
   addLinkerCompressDebugSectionsOption(getToolChain(), Args, CmdArgs);
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
@@ -633,7 +634,7 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,
     const bool ThinLTO = (C.getDriver().getLTOMode() == LTOK_Thin);
     addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0], ThinLTO);
 
-    if (!ThinLTO)
+    if (!ThinLTO && JA.getOffloadingDeviceKind() == Action::OFK_HIP)
       addFullLTOPartitionOption(C.getDriver(), Args, CmdArgs);
   } else if (Args.hasArg(options::OPT_mcpu_EQ)) {
     CmdArgs.push_back(Args.MakeArgString(
@@ -712,14 +713,12 @@ void amdgpu::getAMDGPUTargetFeatures(const Driver &D,
 }
 
 static unsigned getFullLTOPartitions(const Driver &D, const ArgList &Args) {
-  const Arg *A = Args.getLastArg(options::OPT_flto_partitions_EQ);
-  // In the absence of an option, use 8 as the default.
-  if (!A)
-    return 8;
   int Value = 0;
-  if (StringRef(A->getValue()).getAsInteger(10, Value) || (Value < 1)) {
+  StringRef A = Args.getLastArgValue(options::OPT_flto_partitions_EQ, "8");
+  if (A.getAsInteger(10, Value) || (Value < 1)) {
+    Arg *Arg = Args.getLastArg(options::OPT_flto_partitions_EQ);
     D.Diag(diag::err_drv_invalid_int_value)
-        << A->getAsString(Args) << A->getValue();
+        << Arg->getAsString(Args) << Arg->getValue();
     return 1;
   }
 
@@ -729,13 +728,8 @@ static unsigned getFullLTOPartitions(const Driver &D, const ArgList &Args) {
 void amdgpu::addFullLTOPartitionOption(const Driver &D,
                                        const llvm::opt::ArgList &Args,
                                        llvm::opt::ArgStringList &CmdArgs) {
-  // TODO: Should this be restricted to fgpu-rdc only ? Currently we'll
-  //       also do it for non gpu-rdc LTO
-
-  if (unsigned NumParts = getFullLTOPartitions(D, Args); NumParts > 1) {
-    CmdArgs.push_back(
-        Args.MakeArgString("--lto-partitions=" + Twine(NumParts)));
-  }
+  CmdArgs.push_back(Args.MakeArgString("--lto-partitions=" +
+                                       Twine(getFullLTOPartitions(D, Args))));
 }
 
 /// AMDGPU Toolchain
diff --git a/clang/test/Driver/amdgpu-toolchain.c b/clang/test/Driver/amdgpu-toolchain.c
index 6617108e59fcf..20656f1fadeb7 100644
--- a/clang/test/Driver/amdgpu-toolchain.c
+++ b/clang/test/Driver/amdgpu-toolchain.c
@@ -21,7 +21,7 @@
 // RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+:sramecc- -nogpulib \
 // RUN:   -L. -flto -fconvergent-functions %s 2>&1 | FileCheck -check-prefix=LTO %s
 // LTO: clang{{.*}} "-flto=full"{{.*}}"-fconvergent-functions"
-// LTO: ld.lld{{.*}}"-L."{{.*}}"-plugin-opt=mcpu=gfx90a"{{.*}}"--lto-partitions={{[0-9]+}}"{{.*}}"-plugin-opt=-mattr=-sramecc,+xnack"
+// LTO: ld.lld{{.*}}"-L."{{.*}}"-plugin-opt=mcpu=gfx90a"{{.*}}"-plugin-opt=-mattr=-sramecc,+xnack"
 
 // RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+:sramecc- -nogpulib \
 // RUN:   -L. -fconvergent-functions %s 2>&1 | FileCheck -check-prefix=MCPU %s
@@ -38,17 +38,3 @@
 // RUN: %clang -target amdgcn-amd-amdhsa -march=gfx90a -stdlib -startfiles \
 // RUN:   -nogpulib -nogpuinc -### %s 2>&1 | FileCheck -check-prefix=STARTUP %s
 // STARTUP: ld.lld{{.*}}"-lc" "-lm" "{{.*}}crt1.o"
-
-// Check --flto-partitions
-
-// RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib \
-// RUN:   -L. -flto --flto-partitions=42 %s 2>&1 | FileCheck -check-prefix=LTO_PARTS %s
-// LTO_PARTS: ld.lld{{.*}}"-L."{{.*}}"-plugin-opt=mcpu=gfx90a"{{.*}}"--lto-partitions=42"
-
-// RUN: not %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib \
-// RUN:   -L. -flto --flto-partitions=a %s 2>&1 | FileCheck -check-prefix=LTO_PARTS_INV0 %s
-// LTO_PARTS_INV0: clang: error: invalid integral value 'a' in '--flto-partitions=a'
-
-// RUN: not %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib \
-// RUN:   -L. -flto --flto-partitions=0 %s 2>&1 | FileCheck -check-prefix=LTO_PARTS_INV1 %s
-// LTO_PARTS_INV1: clang: error: invalid integral value '0' in '--flto-partitions=0'
diff --git a/libc/startup/gpu/CMakeLists.txt b/libc/startup/gpu/CMakeLists.txt
index a30e6f07a66a3..fa326ef46a9d1 100644
--- a/libc/startup/gpu/CMakeLists.txt
+++ b/libc/startup/gpu/CMakeLists.txt
@@ -33,14 +33,8 @@ function(add_startup_object name)
     set_target_properties(${fq_target_name}.exe PROPERTIES
       RUNTIME_OUTPUT_DIRECTORY ${LIBC_LIBRARY_DIR}
       RUNTIME_OUTPUT_NAME ${name}.o)
-    # FIXME: A bug in the AMDGPU LTO pass is incorrectly removing the kernels.
-    if(LIBC_TARGET_ARCHITECTURE_IS_NVPTX)
-      target_link_options(${fq_target_name}.exe PRIVATE
-                          "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm")
-    else()
-      target_link_options(${fq_target_name}.exe PRIVATE
-                          "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
-    endif()
+    target_link_options(${fq_target_name}.exe PRIVATE
+                        "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm")
   endif()
 endfunction()
 

target_link_options(${fq_target_name}.exe PRIVATE
"-r" "-nostdlib" "-Wl,--lto-emit-llvm")
endif()
target_link_options(${fq_target_name}.exe PRIVATE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related, this works around what this patch disables.


// RUN: not %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib \
// RUN: -L. -flto --flto-partitions=0 %s 2>&1 | FileCheck -check-prefix=LTO_PARTS_INV1 %s
// LTO_PARTS_INV1: clang: error: invalid integral value '0' in '--flto-partitions=0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move these checks to a hip file then?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Mar 26, 2025

I'll clean up the option later as well. --flto with double dashes is wrong.

@jhuber6 jhuber6 merged commit d0aa1f9 into llvm:main Mar 27, 2025
17 checks passed
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 10, 2025
…lvm#133164)

Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the `libc` project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.

I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the `--target=amdgcn-amd-amdhsa` for its output,
which means we'd need to forward the default somehow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category libc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants