Skip to content

[ItaniumDemangle] Add customizable printLeft/printRight APIs to OutputBuffer #133249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Michael137
Copy link
Member

@Michael137 Michael137 commented Mar 27, 2025

This patch includes the necessary changes for the LLDB feature proposed in https://discourse.llvm.org/t/rfc-lldb-highlighting-function-names-in-lldb-backtraces/85309. The TL;DR is that we want to track where certain parts of a demangled name begin/end so we can highlight them in backtraces.

We introduce a new printLeft/printRight API on OutputBuffer that a client (in our case LLDB) can implement to track state while printing the demangle tree. This requires redirecting all calls to to printLeft/printRight to the OutputBuffer. One quirk with the new API is that Utility.h would now depend on ItaniumDemangle.h and vice-versa. To keep these files header-only I made the definitions inline and implement the new APIs in ItaniumDemangle.h (so the definition of Node is available to them).

@Michael137 Michael137 requested a review from a team as a code owner March 27, 2025 13:21
@llvmbot llvmbot added the libc++abi libc++abi C++ Runtime Library. Not libc++. label Mar 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 27, 2025

@llvm/pr-subscribers-lldb

@llvm/pr-subscribers-libcxxabi

Author: Michael Buch (Michael137)

Changes

This patch includes the necessary changes for the LLDB feature proposed in https://discourse.llvm.org/t/rfc-lldb-highlighting-function-names-in-lldb-backtraces/85309. The TL;DR is that we want to track where certain parts of a demangled name begin/end so we can highlight them in backtraces.

The idea that a function name can be decomposed into <scope, base, arguments>. The assumption is that given the ranges of those three elements and the demangled name, LLDB will be able to to reconstruct the full demangled name. The tracking of those ranges is pretty simple inside the demangler. We don’t ever deal with nesting, so whenever we recurse into a template argument list or another function type, we just stop tracking any positions. Once we recursed out of those, and are back to printing the top-level function name, we continue tracking the positions.

The current implementation introduces a new structure FunctionNameInfo that holds all this information and is stored in the llvm::itanium_demangle::OutputBuffer class, which is unfortunately the only way to keep state while printing the demangle tree (it already contains other kinds of information similar to this tracking. In [RFC][ItaniumDemangler] New option to print compact C++ names we propose to refactor this, but shouldn’t be a blocker unless people feel otherwise).

I added the tracking implementation to a new Utility.cpp, so I had to update the sync script. Currently the libcxxabi fails to link, because I haven't figured out how to build/link this new object file. If someone has any ideas, that'd be appreciated. Or if we prefer to keep this header-only, happy to do that too.

Tests are in ItaniumDemangleTest.cpp.


Patch is 30.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133249.diff

10 Files Affected:

  • (modified) libcxxabi/src/demangle/ItaniumDemangle.h (+21)
  • (added) libcxxabi/src/demangle/Utility.cpp (+112)
  • (modified) libcxxabi/src/demangle/Utility.h (+75-16)
  • (modified) libcxxabi/src/demangle/cp-to-llvm.sh (+45-17)
  • (modified) llvm/include/llvm/Demangle/ItaniumDemangle.h (+21)
  • (modified) llvm/include/llvm/Demangle/Utility.h (+75-16)
  • (modified) llvm/lib/Demangle/CMakeLists.txt (+1)
  • (added) llvm/lib/Demangle/README.txt (+61)
  • (added) llvm/lib/Demangle/Utility.cpp (+112)
  • (modified) llvm/unittests/Demangle/ItaniumDemangleTest.cpp (+112)
diff --git a/libcxxabi/src/demangle/ItaniumDemangle.h b/libcxxabi/src/demangle/ItaniumDemangle.h
index 3df41b5f4d7d0..b5a0a86b119f4 100644
--- a/libcxxabi/src/demangle/ItaniumDemangle.h
+++ b/libcxxabi/src/demangle/ItaniumDemangle.h
@@ -851,11 +851,13 @@ class FunctionType final : public Node {
   // by printing out the return types's left, then print our parameters, then
   // finally print right of the return type.
   void printLeft(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
     Ret->printLeft(OB);
     OB += " ";
   }
 
   void printRight(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
     OB.printOpen();
     Params.printWithComma(OB);
     OB.printClose();
@@ -971,18 +973,32 @@ class FunctionEncoding final : public Node {
   const Node *getName() const { return Name; }
 
   void printLeft(OutputBuffer &OB) const override {
+    // Nested FunctionEncoding parsing can happen with following productions:
+    // * <local-name>
+    // * <expr-primary>
+    auto Scoped = OB.enterFunctionTypePrinting();
+
     if (Ret) {
       Ret->printLeft(OB);
       if (!Ret->hasRHSComponent(OB))
         OB += " ";
     }
+
+    OB.FunctionInfo.updateScopeStart(OB);
+
     Name->print(OB);
   }
 
   void printRight(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
+    OB.FunctionInfo.finalizeStart(OB);
+
     OB.printOpen();
     Params.printWithComma(OB);
     OB.printClose();
+
+    OB.FunctionInfo.finalizeArgumentEnd(OB);
+
     if (Ret)
       Ret->printRight(OB);
 
@@ -1005,6 +1021,8 @@ class FunctionEncoding final : public Node {
       OB += " requires ";
       Requires->print(OB);
     }
+
+    OB.FunctionInfo.finalizeEnd(OB);
   }
 };
 
@@ -1072,7 +1090,9 @@ struct NestedName : Node {
   void printLeft(OutputBuffer &OB) const override {
     Qual->print(OB);
     OB += "::";
+    OB.FunctionInfo.updateScopeEnd(OB);
     Name->print(OB);
+    OB.FunctionInfo.updateBasenameEnd(OB);
   }
 };
 
@@ -1633,6 +1653,7 @@ struct NameWithTemplateArgs : Node {
 
   void printLeft(OutputBuffer &OB) const override {
     Name->print(OB);
+    OB.FunctionInfo.updateBasenameEnd(OB);
     TemplateArgs->print(OB);
   }
 };
diff --git a/libcxxabi/src/demangle/Utility.cpp b/libcxxabi/src/demangle/Utility.cpp
new file mode 100644
index 0000000000000..04516082b3443
--- /dev/null
+++ b/libcxxabi/src/demangle/Utility.cpp
@@ -0,0 +1,112 @@
+//===--- Utility.cpp ------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Provide some utility classes for use in the demangler.
+// There are two copies of this file in the source tree.  The one in libcxxabi
+// is the original and the one in llvm is the copy.  Use cp-to-llvm.sh to update
+// the copy.  See README.txt for more details.
+//
+//===----------------------------------------------------------------------===//
+
+#include "Utility.h"
+#include "DemangleConfig.h"
+
+DEMANGLE_NAMESPACE_BEGIN
+
+bool FunctionNameInfo::startedPrintingArguments() const {
+  return ArgumentLocs.first > 0;
+}
+
+bool FunctionNameInfo::shouldTrack(OutputBuffer &OB) const {
+  if (!OB.isPrintingTopLevelFunctionType())
+    return false;
+
+  if (OB.isGtInsideTemplateArgs())
+    return false;
+
+  if (startedPrintingArguments())
+    return false;
+
+  return true;
+}
+
+bool FunctionNameInfo::canFinalize(OutputBuffer &OB) const {
+  if (!OB.isPrintingTopLevelFunctionType())
+    return false;
+
+  if (OB.isGtInsideTemplateArgs())
+    return false;
+
+  if (!startedPrintingArguments())
+    return false;
+
+  return true;
+}
+
+void FunctionNameInfo::updateBasenameEnd(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  BasenameLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::updateScopeStart(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  ScopeLocs.first = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::updateScopeEnd(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  ScopeLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::finalizeArgumentEnd(OutputBuffer &OB) {
+  if (!canFinalize(OB))
+    return;
+
+  OB.FunctionInfo.ArgumentLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::finalizeStart(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  OB.FunctionInfo.ArgumentLocs.first = OB.getCurrentPosition();
+
+  // If nothing has set the end of the basename yet (for example when
+  // printing templates), then the beginning of the arguments is the end of
+  // the basename.
+  if (BasenameLocs.second == 0)
+    OB.FunctionInfo.BasenameLocs.second = OB.getCurrentPosition();
+
+  DEMANGLE_ASSERT(!shouldTrack(OB), "");
+  DEMANGLE_ASSERT(canFinalize(OB), "");
+}
+
+void FunctionNameInfo::finalizeEnd(OutputBuffer &OB) {
+  if (!canFinalize(OB))
+    return;
+
+  if (ScopeLocs.first > OB.FunctionInfo.ScopeLocs.second)
+    ScopeLocs.second = OB.FunctionInfo.ScopeLocs.first;
+  BasenameLocs.first = OB.FunctionInfo.ScopeLocs.second;
+}
+
+bool FunctionNameInfo::hasBasename() const {
+  return BasenameLocs.first != BasenameLocs.second && BasenameLocs.second > 0;
+}
+
+ScopedOverride<unsigned> OutputBuffer::enterFunctionTypePrinting() {
+  return {FunctionPrintingDepth, FunctionPrintingDepth + 1};
+}
+
+DEMANGLE_NAMESPACE_END
diff --git a/libcxxabi/src/demangle/Utility.h b/libcxxabi/src/demangle/Utility.h
index f1fad35d60d98..3b9ff8ea1f82b 100644
--- a/libcxxabi/src/demangle/Utility.h
+++ b/libcxxabi/src/demangle/Utility.h
@@ -27,6 +27,66 @@
 
 DEMANGLE_NAMESPACE_BEGIN
 
+template <class T> class ScopedOverride {
+  T &Loc;
+  T Original;
+
+public:
+  ScopedOverride(T &Loc_) : ScopedOverride(Loc_, Loc_) {}
+
+  ScopedOverride(T &Loc_, T NewVal) : Loc(Loc_), Original(Loc_) {
+    Loc_ = std::move(NewVal);
+  }
+  ~ScopedOverride() { Loc = std::move(Original); }
+
+  ScopedOverride(const ScopedOverride &) = delete;
+  ScopedOverride &operator=(const ScopedOverride &) = delete;
+};
+
+class OutputBuffer;
+
+// Stores information about parts of a demangled function name.
+struct FunctionNameInfo {
+  /// A [start, end) pair for the function basename.
+  /// The basename is the name without scope qualifiers
+  /// and without template parameters. E.g.,
+  /// \code{.cpp}
+  ///    void foo::bar<int>::someFunc<float>(int) const &&
+  ///                        ^       ^
+  ///                      Start    End
+  /// \endcode
+  std::pair<size_t, size_t> BasenameLocs;
+
+  /// A [start, end) pair for the function scope qualifiers.
+  /// E.g., for
+  /// \code{.cpp}
+  ///    void foo::bar<int>::qux<float>(int) const &&
+  ///         ^              ^
+  ///       Start           End
+  /// \endcode
+  std::pair<size_t, size_t> ScopeLocs;
+
+  /// Indicates the [start, end) of the function argument lits.
+  /// E.g.,
+  /// \code{.cpp}
+  ///    int (*getFunc<float>(float, double))(int, int)
+  ///                        ^              ^
+  ///                      start           end
+  /// \endcode
+  std::pair<size_t, size_t> ArgumentLocs;
+
+  bool startedPrintingArguments() const;
+  bool shouldTrack(OutputBuffer &OB) const;
+  bool canFinalize(OutputBuffer &OB) const;
+  void updateBasenameEnd(OutputBuffer &OB);
+  void updateScopeStart(OutputBuffer &OB);
+  void updateScopeEnd(OutputBuffer &OB);
+  void finalizeArgumentEnd(OutputBuffer &OB);
+  void finalizeStart(OutputBuffer &OB);
+  void finalizeEnd(OutputBuffer &OB);
+  bool hasBasename() const;
+};
+
 // Stream that AST nodes write their string representation into after the AST
 // has been parsed.
 class OutputBuffer {
@@ -34,6 +94,10 @@ class OutputBuffer {
   size_t CurrentPosition = 0;
   size_t BufferCapacity = 0;
 
+  /// When a function type is being printed this value is incremented.
+  /// When printing of the type is finished the value is decremented.
+  unsigned FunctionPrintingDepth = 0;
+
   // Ensure there are at least N more positions in the buffer.
   void grow(size_t N) {
     size_t Need = N + CurrentPosition;
@@ -92,8 +156,19 @@ class OutputBuffer {
   /// Use a counter so we can simply increment inside parentheses.
   unsigned GtIsGt = 1;
 
+  /// When printing the mangle tree, this object will hold information about
+  /// the function name being printed (if any).
+  FunctionNameInfo FunctionInfo;
+
+  /// Called when we start printing a function type.
+  [[nodiscard]] ScopedOverride<unsigned> enterFunctionTypePrinting();
+
   bool isGtInsideTemplateArgs() const { return GtIsGt == 0; }
 
+  bool isPrintingTopLevelFunctionType() const {
+    return FunctionPrintingDepth == 1;
+  }
+
   void printOpen(char Open = '(') {
     GtIsGt++;
     *this += Open;
@@ -182,22 +257,6 @@ class OutputBuffer {
   size_t getBufferCapacity() const { return BufferCapacity; }
 };
 
-template <class T> class ScopedOverride {
-  T &Loc;
-  T Original;
-
-public:
-  ScopedOverride(T &Loc_) : ScopedOverride(Loc_, Loc_) {}
-
-  ScopedOverride(T &Loc_, T NewVal) : Loc(Loc_), Original(Loc_) {
-    Loc_ = std::move(NewVal);
-  }
-  ~ScopedOverride() { Loc = std::move(Original); }
-
-  ScopedOverride(const ScopedOverride &) = delete;
-  ScopedOverride &operator=(const ScopedOverride &) = delete;
-};
-
 DEMANGLE_NAMESPACE_END
 
 #endif
diff --git a/libcxxabi/src/demangle/cp-to-llvm.sh b/libcxxabi/src/demangle/cp-to-llvm.sh
index f8b3585a5fa37..4d76a1e110687 100755
--- a/libcxxabi/src/demangle/cp-to-llvm.sh
+++ b/libcxxabi/src/demangle/cp-to-llvm.sh
@@ -7,30 +7,58 @@ set -e
 
 cd $(dirname $0)
 HDRS="ItaniumDemangle.h ItaniumNodes.def StringViewExtras.h Utility.h"
-LLVM_DEMANGLE_DIR=$1
+SRCS="Utility.cpp"
+LLVM_DEMANGLE_INCLUDE_DIR=$1
+LLVM_DEMANGLE_SOURCE_DIR=$2
 
-if [[ -z "$LLVM_DEMANGLE_DIR" ]]; then
-    LLVM_DEMANGLE_DIR="../../../llvm/include/llvm/Demangle"
+if [[ -z "$LLVM_DEMANGLE_INCLUDE_DIR" ]]; then
+    LLVM_DEMANGLE_INCLUDE_DIR="../../../llvm/include/llvm/Demangle"
 fi
 
-if [[ ! -d "$LLVM_DEMANGLE_DIR" ]]; then
-    echo "No such directory: $LLVM_DEMANGLE_DIR" >&2
+if [[ -z "$LLVM_DEMANGLE_SOURCE_DIR" ]]; then
+    LLVM_DEMANGLE_SOURCE_DIR="../../../llvm/lib/Demangle"
+fi
+
+if [[ ! -d "$LLVM_DEMANGLE_INCLUDE_DIR" ]]; then
+    echo "No such directory: $LLVM_DEMANGLE_INCLUDE_DIR" >&2
+    exit 1
+fi
+
+if [[ ! -d "$LLVM_DEMANGLE_SOURCE_DIR" ]]; then
+    echo "No such directory: $LLVM_DEMANGLE_SOURCE_DIR" >&2
     exit 1
 fi
 
-read -p "This will overwrite the copies of $HDRS in $LLVM_DEMANGLE_DIR; are you sure? [y/N]" -n 1 -r ANSWER
+read -p "This will overwrite the copies of $HDRS in $LLVM_DEMANGLE_INCLUDE_DIR and $SRCS in $LLVM_DEMANGLE_SOURCE_DIR; are you sure? [y/N]" -n 1 -r ANSWER
 echo
 
-if [[ $ANSWER =~ ^[Yy]$ ]]; then
-    cp -f README.txt $LLVM_DEMANGLE_DIR
-    chmod -w $LLVM_DEMANGLE_DIR/README.txt
-    for I in $HDRS ; do
-	rm -f $LLVM_DEMANGLE_DIR/$I
-	dash=$(echo "$I---------------------------" | cut -c -27 |\
-		   sed 's|[^-]*||')
-	sed -e '1s|^//=*-* .*\..* -*.*=*// *$|//===--- '"$I $dash"'-*- mode:c++;eval:(read-only-mode) -*-===//|' \
-	    -e '2s|^// *$|//       Do not edit! See README.txt.|' \
-	    $I >$LLVM_DEMANGLE_DIR/$I
-	chmod -w $LLVM_DEMANGLE_DIR/$I
+function copy_files() {
+    local dest_dir=$1
+    local files=$2
+    local adjust_include_paths=$3
+
+    cp -f README.txt $dest_dir
+    chmod -w $dest_dir/README.txt
+    for I in $files ; do
+    rm -f $dest_dir/$I
+    dash=$(echo "$I---------------------------" | cut -c -27 |\
+    	   sed 's|[^-]*||')
+    sed -e '1s|^//=*-* .*\..* -*.*=*// *$|//===--- '"$I $dash"'-*- mode:c++;eval:(read-only-mode) -*-===//|' \
+        -e '2s|^// *$|//       Do not edit! See README.txt.|' \
+        $I >$dest_dir/$I
+
+    if [[ "$adjust_include_paths" = true ]]; then
+        sed -i '' \
+            -e 's|#include "DemangleConfig.h"|#include "llvm/Demangle/DemangleConfig.h"|' \
+            -e 's|#include "Utility.h"|#include "llvm/Demangle/Utility.h"|' \
+            $dest_dir/$I
+    fi
+
+    chmod -w $dest_dir/$I
     done
+}
+
+if [[ $ANSWER =~ ^[Yy]$ ]]; then
+  copy_files $LLVM_DEMANGLE_INCLUDE_DIR "$HDRS" false
+  copy_files $LLVM_DEMANGLE_SOURCE_DIR "$SRCS" true
 fi
diff --git a/llvm/include/llvm/Demangle/ItaniumDemangle.h b/llvm/include/llvm/Demangle/ItaniumDemangle.h
index b0363c1a7a786..2b51be306203d 100644
--- a/llvm/include/llvm/Demangle/ItaniumDemangle.h
+++ b/llvm/include/llvm/Demangle/ItaniumDemangle.h
@@ -851,11 +851,13 @@ class FunctionType final : public Node {
   // by printing out the return types's left, then print our parameters, then
   // finally print right of the return type.
   void printLeft(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
     Ret->printLeft(OB);
     OB += " ";
   }
 
   void printRight(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
     OB.printOpen();
     Params.printWithComma(OB);
     OB.printClose();
@@ -971,18 +973,32 @@ class FunctionEncoding final : public Node {
   const Node *getName() const { return Name; }
 
   void printLeft(OutputBuffer &OB) const override {
+    // Nested FunctionEncoding parsing can happen with following productions:
+    // * <local-name>
+    // * <expr-primary>
+    auto Scoped = OB.enterFunctionTypePrinting();
+
     if (Ret) {
       Ret->printLeft(OB);
       if (!Ret->hasRHSComponent(OB))
         OB += " ";
     }
+
+    OB.FunctionInfo.updateScopeStart(OB);
+
     Name->print(OB);
   }
 
   void printRight(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
+    OB.FunctionInfo.finalizeStart(OB);
+
     OB.printOpen();
     Params.printWithComma(OB);
     OB.printClose();
+
+    OB.FunctionInfo.finalizeArgumentEnd(OB);
+
     if (Ret)
       Ret->printRight(OB);
 
@@ -1005,6 +1021,8 @@ class FunctionEncoding final : public Node {
       OB += " requires ";
       Requires->print(OB);
     }
+
+    OB.FunctionInfo.finalizeEnd(OB);
   }
 };
 
@@ -1072,7 +1090,9 @@ struct NestedName : Node {
   void printLeft(OutputBuffer &OB) const override {
     Qual->print(OB);
     OB += "::";
+    OB.FunctionInfo.updateScopeEnd(OB);
     Name->print(OB);
+    OB.FunctionInfo.updateBasenameEnd(OB);
   }
 };
 
@@ -1633,6 +1653,7 @@ struct NameWithTemplateArgs : Node {
 
   void printLeft(OutputBuffer &OB) const override {
     Name->print(OB);
+    OB.FunctionInfo.updateBasenameEnd(OB);
     TemplateArgs->print(OB);
   }
 };
diff --git a/llvm/include/llvm/Demangle/Utility.h b/llvm/include/llvm/Demangle/Utility.h
index e893cceea2cdc..4e69c3623b480 100644
--- a/llvm/include/llvm/Demangle/Utility.h
+++ b/llvm/include/llvm/Demangle/Utility.h
@@ -27,6 +27,66 @@
 
 DEMANGLE_NAMESPACE_BEGIN
 
+template <class T> class ScopedOverride {
+  T &Loc;
+  T Original;
+
+public:
+  ScopedOverride(T &Loc_) : ScopedOverride(Loc_, Loc_) {}
+
+  ScopedOverride(T &Loc_, T NewVal) : Loc(Loc_), Original(Loc_) {
+    Loc_ = std::move(NewVal);
+  }
+  ~ScopedOverride() { Loc = std::move(Original); }
+
+  ScopedOverride(const ScopedOverride &) = delete;
+  ScopedOverride &operator=(const ScopedOverride &) = delete;
+};
+
+class OutputBuffer;
+
+// Stores information about parts of a demangled function name.
+struct FunctionNameInfo {
+  /// A [start, end) pair for the function basename.
+  /// The basename is the name without scope qualifiers
+  /// and without template parameters. E.g.,
+  /// \code{.cpp}
+  ///    void foo::bar<int>::someFunc<float>(int) const &&
+  ///                        ^       ^
+  ///                      Start    End
+  /// \endcode
+  std::pair<size_t, size_t> BasenameLocs;
+
+  /// A [start, end) pair for the function scope qualifiers.
+  /// E.g., for
+  /// \code{.cpp}
+  ///    void foo::bar<int>::qux<float>(int) const &&
+  ///         ^              ^
+  ///       Start           End
+  /// \endcode
+  std::pair<size_t, size_t> ScopeLocs;
+
+  /// Indicates the [start, end) of the function argument lits.
+  /// E.g.,
+  /// \code{.cpp}
+  ///    int (*getFunc<float>(float, double))(int, int)
+  ///                        ^              ^
+  ///                      start           end
+  /// \endcode
+  std::pair<size_t, size_t> ArgumentLocs;
+
+  bool startedPrintingArguments() const;
+  bool shouldTrack(OutputBuffer &OB) const;
+  bool canFinalize(OutputBuffer &OB) const;
+  void updateBasenameEnd(OutputBuffer &OB);
+  void updateScopeStart(OutputBuffer &OB);
+  void updateScopeEnd(OutputBuffer &OB);
+  void finalizeArgumentEnd(OutputBuffer &OB);
+  void finalizeStart(OutputBuffer &OB);
+  void finalizeEnd(OutputBuffer &OB);
+  bool hasBasename() const;
+};
+
 // Stream that AST nodes write their string representation into after the AST
 // has been parsed.
 class OutputBuffer {
@@ -34,6 +94,10 @@ class OutputBuffer {
   size_t CurrentPosition = 0;
   size_t BufferCapacity = 0;
 
+  /// When a function type is being printed this value is incremented.
+  /// When printing of the type is finished the value is decremented.
+  unsigned FunctionPrintingDepth = 0;
+
   // Ensure there are at least N more positions in the buffer.
   void grow(size_t N) {
     size_t Need = N + CurrentPosition;
@@ -92,8 +156,19 @@ class OutputBuffer {
   /// Use a counter so we can simply increment inside parentheses.
   unsigned GtIsGt = 1;
 
+  /// When printing the mangle tree, this object will hold information about
+  /// the function name being printed (if any).
+  FunctionNameInfo FunctionInfo;
+
+  /// Called when we start printing a function type.
+  [[nodiscard]] ScopedOverride<unsigned> enterFunctionTypePrinting();
+
   bool isGtInsideTemplateArgs() const { return GtIsGt == 0; }
 
+  bool isPrintingTopLevelFunctionType() const {
+    return FunctionPrintingDepth == 1;
+  }
+
   void printOpen(char Open = '(') {
     GtIsGt++;
     *this += Open;
@@ -182,22 +257,6 @@ class OutputBuffer {
   size_t getBufferCapacity() const { return BufferCapacity; }
 };
 
-template <class T> class ScopedOverride {
-  T &Loc;
-  T Original;
-
-public:
-  ScopedOverride(T &Loc_) : ScopedOverride(Loc_, Loc_) {}
-
-  ScopedOverride(T &Loc_, T NewVal) : Loc(Loc_), Original(Loc_) {
-    Loc_ = std::move(NewVal);
-  }
-  ~ScopedOverride() { Loc = std::move(Original); }
-
-  ScopedOverride(const ScopedOverride &) = delete;
-  ScopedOverride &operator=(const ScopedOverride &) = delete;
-};
-
 DEMANGLE_NAMESPACE_END
 
 #endif
diff --git a/llvm/lib/Demangle/CMakeLists.txt b/llvm/lib/Demangle/CMakeLists.txt
index eb7d212a02449..0da6f6b89ad54 100644
--- a/llvm/lib/Demangle/CMakeLists.txt
+++ b/llvm/lib/Demangle/CMakeLists.txt
@@ -1,4 +1,5 @@
 add_llvm_component_library(LLVMDemangle
+  Utility.cpp
   Demangle.cpp
   ItaniumDemangle.cpp
   MicrosoftDemangle.cpp
diff --git a/llvm/lib/Demangle/README.txt b/llvm/lib/Demangle/README.txt
new file mode 100644
index 0000000000000..c3f49e57b8d16
--- /dev/null
+++ b/llvm/lib/Demangle/README.txt
@@ -0,0 +1,61 @@
+Itanium Name Demangler Library
+==============================
+
+Introduction
+------------
+
+This directory contains the generic itanium name demangler
+library. The main purpose of the library is to demangle C++ symbols,
+i.e. convert the string "_Z1fv" into "f()". You can also use the CRTP
+base ManglingParser to perform some simple analysis on the mangled
+name, or (in LLVM) use the opaque ItaniumPartialDemangler to query the
+demangled AST.
+
+Why are there multiple copies of the this library in the source tree?
+---------------------------------------------------------------------
+
+The canonical sources are in libcxxabi/src/demangle and some of the
+files are copied to llvm/include/llvm/Demangle.  The simple reason for
+this comes from before the monorepo, and both [sub]projects need to
+demangle symbols, but neither can depend on each other.
+
+* libcxxabi needs the demangler to implement __cxa_demangle, which is
+  part of the itanium ABI spec.
+
+* LLVM needs a copy for a bunch of places, and cannot rely on the
+  system's __cxa_demangle because it a) might not be available (i.e.,
+  on Windows), and b) may not be up-to-date on the latest language
+  features.
+
+The copy of the demangler in LLVM has some extra stuff that aren't
+ne...
[truncated]

@Michael137 Michael137 requested a review from ldionne March 27, 2025 14:23
@zygoloid
Copy link
Collaborator

This approach seems very specific to this particular use case. That doesn't seem like a great fit here; the demangler has generally been designed to be pretty neutral to its use case, including things like the fuzzy matcher that's used for profile info being implemented outside the demangler through a few hooks. I'd imagine other users of the demangler might want something similar to this, but different -- for example, full syntax highlighting of demangled names, or eliding template argument lists and function parameter types when printing demangled names -- so I think we shouldn't be adding something this coupled to the lldb use case here. How would you feel about a somewhat different approach:

  • Instead of having each Node::print* function call other Node::print* functions, make it call a new OutputStream::print*(Node*) function.
  • Make that new function virtual, with a default implementation that just calls back into the Nodes, and ensure that clients of the demangler can pass in their own custom OutputStream.
  • Override this part of OutputStream from lldb and add a small state machine to track which parts of the output correspond to which nodes.

You'll also need to add hooks to OutputBuffer::insert and OutputBuffer::prepend, which can insert text at a position that's not the end of the buffer, and OutputBuffer::setCurrentPosition, which effectively erases from the new current position to the old end of the buffer.

@Michael137
Copy link
Member Author

This approach seems very specific to this particular use case. That doesn't seem like a great fit here; the demangler has generally been designed to be pretty neutral to its use case, including things like the fuzzy matcher that's used for profile info being implemented outside the demangler through a few hooks. I'd imagine other users of the demangler might want something similar to this, but different -- for example, full syntax highlighting of demangled names, or eliding template argument lists and function parameter types when printing demangled names -- so I think we shouldn't be adding something this coupled to the lldb use case here. How would you feel about a somewhat different approach:

  • Instead of having each Node::print* function call other Node::print* functions, make it call a new OutputStream::print*(Node*) function.
  • Make that new function virtual, with a default implementation that just calls back into the Nodes, and ensure that clients of the demangler can pass in their own custom OutputStream.
  • Override this part of OutputStream from lldb and add a small state machine to track which parts of the output correspond to which nodes.

You'll also need to add hooks to OutputBuffer::insert and OutputBuffer::prepend, which can insert text at a position that's not the end of the buffer, and OutputBuffer::setCurrentPosition, which effectively erases from the new current position to the old end of the buffer.

Yea I agree this would be very specific to how LLDB uses this API. Happy to make it more generic as you propose. Let me give that a shot!

@Michael137 Michael137 requested a review from JDevlieghere as a code owner April 3, 2025 10:03
@llvmbot llvmbot added the lldb label Apr 3, 2025
@Michael137 Michael137 changed the title [WIP] [libcxxabi][ItaniumDemangle] Add infrastructure to track location information of parts of a demangled function name [WIP] [ItaniumDemangle] Add infrastructure to track location information of parts of a demangled function name Apr 3, 2025
@Michael137 Michael137 force-pushed the lldb/function-name-highlighting-demangler branch 3 times, most recently from 3fc4026 to 081bfaa Compare April 8, 2025 14:18
@Michael137 Michael137 changed the title [WIP] [ItaniumDemangle] Add infrastructure to track location information of parts of a demangled function name [ItaniumDemangle] Add infrastructure to track location information of parts of a demangled function name Apr 8, 2025
@Michael137 Michael137 changed the title [ItaniumDemangle] Add infrastructure to track location information of parts of a demangled function name [ItaniumDemangle] Add customizable printLeft/printRight APIs to OutputBuffer Apr 8, 2025
@Michael137
Copy link
Member Author

Test failures look unrelated

Mind having another look @zygoloid (or anyone else) ? The LLDB PR that makes use of this is mostly ready

Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

However: for your use case, I think you'll likely also need to add a couple more hooks to OutputBuffer in order to properly track the locations of parts of the demangled name. Specifically:

  • For prepend and insert, some virtual function call to indicate that an insertion has happened in the buffer (eg, specifying the insertion position and size).
  • For setCurrentPosition, some virtual function call to indicate that an erase is about to happen in the buffer (eg, specifying the position and size) -- right now the only kind of erase we perform here is a truncation, but it may be a bit more future-proof for the extension mechanism to support an arbitrary erase.

If you don't handle those, any locations that you are tracking on the lldb side may be wrong when they happen.

@Michael137
Copy link
Member Author

Looks good.

However: for your use case, I think you'll likely also need to add a couple more hooks to OutputBuffer in order to properly track the locations of parts of the demangled name. Specifically:

  • For prepend and insert, some virtual function call to indicate that an insertion has happened in the buffer (eg, specifying the insertion position and size).
  • For setCurrentPosition, some virtual function call to indicate that an erase is about to happen in the buffer (eg, specifying the position and size) -- right now the only kind of erase we perform here is a truncation, but it may be a bit more future-proof for the extension mechanism to support an arbitrary erase.

If you don't handle those, any locations that you are tracking on the lldb side may be wrong when they happen.

Makes sense! Yea looks like insert/prepend are only used for D and Rust, but I'll add a hook for future-proofing. setCurrentPosition seems like it could already break our LLDB tracking when parameter packs are involved. I'll try to test this

@Michael137 Michael137 force-pushed the lldb/function-name-highlighting-demangler branch from c47f48b to 8526be4 Compare April 16, 2025 15:07
@Michael137
Copy link
Member Author

@zygoloid Added the suggested hooks in latest iteration. On the LLDB PR insert and prepend are pretty easy to handle (we just have to shift all the locations after the position that we've inserted into).

setCurrentPosition is trickier because we don't actually know what should happen to the tracked locations. I chose to just reset all the locations when that is called.

Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor things but I think generally this looks good.

Comment on lines 99 to 100
/// Called when we reset the \c CurrentPosition of this object.
virtual void notifyPositionChanged(size_t /*OldPos*/, size_t /*NewPos*/) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think notifyDeletion / notifyErasure would make more sense here. setCurrentPosition is an erase operation, because CurrentPosition is treated as the end of the buffer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or notifyResize if you'd prefer to not provide the more general "erase" notification, given that we don't currently use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with notifyDeletion for now since "resize" might get confused with resizing of the underlying buffer

@@ -121,6 +137,8 @@ class OutputBuffer {
OutputBuffer &prepend(std::string_view R) {
size_t Size = R.size();

notifyInsertion(/*Position=*/0, /*Count=*/Size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be called after we perform the insert, so that the callback can inspect the added bytes.

@@ -161,14 +179,20 @@ class OutputBuffer {
DEMANGLE_ASSERT(Pos <= CurrentPosition, "");
if (N == 0)
return;

notifyInsertion(Pos, N);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Likewise.)

@Michael137 Michael137 force-pushed the lldb/function-name-highlighting-demangler branch from 7392bd6 to 94128c9 Compare April 17, 2025 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++abi libc++abi C++ Runtime Library. Not libc++. lldb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants