Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge test #1

Closed
wants to merge 0 commits into from
Closed

Merge test #1

wants to merge 0 commits into from

Conversation

tstellar
Copy link
Owner

No description provided.

@tstellar tstellar closed this Mar 18, 2019
tstellar pushed a commit that referenced this pull request Jul 1, 2019
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.

Differential Revision: https://reviews.llvm.org/D63635

llvm-svn: 364115
tstellar pushed a commit that referenced this pull request Jul 1, 2019
This fixes a failing testcase on Fedora 30 x86_64 (regression Fedora 29->30):

PASS:
./bin/lldb ./lldb-test-build.noindex/functionalities/unwind/noreturn/TestNoreturnUnwind.test_dwarf/a.out -o 'settings set symbols.enable-external-lookup false' -o r -o bt -o quit
  * frame #0: 0x00007ffff7aa6e75 libc.so.6`__GI_raise + 325
    frame #1: 0x00007ffff7a91895 libc.so.6`__GI_abort + 295
    frame #2: 0x0000000000401140 a.out`func_c at main.c:12:2
    frame #3: 0x000000000040113a a.out`func_b at main.c:18:2
    frame #4: 0x0000000000401134 a.out`func_a at main.c:26:2
    frame #5: 0x000000000040112e a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:32:2
    frame #6: 0x00007ffff7a92f33 libc.so.6`__libc_start_main + 243
    frame #7: 0x000000000040106e a.out`_start + 46

vs.

FAIL - unrecognized abort() function:
./bin/lldb ./lldb-test-build.noindex/functionalities/unwind/noreturn/TestNoreturnUnwind.test_dwarf/a.out -o 'settings set symbols.enable-external-lookup false' -o r -o bt -o quit
  * frame #0: 0x00007ffff7aa6e75 libc.so.6`.annobin_raise.c + 325
    frame #1: 0x00007ffff7a91895 libc.so.6`.annobin_loadmsgcat.c_end.unlikely + 295
    frame #2: 0x0000000000401140 a.out`func_c at main.c:12:2
    frame #3: 0x000000000040113a a.out`func_b at main.c:18:2
    frame #4: 0x0000000000401134 a.out`func_a at main.c:26:2
    frame #5: 0x000000000040112e a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:32:2
    frame #6: 0x00007ffff7a92f33 libc.so.6`.annobin_libc_start.c + 243
    frame #7: 0x000000000040106e a.out`.annobin_init.c.hot + 46

The extra ELF symbols are there due to Annobin (I did not investigate why this problem happened specifically since F-30 and not since F-28).
It is due to:

Symbol table '.dynsym' contains 2361 entries:
Valu e          Size Type   Bind   Vis     Name
0000000000022769   5 FUNC   LOCAL  DEFAULT _nl_load_domain.cold
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_abort.c.unlikely
...
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_loadmsgcat.c_end.unlikely
...
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_textdomain.c_end.unlikely
000000000002276e 548 FUNC   GLOBAL DEFAULT abort
000000000002276e 548 FUNC   GLOBAL DEFAULT abort@@GLIBC_2.2.5
000000000002276e 548 FUNC   LOCAL  DEFAULT __GI_abort
0000000000022992   0 NOTYPE LOCAL  HIDDEN  .annobin_abort.c_end.unlikely

Differential Revision: https://reviews.llvm.org/D63540

llvm-svn: 364773
tstellar pushed a commit that referenced this pull request Oct 25, 2019
Summary:
If the .symtab section is stripped from the binary it might be that
there's a .gnu_debugdata section which contains a smaller .symtab in
order to provide enough information to create a backtrace with function
names or to set and hit a breakpoint on a function name.

This change looks for a .gnu_debugdata section in the ELF object file.
The .gnu_debugdata section contains a xz-compressed ELF file with a
.symtab section inside. Symbols from that compressed .symtab section
are merged with the main object file's .dynsym symbols (if any).
In addition we always load the .dynsym even if there's a .symtab
section.

For example, the Fedora and RHEL operating systems strip their binaries
but keep a .gnu_debugdata section. While gdb already can read this
section, LLDB until this patch couldn't. To test this patch on a
Fedora or RHEL operating system, try to set a breakpoint on the "help"
symbol in the "zip" binary. Before this patch, only GDB can set this
breakpoint; now LLDB also can do so without installing extra debug
symbols:

    lldb /usr/bin/zip -b -o "b help" -o "r" -o "bt" -- -h

The above line runs LLDB in batch mode and on the "/usr/bin/zip -h"
target:

    (lldb) target create "/usr/bin/zip"
    Current executable set to '/usr/bin/zip' (x86_64).
    (lldb) settings set -- target.run-args  "-h"

Before the program starts, we set a breakpoint on the "help" symbol:

    (lldb) b help
    Breakpoint 1: where = zip`help, address = 0x00000000004093b0

Once the program is run and has hit the breakpoint we ask for a
backtrace:

    (lldb) r
    Process 10073 stopped
    * thread #1, name = 'zip', stop reason = breakpoint 1.1
        frame #0: 0x00000000004093b0 zip`help
    zip`help:
    ->  0x4093b0 <+0>:  pushq  %r12
        0x4093b2 <+2>:  movq   0x2af5f(%rip), %rsi       ;  + 4056
        0x4093b9 <+9>:  movl   $0x1, %edi
        0x4093be <+14>: xorl   %eax, %eax

    Process 10073 launched: '/usr/bin/zip' (x86_64)
    (lldb) bt
    * thread #1, name = 'zip', stop reason = breakpoint 1.1
      * frame #0: 0x00000000004093b0 zip`help
        frame #1: 0x0000000000403970 zip`main + 3248
        frame #2: 0x00007ffff7d8bf33 libc.so.6`__libc_start_main + 243
        frame #3: 0x0000000000408cee zip`_start + 46

In order to support the .gnu_debugdata section, one has to have LZMA
development headers installed. The CMake section, that controls this
part looks for the LZMA headers and enables .gnu_debugdata support by
default if they are found; otherwise or if explicitly requested, the
minidebuginfo support is disabled.

GDB supports the "mini debuginfo" section .gnu_debugdata since v7.6
(2013).

Reviewers: espindola, labath, jankratochvil, alexshap

Reviewed By: labath

Subscribers: rnkovacs, wuzish, shafik, emaste, mgorny, arichardson, hiraditya, MaskRay, lldb-commits

Tags: #lldb, #llvm

Differential Revision: https://reviews.llvm.org/D66791

llvm-svn: 373891
tstellar pushed a commit that referenced this pull request Oct 25, 2019
This test is not defined.

FAIL: LLVM-Unit :: ADT/./ADTTests/ArrayRefTest.SizeTSizedOperations (178 of 33926)
******************** TEST 'LLVM-Unit :: ADT/./ADTTests/ArrayRefTest.SizeTSizedOperations' FAILED ********************
Note: Google Test filter = ArrayRefTest.SizeTSizedOperations
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ArrayRefTest
[ RUN      ] ArrayRefTest.SizeTSizedOperations
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:180:32: runtime error: applying non-zero offset 9223372036854775806 to null pointer
    #0 0x5ae8dc in llvm::ArrayRef<char>::slice(unsigned long, unsigned long) const /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:180:32
    #1 0x5ae44c in (anonymous namespace)::ArrayRefTest_SizeTSizedOperations_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/unittests/ADT/ArrayRefTest.cpp:85:3
    #2 0x928a96 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2474:5
    #3 0x929793 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2656:11
    #4 0x92a152 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2774:28
    #5 0x9319d2 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4649:43
    #6 0x931416 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4257:10
    #7 0x920ac3 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46
    #8 0x920ac3 in main /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50:10
    #9 0x7f66135b72e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #10 0x472c19 in _start (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/unittests/ADT/ADTTests+0x472c19)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:180:32 in
llvm-svn: 374327
tstellar pushed a commit that referenced this pull request Oct 25, 2019
…the branch where it's used

The existing code is not defined, you are not allowed
to produce non-null pointer from null pointer (F->FileSortedDecls here).
That being said, i'm not really confident this is fix-enough, but we'll see.

FAIL: Clang :: Modules/no-module-map.cpp (6879 of 16079)
******************** TEST 'Clang :: Modules/no-module-map.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-name=ab -x c++-header /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map/a.h /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map/b.h -emit-header-module -o /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm
: 'RUN: at line 2';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-file=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map -verify
: 'RUN: at line 3';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-file=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map -verify -DA
: 'RUN: at line 4';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-file=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map -verify -DB
: 'RUN: at line 5';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-file=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map -verify -DA -DB
: 'RUN: at line 7';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -E /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm -o - | /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp
: 'RUN: at line 8';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -frewrite-imports -E /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm -o - | /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp
--
Exit Code: 2

Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:1526:50: runtime error: applying non-zero offset 8 to null pointer
    #0 0x3a9bd0c in clang::ASTReader::ReadSLocEntry(int) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:1526:50
    #1 0x328b6f8 in clang::SourceManager::loadSLocEntry(unsigned int, bool*) const /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Basic/SourceManager.cpp:461:28
    #2 0x328b351 in clang::SourceManager::initializeForReplay(clang::SourceManager const&) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Basic/SourceManager.cpp:399:11
    #3 0x3996c71 in clang::FrontendAction::BeginSourceFile(clang::CompilerInstance&, clang::FrontendInputFile const&) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Frontend/FrontendAction.cpp:581:27
    #4 0x394f341 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:956:13
    #5 0x3a8a92b in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:290:25
    #6 0xaf8d62 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/tools/driver/cc1_main.cpp:250:15
    #7 0xaf1602 in ExecuteCC1Tool /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/tools/driver/driver.cpp:309:12
    #8 0xaf1602 in main /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/tools/driver/driver.cpp:382:12
    #9 0x7f2c1eecc2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #10 0xad57f9 in _start (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang-10+0xad57f9)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:1526:50 in
llvm-svn: 374328
tstellar pushed a commit that referenced this pull request Nov 5, 2019
Currently, clang emits subprograms for declared functions when the
target debugger or DWARF standard is known to support entry values
(DW_OP_entry_value & the GNU equivalent).

Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU
tail calls.

Pre-patch debug session with a cross-TU tail call:

```
  * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt]
    frame #1: 0x0000000100000f99 main`main at a.c:8:10 [opt]
```

Post-patch (note that the tail-calling frame, "helper", is visible):

```
  * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt]
    frame #1: 0x0000000100000f80 main`helper [opt] [artificial]
    frame #2: 0x0000000100000f99 main`main at a.c:8:10 [opt]
```

rdar://46577651

Differential Revision: https://reviews.llvm.org/D69743
tstellar pushed a commit that referenced this pull request Nov 8, 2019
…_call is understood"

This caused Chromium builds to fail with "inlinable function call in a function
with debug info must have a !dbg location" errors. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1022296#c1 for a
reproducer.

> Currently, clang emits subprograms for declared functions when the
> target debugger or DWARF standard is known to support entry values
> (DW_OP_entry_value & the GNU equivalent).
>
> Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU
> tail calls.
>
> Pre-patch debug session with a cross-TU tail call:
>
> ```
>   * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt]
>     frame #1: 0x0000000100000f99 main`main at a.c:8:10 [opt]
> ```
>
> Post-patch (note that the tail-calling frame, "helper", is visible):
>
> ```
>   * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt]
>     frame #1: 0x0000000100000f80 main`helper [opt] [artificial]
>     frame #2: 0x0000000100000f99 main`main at a.c:8:10 [opt]
> ```
>
> rdar://46577651
>
> Differential Revision: https://reviews.llvm.org/D69743
tstellar pushed a commit that referenced this pull request Nov 14, 2019
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.

This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.

E.g., let say we want to merge:
    V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>

We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
    V2: sub0 sub1 sub2 sub3
    V1: <offset>  sub0 sub1

This offset will look like a composed subregidx in the the class:
     V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
 =>  V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>

Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.

Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.

For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)

This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
   subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.

looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.

None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
tstellar pushed a commit that referenced this pull request Nov 21, 2019
…ood (reland with fixes)

Currently, clang emits subprograms for declared functions when the
target debugger or DWARF standard is known to support entry values
(DW_OP_entry_value & the GNU equivalent).

Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU
tail calls.

Pre-patch debug session with a cross-TU tail call:

```
  * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt]
    frame #1: 0x0000000100000f99 main`main at a.c:8:10 [opt]
```

Post-patch (note that the tail-calling frame, "helper", is visible):

```
  * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt]
    frame #1: 0x0000000100000f80 main`helper [opt] [artificial]
    frame #2: 0x0000000100000f99 main`main at a.c:8:10 [opt]
```

This was reverted in 5b9a072 because it attached declaration
subprograms to inlinable builtin calls, which interacted badly with the
MergeICmps pass. The fix is to not attach declarations to builtins.

rdar://46577651

Differential Revision: https://reviews.llvm.org/D69743
tstellar pushed a commit that referenced this pull request Dec 5, 2019
…iant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151"

Summary:
Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151"

 This reverts commit 5f026b6.

We're (tensorflow.org/xla team) seeing some misscompiles with the new change, only at -O3, with fast math disabled.

I'm still trying to come up with a useful/small/external example, but for now, the following IR:

```
; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

@0 = private unnamed_addr constant [4 x i8] c"\DB\0F\C9@"
@1 = private unnamed_addr constant [4 x i8] c"\00\00\00?"

; Function Attrs: uwtable
define void @jit_wrapped_fun.31(i8* %retval, i8* noalias %run_options, i8** noalias %params, i8** noalias %buffer_table, i64* noalias %prof_counters) #0 {
entry:
  %fusion.invar_address.dim.2 = alloca i64
  %fusion.invar_address.dim.1 = alloca i64
  %fusion.invar_address.dim.0 = alloca i64
  %fusion.1.invar_address.dim.2 = alloca i64
  %fusion.1.invar_address.dim.1 = alloca i64
  %fusion.1.invar_address.dim.0 = alloca i64
  %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
  %1 = load i8*, i8** %0, !invariant.load !0, !dereferenceable !1, !align !2
  %parameter.3 = bitcast i8* %1 to [2 x [1 x [4 x float]]]*
  %2 = getelementptr inbounds i8*, i8** %buffer_table, i64 5
  %3 = load i8*, i8** %2, !invariant.load !0, !dereferenceable !1, !align !2
  %fusion.1 = bitcast i8* %3 to [2 x [1 x [4 x float]]]*
  store i64 0, i64* %fusion.1.invar_address.dim.0
  br label %fusion.1.loop_header.dim.0

fusion.1.loop_header.dim.0:                       ; preds = %fusion.1.loop_exit.dim.1, %entry
  %fusion.1.indvar.dim.0 = load i64, i64* %fusion.1.invar_address.dim.0
  %4 = icmp uge i64 %fusion.1.indvar.dim.0, 2
  br i1 %4, label %fusion.1.loop_exit.dim.0, label %fusion.1.loop_body.dim.0

fusion.1.loop_body.dim.0:                         ; preds = %fusion.1.loop_header.dim.0
  store i64 0, i64* %fusion.1.invar_address.dim.1
  br label %fusion.1.loop_header.dim.1

fusion.1.loop_header.dim.1:                       ; preds = %fusion.1.loop_exit.dim.2, %fusion.1.loop_body.dim.0
  %fusion.1.indvar.dim.1 = load i64, i64* %fusion.1.invar_address.dim.1
  %5 = icmp uge i64 %fusion.1.indvar.dim.1, 1
  br i1 %5, label %fusion.1.loop_exit.dim.1, label %fusion.1.loop_body.dim.1

fusion.1.loop_body.dim.1:                         ; preds = %fusion.1.loop_header.dim.1
  store i64 0, i64* %fusion.1.invar_address.dim.2
  br label %fusion.1.loop_header.dim.2

fusion.1.loop_header.dim.2:                       ; preds = %fusion.1.loop_body.dim.2, %fusion.1.loop_body.dim.1
  %fusion.1.indvar.dim.2 = load i64, i64* %fusion.1.invar_address.dim.2
  %6 = icmp uge i64 %fusion.1.indvar.dim.2, 4
  br i1 %6, label %fusion.1.loop_exit.dim.2, label %fusion.1.loop_body.dim.2

fusion.1.loop_body.dim.2:                         ; preds = %fusion.1.loop_header.dim.2
  %7 = load float, float* bitcast ([4 x i8]* @0 to float*)
  %8 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2
  %9 = load float, float* %8, !invariant.load !0, !noalias !3
  %10 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2
  %11 = load float, float* %10, !invariant.load !0, !noalias !3
  %12 = fmul float %9, %11
  %13 = fmul float %7, %12
  %14 = call float @llvm.log.f32(float %13)
  %15 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2
  store float %14, float* %15, !alias.scope !7, !noalias !8
  %invar.inc2 = add nuw nsw i64 %fusion.1.indvar.dim.2, 1
  store i64 %invar.inc2, i64* %fusion.1.invar_address.dim.2
  br label %fusion.1.loop_header.dim.2

fusion.1.loop_exit.dim.2:                         ; preds = %fusion.1.loop_header.dim.2
  %invar.inc1 = add nuw nsw i64 %fusion.1.indvar.dim.1, 1
  store i64 %invar.inc1, i64* %fusion.1.invar_address.dim.1
  br label %fusion.1.loop_header.dim.1

fusion.1.loop_exit.dim.1:                         ; preds = %fusion.1.loop_header.dim.1
  %invar.inc = add nuw nsw i64 %fusion.1.indvar.dim.0, 1
  store i64 %invar.inc, i64* %fusion.1.invar_address.dim.0
  br label %fusion.1.loop_header.dim.0

fusion.1.loop_exit.dim.0:                         ; preds = %fusion.1.loop_header.dim.0
  %16 = getelementptr inbounds i8*, i8** %buffer_table, i64 4
  %17 = load i8*, i8** %16, !invariant.load !0, !dereferenceable !9, !align !2
  %parameter.1 = bitcast i8* %17 to float*
  %18 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
  %19 = load i8*, i8** %18, !invariant.load !0, !dereferenceable !10, !align !2
  %parameter.2 = bitcast i8* %19 to [3 x [1 x float]]*
  %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 0
  %21 = load i8*, i8** %20, !invariant.load !0, !dereferenceable !11, !align !2
  %fusion = bitcast i8* %21 to [2 x [3 x [4 x float]]]*
  store i64 0, i64* %fusion.invar_address.dim.0
  br label %fusion.loop_header.dim.0

fusion.loop_header.dim.0:                         ; preds = %fusion.loop_exit.dim.1, %fusion.1.loop_exit.dim.0
  %fusion.indvar.dim.0 = load i64, i64* %fusion.invar_address.dim.0
  %22 = icmp uge i64 %fusion.indvar.dim.0, 2
  br i1 %22, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0

fusion.loop_body.dim.0:                           ; preds = %fusion.loop_header.dim.0
  store i64 0, i64* %fusion.invar_address.dim.1
  br label %fusion.loop_header.dim.1

fusion.loop_header.dim.1:                         ; preds = %fusion.loop_exit.dim.2, %fusion.loop_body.dim.0
  %fusion.indvar.dim.1 = load i64, i64* %fusion.invar_address.dim.1
  %23 = icmp uge i64 %fusion.indvar.dim.1, 3
  br i1 %23, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1

fusion.loop_body.dim.1:                           ; preds = %fusion.loop_header.dim.1
  store i64 0, i64* %fusion.invar_address.dim.2
  br label %fusion.loop_header.dim.2

fusion.loop_header.dim.2:                         ; preds = %fusion.loop_body.dim.2, %fusion.loop_body.dim.1
  %fusion.indvar.dim.2 = load i64, i64* %fusion.invar_address.dim.2
  %24 = icmp uge i64 %fusion.indvar.dim.2, 4
  br i1 %24, label %fusion.loop_exit.dim.2, label %fusion.loop_body.dim.2

fusion.loop_body.dim.2:                           ; preds = %fusion.loop_header.dim.2
  %25 = mul nuw nsw i64 %fusion.indvar.dim.2, 1
  %26 = add nuw nsw i64 0, %25
  %27 = udiv i64 %26, 4
  %28 = mul nuw nsw i64 %fusion.indvar.dim.0, 1
  %29 = add nuw nsw i64 0, %28
  %30 = udiv i64 %29, 2
  %31 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %29, i64 0, i64 %26
  %32 = load float, float* %31, !alias.scope !7, !noalias !8
  %33 = mul nuw nsw i64 %fusion.indvar.dim.1, 1
  %34 = add nuw nsw i64 0, %33
  %35 = udiv i64 %34, 3
  %36 = load float, float* %parameter.1, !invariant.load !0, !noalias !3
  %37 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %parameter.2, i64 0, i64 %34, i64 0
  %38 = load float, float* %37, !invariant.load !0, !noalias !3
  %39 = fsub float %36, %38
  %40 = fmul float %39, %39
  %41 = mul nuw nsw i64 %fusion.indvar.dim.2, 1
  %42 = add nuw nsw i64 0, %41
  %43 = udiv i64 %42, 4
  %44 = mul nuw nsw i64 %fusion.indvar.dim.0, 1
  %45 = add nuw nsw i64 0, %44
  %46 = udiv i64 %45, 2
  %47 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42
  %48 = load float, float* %47, !invariant.load !0, !noalias !3
  %49 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42
  %50 = load float, float* %49, !invariant.load !0, !noalias !3
  %51 = fmul float %48, %50
  %52 = fdiv float %40, %51
  %53 = fadd float %32, %52
  %54 = fneg float %53
  %55 = load float, float* bitcast ([4 x i8]* @1 to float*)
  %56 = fmul float %54, %55
  %57 = getelementptr inbounds [2 x [3 x [4 x float]]], [2 x [3 x [4 x float]]]* %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 %fusion.indvar.dim.1, i64 %fusion.indvar.dim.2
  store float %56, float* %57, !alias.scope !8, !noalias !12
  %invar.inc5 = add nuw nsw i64 %fusion.indvar.dim.2, 1
  store i64 %invar.inc5, i64* %fusion.invar_address.dim.2
  br label %fusion.loop_header.dim.2

fusion.loop_exit.dim.2:                           ; preds = %fusion.loop_header.dim.2
  %invar.inc4 = add nuw nsw i64 %fusion.indvar.dim.1, 1
  store i64 %invar.inc4, i64* %fusion.invar_address.dim.1
  br label %fusion.loop_header.dim.1

fusion.loop_exit.dim.1:                           ; preds = %fusion.loop_header.dim.1
  %invar.inc3 = add nuw nsw i64 %fusion.indvar.dim.0, 1
  store i64 %invar.inc3, i64* %fusion.invar_address.dim.0
  br label %fusion.loop_header.dim.0

fusion.loop_exit.dim.0:                           ; preds = %fusion.loop_header.dim.0
  %58 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
  %59 = load i8*, i8** %58, !invariant.load !0, !dereferenceable !2, !align !2
  %tuple.30 = bitcast i8* %59 to [1 x i8*]*
  %60 = bitcast [2 x [3 x [4 x float]]]* %fusion to i8*
  %61 = getelementptr inbounds [1 x i8*], [1 x i8*]* %tuple.30, i64 0, i64 0
  store i8* %60, i8** %61, !alias.scope !14, !noalias !8
  ret void
}

; Function Attrs: nounwind readnone speculatable willreturn
declare float @llvm.log.f32(float) #1

attributes #0 = { uwtable "no-frame-pointer-elim"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }

!0 = !{}
!1 = !{i64 32}
!2 = !{i64 8}
!3 = !{!4, !6}
!4 = !{!"buffer: {index:0, offset:0, size:96}", !5}
!5 = !{!"XLA global AA domain"}
!6 = !{!"buffer: {index:5, offset:0, size:32}", !5}
!7 = !{!6}
!8 = !{!4}
!9 = !{i64 4}
!10 = !{i64 12}
!11 = !{i64 96}
!12 = !{!13, !6}
!13 = !{!"buffer: {index:3, offset:0, size:8}", !5}
!14 = !{!13}
```

gets (correctly) optimized to the one below without the change:

```
; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

; Function Attrs: nofree nounwind uwtable
define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 {
entry:
  %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
  %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]**
  %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5
  %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]**
  %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>*
  %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3
  %8 = fmul <4 x float> %7, %7
  %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000>
  %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9)
  %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>*
  store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8
  %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0
  %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0
  %14 = bitcast float* %12 to <4 x float>*
  %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3
  %16 = fmul <4 x float> %15, %15
  %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000>
  %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17)
  %19 = bitcast float* %13 to <4 x float>*
  store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8
  %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4
  %21 = bitcast i8** %20 to float**
  %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2
  %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
  %24 = bitcast i8** %23 to [3 x [1 x float]]**
  %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2
  %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2
  %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3
  %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0
  %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3
  %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>*
  %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3
  %30 = insertelement <2 x float> undef, float %27, i32 0
  %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer
  %32 = fsub <2 x float> %31, %29
  %33 = fmul <2 x float> %32, %32
  %shuffle30 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
  %34 = fsub float %27, %.pre29
  %35 = fmul float %34, %34
  %36 = insertelement <4 x float> undef, float %35, i32 0
  %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer
  %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %38 = fmul <4 x float> %7, %7
  %shuffle31 = shufflevector <4 x float> %38, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %39 = fdiv <8 x float> %shuffle30, %shuffle31
  %40 = fadd <8 x float> %shuffle, %39
  %41 = fmul <8 x float> %40, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %42 = bitcast i8* %26 to <8 x float>*
  store <8 x float> %41, <8 x float>* %42, align 8, !alias.scope !8, !noalias !12
  %43 = getelementptr inbounds i8, i8* %26, i64 32
  %44 = fdiv <4 x float> %37, %38
  %45 = fadd <4 x float> %10, %44
  %46 = fmul <4 x float> %45, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %47 = bitcast i8* %43 to <4 x float>*
  store <4 x float> %46, <4 x float>* %47, align 8, !alias.scope !8, !noalias !12
  %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0
  %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0
  %48 = bitcast float* %.phi.trans.insert to <4 x float>*
  %49 = load <4 x float>, <4 x float>* %48, align 8, !alias.scope !7, !noalias !8
  %50 = bitcast float* %.phi.trans.insert12 to <4 x float>*
  %51 = load <4 x float>, <4 x float>* %50, align 8, !invariant.load !0, !noalias !3
  %shuffle.1 = shufflevector <4 x float> %49, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %52 = getelementptr inbounds i8, i8* %26, i64 48
  %53 = fmul <4 x float> %51, %51
  %shuffle31.1 = shufflevector <4 x float> %53, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %54 = fdiv <8 x float> %shuffle30, %shuffle31.1
  %55 = fadd <8 x float> %shuffle.1, %54
  %56 = fmul <8 x float> %55, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %57 = bitcast i8* %52 to <8 x float>*
  store <8 x float> %56, <8 x float>* %57, align 8, !alias.scope !8, !noalias !12
  %58 = getelementptr inbounds i8, i8* %26, i64 80
  %59 = fdiv <4 x float> %37, %53
  %60 = fadd <4 x float> %49, %59
  %61 = fmul <4 x float> %60, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %62 = bitcast i8* %58 to <4 x float>*
  store <4 x float> %61, <4 x float>* %62, align 8, !alias.scope !8, !noalias !12
  %63 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
  %64 = bitcast i8** %63 to [1 x i8*]**
  %65 = load [1 x i8*]*, [1 x i8*]** %64, align 8, !invariant.load !0, !dereferenceable !2, !align !2
  %66 = getelementptr inbounds [1 x i8*], [1 x i8*]* %65, i64 0, i64 0
  store i8* %26, i8** %66, align 8, !alias.scope !14, !noalias !8
  ret void
}

; Function Attrs: nounwind readnone speculatable willreturn
declare <4 x float> @llvm.log.v4f32(<4 x float>) #1

attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }

!0 = !{}
!1 = !{i64 32}
!2 = !{i64 8}
!3 = !{!4, !6}
!4 = !{!"buffer: {index:0, offset:0, size:96}", !5}
!5 = !{!"XLA global AA domain"}
!6 = !{!"buffer: {index:5, offset:0, size:32}", !5}
!7 = !{!6}
!8 = !{!4}
!9 = !{i64 4}
!10 = !{i64 12}
!11 = !{i64 96}
!12 = !{!13, !6}
!13 = !{!"buffer: {index:3, offset:0, size:8}", !5}
!14 = !{!13}

```

and (incorrectly) optimized to the one below with the change:

```
; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

; Function Attrs: nofree nounwind uwtable
define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 {
entry:
  %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
  %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]**
  %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5
  %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]**
  %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>*
  %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3
  %8 = fmul <4 x float> %7, %7
  %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000>
  %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9)
  %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>*
  store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8
  %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0
  %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0
  %14 = bitcast float* %12 to <4 x float>*
  %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3
  %16 = fmul <4 x float> %15, %15
  %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000>
  %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17)
  %19 = bitcast float* %13 to <4 x float>*
  store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8
  %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4
  %21 = bitcast i8** %20 to float**
  %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2
  %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
  %24 = bitcast i8** %23 to [3 x [1 x float]]**
  %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2
  %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2
  %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3
  %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0
  %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3
  %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>*
  %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3
  %30 = insertelement <2 x float> undef, float %27, i32 0
  %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer
  %32 = fsub <2 x float> %31, %29
  %33 = fmul <2 x float> %32, %32
  %shuffle32 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
  %34 = fsub float %27, %.pre29
  %35 = fmul float %34, %34
  %36 = insertelement <4 x float> undef, float %35, i32 0
  %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer
  %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %38 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 0, i64 0, i64 3
  %39 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 0, i64 0, i64 3
  %40 = fmul <4 x float> %7, %7
  %41 = shufflevector <4 x float> %40, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
  %42 = fdiv <8 x float> %shuffle32, %41
  %43 = fadd <8 x float> %shuffle, %42
  %44 = fmul <8 x float> %43, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %45 = bitcast i8* %26 to <8 x float>*
  store <8 x float> %44, <8 x float>* %45, align 8, !alias.scope !8, !noalias !12
  %46 = extractelement <4 x float> %10, i32 0
  %47 = getelementptr inbounds i8, i8* %26, i64 32
  %48 = extractelement <4 x float> %10, i32 1
  %49 = extractelement <4 x float> %10, i32 2
  %50 = load float, float* %38, align 4, !alias.scope !7, !noalias !8
  %51 = load float, float* %39, align 4, !invariant.load !0, !noalias !3
  %52 = fmul float %51, %51
  %53 = insertelement <4 x float> undef, float %52, i32 3
  %54 = fdiv <4 x float> %37, %53
  %55 = insertelement <4 x float> undef, float %46, i32 0
  %56 = insertelement <4 x float> %55, float %48, i32 1
  %57 = insertelement <4 x float> %56, float %49, i32 2
  %58 = insertelement <4 x float> %57, float %50, i32 3
  %59 = fadd <4 x float> %58, %54
  %60 = fmul <4 x float> %59, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %61 = bitcast i8* %47 to <4 x float>*
  store <4 x float> %60, <4 x float>* %61, align 8, !alias.scope !8, !noalias !12
  %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0
  %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0
  %62 = bitcast float* %.phi.trans.insert to <4 x float>*
  %63 = load <4 x float>, <4 x float>* %62, align 8, !alias.scope !7, !noalias !8
  %64 = bitcast float* %.phi.trans.insert12 to <4 x float>*
  %65 = load <4 x float>, <4 x float>* %64, align 8, !invariant.load !0, !noalias !3
  %shuffle.1 = shufflevector <4 x float> %63, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %66 = getelementptr inbounds i8, i8* %26, i64 48
  %67 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 3
  %68 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 3
  %69 = fmul <4 x float> %65, %65
  %70 = shufflevector <4 x float> %69, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %71 = fdiv <8 x float> %shuffle32, %70
  %72 = fadd <8 x float> %shuffle.1, %71
  %73 = fmul <8 x float> %72, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %74 = bitcast i8* %66 to <8 x float>*
  store <8 x float> %73, <8 x float>* %74, align 8, !alias.scope !8, !noalias !12
  %75 = extractelement <4 x float> %69, i32 0
  %76 = extractelement <4 x float> %63, i32 0
  %77 = getelementptr inbounds i8, i8* %26, i64 80
  %78 = extractelement <4 x float> %69, i32 1
  %79 = extractelement <4 x float> %63, i32 1
  %80 = extractelement <4 x float> %69, i32 2
  %81 = extractelement <4 x float> %63, i32 2
  %82 = load float, float* %67, align 4, !alias.scope !7, !noalias !8
  %83 = load float, float* %68, align 4, !invariant.load !0, !noalias !3
  %84 = fmul float %83, %83
  %85 = insertelement <4 x float> undef, float %75, i32 0
  %86 = insertelement <4 x float> %85, float %78, i32 1
  %87 = insertelement <4 x float> %86, float %80, i32 2
  %88 = insertelement <4 x float> %87, float %84, i32 3
  %89 = fdiv <4 x float> %37, %88
  %90 = insertelement <4 x float> undef, float %76, i32 0
  %91 = insertelement <4 x float> %90, float %79, i32 1
  %92 = insertelement <4 x float> %91, float %81, i32 2
  %93 = insertelement <4 x float> %92, float %82, i32 3
  %94 = fadd <4 x float> %93, %89
  %95 = fmul <4 x float> %94, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %96 = bitcast i8* %77 to <4 x float>*
  store <4 x float> %95, <4 x float>* %96, align 8, !alias.scope !8, !noalias !12
  %97 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
  %98 = bitcast i8** %97 to [1 x i8*]**
  %99 = load [1 x i8*]*, [1 x i8*]** %98, align 8, !invariant.load !0, !dereferenceable !2, !align !2
  %100 = getelementptr inbounds [1 x i8*], [1 x i8*]* %99, i64 0, i64 0
  store i8* %26, i8** %100, align 8, !alias.scope !14, !noalias !8
  ret void
}

; Function Attrs: nounwind readnone speculatable willreturn
declare <4 x float> @llvm.log.v4f32(<4 x float>) #1

attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }

!0 = !{}
!1 = !{i64 32}
!2 = !{i64 8}
!3 = !{!4, !6}
!4 = !{!"buffer: {index:0, offset:0, size:96}", !5}
!5 = !{!"XLA global AA domain"}
!6 = !{!"buffer: {index:5, offset:0, size:32}", !5}
!7 = !{!6}
!8 = !{!4}
!9 = !{i64 4}
!10 = !{i64 12}
!11 = !{i64 96}
!12 = !{!13, !6}
!13 = !{!"buffer: {index:3, offset:0, size:8}", !5}
!14 = !{!13}

```

This results in bad numerical answers when used through XLA.
Again, it's not that easy to give a small fully-reproducible example, but the misscompare is:

```
Expected literal:
(
f32[2,3,4] {
{
  { nan, -inf, -3181.35, -inf },
  { nan, -inf, -28.2577019, -inf },
  { nan, -inf, -28.2577019, -inf }
},
{
  { -inf, -inf, -inf, -inf },
  { -6.60753046e+28, -1.47314833e+23, -inf, -inf },
  { -2.43504347e+30, -5.42892693e+24, -inf, -inf }
}
}
)

Actual literal:
(
f32[2,3,4] {
{
  { nan, -inf, -3181.35, -inf },
  { nan, -inf, -inf, -inf },
  { inf, -inf, -28.2577019, -inf }
},
{
  { -inf, -inf, -inf, -inf },
  { -6.60753046e+28, -1.47314833e+23, -inf, -inf },
  { -2.43504347e+30, -5.42892693e+24, -inf, -inf }
}
}
)
```

Reviewers: sanjoy.google, sanjoy, ebrevnov, jdoerfert, reames, chandlerc

Subscribers: hiraditya, Charusso, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70516
tstellar pushed a commit that referenced this pull request Dec 5, 2019
This patch adds the following intrinsics for gather loads with 64-bit offsets:
      * @llvm.aarch64.sve.ld1.gather (unscaled offset)
      * @llvm.aarch64.sve.ld1.gather.index (scaled offset)

These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words):
      * ld1h { z0.d }, p0/z, [x0, z0.d]
      * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1]

Committing on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70542
tstellar pushed a commit that referenced this pull request Dec 5, 2019
This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are:
* unscaled
  * @llvm.aarch64.sve.ld1.gather.sxtw
  * @llvm.aarch64.sve.ld1.gather.uxtw
* scaled (offsets become indices)
  * @llvm.arch64.sve.ld1.gather.sxtw.index
  * @llvm.arch64.sve.ld1.gather.uxtw.index
The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits.

These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words):
* unscaled
  * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw]
  * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw]
* scaled
  * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1]
  * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1]

Committed on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma

Reviewed By: sdesmalen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70782
tstellar pushed a commit that referenced this pull request Feb 1, 2021
… disabled for a PCH vs a module file

This addresses an issue with how the PCH preable works, specifically:

1. When using a PCH/preamble the module hash changes and a different cache directory is used
2. When the preamble is used, PCH & PCM validation is disabled.

Due to combination of #1 and #2, reparsing with preamble enabled can end up loading a stale module file before a header change and using it without updating it because validation is disabled and it doesn’t check that the header has changed and the module file is out-of-date.

rdar://72611253

Differential Revision: https://reviews.llvm.org/D95159
tstellar pushed a commit that referenced this pull request Apr 5, 2021
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
github-actions bot pushed a commit that referenced this pull request Aug 4, 2021
In the latest Linux kernels synchronous tag faults
include the tag bits in their address.
This change adds logical and allocation tags to the
description of synchronous tag faults.
(asynchronous faults have no address)

Process 1626 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSEGV: sync tag check fault (fault address: 0x900fffff7ff9010 logical tag: 0x9 allocation tag: 0x0)

This extends the existing description and will
show as much as it can on the rare occasion something
fails.

This change supports AArch64 MTE only but other
architectures could be added by extending the
switch at the start of AnnotateSyncTagCheckFault.
The rest of the function is generic code.

Tests have been added for synchronous and asynchronous
MTE faults.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D105178

(cherry picked from commit d510b5f)
github-actions bot pushed a commit that referenced this pull request Oct 27, 2021
This patch re-introduces the fix in the commit llvm@66b0cebf7f736 by @yrnkrn

> In DwarfEHPrepare, after all passes are run, RewindFunction may be a dangling
>
> pointer to a dead function. To make sure it's valid, doFinalization nullptrs
> RewindFunction just like the constructor and so it will be found on next run.
>
> llvm-svn: 217737

It seems that the fix was not migrated to `DwarfEHPrepareLegacyPass`.

This patch also updates `llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` to include `-run-twice` to exercise the cleanup. Without this patch `llvm-lit -v llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` fails with

```
-- Testing: 1 tests, 1 workers --
FAIL: LLVM :: CodeGen/X86/dwarf-eh-prepare.ll (1 of 1)
******************** TEST 'LLVM :: CodeGen/X86/dwarf-eh-prepare.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice < /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll -S | /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll
--
Exit Code: 2

Command Output (stderr):
--
Referencing function in another module!
  call void @_Unwind_Resume(i8* %ehptr) #1
; ModuleID = '<stdin>'
void (i8*)* @_Unwind_Resume
; ModuleID = '<stdin>'
in function simple_cleanup_catch
LLVM ERROR: Broken function found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice -S
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Module Verifier' on function '@simple_cleanup_catch'
 #0 0x000056121b570a2c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:0
 #1 0x000056121b56eb64 llvm::sys::RunSignalHandlers() /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Signals.cpp:97:0
 #2 0x000056121b56f28e SignalHandler(int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:397:0
 #3 0x00007fc7e9b22980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #4 0x00007fc7e87d3fb7 raise /build/glibc-S7xCS9/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #5 0x00007fc7e87d5921 abort /build/glibc-S7xCS9/glibc-2.27/stdlib/abort.c:81:0
 #6 0x000056121b4e1386 llvm::raw_svector_ostream::raw_svector_ostream(llvm::SmallVectorImpl<char>&) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:674:0
 #7 0x000056121b4e1386 llvm::report_fatal_error(llvm::Twine const&, bool) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/ErrorHandling.cpp:114:0
 #8 0x000056121b4e1528 (/home/arakaki/build/llvm-project/main/bin/opt+0x29e3528)
 #9 0x000056121adfd03f llvm::raw_ostream::operator<<(llvm::StringRef) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:218:0
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll

--

********************
********************
Failed Tests (1):
  LLVM :: CodeGen/X86/dwarf-eh-prepare.ll

Testing Time: 0.22s
  Failed: 1
```

Reviewed By: loladiro

Differential Revision: https://reviews.llvm.org/D110979

(cherry picked from commit e8806d7)
github-actions bot pushed a commit that referenced this pull request Nov 2, 2021
This patch re-introduces the fix in the commit llvm@66b0cebf7f736 by @yrnkrn

> In DwarfEHPrepare, after all passes are run, RewindFunction may be a dangling
>
> pointer to a dead function. To make sure it's valid, doFinalization nullptrs
> RewindFunction just like the constructor and so it will be found on next run.
>
> llvm-svn: 217737

It seems that the fix was not migrated to `DwarfEHPrepareLegacyPass`.

This patch also updates `llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` to include `-run-twice` to exercise the cleanup. Without this patch `llvm-lit -v llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` fails with

```
-- Testing: 1 tests, 1 workers --
FAIL: LLVM :: CodeGen/X86/dwarf-eh-prepare.ll (1 of 1)
******************** TEST 'LLVM :: CodeGen/X86/dwarf-eh-prepare.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice < /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll -S | /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll
--
Exit Code: 2

Command Output (stderr):
--
Referencing function in another module!
  call void @_Unwind_Resume(i8* %ehptr) #1
; ModuleID = '<stdin>'
void (i8*)* @_Unwind_Resume
; ModuleID = '<stdin>'
in function simple_cleanup_catch
LLVM ERROR: Broken function found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice -S
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Module Verifier' on function '@simple_cleanup_catch'
 #0 0x000056121b570a2c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:0
 #1 0x000056121b56eb64 llvm::sys::RunSignalHandlers() /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Signals.cpp:97:0
 #2 0x000056121b56f28e SignalHandler(int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:397:0
 #3 0x00007fc7e9b22980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #4 0x00007fc7e87d3fb7 raise /build/glibc-S7xCS9/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #5 0x00007fc7e87d5921 abort /build/glibc-S7xCS9/glibc-2.27/stdlib/abort.c:81:0
 #6 0x000056121b4e1386 llvm::raw_svector_ostream::raw_svector_ostream(llvm::SmallVectorImpl<char>&) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:674:0
 #7 0x000056121b4e1386 llvm::report_fatal_error(llvm::Twine const&, bool) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/ErrorHandling.cpp:114:0
 #8 0x000056121b4e1528 (/home/arakaki/build/llvm-project/main/bin/opt+0x29e3528)
 #9 0x000056121adfd03f llvm::raw_ostream::operator<<(llvm::StringRef) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:218:0
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll

--

********************
********************
Failed Tests (1):
  LLVM :: CodeGen/X86/dwarf-eh-prepare.ll

Testing Time: 0.22s
  Failed: 1
```

Reviewed By: loladiro

Differential Revision: https://reviews.llvm.org/D110979

(cherry picked from commit e8806d7)
tstellar pushed a commit that referenced this pull request Feb 10, 2022
llvm.insertvalue and llvm.extractvalue need LLVM primitive type
for the indexing operands. While upstreaming the TargetRewrite pass the change
was made from i32 to index without knowing this restriction. This patch reverts
back the types used for indexing in the two ops created in this pass.

the error you will receive when lowering to LLVM IR with the current code
is the following:

```
 'llvm.insertvalue' op operand #1 must be primitive LLVM type, but got 'index'
```

Reviewed By: jeanPerier, schweitz

Differential Revision: https://reviews.llvm.org/D119253
tstellar pushed a commit that referenced this pull request Feb 15, 2022
There is a clangd crash at `__memcmp_avx2_movbe`. Short problem description is below.

The method `HeaderIncludes::addExistingInclude` stores `Include` objects by reference at 2 places: `ExistingIncludes` (primary storage) and `IncludesByPriority` (pointer to the object's location at ExistingIncludes). `ExistingIncludes` is a map where value is a `SmallVector`. A new element is inserted by `push_back`. The operation might do resize. As result pointers stored at `IncludesByPriority` might become invalid.

Typical stack trace
```
    frame #0: 0x00007f11460dcd94 libc.so.6`__memcmp_avx2_movbe + 308
    frame #1: 0x00000000004782b8 clangd`llvm::StringRef::compareMemory(Lhs="
\"t2.h\"", Rhs="", Length=6) at StringRef.h:76:22
    frame #2: 0x0000000000701253 clangd`llvm::StringRef::compare(this=0x0000
7f10de7d8610, RHS=(Data = "", Length = 7166742329480737377)) const at String
Ref.h:206:34
  * frame #3: 0x00000000007603ab clangd`llvm::operator<(llvm::StringRef, llv
m::StringRef)(LHS=(Data = "\"t2.h\"", Length = 6), RHS=(Data = "", Length =
7166742329480737377)) at StringRef.h:907:23
    frame #4: 0x0000000002d0ad9f clangd`clang::tooling::HeaderIncludes::inse
rt(this=0x00007f10de7fb1a0, IncludeName=(Data = "t2.h\"", Length = 4), IsAng
led=false) const at HeaderIncludes.cpp:365:22
    frame #5: 0x00000000012ebfdd clangd`clang::clangd::IncludeInserter::inse
rt(this=0x00007f10de7fb148, VerbatimHeader=(Data = "\"t2.h\"", Length = 6))
const at Headers.cpp:262:70
```

A unit test test for the crash was created (`HeaderIncludesTest.RepeatedIncludes`). The proposed solution is to use std::list instead of llvm::SmallVector

Test Plan
```
./tools/clang/unittests/Tooling/ToolingTests --gtest_filter=HeaderIncludesTest.RepeatedIncludes
```

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D118755
tstellar pushed a commit that referenced this pull request Feb 15, 2022
A LUI instruction with flag RISCVII::MO_HI is usually used in conjunction
with ADDI, and jointly complete address computation. To bind the cost
evaluation of address computation, the LUI should not be regarded as a cheap
 move separately, which is consistent with ADDI.

In this test case, it improves the unroll-loop code that the rematerialization
of array's base address miss MachineCSE with Heuristics #1 at isProfitableToCSE.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D118216
tstellar pushed a commit that referenced this pull request Feb 15, 2022
We experienced some deadlocks when we used multiple threads for logging
using `scan-builds` intercept-build tool when we used multiple threads by
e.g. logging `make -j16`

```
(gdb) bt
#0  0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2bb3d152e4 in ?? ()
#3  0x00007ffcc5f0cc80 in ?? ()
#4  0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2
#5  0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f2bb3d144ee in ?? ()
#8  0x746e692f706d742f in ?? ()
#9  0x692d747065637265 in ?? ()
#10 0x2f653631326b3034 in ?? ()
#11 0x646d632e35353532 in ?? ()
#12 0x0000000000000000 in ?? ()
```

I think the gcc's exit call caused the injected `libear.so` to be unloaded
by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`.
That tried to acquire an already locked mutex which was left locked in the
`bear_report_call()` call, that probably encountered some error and
returned early when it forgot to unlock the mutex.

All of these are speculation since from the backtrace I could not verify
if frames 2 and 3 are in fact corresponding to the `libear.so` module.
But I think it's a fairly safe bet.

So, hereby I'm releasing the held mutex on *all paths*, even if some failure
happens.

PS: I would use lock_guards, but it's C.

Reviewed-by: NoQ

Differential Revision: https://reviews.llvm.org/D118439

(cherry picked from commit d919d02)
tstellar pushed a commit that referenced this pull request Jul 28, 2022
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.

Sample output:

```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
   26   };
   27
   28   int main() {
-> 29     std::vector<int> vvv;
   30     for (int i = 0; i < 100; i++)
   31       vvv.push_back(i);
   32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
    frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
   57     map<int, int> m;
   58     m[3] = 4;
   59
-> 60     map<string, string> m2;
   61     m2["5"] = "6";
   62
   63     std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
    0: [379567.000 ns] (event) HW clock tick [48599428476224707]
    1: [379569.000 ns] (event) CPU core changed [new CPU=2]
    2: [390487.000 ns] (event) HW clock tick [48599428476246495]
    3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
    4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
  libc.so.6`malloc
    5: [1662746.995 ns] 0x00007ffff7176660    endbr64
    6: [1662748.991 ns] 0x00007ffff7176664    movq   0x32387d(%rip), %rax      ;  + 408
    7: [1662750.986 ns] 0x00007ffff717666b    pushq  %r12
    8: [1662752.981 ns] 0x00007ffff717666d    pushq  %rbp
    9: [1662754.977 ns] 0x00007ffff717666e    pushq  %rbx
    10: [1662756.972 ns] 0x00007ffff717666f    movq   (%rax), %rax
    11: [1662758.967 ns] 0x00007ffff7176672    testq  %rax, %rax
    12: [1662760.963 ns] 0x00007ffff7176675    jne    0x9c7e0                   ; <+384>
    13: [1662762.958 ns] 0x00007ffff717667b    leaq   0x17(%rdi), %rax
    14: [1662764.953 ns] 0x00007ffff717667f    cmpq   $0x1f, %rax
    15: [1662766.949 ns] 0x00007ffff7176683    ja     0x9c730                   ; <+208>
    16: [1662768.944 ns] 0x00007ffff7176730    andq   $-0x10, %rax
    17: [1662770.939 ns] 0x00007ffff7176734    cmpq   $-0x41, %rax
    18: [1662772.935 ns] 0x00007ffff7176738    seta   %dl
    19: [1662774.930 ns] 0x00007ffff717673b    jmp    0x9c690                   ; <+48>
    20: [1662776.925 ns] 0x00007ffff7176690    cmpq   %rdi, %rax
    21: [1662778.921 ns] 0x00007ffff7176693    jb     0x9c7b0                   ; <+336>
    22: [1662780.916 ns] 0x00007ffff7176699    testb  %dl, %dl
    23: [1662782.911 ns] 0x00007ffff717669b    jne    0x9c7b0                   ; <+336>
    24: [1662784.906 ns] 0x00007ffff71766a1    movq   0x3236c0(%rip), %r12      ;  + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
  {
    "id": 0,
    "timestamp_ns": "379567.000000",
    "event": "HW clock tick",
    "hwClock": 48599428476224707
  },
  {
    "id": 1,
    "timestamp_ns": "379569.000000",
    "event": "CPU core changed",
    "cpuId": 2
  },
  {
    "id": 2,
    "timestamp_ns": "390487.000000",
    "event": "HW clock tick",
    "hwClock": 48599428476246495
  },
  {
    "id": 3,
    "timestamp_ns": "1602508.000000",
    "event": "HW clock tick",
    "hwClock": 48599428478664855
  },
  {
    "id": 4,
    "timestamp_ns": "1662745.000000",
    "event": "HW clock tick",
    "hwClock": 48599428478785046
  },
  {
    "id": 5,
    "timestamp_ns": "1662746.995324",
    "loadAddress": "0x7ffff7176660",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "endbr64"
  },
  {
    "id": 6,
    "timestamp_ns": "1662748.990648",
    "loadAddress": "0x7ffff7176664",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "movq"
  },
  {
    "id": 7,
    "timestamp_ns": "1662750.985972",
    "loadAddress": "0x7ffff717666b",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "pushq"
  },
  {
    "id": 8,
    "timestamp_ns": "1662752.981296",
    "loadAddress": "0x7ffff717666d",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "pushq"
  }
]
```

Differential Revision: https://reviews.llvm.org/D130054
tstellar pushed a commit that referenced this pull request Jul 28, 2022
Refactor the string conversion of the `lldb::InstructionControlFlowKind` enum out
of `Instruction::Dump` to enable reuse of this logic by the
JSON TraceDumper (to be implemented in separate diff).

Will coordinate the landing of this change with D130320 since there will be a minor merge conflict between
these changes.

Test Plan:
Run unittests
```
> ninja check-lldb
[4/5] Running lldb unit test suite

Testing Time: 10.13s
  Passed: 1084
```

Verify '-k' flag's output
```
(lldb) thread trace dump instructions -k
thread #1: tid = 1375377
  libstdc++.so.6`std::ostream::flush() + 43
    7048: 0x00007ffff7b54dab    return      retq
    7047: 0x00007ffff7b54daa    other       popq   %rbx
    7046: 0x00007ffff7b54da7    other       movq   %rbx, %rax
    7045: 0x00007ffff7b54da5    cond jump   je     0x11adb0                  ; <+48>
    7044: 0x00007ffff7b54da2    other       cmpl   $-0x1, %eax
  libc.so.6`_IO_fflush + 249
    7043: 0x00007ffff7161729    return      retq
    7042: 0x00007ffff7161728    other       popq   %rbp
    7041: 0x00007ffff7161727    other       popq   %rbx
    7040: 0x00007ffff7161725    other       movl   %edx, %eax
    7039: 0x00007ffff7161721    other       addq   $0x8, %rsp
    7038: 0x00007ffff7161709    cond jump   je     0x87721                   ; <+241>
    7037: 0x00007ffff7161707    other       decl   (%rsi)
    7036: 0x00007ffff71616fe    cond jump   je     0x87707                   ; <+215>
    7035: 0x00007ffff71616f7    other       cmpl   $0x0, 0x33de92(%rip)      ; __libc_multiple_threads
    7034: 0x00007ffff71616ef    other       movq   $0x0, 0x8(%rsi)
    7033: 0x00007ffff71616ed    cond jump   jne    0x87721                   ; <+241>
    7032: 0x00007ffff71616e9    other       subl   $0x1, 0x4(%rsi)
    7031: 0x00007ffff71616e2    other       movq   0x88(%rbx), %rsi
    7030: 0x00007ffff71616e0    cond jump   jne    0x87721                   ; <+241>
    7029: 0x00007ffff71616da    other       testl  $0x8000, (%rbx)           ; imm = 0x8000
```

Differential Revision: https://reviews.llvm.org/D130580
tstellar pushed a commit that referenced this pull request Aug 8, 2022
This reverts commit 5fb4134.

This patch is causing crashes when building llvm-test-suite when
optimizing for CPUs with AVX512.

Reproducer crashing with llc:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-apple-macosx"

    define i32 @test(<32 x i32> %0) #0 {
    entry:
      %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
      %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
      ret i32 %2
    }

    ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
    declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

    attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
    attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }

(cherry picked from commit f912bab)
tstellar pushed a commit that referenced this pull request Aug 9, 2022
This reverts commit 5fb4134.

This patch is causing crashes when building llvm-test-suite when
optimizing for CPUs with AVX512.

Reproducer crashing with llc:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-apple-macosx"

    define i32 @test(<32 x i32> %0) #0 {
    entry:
      %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
      %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
      ret i32 %2
    }

    ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
    declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

    attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
    attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }
tstellar pushed a commit that referenced this pull request Aug 9, 2022
This diff uncovers an ASAN leak in getOrCreateJumpTable:
```
Indirect leak of 264 byte(s) in 1 object(s) allocated from:
    #1 0x4f6e48c in llvm::bolt::BinaryContext::getOrCreateJumpTable ...
```
The removal of an assertion needs to be accompanied by proper deallocation of
a `JumpTable` object for which `analyzeJumpTable` was unsuccessful.

This reverts commit 52cd00c.
tstellar pushed a commit that referenced this pull request Dec 21, 2023
We'd like a way to select the current thread by its thread ID (rather
than its internal LLDB thread index).

This PR adds a `-t` option (`--thread_id` long option) that tells the
`thread select` command to interpret the `<thread-index>` argument as a
thread ID.

Here's an example of it working:
```
michristensen@devbig356 llvm/llvm-project (thread-select-tid) » ../Debug/bin/lldb ~/scratch/cpp/threading/a.out
(lldb) target create "/home/michristensen/scratch/cpp/threading/a.out"
Current executable set to '/home/michristensen/scratch/cpp/threading/a.out' (x86_64).
(lldb) b 18
Breakpoint 1: where = a.out`main + 80 at main.cpp:18:12, address = 0x0000000000000850
(lldb) run
Process 215715 launched: '/home/michristensen/scratch/cpp/threading/a.out' (x86_64)
This is a thread, i=1
This is a thread, i=2
This is a thread, i=3
This is a thread, i=4
This is a thread, i=5
Process 215715 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000555555400850 a.out`main at main.cpp:18:12
   15     for (int i = 0; i < 5; i++) {
   16       pthread_create(&thread_ids[i], NULL, foo, NULL);
   17     }
-> 18     for (int i = 0; i < 5; i++) {
   19       pthread_join(thread_ids[i], NULL);
   20     }
   21     return 0;
(lldb) thread select 2
* thread #2, name = 'a.out'
    frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72
libc.so.6`__nanosleep:
->  0x7ffff68f9918 <+72>: cmpq   $-0x1000, %rax ; imm = 0xF000
    0x7ffff68f991e <+78>: ja     0x7ffff68f9952 ; <+130>
    0x7ffff68f9920 <+80>: movl   %edx, %edi
    0x7ffff68f9922 <+82>: movl   %eax, 0xc(%rsp)
(lldb) thread info
thread #2: tid = 216047, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out'

(lldb) thread list
Process 215715 stopped
  thread #1: tid = 215715, 0x0000555555400850 a.out`main at main.cpp:18:12, name = 'a.out', stop reason = breakpoint 1.1
* thread #2: tid = 216047, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out'
  thread #3: tid = 216048, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out'
  thread #4: tid = 216049, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out'
  thread #5: tid = 216050, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out'
  thread #6: tid = 216051, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out'
(lldb) thread select 215715
error: invalid thread #215715.
(lldb) thread select -t 215715
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000555555400850 a.out`main at main.cpp:18:12
   15     for (int i = 0; i < 5; i++) {
   16       pthread_create(&thread_ids[i], NULL, foo, NULL);
   17     }
-> 18     for (int i = 0; i < 5; i++) {
   19       pthread_join(thread_ids[i], NULL);
   20     }
   21     return 0;
(lldb) thread select -t 216051
* thread #6, name = 'a.out'
    frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72
libc.so.6`__nanosleep:
->  0x7ffff68f9918 <+72>: cmpq   $-0x1000, %rax ; imm = 0xF000
    0x7ffff68f991e <+78>: ja     0x7ffff68f9952 ; <+130>
    0x7ffff68f9920 <+80>: movl   %edx, %edi
    0x7ffff68f9922 <+82>: movl   %eax, 0xc(%rsp)
(lldb) thread select 3
* thread #3, name = 'a.out'
    frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72
libc.so.6`__nanosleep:
->  0x7ffff68f9918 <+72>: cmpq   $-0x1000, %rax ; imm = 0xF000
    0x7ffff68f991e <+78>: ja     0x7ffff68f9952 ; <+130>
    0x7ffff68f9920 <+80>: movl   %edx, %edi
    0x7ffff68f9922 <+82>: movl   %eax, 0xc(%rsp)
(lldb) thread select -t 216048
* thread #3, name = 'a.out'
    frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72
libc.so.6`__nanosleep:
->  0x7ffff68f9918 <+72>: cmpq   $-0x1000, %rax ; imm = 0xF000
    0x7ffff68f991e <+78>: ja     0x7ffff68f9952 ; <+130>
    0x7ffff68f9920 <+80>: movl   %edx, %edi
    0x7ffff68f9922 <+82>: movl   %eax, 0xc(%rsp)
(lldb) thread select --thread_id 216048
* thread #3, name = 'a.out'
    frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72
libc.so.6`__nanosleep:
->  0x7ffff68f9918 <+72>: cmpq   $-0x1000, %rax ; imm = 0xF000
    0x7ffff68f991e <+78>: ja     0x7ffff68f9952 ; <+130>
    0x7ffff68f9920 <+80>: movl   %edx, %edi
    0x7ffff68f9922 <+82>: movl   %eax, 0xc(%rsp)
(lldb) help thread select
Change the currently selected thread.

Syntax: thread select <cmd-options> <thread-index>

Command Options Usage:
  thread select [-t] <thread-index>

       -t ( --thread_id )
            Provide a thread ID instead of a thread index.

     This command takes options and free-form arguments.  If your arguments
     resemble option specifiers (i.e., they start with a - or --), you must use
     ' -- ' between the end of the command options and the beginning of the
     arguments.
(lldb) c
Process 215715 resuming
Process 215715 exited with status = 0 (0x00000000)
```
tstellar pushed a commit that referenced this pull request Jan 4, 2024
This has been flaky for a while, for example
https://lab.llvm.org/buildbot/#/builders/96/builds/50350

```
Command Output (stdout):
--
lldb version 18.0.0git (https://github.com/llvm/llvm-project.git revision 3974d89)
  clang revision 3974d89
  llvm revision 3974d89
"can't evaluate expressions when the process is running."
```

```
  PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
   #0 0x0000ffffa46191a0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x529a1a0)
   #1 0x0000ffffa4617144 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x5298144)
   #2 0x0000ffffa46198d0 SignalHandler(int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x529a8d0)
   #3 0x0000ffffab25b7dc (linux-vdso.so.1+0x7dc)
   #4 0x0000ffffab13d050 /build/glibc-Q8DG8B/glibc-2.31/string/../sysdeps/aarch64/multiarch/memcpy_advsimd.S:92:0
   #5 0x0000ffffa446f420 lldb_private::process_gdb_remote::GDBRemoteRegisterContext::PrivateSetRegisterValue(unsigned int, llvm::ArrayRef<unsigned char>) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x50f0420)
   #6 0x0000ffffa446f7b8 lldb_private::process_gdb_remote::GDBRemoteRegisterContext::GetPrimordialRegister(lldb_private::RegisterInfo const*, lldb_private::process_gdb_remote::GDBRemoteCommunicationClient&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x50f07b8)
   #7 0x0000ffffa446f308 lldb_private::process_gdb_remote::GDBRemoteRegisterContext::ReadRegisterBytes(lldb_private::RegisterInfo const*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x50f0308)
   #8 0x0000ffffa446ec1c lldb_private::process_gdb_remote::GDBRemoteRegisterContext::ReadRegister(lldb_private::RegisterInfo const*, lldb_private::RegisterValue&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x50efc1c)
   #9 0x0000ffffa412eaa4 lldb_private::RegisterContext::ReadRegisterAsUnsigned(lldb_private::RegisterInfo const*, unsigned long) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4dafaa4)
  #10 0x0000ffffa420861c ReadLinuxProcessAddressMask(std::shared_ptr<lldb_private::Process>, llvm::StringRef) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4e8961c)
  #11 0x0000ffffa4208430 ABISysV_arm64::FixCodeAddress(unsigned long) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4e89430)
```

Judging by the backtrace something is trying to read the pointer authentication address/code mask
registers. This explains why I've not seen this issue locally, as the buildbot runs on Graviton
3 with has the pointer authentication extension.

I will try to reproduce, fix and re-enable the test.
tstellar pushed a commit that referenced this pull request Jan 4, 2024
This PR adds support for thread names in lldb on Windows.

```
(lldb) thr list
Process 2960 stopped
  thread #53: tid = 0x03a0, 0x00007ff84582db34 ntdll.dll`NtWaitForMultipleObjects + 20
  thread #29: tid = 0x04ec, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'SPUW.6'
  thread #89: tid = 0x057c, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x1000019] physics[main]'
  thread #3: tid = 0x0648, 0x00007ff843c2cafe combase.dll`InternalDoATClassCreate + 39518
  thread #93: tid = 0x0688, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x100501d] uMovie::StreamingThread'
  thread #1: tid = 0x087c, 0x00007ff842e7a104 win32u.dll`NtUserMsgWaitForMultipleObjectsEx + 20
  thread #96: tid = 0x0890, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x1002020] HLE Video Decoder'
<...>
```
tstellar pushed a commit that referenced this pull request Jan 4, 2024
The upstream test relies on jump-tables, which are lowered in
dramatically different ways with later arm64e/ptrauth patches.

Concretely, it's failing for at least two reasons:
- ptrauth removes x16/x17 from tcGPR64 to prevent indirect tail-calls
  from using either register as the callee, conflicting with their usage
  as scratch for the tail-call LR auth checking sequence.  In the
  1/2_available_regs_left tests, this causes the MI scheduler to move
  the load up across some of the inlineasm register clobbers.

- ptrauth adds an x16/x17-using pseudo for jump-table dispatch, which
  looks somewhat different from the regular jump-table dispatch codegen
  by itself, but also prevents compression currently.

They seem like sensible changes.  But they mean the tests aren't really
testing what they're intented to, because there's always an implicit
x16/x17 clobber when using jump-tables.

This updates the test in a way that should work identically regardless
of ptrauth support, with one exception, #1 above, which merely reorders
the load/inlineasm w.r.t. eachother.
I verified the tests still fail the live-reg assertions when
applicable.
tstellar pushed a commit that referenced this pull request Jan 13, 2024
…vm#75394)

Calling one of pthread join/detach interceptor on an already
joined/detached thread causes asserts such as:

AddressSanitizer: CHECK failed: sanitizer_thread_arg_retval.cpp:56
"((t)) != (0)" (0x0, 0x0) (tid=1236094)
#0 0x555555634f8b in __asan::CheckUnwind()
compiler-rt/lib/asan/asan_rtl.cpp:69:3
#1 0x55555564e06e in __sanitizer::CheckFailed(char const*, int, char
const*, unsigned long long, unsigned long long)
compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:86:24
#2 0x5555556491df in __sanitizer::ThreadArgRetval::BeforeJoin(unsigned
long) const
compiler-rt/lib/sanitizer_common/sanitizer_thread_arg_retval.cpp:56:3
#3 0x5555556198ed in Join<___interceptor_pthread_tryjoin_np(void*,
void**)::<lambda()> >
compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_arg_retval.h:74:26
#4 0x5555556198ed in pthread_tryjoin_np
compiler-rt/lib/asan/asan_interceptors.cpp:311:29

The assert are replaced by error codes.
tstellar pushed a commit that referenced this pull request Feb 1, 2024
…ass template explict specializations (llvm#78720)

According to [[dcl.type.elab]
p2](http://eel.is/c++draft/dcl.type.elab#2):
> If an
[elaborated-type-specifier](http://eel.is/c++draft/dcl.type.elab#nt:elaborated-type-specifier)
is the sole constituent of a declaration, the declaration is ill-formed
unless it is an explicit specialization, an explicit instantiation or it
has one of the following forms [...]

Consider the following:
```cpp
template<typename T>
struct A 
{
    template<typename U>
    struct B;
};

template<>
template<typename U>
struct A<int>::B; // #1
```
The _elaborated-type-specifier_ at `#1` declares an explicit
specialization (which is itself a template). We currently (incorrectly)
reject this, and this PR fixes that.

I moved the point at which _elaborated-type-specifiers_ with
_nested-name-specifiers_ are diagnosed from `ParsedFreeStandingDeclSpec`
to `ActOnTag` for two reasons: `ActOnTag` isn't called for explicit
instantiations and partial/explicit specializations, and because it's
where we determine if a member specialization is being declared.

With respect to diagnostics, I am currently issuing the diagnostic
without marking the declaration as invalid or returning early, which
results in more diagnostics that I think is necessary. I would like
feedback regarding what the "correct" behavior should be here.
tstellar pushed a commit that referenced this pull request Feb 2, 2024
…ing bound ops (llvm#80317)

`getDataOperandBaseAddr` retrieve the address of a value when we need to
generate bound operations. When switching to HLFIR, we did not really
handle the fact that this value was then pointing to the result of a
hlfir.declare. Because of that the `#1` value was being used. `#0` value
is carrying the correct information about lowerbounds and should be
used. This patch updates the `getDataOperandBaseAddr` function to use
the correct result value from hlfir.declare.
tstellar pushed a commit that referenced this pull request Feb 16, 2024
The concurrent tests all do a pthread_join at the end, and
concurrent_base.py stops after that pthread_join and sanity checks that
only 1 thread is running. On macOS, after pthread_join() has completed,
there can be an extra thread still running which is completing the
details of that task asynchronously; this causes testsuite failures.
When this happens, we see the second thread is in

```
frame #0: 0x0000000180ce7700 libsystem_kernel.dylib`__ulock_wake + 8
frame #1: 0x0000000180d25ad4 libsystem_pthread.dylib`_pthread_joiner_wake + 52
frame #2: 0x0000000180d23c18 libsystem_pthread.dylib`_pthread_terminate + 384
frame #3: 0x0000000180d23a98 libsystem_pthread.dylib`_pthread_terminate_invoke + 92
frame #4: 0x0000000180d26740 libsystem_pthread.dylib`_pthread_exit + 112
frame #5: 0x0000000180d26040 libsystem_pthread.dylib`_pthread_start + 148
```

there are none of the functions from the test file present on this
thread.

In this patch, instead of counting the number of threads, I iterate over
the threads looking for functions from our test file (by name) and only
count threads that have at least one of them.

It's a lower frequency failure than the darwin kernel bug causing an
extra step instruction mach exception when hardware
breakpoint/watchpoints are used, but once I fixed that, this came up as
the next most common failure for these tests.

rdar://110555062
tstellar pushed a commit that referenced this pull request Feb 23, 2024
…lvm#80904)"

This reverts commit b1ac052.

This commit breaks coroutine splitting for non-swift calling convention
functions. In this example:

```ll
; ModuleID = 'repro.ll'
source_filename = "stdlib/test/runtime/test_llcl.mojo"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@0 = internal constant { i32, i32 } { i32 trunc (i64 sub (i64 ptrtoint (ptr @craSH to i64), i64 ptrtoint (ptr getelementptr inbounds ({ i32, i32 }, ptr @0, i32 0, i32 1) to i64)) to i32), i32 64 }

define dso_local void @af_suspend_fn(ptr %0, i64 %1, ptr %2) #0 {
  ret void
}

define dso_local void @craSH(ptr %0) #0 {
  %2 = call token @llvm.coro.id.async(i32 64, i32 8, i32 0, ptr @0)
  %3 = call ptr @llvm.coro.begin(token %2, ptr null)
  %4 = getelementptr inbounds { ptr, { ptr, ptr }, i64, { ptr, i1 }, i64, i64 }, ptr poison, i32 0, i32 0
  %5 = call ptr @llvm.coro.async.resume()
  store ptr %5, ptr %4, align 8
  %6 = call { ptr, ptr, ptr } (i32, ptr, ptr, ...) @llvm.coro.suspend.async.sl_p0p0p0s(i32 0, ptr %5, ptr @ctxt_proj_fn, ptr @af_suspend_fn, ptr poison, i64 -1, ptr poison)
  ret void
}

define dso_local ptr @ctxt_proj_fn(ptr %0) #0 {
  ret ptr %0
}

; Function Attrs: nomerge nounwind
declare { ptr, ptr, ptr } @llvm.coro.suspend.async.sl_p0p0p0s(i32, ptr, ptr, ...) #1

; Function Attrs: nounwind
declare token @llvm.coro.id.async(i32, i32, i32, ptr) #2

; Function Attrs: nounwind
declare ptr @llvm.coro.begin(token, ptr writeonly) #2

; Function Attrs: nomerge nounwind
declare ptr @llvm.coro.async.resume() #1

attributes #0 = { "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #1 = { nomerge nounwind }
attributes #2 = { nounwind }
```

This verifier crashes after the `coro-split` pass with

```
cannot guarantee tail call due to mismatched parameter counts
  musttail call void @af_suspend_fn(ptr poison, i64 -1, ptr poison)
LLVM ERROR: Broken function
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: opt ../../../reduced.ll -O0
 #0 0x00007f1d89645c0e __interceptor_backtrace.part.0 /build/gcc-11-XeT9lY/gcc-11-11.4.0/build/x86_64-linux-gnu/libsanitizer/asan/../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4193:28
 #1 0x0000556d94d254f7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:22
 #2 0x0000556d94d19a2f llvm::sys::RunSignalHandlers() /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Signals.cpp:105:20
 #3 0x0000556d94d1aa42 SignalHandler(int) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Unix/Signals.inc:371:36
 #4 0x00007f1d88e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007f1d88e969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #6 0x00007f1d88e969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #7 0x00007f1d88e969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #8 0x00007f1d88e42476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #9 0x00007f1d88e287f3 abort ./stdlib/abort.c:81:7
 #10 0x0000556d8944be01 std::vector<llvm::json::Value, std::allocator<llvm::json::Value>>::size() const /usr/include/c++/11/bits/stl_vector.h:919:40
 #11 0x0000556d8944be01 bool std::operator==<llvm::json::Value, std::allocator<llvm::json::Value>>(std::vector<llvm::json::Value, std::allocator<llvm::json::Value>> const&, std::vector<llvm::json::Value, std::allocator<llvm::json::Value>> const&) /usr/include/c++/11/bits/stl_vector.h:1893:23
 #12 0x0000556d8944be01 llvm::json::operator==(llvm::json::Array const&, llvm::json::Array const&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/Support/JSON.h:572:69
 #13 0x0000556d8944be01 llvm::json::operator==(llvm::json::Value const&, llvm::json::Value const&) (.cold) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/JSON.cpp:204:28
 #14 0x0000556d949ed2bd llvm::report_fatal_error(char const*, bool) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/ErrorHandling.cpp:82:70
 #15 0x0000556d8e37e876 llvm::SmallVectorBase<unsigned int>::size() const /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:91:32
 #16 0x0000556d8e37e876 llvm::SmallVectorTemplateCommon<llvm::DiagnosticInfoOptimizationBase::Argument, void>::end() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:282:41
 #17 0x0000556d8e37e876 llvm::SmallVector<llvm::DiagnosticInfoOptimizationBase::Argument, 4u>::~SmallVector() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:1215:24
 #18 0x0000556d8e37e876 llvm::DiagnosticInfoOptimizationBase::~DiagnosticInfoOptimizationBase() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:413:7
 #19 0x0000556d8e37e876 llvm::DiagnosticInfoIROptimization::~DiagnosticInfoIROptimization() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:622:7
 #20 0x0000556d8e37e876 llvm::OptimizationRemark::~OptimizationRemark() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:689:7
 #21 0x0000556d8e37e876 operator() /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2213:14
 #22 0x0000556d8e37e876 emit<llvm::CoroSplitPass::run(llvm::LazyCallGraph::SCC&, llvm::CGSCCAnalysisManager&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)::<lambda()> > /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h:83:12
 #23 0x0000556d8e37e876 llvm::CoroSplitPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2212:13
 #24 0x0000556d8c36ecb1 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CoroSplitPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3
 #25 0x0000556d91c1a84f llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:90:12
 #26 0x0000556d8c3690d1 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3
 #27 0x0000556d91c2162d llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:278:18
 #28 0x0000556d8c369035 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3
 #29 0x0000556d9457abc5 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManager.h:247:20
 #30 0x0000556d8e30979e llvm::CoroConditionalWrapper::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroConditionalWrapper.cpp:19:74
 #31 0x0000556d8c365755 llvm::detail::PassModel<llvm::Module, llvm::CoroConditionalWrapper, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3
 #32 0x0000556d9457abc5 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManager.h:247:20
 #33 0x0000556d89818556 llvm::SmallPtrSetImplBase::isSmall() const /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:196:33
 #34 0x0000556d89818556 llvm::SmallPtrSetImplBase::~SmallPtrSetImplBase() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:84:17
 #35 0x0000556d89818556 llvm::SmallPtrSetImpl<llvm::AnalysisKey*>::~SmallPtrSetImpl() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:321:7
 #36 0x0000556d89818556 llvm::SmallPtrSet<llvm::AnalysisKey*, 2u>::~SmallPtrSet() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:427:7
 #37 0x0000556d89818556 llvm::PreservedAnalyses::~PreservedAnalyses() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/Analysis.h:109:7
 #38 0x0000556d89818556 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/NewPMDriver.cpp:532:10
 #39 0x0000556d897e3939 optMain /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/optdriver.cpp:737:27
 #40 0x0000556d89455461 main /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/opt.cpp:25:33
 #41 0x00007f1d88e29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
 #42 0x00007f1d88e29e40 call_init ./csu/../csu/libc-start.c:128:20
 #43 0x00007f1d88e29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
 #44 0x0000556d897b6335 _start (/home/ubuntu/modular/.derived/third-party/llvm-project/build-relwithdebinfo-asan/bin/opt+0x150c335)
Aborted (core dumped)
tstellar pushed a commit that referenced this pull request Mar 2, 2024
…ter partial ordering when determining primary template (llvm#82417)

Consider the following:
```
struct A {
  static constexpr bool x = true;
};

template<typename T, typename U>
void f(T, U) noexcept(T::y); // #1, error: no member named 'y' in 'A'

template<typename T, typename U>
void f(T, U*) noexcept(T::x); // #2

template<>
void f(A, int*) noexcept; // explicit specialization of #2
```

We currently instantiate the exception specification of all candidate
function template specializations when deducting template arguments for
an explicit specialization, which results in a error despite `#1` not
being selected by partial ordering as the most specialized template.
According to [except.spec] p13:
> An exception specification is considered to be needed when: 
> - [...]
> - the exception specification is compared to that of another
declaration (e.g., an explicit specialization or an overriding virtual
function);

Assuming that "comparing declarations" means "determining whether the
declarations correspond and declare the same entity" (per [basic.scope.scope] p4 and
[basic.link] p11.1, respectively), the exception specification does _not_ need to be
instantiated until _after_ partial ordering, at which point we determine
whether the implicitly instantiated specialization and the explicit
specialization declare the same entity (the determination of whether two
functions/function templates correspond does not consider the exception
specifications).

This patch defers the instantiation of the exception specification until
a single function template specialization is selected via partial
ordering, matching the behavior of GCC, EDG, and
MSVC: see https://godbolt.org/z/Ebb6GTcWE.
tstellar pushed a commit that referenced this pull request Mar 11, 2024
TestCases/Misc/Linux/sigaction.cpp fails because dlsym() may call malloc
on failure. And then the wrapped malloc appears to access thread local
storage using global dynamic accesses, thus calling
___interceptor___tls_get_addr, before REAL(__tls_get_addr) has
been set, so we get a crash inside ___interceptor___tls_get_addr. For
example, this can happen when looking up __isoc23_scanf which might not
exist in some libcs.

Fix this by marking the thread local variable accessed inside the
debug checks as "initial-exec", which does not require __tls_get_addr.

This is probably a better alternative to llvm#83886.

This fixes a different crash but is related to llvm#46204.

Backtrace:
```
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6a9d89e in ___interceptor___tls_get_addr (arg=0x7ffff6b27be8) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:2759
#2 0x00007ffff6a46bc6 in __sanitizer::CheckedMutex::LockImpl (this=0x7ffff6b27be8, pc=140737331846066) at /path/to/llvm/compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp:218
#3 0x00007ffff6a448b2 in __sanitizer::CheckedMutex::Lock (this=0x7ffff6b27be8, this@entry=0x730000000580) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:129
#4 __sanitizer::Mutex::Lock (this=0x7ffff6b27be8, this@entry=0x730000000580) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:167
#5 0x00007ffff6abdbb2 in __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock (mu=0x730000000580, this=<optimized out>) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:383
#6 __sanitizer::SizeClassAllocator64<__tsan::AP64>::GetFromAllocator (this=0x7ffff7487dc0 <__tsan::allocator_placeholder>, stat=stat@entry=0x7ffff570db68, class_id=11, chunks=chunks@entry=0x7ffff5702cc8, n_chunks=n_chunks@entry=128) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_primary64.h:207
#7 0x00007ffff6abdaa0 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >::Refill (this=<optimized out>, c=c@entry=0x7ffff5702cb8, allocator=<optimized out>, class_id=<optimized out>)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_local_cache.h:103
#8 0x00007ffff6abd731 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >::Allocate (this=0x7ffff6b27be8, allocator=0x7ffff5702cc8, class_id=140737311157448)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_local_cache.h:39
#9 0x00007ffff6abc397 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__tsan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate (this=0x7ffff5702cc8, cache=0x7ffff6b27be8, size=<optimized out>, size@entry=175, alignment=alignment@entry=16)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_combined.h:69
#10 0x00007ffff6abaa6a in __tsan::user_alloc_internal (thr=0x7ffff7ebd980, pc=140737331499943, sz=sz@entry=175, align=align@entry=16, signal=true) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:198
#11 0x00007ffff6abb0d1 in __tsan::user_alloc (thr=0x7ffff6b27be8, pc=140737331846066, sz=11, sz@entry=175) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:223
#12 0x00007ffff6a693b5 in ___interceptor_malloc (size=175) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:666
#13 0x00007ffff7fce7f2 in malloc (size=175) at ../include/rtld-malloc.h:56
#14 __GI__dl_exception_create_format (exception=exception@entry=0x7fffffffd0d0, objname=0x7ffff7fc3550 "/path/to/llvm/compiler-rt/cmake-build-all-sanitizers/lib/linux/libclang_rt.tsan-x86_64.so",
 fmt=fmt@entry=0x7ffff7ff2db9 "undefined symbol: %s%s%s") at ./elf/dl-exception.c:157
#15 0x00007ffff7fd50e8 in _dl_lookup_symbol_x (undef_name=0x7ffff6af868b "__isoc23_scanf", undef_map=<optimized out>, ref=0x7fffffffd148, symbol_scope=<optimized out>, version=<optimized out>, type_class=0, flags=2, skip_map=0x7ffff7fc35e0) at ./elf/dl-lookup.c:793
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x00007ffff656d6ed in do_sym (handle=<optimized out>, name=0x7ffff6af868b "__isoc23_scanf", who=0x7ffff6a3bb84 <__interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long)+36>, vers=vers@entry=0x0, flags=flags@entry=2) at ./elf/dl-sym.c:146
#17 0x00007ffff656d9dd in _dl_sym (handle=<optimized out>, name=<optimized out>, who=<optimized out>) at ./elf/dl-sym.c:195
#18 0x00007ffff64a2854 in dlsym_doit (a=a@entry=0x7fffffffd3b0) at ./dlfcn/dlsym.c:40
#19 0x00007ffff7fcc489 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd310, operate=0x7ffff64a2840 <dlsym_doit>, args=0x7fffffffd3b0) at ./elf/dl-catch.c:237
#20 0x00007ffff7fcc5af in _dl_catch_error (objname=0x7fffffffd368, errstring=0x7fffffffd370, mallocedp=0x7fffffffd367, operate=<optimized out>, args=<optimized out>) at ./elf/dl-catch.c:256
#21 0x00007ffff64a2257 in _dlerror_run (operate=operate@entry=0x7ffff64a2840 <dlsym_doit>, args=args@entry=0x7fffffffd3b0) at ./dlfcn/dlerror.c:138
#22 0x00007ffff64a28e5 in dlsym_implementation (dl_caller=<optimized out>, name=<optimized out>, handle=<optimized out>) at ./dlfcn/dlsym.c:54
#23 ___dlsym (handle=<optimized out>, name=<optimized out>) at ./dlfcn/dlsym.c:68
#24 0x00007ffff6a3bb84 in __interception::GetFuncAddr (name=0x7ffff6af868b "__isoc23_scanf", trampoline=140737311157448) at /path/to/llvm/compiler-rt/lib/interception/interception_linux.cpp:42
#25 __interception::InterceptFunction (name=0x7ffff6af868b "__isoc23_scanf", ptr_to_real=0x7ffff74850e8 <__interception::real___isoc23_scanf>, func=11, trampoline=140737311157448)
 at /path/to/llvm/compiler-rt/lib/interception/interception_linux.cpp:61
#26 0x00007ffff6a9f2d9 in InitializeCommonInterceptors () at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:10315
```

Reviewed By: vitalybuka, MaskRay

Pull Request: llvm#83890
tstellar pushed a commit that referenced this pull request Mar 11, 2024
Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before
that, that emitted privatization logic was a mix of using `#0` and `#1`
which leads to some difficulties trying to move to delayed privatization
(see the discussion on llvm#84033).
tstellar pushed a commit that referenced this pull request Mar 13, 2024
…p canonicalization (llvm#84225)

The current canonicalization of `memref.dim` operating on the result of
`memref.reshape` into `memref.load` is incorrect as it doesn't check
whether the `index` operand of `memref.dim` dominates the source
`memref.reshape` op. It always introduces `memref.load` right after
`memref.reshape` to ensure the `memref` is not mutated before the
`memref.load` call. As a result, the following error is observed:

```
$> mlir-opt --canonicalize input.mlir

func.func @reshape_dim(%arg0: memref<*xf32>, %arg1: memref<?xindex>, %arg2: index) -> index {
    %c4 = arith.constant 4 : index
    %reshape = memref.reshape %arg0(%arg1) : (memref<*xf32>, memref<?xindex>) -> memref<*xf32>
    %0 = arith.muli %arg2, %c4 : index
    %dim = memref.dim %reshape, %0 : memref<*xf32>
    return %dim : index
  }
```

results in:

```
dominator.mlir:22:12: error: operand #1 does not dominate this use
    %dim = memref.dim %reshape, %0 : memref<*xf32>
           ^
dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index
dominator.mlir:21:10: note: operand defined here (op in the same block)
    %0 = arith.muli %arg2, %c4 : index
```

Properly fixing this issue requires a dominator analysis which is
expensive to run within a canonicalization pattern. So, this patch fixes
the canonicalization pattern by being more strict/conservative about the
legality condition in which we perform this canonicalization.
The more general pattern is also added to `tensor.dim`. Since tensors are
immutable we don't need to worry about where to introduce the
`tensor.extract` call after canonicalization.
tstellar pushed a commit that referenced this pull request Mar 22, 2024
…lvm#85653)

This reverts commit daebe5c.

This commit causes the following asan issue:

```
<snip>/llvm-project/build/bin/mlir-opt <snip>/llvm-project/mlir/test/Dialect/XeGPU/XeGPUOps.mlir | <snip>/llvm-project/build/bin/FileCheck <snip>/llvm-project/mlir/test/Dialect/XeGPU/XeGPUOps.mlir
# executed command: <snip>/llvm-project/build/bin/mlir-opt <snip>/llvm-project/mlir/test/Dialect/XeGPU/XeGPUOps.mlir
# .---command stderr------------
# | =================================================================
# | ==2772558==ERROR: AddressSanitizer: stack-use-after-return on address 0x7fd2c2c42b90 at pc 0x55e406d54614 bp 0x7ffc810e4070 sp 0x7ffc810e4068
# | READ of size 8 at 0x7fd2c2c42b90 thread T0
# |     #0 0x55e406d54613 in operator()<long int const*> /usr/include/c++/13/bits/predefined_ops.h:318
# |     #1 0x55e406d54613 in __count_if<long int const*, __gnu_cxx::__ops::_Iter_pred<mlir::verifyListOfOperandsOrIntegers(Operation*, llvm::StringRef, unsigned int, llvm::ArrayRef<long int>, ValueRange)::<lambda(int64_t)> > > /usr/include/c++/13/bits/stl_algobase.h:2125
# |     #2 0x55e406d54613 in count_if<long int const*, mlir::verifyListOfOperandsOrIntegers(Operation*, 
...
```
tstellar pushed a commit that referenced this pull request Mar 22, 2024
…oint. (llvm#83821)"

This reverts commit c2c1e6e. It creates
a use after free.

==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08
READ of size 1 at 0x50f000001760 thread T0
 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13
 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>)::$_0::operator()(llvm::VPRecipeBase*) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23
 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Can be reproduced with asan on
Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
Transforms/LoopVectorize/X86/pr81872.ll
Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
tstellar pushed a commit that referenced this pull request May 2, 2024
Builder alerted me to the failing test, attempt #1 in the blind.
tstellar pushed a commit that referenced this pull request May 10, 2024
…e exception specification of a function (llvm#90760)

[temp.deduct.general] p6 states:
> At certain points in the template argument deduction process it is
necessary to take a function type that makes use of template parameters
and replace those template parameters with the corresponding template
arguments.
This is done at the beginning of template argument deduction when any
explicitly specified template arguments are substituted into the
function type, and again at the end of template argument deduction when
any template arguments that were deduced or obtained from default
arguments are substituted.

[temp.deduct.general] p7 goes on to say:
> The _deduction substitution loci_ are
> - the function type outside of the _noexcept-specifier_,
> - the explicit-specifier,
> - the template parameter declarations, and
> - the template argument list of a partial specialization
>
> The substitution occurs in all types and expressions that are used in
the deduction substitution loci. [...]

Consider the following:
```cpp
struct A
{
    static constexpr bool x = true;
};

template<typename T, typename U>
void f(T, U) noexcept(T::x); // #1

template<typename T, typename U>
void f(T, U*) noexcept(T::y); // #2

template<>
void f<A>(A, int*) noexcept; // clang currently accepts, GCC and EDG reject
```

Currently, `Sema::SubstituteExplicitTemplateArguments` will substitute
into the _noexcept-specifier_ when deducing template arguments from a
function declaration or when deducing template arguments for taking the
address of a function template (and the substitution is treated as a
SFINAE context). In the above example, `#1` is selected as the primary
template because substitution of the explicit template arguments into
the _noexcept-specifier_ of `#2` failed, which resulted in the candidate
being ignored.

This behavior is incorrect ([temp.deduct.general] note 4 says as much), and
this patch corrects it by deferring all substitution into the
_noexcept-specifier_ until it is instantiated.

As part of the necessary changes to make this patch work, the
instantiation of the exception specification of a function template
specialization when taking the address of a function template is changed
to only occur for the function selected by overload resolution per
[except.spec] p13.1 (as opposed to being instantiated for every candidate).
tstellar pushed a commit that referenced this pull request May 10, 2024
…ined member functions & member function templates (llvm#88963)

Consider the following snippet from the discussion of CWG2847 on the core reflector:
```
template<typename T>
concept C = sizeof(T) <= sizeof(long);

template<typename T>
struct A 
{
    template<typename U>
    void f(U) requires C<U>; // #1, declares a function template 

    void g() requires C<T>; // #2, declares a function

    template<>
    void f(char);  // #3, an explicit specialization of a function template that declares a function
};

template<>
template<typename U>
void A<short>::f(U) requires C<U>; // #4, an explicit specialization of a function template that declares a function template

template<>
template<>
void A<int>::f(int); // #5, an explicit specialization of a function template that declares a function

template<>
void A<long>::g(); // #6, an explicit specialization of a function that declares a function
```

A number of problems exist:
- Clang rejects `#4` because the trailing _requires-clause_ has `U`
substituted with the wrong template parameter depth when
`Sema::AreConstraintExpressionsEqual` is called to determine whether it
matches the trailing _requires-clause_ of the implicitly instantiated
function template.
- Clang rejects `#5` because the function template specialization
instantiated from `A<int>::f` has a trailing _requires-clause_, but `#5`
does not (nor can it have one as it isn't a templated function).
- Clang rejects `#6` for the same reasons it rejects `#5`.

This patch resolves these issues by making the following changes:
- To fix `#4`, `Sema::AreConstraintExpressionsEqual` is passed
`FunctionTemplateDecl`s when comparing the trailing _requires-clauses_
of `#4` and the function template instantiated from `#1`.
- To fix `#5` and `#6`, the trailing _requires-clauses_ are not compared
for explicit specializations that declare functions.

In addition to these changes, `CheckMemberSpecialization` now considers
constraint satisfaction/constraint partial ordering when determining
which member function is specialized by an explicit specialization of a
member function for an implicit instantiation of a class template (we
previously would select the first function that has the same type as the
explicit specialization). With constraints taken under consideration, we
match EDG's behavior for these declarations.
tstellar pushed a commit that referenced this pull request May 15, 2024
...which caused issues like

> ==42==ERROR: AddressSanitizer failed to deallocate 0x32 (50) bytes at
address 0x117e0000 (error code: 28)
> ==42==Cannot dump memory map on emscriptenAddressSanitizer: CHECK
failed: sanitizer_common.cpp:81 "((0 && "unable to unmmap")) != (0)"
(0x0, 0x0) (tid=288045824)
> #0 0x14f73b0c in __asan::CheckUnwind()+0x14f73b0c
(this.program+0x14f73b0c)
> #1 0x14f8a3c2 in __sanitizer::CheckFailed(char const*, int, char
const*, unsigned long long, unsigned long long)+0x14f8a3c2
(this.program+0x14f8a3c2)
> #2 0x14f7d6e1 in __sanitizer::ReportMunmapFailureAndDie(void*,
unsigned long, int, bool)+0x14f7d6e1 (this.program+0x14f7d6e1)
> #3 0x14f81fbd in __sanitizer::UnmapOrDie(void*, unsigned
long)+0x14f81fbd (this.program+0x14f81fbd)
> #4 0x14f875df in __sanitizer::SuppressionContext::ParseFromFile(char
const*)+0x14f875df (this.program+0x14f875df)
> #5 0x14f74eab in __asan::InitializeSuppressions()+0x14f74eab
(this.program+0x14f74eab)
> #6 0x14f73a1a in __asan::AsanInitInternal()+0x14f73a1a
(this.program+0x14f73a1a)

when trying to use an ASan suppressions file under Emscripten: Even
though it would be considered OK by SUSv4, the Emscripten runtime states
"We don't support partial munmapping" (see

<emscripten-core/emscripten@f4115eb>
"Implement MAP_ANONYMOUS on top of malloc in STANDALONE_WASM mode
(llvm#16289)").

Co-authored-by: Stephan Bergmann <stephan.bergmann@allotropia.de>
tstellar pushed a commit that referenced this pull request May 17, 2024
…ication as used during partial ordering (llvm#91534)

We do not deduce template arguments from the exception specification
when determining the primary template of a function template
specialization or when taking the address of a function template.
Therefore, this patch changes `isAtLeastAsSpecializedAs` such that we do
not mark template parameters in the exception specification as 'used'
during partial ordering (per [temp.deduct.partial]
p12) to prevent the following from being ambiguous:

```
template<typename T, typename U>
void f(U) noexcept(noexcept(T())); // #1

template<typename T>
void f(T*) noexcept; // #2

template<>
void f<int>(int*) noexcept; // currently ambiguous, selects #2 with this patch applied 
```

Although there is no corresponding wording in the standard (see core issue filed here
cplusplus/CWG#537), this seems
to be the intended behavior given the definition of _deduction
substitution loci_ in [temp.deduct.general] p7 (and EDG does the same thing).
tstellar pushed a commit that referenced this pull request May 17, 2024
…erSize (llvm#67657)"

This reverts commit f0b3654.

This commit triggers UB by reading an uninitialized variable.

`UP.PartialThreshold` is used uninitialized in `getUnrollingPreferences()` when
it is called from `LoopVectorizationPlanner::executePlan()`. In this case the
`UP` variable is created on the stack and its fields are not initialized.

```
==8802==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x557c0b081b99 in llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h
    #1 0x557c0b07a40c in llvm::TargetTransformInfo::Model<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2277:17
    #2 0x557c0f5d69ee in llvm::TargetTransformInfo::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) const llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:387:19
    #3 0x557c0e6b96a0 in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7624:7
    #4 0x557c0e6e4b63 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10253:13
    #5 0x557c0e6f2429 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10344:30
    #6 0x557c0e6f2f97 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10383:9

[...]

  Uninitialized value was created by an allocation of 'UP' in the stack frame
    #0 0x557c0e6b961e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:3
```
tstellar pushed a commit that referenced this pull request May 17, 2024
…vm#90820)

This solves some ambuguity introduced in P0522 regarding how
template template parameters are partially ordered, and should reduce
the negative impact of enabling `-frelaxed-template-template-args`
by default.

When performing template argument deduction, a template template
parameter
containing no packs should be more specialized than one that does.

Given the following example:
```C++
template<class T2> struct A;
template<template<class ...T3s> class TT1, class T4> struct A<TT1<T4>>; // #1
template<template<class    T5 > class TT2, class T6> struct A<TT2<T6>>; // #2

template<class T1> struct B;
template struct A<B<char>>;
```

Prior to P0522, candidate `#2` would be more specialized.
After P0522, neither is more specialized, so this becomes ambiguous.
With this change, `#2` becomes more specialized again,
maintaining compatibility with pre-P0522 implementations.

The problem is that in P0522, candidates are at least as specialized
when matching packs to fixed-size lists both ways, whereas before,
a fixed-size list is more specialized.

This patch keeps the original behavior when checking template arguments
outside deduction, but restores this aspect of pre-P0522 matching
during deduction.

---

Since this changes provisional implementation of CWG2398 which has
not been released yet, and already contains a changelog entry,
we don't provide a changelog entry here.
tstellar pushed a commit that referenced this pull request May 22, 2024
…llvm#92855)

This solves some ambuguity introduced in P0522 regarding how template
template parameters are partially ordered, and should reduce the
negative impact of enabling `-frelaxed-template-template-args` by
default.

When performing template argument deduction, we extend the provisional
wording introduced in llvm#89807 so
it also covers deduction of class templates.

Given the following example:
```C++
template <class T1, class T2 = float> struct A;
template <class T3> struct B;

template <template <class T4> class TT1, class T5> struct B<TT1<T5>>;   // #1
template <class T6, class T7>                      struct B<A<T6, T7>>; // #2

template struct B<A<int>>;
```
Prior to P0522, `#2` was picked. Afterwards, this became ambiguous. This
patch restores the pre-P0522 behavior, `#2` is picked again.

This has the beneficial side effect of making the following code valid:
```C++
template<class T, class U> struct A {};
A<int, float> v;
template<template<class> class TT> void f(TT<int>);

// OK: TT picks 'float' as the default argument for the second parameter.
void g() { f(v); }
```

---

Since this changes provisional implementation of CWG2398 which has not
been released yet, and already contains a changelog entry, we don't
provide a changelog entry here.
tstellar pushed a commit that referenced this pull request May 30, 2024
The problematic program is as follows:

```shell
#define pre_a 0
#define PRE(x) pre_##x

void f(void) {
    PRE(a) && 0;
}

int main(void) { return 0; }
```

in which after token concatenation (`##`), there's another nested macro
`pre_a`.

Currently only the outer expansion region will be produced. ([compiler
explorer
link](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:29,endLineNumber:8,positionColumn:29,positionLineNumber:8,selectionStartColumn:29,selectionStartLineNumber:8,startColumn:29,startLineNumber:8),source:'%23define+pre_a+0%0A%23define+PRE(x)+pre_%23%23x%0A%0Avoid+f(void)+%7B%0A++++PRE(a)+%26%26+0%3B%0A%7D%0A%0Aint+main(void)+%7B+return+0%3B+%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:51.69491525423727,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:cclang_assertions_trunk,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'1',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:___c,libs:!(),options:'-fprofile-instr-generate+-fcoverage-mapping+-fcoverage-mcdc+-Xclang+-dump-coverage-mapping+',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+clang+(assertions+trunk)+(Editor+%231)',t:'0')),k:34.5741843594503,l:'4',m:28.903654485049834,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compilerName:'x86-64+clang+(trunk)',editorid:1,fontScale:14,fontUsePx:'0',j:2,wrap:'1'),l:'5',n:'0',o:'Output+of+x86-64+clang+(assertions+trunk)+(Compiler+%232)',t:'0')),header:(),l:'4',m:71.09634551495017,n:'0',o:'',s:0,t:'0')),k:48.30508474576271,l:'3',n:'0',o:'',t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4))

```text
f:
  File 0, 4:14 -> 6:2 = #0
  Decision,File 0, 5:5 -> 5:16 = M:0, C:2
  Expansion,File 0, 5:5 -> 5:8 = #0 (Expanded file = 1)
  File 0, 5:15 -> 5:16 = #1
  Branch,File 0, 5:15 -> 5:16 = 0, 0 [2,0,0] 
  File 1, 2:16 -> 2:23 = #0
  File 2, 1:15 -> 1:16 = #0
  File 2, 1:15 -> 1:16 = #0
  Branch,File 2, 1:15 -> 1:16 = 0, 0 [1,2,0] 
```

The inner expansion region isn't produced because:

1. In the range-based for loop quoted below, each sloc is processed and
possibly emit a corresponding expansion region.
2. For our sloc in question, its direct parent returned by
`getIncludeOrExpansionLoc()` is a `<scratch space>`, because that's how
`##` is processed.


https://github.com/llvm/llvm-project/blob/88b6186af3908c55b357858eb348b5143f21c289/clang/lib/CodeGen/CoverageMappingGen.cpp#L518-L520

3. This `<scratch space>` cannot be found in the FileID mapping so
`ParentFileID` will be assigned an `std::nullopt`


https://github.com/llvm/llvm-project/blob/88b6186af3908c55b357858eb348b5143f21c289/clang/lib/CodeGen/CoverageMappingGen.cpp#L521-L526

4. As a result this iteration of for loop finishes early and no
expansion region is added for the sloc.

This problem gets worse with MC/DC: as the example shows, there's a
branch from File 2 but File 2 itself is missing. This will trigger
assertion failures.

The fix is more or less a workaround and takes a similar approach as
llvm#89573.

~~Depends on llvm#89573.~~ This includes llvm#89573. Kudos to @chapuni!
This and llvm#89573 together fix llvm#87000: I tested locally, both the reduced
program and my original use case (fwiw, Linux kernel) can run
successfully.

---------

Co-authored-by: NAKAMURA Takumi <geek4civic@gmail.com>
tstellar pushed a commit that referenced this pull request Jun 11, 2024
…des (llvm#94453)

LSR will generate chains of related instructions with a known increment
between them. With SVE, in the case of the test case, this can include
increments like 'vscale * 16 + 8'. The idea of this patch is if we have
a '+8' increment already calculated in the chain, we can generate a
(legal) '+ vscale*16' addressing mode from it, allowing us to use the
'[x16, #1, mul vl]' addressing mode instructions.

In order to do this we keep track of the known 'bases' when generating
chains in GenerateIVChain, checking for each if the accumulated
increment expression from the base neatly folds into a legal addressing
mode. If they do not we fall back to the existing LeftOverExpr, whether
it is legal or not.

This is mostly orthogonal to llvm#88124, dealing with the generation of
chains as opposed to rest of LSR. The existing vscale addressing mode
work has greatly helped compared to the last time I looked at this,
allowing us to check that the addressing modes are indeed legal.
tstellar pushed a commit that referenced this pull request Jun 24, 2024
…on (llvm#94752)

Fixes llvm#62925.

The following code:
```cpp
#include <map>

int main() {
   std::map m1 = {std::pair{"foo", 2}, {"bar", 3}}; // guide #2
   std::map m2(m1.begin(), m1.end()); // guide #1
}
```
Is rejected by clang, but accepted by both gcc and msvc:
https://godbolt.org/z/6v4fvabb5 .

So basically CTAD with copy-list-initialization is rejected.

Note that this exact code is also used in a cppreference article:
https://en.cppreference.com/w/cpp/container/map/deduction_guides

I checked the C++11 and C++20 standard drafts to see whether suppressing
user conversion is the correct thing to do for user conversions. Based
on the standard I don't think that it is correct.

```
13.3.1.4 Copy-initialization of class by user-defined conversion [over.match.copy]
Under the conditions specified in 8.5, as part of a copy-initialization of an object of class type, a user-defined
conversion can be invoked to convert an initializer expression to the type of the object being initialized.
Overload resolution is used to select the user-defined conversion to be invoked
```
So we could use user defined conversions according to the standard.

```
If a narrowing conversion is required to initialize any of the elements, the
program is ill-formed.
```
We should not do narrowing.

```
In copy-list-initialization, if an explicit constructor is chosen, the initialization is ill-formed.
```
We should not use explicit constructors.
tstellar pushed a commit that referenced this pull request Jun 24, 2024
`rethrow` instruction is a terminator, but when when its DAG is built in
`SelectionDAGBuilder` in a custom routine, it was NOT treated as such.

```ll
rethrow:                                          ; preds = %catch.start
  invoke void @llvm.wasm.rethrow() #1 [ "funclet"(token %1) ]
          to label %unreachable unwind label %ehcleanup

ehcleanup:                                        ; preds = %rethrow, %catch.dispatch
  %tmp = phi i32 [ 10, %catch.dispatch ], [ 20, %rethrow ]
  ...
```

In this bitcode, because of the `phi`, a `CONST_I32` will be created in
the `rethrow` BB. Without this patch, the DAG for the `rethrow` BB looks
like this:
```
  t0: ch,glue = EntryToken
      t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
      t5: ch = llvm.wasm.rethrow t0, TargetConstant:i32<12161>
    t6: ch = TokenFactor t3, t5
  t8: ch = br t6, BasicBlock:ch<unreachable 0x562532e43c50>
```
Note that `CopyToReg` and `llvm.wasm.rethrow` don't have dependence so
either can come first in the selected code, which can result in the code
like
```mir
bb.3.rethrow:
  RETHROW 0, implicit-def dead $arguments
  %9:i32 = CONST_I32 20, implicit-def dead $arguments
  BR %bb.6, implicit-def dead $arguments
```

After this patch, `llvm.wasm.rethrow` is treated as a terminator, and
the DAG will look like
```
        t0: ch,glue = EntryToken
      t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
    t5: ch = llvm.wasm.rethrow t3, TargetConstant:i32<12161>
  t7: ch = br t5, BasicBlock:ch<unreachable 0x5555e3d32c70>
```
Note that now `rethrow` takes a token from `CopyToReg`, so `rethrow` has
to come after `CopyToReg`. And the resulting code will be
```mir
bb.3.rethrow:
  %9:i32 = CONST_I32 20, implicit-def dead $arguments
  RETHROW 0, implicit-def dead $arguments
  BR %bb.6, implicit-def dead $arguments
```

I'm not very familiar with the internals of `getRoot` vs.
`getControlRoot`, but other terminator instructions seem to use the
latter, and using it for `rethrow` too worked.
tstellar pushed a commit that referenced this pull request Jul 10, 2024
…arallel fusion llvm#94391 (llvm#97607)"

This reverts commit edbc0e3.

Reason for rollback. ASAN complains about this PR:

==4320==ERROR: AddressSanitizer: heap-use-after-free on address 0x502000006cd8 at pc 0x55e2978d63cf bp 0x7ffe6431c2b0 sp 0x7ffe6431c2a8
READ of size 8 at 0x502000006cd8 thread T0
    #0 0x55e2978d63ce in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr> mlir/include/mlir/IR/IRMapping.h:40:11
    #1 0x55e2978d63ce in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:156:11
    #2 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
    #3 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
    #4 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #5 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #6 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #7 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
    #8 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
    #9 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #10 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #11 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #12 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
    #13 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
    #14 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
    #15 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
    #16 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #17 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #18 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
    #19 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
    #20 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
    #21 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
    #22 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
    #23 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
    #24 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #25 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #26 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #27 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #28 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #29 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #30 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #31 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #32 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #33 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
    #34 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
    #35 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
    #36 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
    #37 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120

0x502000006cd8 is located 8 bytes inside of 16-byte region [0x502000006cd0,0x502000006ce0)
freed by thread T0 here:
    #0 0x55e29130b7e2 in operator delete(void*, unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:155:3
    #1 0x55e2979eb657 in __libcpp_operator_delete<void *, unsigned long>
    #2 0x55e2979eb657 in __do_deallocate_handle_size<>
    #3 0x55e2979eb657 in __libcpp_deallocate
    #4 0x55e2979eb657 in deallocate
    #5 0x55e2979eb657 in deallocate
    #6 0x55e2979eb657 in operator()
    #7 0x55e2979eb657 in ~vector
    #8 0x55e2979eb657 in mlir::Block::~Block() mlir/lib/IR/Block.cpp:24:1
    #9 0x55e2979ebc17 in deleteNode llvm/include/llvm/ADT/ilist.h:42:39
    #10 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:205:5
    #11 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:209:39
    #12 0x55e2979ebc17 in mlir::Block::erase() mlir/lib/IR/Block.cpp:67:28
    #13 0x55e297aef978 in mlir::RewriterBase::eraseBlock(mlir::Block*) mlir/lib/IR/PatternMatch.cpp:245:10
    #14 0x55e297af0563 in mlir::RewriterBase::inlineBlockBefore(mlir::Block*, mlir::Block*, llvm::ilist_iterator<llvm::ilist_detail::node_options<mlir::Operation, false, false, void, false, void>, false, false>, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:331:3
    #15 0x55e297af06d8 in mlir::RewriterBase::mergeBlocks(mlir::Block*, mlir::Block*, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:341:3
    #16 0x55e297036608 in mlir::scf::ForOp::replaceWithAdditionalYields(mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) mlir/lib/Dialect/SCF/IR/SCF.cpp:575:12
    #17 0x55e2970673ca in mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Model<mlir::scf::ForOp>::replaceWithAdditionalYields(mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.h.inc:658:56
    #18 0x55e2978d5feb in replaceWithAdditionalYields blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.cpp.inc:105:14
    #19 0x55e2978d5feb in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:135:14
    #20 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
    #21 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
    #22 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #23 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #24 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #25 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
    #26 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
    #27 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #28 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #29 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #30 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
    #31 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
    #32 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
    #33 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
    #34 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #35 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #36 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
    #37 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
    #38 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
    #39 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
    #40 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
    #41 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
    #42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14

previously allocated by thread T0 here:
    #0 0x55e29130ab5d in operator new(unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:86:3
    #1 0x55e2979ed5d4 in __libcpp_operator_new<unsigned long>
    #2 0x55e2979ed5d4 in __libcpp_allocate
    #3 0x55e2979ed5d4 in allocate
    #4 0x55e2979ed5d4 in __allocate_at_least<std::__u::allocator<mlir::BlockArgument> >
    #5 0x55e2979ed5d4 in __split_buffer
    #6 0x55e2979ed5d4 in mlir::BlockArgument* std::__u::vector<mlir::BlockArgument, std::__u::allocator<mlir::BlockArgument>>::__push_back_slow_path<mlir::BlockArgument const&>(mlir::BlockArgument const&)
    #7 0x55e2979ec0f2 in push_back
    #8 0x55e2979ec0f2 in mlir::Block::addArgument(mlir::Type, mlir::Location) mlir/lib/IR/Block.cpp:154:13
    #9 0x55e29796e457 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2172:34
    #10 0x55e29796e457 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
    #11 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
    #12 0x55e297035742 in mlir::scf::ForOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/SCF/IR/SCF.cpp:521:14
    #13 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
    #14 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
    #15 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #16 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #17 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
    #18 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
    #19 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
    #20 0x55e297971d20 in parseBlockBody mlir/lib/AsmParser/Parser.cpp:2296:9
    #21 0x55e297971d20 in (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) mlir/lib/AsmParser/Parser.cpp:2226:12
    #22 0x55e29796e4f5 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2184:7
    #23 0x55e29796e4f5 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
    #24 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
    #25 0x55e29796b2cf in (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1796:12
    #26 0x55e2978d89ff in mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>&)>, mlir::StringAttr, mlir::StringAttr) mlir/lib/Interfaces/FunctionImplementation.cpp:232:14
    #27 0x55e2969ba41d in mlir::func::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/Func/IR/FuncOps.cpp:203:10
    #28 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
    #29 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
    #30 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #31 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #32 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
    #33 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
    #34 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
    #35 0x55e297959b78 in parse mlir/lib/AsmParser/Parser.cpp:2725:20
    #36 0x55e297959b78 in mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) mlir/lib/AsmParser/Parser.cpp:2785:41
    #37 0x55e29790d5c2 in mlir::parseSourceFile(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::Block*, mlir::ParserConfig const&, mlir::LocationAttr*) mlir/lib/Parser/Parser.cpp:46:10
    #38 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp, const std::__u::shared_ptr<llvm::SourceMgr> &> mlir/include/mlir/Parser/Parser.h:159:14
    #39 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp> mlir/include/mlir/Parser/Parser.h:189:10
    #40 0x55e291ebbfe2 in mlir::parseSourceFileForTool(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::ParserConfig const&, bool) mlir/include/mlir/Tools/ParseUtilities.h:31:12
    #41 0x55e291ebb263 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:383:33
    #42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
    #52 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
    #53 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
    #54 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
    #55 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120

SUMMARY: AddressSanitizer: heap-use-after-free mlir/include/mlir/IR/IRMapping.h:40:11 in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr>
Shadow bytes around the buggy address:
  0x502000006a00: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006a80: fa fa 00 fa fa fa 00 00 fa fa 00 00 fa fa 00 00
  0x502000006b00: fa fa 00 00 fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006b80: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 00
  0x502000006c00: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa fd fa
=>0x502000006c80: fa fa fd fa fa fa fd fd fa fa fd[fd]fa fa fd fd
  0x502000006d00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
  0x502000006d80: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
  0x502000006e00: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 fa
  0x502000006e80: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006f00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4320==ABORTING
tstellar pushed a commit that referenced this pull request Jul 10, 2024
This test is currently flaky on a local Windows amd64 build. The reason
is that it relies on the order of `process.threads` but this order is
nondeterministic:

If we print lldb's inputs and outputs while running, we can see that the
breakpoints are always being set correctly, and always being hit:

```sh
runCmd: breakpoint set -f "main.c" -l 2
output: Breakpoint 1: where = a.out`func_inner + 1 at main.c:2:9, address = 0x0000000140001001

runCmd: breakpoint set -f "main.c" -l 7
output: Breakpoint 2: where = a.out`main + 17 at main.c:7:5, address = 0x0000000140001021

runCmd: run
output: Process 52328 launched: 'C:\workspace\llvm-project\llvm\build\lldb-test-build.noindex\functionalities\unwind\zeroth_frame\TestZerothFrame.test_dwarf\a.out' (x86_64)
Process 52328 stopped
* thread #1, stop reason = breakpoint 1.1
    frame #0: 0x00007ff68f6b1001 a.out`func_inner at main.c:2:9
   1    void func_inner() {
-> 2        int a = 1;  // Set breakpoint 1 here
                ^
   3    }
   4
   5    int main() {
   6        func_inner();
   7        return 0; // Set breakpoint 2 here
```

However, sometimes the backtrace printed in this test shows that the
process is stopped inside NtWaitForWorkViaWorkerFactory from
`ntdll.dll`:

```sh
Backtrace at the first breakpoint:
frame #0: 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
frame #1: 0x00007ffecc74585e ntdll.dll`RtlClearThreadWorkOnBehalfTicket + 862
frame #2: 0x00007ffecc3e257d kernel32.dll`BaseThreadInitThunk + 29
frame #3: 0x00007ffecc76af28 ntdll.dll`RtlUserThreadStart + 40
```

When this happens, the test fails with an assertion error that the
stopped thread's zeroth frame's current line number does not match the
expected line number. This is because the test is looking at the wrong
thread: `process.threads[0]`.

If we print the list of threads each time the test is run, we notice
that threads are sometimes in a different order, within
`process.threads`:

```sh
Thread 0: thread #4: tid = 0x9c38, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #2: tid = 0xa950, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 2: thread #1: tid = 0xab18, 0x00007ff64bc81001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 3: thread #3: tid = 0xc514, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20

Thread 0: thread #3: tid = 0x018c, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #1: tid = 0x85c8, 0x00007ff7130c1001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 2: thread #2: tid = 0xf344, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 3: thread #4: tid = 0x6a50, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
```

Use `self.thread()` to consistently select the correct thread, instead.

Co-authored-by: kendal <kendal@thebrowser.company>
tstellar pushed a commit that referenced this pull request Jul 10, 2024
…izations of function templates to USRGenerator (llvm#98027)

Given the following:
```
template<typename T>
struct A
{
    void f(int); // #1
    
    template<typename U>
    void f(U); // #2
    
    template<>
    void f<int>(int); // #3
};
```
Clang will generate the same USR for `#1` and `#2`. This patch fixes the
issue by including the template arguments of dependent class scope
explicit specializations in their USRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant