Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only require ivy bridge when compiling with gcc 13. #25786

Merged

Conversation

toregge
Copy link
Member

@toregge toregge commented Jan 29, 2023

@baldersheim : please review

Copy link
Member

@baldersheim baldersheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ?

@toregge
Copy link
Member Author

toregge commented Jan 29, 2023

Issues with generated binaries, causing unique_store_string_allocator_test_app to fail.

The following test program fails when requiring haswell:

// Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.

#include <gtest/gtest.h>

struct Stats
{
    size_t _used;
    size_t _hold;
    size_t _dead;
    size_t _extra_used;
    Stats() : _used(0), _hold(0), _dead(0), _extra_used(0) {}
    Stats used(size_t val) { _used += val; return *this; }
    Stats hold(size_t val) { _hold += val; return *this; }
};

void
assert_stats(size_t exp_used, size_t exp_hold, size_t exp_dead,
             size_t exp_extra_used, const Stats stats)
{
    EXPECT_EQ(exp_used, stats._used);
    EXPECT_EQ(exp_hold, stats._hold);
    EXPECT_EQ(exp_dead, stats._dead);
    EXPECT_EQ(exp_extra_used, stats._extra_used);
}

TEST(Lifetime, temporary_stats_object_lifetime)
{
    assert_stats(16, 0, 0, 0, Stats().used(16).hold(0));
    assert_stats(16, 16, 0, 0, Stats().used(16).hold(16));
}

int main(int argc, char* argv[])
{
    ::testing::InitGoogleTest(&argc, argv);
    return RUN_ALL_TESTS();
}

vespadev-rawhide-desk2$ g++ -march=ivybridge -O3 -g -o lifetime lifetime.cpp -lgtest
vespadev-rawhide-desk2$ ./lifetime 
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Lifetime
[ RUN      ] Lifetime.temporary_stats_object_lifetime
[       OK ] Lifetime.temporary_stats_object_lifetime (0 ms)
[----------] 1 test from Lifetime (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 1 test.
vespadev-rawhide-desk2$ g++ -march=haswell -O3 -g -o lifetime lifetime.cpp -lgtest
vespadev-rawhide-desk2$ ./lifetime 
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Lifetime
[ RUN      ] Lifetime.temporary_stats_object_lifetime
lifetime.cpp:22: Failure
Expected equality of these values:
  exp_dead
    Which is: 0
  stats._dead
    Which is: 16
lifetime.cpp:23: Failure
Expected equality of these values:
  exp_extra_used
    Which is: 0
  stats._extra_used
    Which is: 16
[  FAILED  ] Lifetime.temporary_stats_object_lifetime (0 ms)
[----------] 1 test from Lifetime (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Lifetime.temporary_stats_object_lifetime

 1 FAILED TEST
vespadev-rawhide-desk2$ 

@toregge
Copy link
Member Author

toregge commented Jan 29, 2023

Note, the same test program also fails with gcc 12 on CentOS Stream 8 when tuning for icelake-server:

vespadev-cs8-desk2$ g++ -I /opt/vespa-deps/include -L /opt/vespa-deps/lib64 -Wl,-rpath,/opt/vespa-deps/lib64  -march=haswell -mtune=icelake-server -O3 -g -o lifetime lifetime.cpp -lgtest
vespadev-cs8-desk2$ ./lifetime 
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Lifetime
[ RUN      ] Lifetime.temporary_stats_object_lifetime
lifetime.cpp:22: Failure
Expected equality of these values:
  exp_dead
    Which is: 0
  stats._dead
    Which is: 16
lifetime.cpp:23: Failure
Expected equality of these values:
  exp_extra_used
    Which is: 0
  stats._extra_used
    Which is: 16
[  FAILED  ] Lifetime.temporary_stats_object_lifetime (0 ms)
[----------] 1 test from Lifetime (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Lifetime.temporary_stats_object_lifetime

 1 FAILED TEST
vespadev-cs8-desk2$ 

@baldersheim
Copy link
Member

baldersheim commented Jan 30, 2023

A smaller test program for verifying issue.
gcc-12:
Works: c++ -g -O3 -march=haswell skylake_test.cpp -o skylake_test && ./skylake_test
Fails: c++ -g -O3 -march=haswell -mtune=skylake-avx512 skylake_test.cpp -o skylake_test && ./skylake_test

#include <cstddef>
#include <cassert>

struct Stats
{
    size_t _used;
    size_t _hold;
    size_t _dead;
    size_t _extra_used;
    Stats() : _used(0), _hold(0), _dead(0), _extra_used(0) {}
    Stats used(size_t val) { _used += val; return *this; }
    Stats hold(size_t val) { _hold += val; return *this; }
};

void
assert_stats(size_t exp_used, size_t exp_hold, size_t exp_dead,
             size_t exp_extra_used, const Stats stats)
{
    assert(exp_used == stats._used);
    assert(exp_hold == stats._hold);
    assert(exp_dead == stats._dead);                // <===== Assert fails
    assert(exp_extra_used == stats._extra_used);
}

int main(int , char* [])
{
    assert_stats(16, 0, 0, 0, Stats().used(16).hold(0));
    assert_stats(16, 16, 0, 0, Stats().used(16).hold(16));    // <========= Causes assert to fail
    return 0;
}

@baldersheim baldersheim merged commit b229fe2 into master Jan 30, 2023
@baldersheim baldersheim deleted the toregge/only-require-ivybridge-when-compiling-with-gcc-13 branch January 30, 2023 12:44
@vekterli
Copy link
Member

GCC looks to be emitting a semantically invalid AVX2 register broadcast instruction.

Prologue of assert_stats:

Dump of assembler code for function _Z12assert_statsmmmm5Stats:
   0x00000000004011a0 <+0>:	sub    rsp,0x8
   0x00000000004011a4 <+4>:	cmp    QWORD PTR [rsp+0x10],rdi
   0x00000000004011a9 <+9>:	jne    0x4011c5 <_Z12assert_statsmmmm5Stats+37>
   0x00000000004011ab <+11>:	cmp    QWORD PTR [rsp+0x18],rsi
   0x00000000004011b0 <+16>:	jne    0x401210 <_Z12assert_statsmmmm5Stats+112>
   0x00000000004011b2 <+18>:	cmp    QWORD PTR [rsp+0x20],rdx
   0x00000000004011b7 <+23>:	jne    0x4011f7 <_Z12assert_statsmmmm5Stats+87>
   0x00000000004011b9 <+25>:	cmp    QWORD PTR [rsp+0x28],rcx
   0x00000000004011be <+30>:	jne    0x4011de <_Z12assert_statsmmmm5Stats+62>
(...)

ABI conventions state that rdx is the 3rd integer argument, i.e. exp_dead. The input stats is passed on the stack, starting in memory at rsp+0x10. _dead is the 3rd field at rsp+0x20. Everything here looks as expected.

main looks fine for the first assert_stats invocation:

Dump of assembler code for function main(int, char**):
   0x0000000000401040 <+0>:	lea    r10,[rsp+0x8]
   0x0000000000401045 <+5>:	and    rsp,0xffffffffffffffe0
   0x0000000000401049 <+9>:	push   QWORD PTR [r10-0x8]
   0x000000000040104d <+13>:	xor    ecx,ecx
   0x000000000040104f <+15>:	xor    edx,edx
   0x0000000000401051 <+17>:	push   rbp
   0x0000000000401052 <+18>:	xor    esi,esi
   0x0000000000401054 <+20>:	mov    edi,0x10
   0x0000000000401059 <+25>:	mov    rbp,rsp
   0x000000000040105c <+28>:	push   r10
   0x000000000040105e <+30>:	sub    rsp,0x68
   0x0000000000401062 <+34>:	vmovdqa ymm1,YMMWORD PTR [rip+0x1076]        # 0x4020e0
   0x000000000040106a <+42>:	vmovdqu YMMWORD PTR [rsp],ymm1
   0x000000000040106f <+47>:	vzeroupper 
   0x0000000000401072 <+50>:	call   0x4011a0 <_Z12assert_statsmmmm5Stats>

Contents of rip+0x1076 (0x4020e0) is (as 4x u64) {0x0000000000000010, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000}, i.e. {16, 0, 0, 0}. This is first loaded into ymm1, then stored into [rsp] and subsequently checked in assert_stats. This matches the expected contents of the input Stats struct.

The funkiness starts at the next call, where we want to compare against a Stats struct with field data {16, 16, 0, 0}. Here with a breakpoint inserted after the vpbroadcastq instruction that immediately follows the above call.

   0x0000000000401077 <+55>:	vpbroadcastq ymm0,QWORD PTR [rip+0x1060]        # 0x4020e0
   0x0000000000401080 <+64>:	xor    ecx,ecx
=> 0x0000000000401082 <+66>:	vmovdqu YMMWORD PTR [rsp],ymm0
   0x0000000000401087 <+71>:	xor    edx,edx
   0x0000000000401089 <+73>:	mov    esi,0x10
   0x000000000040108e <+78>:	mov    edi,0x10
   0x0000000000401093 <+83>:	vzeroupper 
   0x0000000000401096 <+86>:	call   0x4011a0 <_Z12assert_statsmmmm5Stats>
   0x000000000040109b <+91>:	mov    r10,QWORD PTR [rbp-0x8]
   0x000000000040109f <+95>:	add    rsp,0x20
   0x00000000004010a3 <+99>:	xor    eax,eax
   0x00000000004010a5 <+101>:	leave  
   0x00000000004010a6 <+102>:	lea    rsp,[r10-0x8]
   0x00000000004010aa <+106>:	ret 

Since this uses an YMM output register, this would be the "Broadcast a qword element in source operand to four locations in ymm1" version of VPBROADCASTQ. Note that we're broadcasting a single u64 into 4x u64 register "regions". This is reusing the same constant memory as the previous call (0x4020e0), so we know this will be 0x0000000000000010. Inspecting ymm0 shows this:

(gdb) i r $ymm0
ymm0           {
v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0x10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v16_int16 = {0x10, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0}, v8_int32 = {0x10, 0x0, 0x10, 0x0, 0x10, 0x0, 0x10, 0x0},
v4_int64 = {0x10, 0x10, 0x10, 0x10}, <===
v2_int128 = {0x100000000000000010, 0x100000000000000010}
}

ymm0 is then stored verbatim into rsp as with the first call to assert_stats and the input integer registers are initialized to {rdi=16, rsi=16, rdx=0, rcx=0}. The latter is as expected, the former is not.

Breakpointing in assert_stats also shows how the YMM register is verbatim used as the Stats fields:

Dump of assembler code for function _Z12assert_statsmmmm5Stats:
   0x00000000004011a0 <+0>:	sub    rsp,0x8
   0x00000000004011a4 <+4>:	cmp    QWORD PTR [rsp+0x10],rdi
   0x00000000004011a9 <+9>:	jne    0x4011c5 <_Z12assert_statsmmmm5Stats+37>
   0x00000000004011ab <+11>:	cmp    QWORD PTR [rsp+0x18],rsi
   0x00000000004011b0 <+16>:	jne    0x401210 <_Z12assert_statsmmmm5Stats+112>
   0x00000000004011b2 <+18>:	cmp    QWORD PTR [rsp+0x20],rdx
=> 0x00000000004011b7 <+23>:	jne    0x4011f7 <_Z12assert_statsmmmm5Stats+87>
(gdb) x/4gx $rsp+0x10
0x7fffffffdce0:	0x0000000000000010	0x0000000000000010
0x7fffffffdcf0:	0x0000000000000010	0x0000000000000010

There is an AVX-512 variant of VPBROADCASTQ that has a write mask ("VPBROADCASTQ ymm1 {k1}{z}, xmm2/m64 Broadcast a qword element in source operand to locations in ymm1 subject to writemask k1."). Write mask registers are only available on AVX-512, so I'm wondering if GCC is somehow getting confused and trying to half-heartedly emit such a construction in AVX2-only code? Possibly in conjunction with compile-time constant handling of types that happen to be possible to entirely "alias" in an YMM register?

@baldersheim
Copy link
Member

Reported as GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108599

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants