Add LoongArch support #11

wangjingwb · 2022-05-13T07:11:47Z

Please review, Thanks.

compudj · 2022-05-13T14:57:02Z

There is still one unanswered question about this patch: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030119.html

Fix a deadlock for auto-resize hash tables when cds_lfht_destroy is called with RCU read-side lock held. Example stack track of a hang: Thread 2 (Thread 0x7f21ba876700 (LWP 26114)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007f21beba7aa0 in futex (val3=0, uaddr2=0x0, timeout=0x0, val=-1, op=0, uaddr=0x7f21bedac308 <urcu_memb_gp+8>) at ../include/urcu/futex.h:81 #2 futex_noasync (timeout=0x0, uaddr2=0x0, val3=0, val=-1, op=0, uaddr=0x7f21bedac308 <urcu_memb_gp+8>) at ../include/urcu/futex.h:90 #3 wait_gp () at urcu.c:265 #4 wait_for_readers (input_readers=input_readers@entry=0x7f21ba8751b0, cur_snap_readers=cur_snap_readers@entry=0x0, qsreaders=qsreaders@entry=0x7f21ba8751c0) at urcu.c:357 #5 0x00007f21beba8339 in urcu_memb_synchronize_rcu () at urcu.c:498 #6 0x00007f21be99f93f in fini_table (last_order=<optimized out>, first_order=13, ht=0x5651cec75400) at rculfhash.c:1489 #7 _do_cds_lfht_shrink (new_size=<optimized out>, old_size=<optimized out>, ht=0x5651cec75400) at rculfhash.c:2001 #8 _do_cds_lfht_resize (ht=ht@entry=0x5651cec75400) at rculfhash.c:2023 #9 0x00007f21be99fa26 in do_resize_cb (work=0x5651e20621a0) at rculfhash.c:2063 #10 0x00007f21be99dbfd in workqueue_thread (arg=0x5651cec74a00) at workqueue.c:234 #11 0x00007f21bd7c06db in start_thread (arg=0x7f21ba876700) at pthread_create.c:463 #12 0x00007f21bd4e961f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7f21bf285300 (LWP 26098)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007f21be99d8b7 in futex (val3=0, uaddr2=0x0, timeout=0x0, val=-1, op=0, uaddr=0x5651d8b38584) at ../include/urcu/futex.h:81 #2 futex_async (timeout=0x0, uaddr2=0x0, val3=0, val=-1, op=0, uaddr=0x5651d8b38584) at ../include/urcu/futex.h:113 #3 futex_wait (futex=futex@entry=0x5651d8b38584) at workqueue.c:135 #4 0x00007f21be99e2c8 in urcu_workqueue_wait_completion (completion=completion@entry=0x5651d8b38580) at workqueue.c:423 #5 0x00007f21be99e3f9 in urcu_workqueue_flush_queued_work (workqueue=0x5651cec74a00) at workqueue.c:452 #6 0x00007f21be9a0c83 in cds_lfht_destroy (ht=0x5651d8b2fcf0, attr=attr@entry=0x0) at rculfhash.c:1906 This deadlock is easy to reproduce when rapidly adding a large number of entries in the cds_lfht, removing them, and calling cds_lfht_destroy(). The deadlock will occur if the call to cds_lfht_destroy() takes place while a resize of the hash table is ongoing. Fix this by moving the teardown of the lfht worker thread to libcds library destructor, so it does not have to wait on synchronize_rcu from a resize callback from within a read-side critical section. As a consequence, the atfork callbacks are left registered within each urcu flavor for which a resizeable hash table is created until the end of the executable lifetime. The other part of the fix is to move the hash table destruction to the worker thread for auto-resize hash tables. This prevents having to wait for resize callbacks from RCU read-side critical section. This is guaranteed by the fact that the worker thread serializes previously queued resize callbacks before the destroy callback. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: If8b1c3c8063dc7b9846dc5c3fc452efd917eab4d

loongson-sm · 2023-08-08T11:30:24Z

There is still one unanswered question about this patch: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030119.html

Does this issue still block the merge of the pull request? If it does, we can provide the hardware for you to perform verification.

compudj · 2023-08-10T11:41:06Z

There is still one unanswered question about this patch: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030119.html

Does this issue still block the merge of the pull request? If it does, we can provide the hardware for you to perform verification.

Yes, this still blocks the merge of the pull request. I need to understand the architecture design well enough to merge and maintain support for an architecture. For this I would need either answer to my questions, or boards available for testing and access to the architecture documentation. Testing-wise, the ideal scenario is if we can add at least 2 test boards in our test rack at EfficiOS, so it can be wired up in our CI.

We can discuss access to hardware by email. Please contact me at mathieu.desnoyers@efficios.com

glaubitz · 2023-09-03T08:28:00Z

Yes, this still blocks the merge of the pull request. I need to understand the architecture design well enough to merge and maintain support for an architecture. For this I would need either answer to my questions, or boards available for testing and access to the architecture documentation. Testing-wise, the ideal scenario is if we can add at least 2 test boards in our test rack at EfficiOS, so it can be wired up in our CI.

LoongArch is already supported by QEMU so CI should be possible without real hardware.

FWIW, this missing patch is blocking other packages on Debian now. I am adding the patch locally now.

See: https://buildd.debian.org/status/fetch.php?pkg=liburcu&arch=loong64&ver=0.14.0-1&stamp=1693584292&raw=0

compudj · 2023-09-03T14:39:48Z

The proposed patch has this set in include/urcu/uatomic/loongarch.h:

#define UATOMIC_HAS_ATOMIC_BYTE
#define UATOMIC_HAS_ATOMIC_SHORT

The architecture manual states:

https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#atomic-memory-access-instructions

"The access address of an AM* atomic access instruction is the value of the general register rj. The access address of an AM* atomic access instruction always requires natural alignment, and failure to meet this condition will trigger a non-alignment exception.

Atomic access instructions ending in .W and .WU read and write memory and intermediate operations with a data length of 32 bits, while atomic access instructions ending in .D and .DU read and write memory and intermediate operations with a data length of 64 bits. Whether ending in .W or .WU, the data of a word retrieved from memory by an atomic access instruction is symbolically extended and written to the general register rd."

So there is a discrepancy between the patch implementation and the architecture manual. There is a test for this under tests/unit/test_uatomic.c. I suspect the main reason why this test does not fail is because the addresses of the byte and short variables happen to be naturally aligned on word-size.

So the reason why I have not merged this patch is because I think it has a bug, and that if the urcu tests all pass with this, there is a hole in the testing that needs to be filled.

I am not in favor of this patch being picked up as is in Debian without these issues being addressed first. I have voiced my concerns repeatedly to the patch submitter and they were never addressed.

compudj · 2023-09-03T15:00:47Z

I have pushed this commit into the liburcu master branch which should help trigger the "unaligned atomic trap" detailed in the loongson architecture manual, please try it out:

commit cac31bf
Author: Mathieu Desnoyers mathieu.desnoyers@efficios.com
Date: Sun Sep 3 10:55:24 2023 -0400

Tests: Add test for byte/short atomics on addresses which are not word-aligned

Add a unit test to catch architectures which do not allow byte and short
atomic operations on addresses which are not word aligned.

If an architecture supports byte and short atomic operations, it should
be valid to issue those operations on variables which are not
word-aligned, otherwise the architecture should not define
UATOMIC_HAS_ATOMIC_BYTE nor UATOMIC_HAS_ATOMIC_SHORT.

This should help identify architectures which mistakenly define
UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT.

compudj · 2023-09-03T15:06:24Z

Yes, this still blocks the merge of the pull request. I need to understand the architecture design well enough to merge and maintain support for an architecture. For this I would need either answer to my questions, or boards available for testing and access to the architecture documentation. Testing-wise, the ideal scenario is if we can add at least 2 test boards in our test rack at EfficiOS, so it can be wired up in our CI.

LoongArch is already supported by QEMU so CI should be possible without real hardware.

FWIW, this missing patch is blocking other packages on Debian now. I am adding the patch locally now.

See: https://buildd.debian.org/status/fetch.php?pkg=liburcu&arch=loong64&ver=0.14.0-1&stamp=1693584292&raw=0

Testing liburcu in a QEMU environment is not sufficient to validate the correctness of the memory barriers, as this inherently depends on the hardware implementation of the processor. Typically emulators will fall back on a stronger memory consistency model compared to the emulated hardware, which makes things "work" but fail to cover the various race conditions that can happen if the memory barriers are wrong.

So no, testing within QEMU is not enough for liburcu.

glaubitz · 2023-09-03T15:17:43Z

I am not in favor of this patch being picked up as is in Debian without these issues being addressed first. I have voiced my concerns repeatedly to the patch submitter and they were never addressed.

Patch is already in the »unreleased« of Debian distribution as this would otherwise block many other packages.

So no, testing within QEMU is not enough for liburcu.

There are two LoongArch machines in the GCC compile farm for which access can be obtained by any open source developers:

https://cfarm.tetaneutral.net/machines/list/

The two machines are currently offline, but I will reach out to my contacts at Loongson to get them back online.

compudj · 2023-09-03T15:25:02Z

I am not in favor of this patch being picked up as is in Debian without these issues being addressed first. I have voiced my concerns repeatedly to the patch submitter and they were never addressed.

Patch is already in the »unreleased« of Debian distribution as this would otherwise block many other packages.

Short-term, if you really need to deploy this patch, I recommend that you remove the UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT defines from include/urcu/uatomic/loongarch.h. It still needs to be tested on real hardware, but my main underlying concern is the presence of those two define in public headers that contradict the architecture reference manual.

So no, testing within QEMU is not enough for liburcu.

There are two LoongArch machines in the GCC compile farm for which access can be obtained by any open source developers:

https://cfarm.tetaneutral.net/machines/list/

The two machines are currently offline, but I will reach out to my contacts at Loongson to get them back online.

I would be interested to have liburcu tested on real Loongson boards before I merge its support into liburcu. Please let me know how it goes.

wangjingwb · 2023-09-05T11:56:30Z

cac31bf

Test passed.
Environment:
cpu:Loongson-3A5000
memory:16G
os:Loongnix 20.5
kernel version:4.19.190.8.11
gcc version:8.3.0-6.lnd.vec.36

Results:
./test_uatomic
1..255

Test atomic ops on byte with 0 byte offset from long alignment

ok 1 - uatomic_read(&vals.c[i]) == 10
ok 2 - uatomic_read(&vals.c[i]) == (typeof((&vals.c[i])))-1UL
ok 3 - uatomic_read(&vals.c[i]) == 22
ok 4 - v == (typeof((&vals.c[i])))-1UL
ok 5 - uatomic_read(&vals.c[i]) == 22
ok 6 - v == 22
ok 7 - uatomic_read(&vals.c[i]) == 55
ok 8 - v == 22
ok 9 - uatomic_read(&vals.c[i]) == 23
ok 10 - uatomic_read(&vals.c[i]) == 22
ok 11 - v == 96
ok 12 - uatomic_read(&vals.c[i]) == 96
ok 13 - uatomic_read(&vals.c[i]) == 122
ok 14 - v == 121
ok 15 - uatomic_read(&vals.c[i]) == 119
ok 16 - uatomic_read(&vals.c[i]) == 121
ok 17 - uatomic_read(&vals.c[i]) == 1

Test atomic ops on byte with 1 byte offset from long alignment

ok 18 - uatomic_read(&vals.c[i]) == 10
ok 19 - uatomic_read(&vals.c[i]) == (typeof((&vals.c[i])))-1UL
ok 20 - uatomic_read(&vals.c[i]) == 22
ok 21 - v == (typeof((&vals.c[i])))-1UL
ok 22 - uatomic_read(&vals.c[i]) == 22
ok 23 - v == 22
ok 24 - uatomic_read(&vals.c[i]) == 55
ok 25 - v == 22
ok 26 - uatomic_read(&vals.c[i]) == 23
ok 27 - uatomic_read(&vals.c[i]) == 22
ok 28 - v == 96
ok 29 - uatomic_read(&vals.c[i]) == 96
ok 30 - uatomic_read(&vals.c[i]) == 122
ok 31 - v == 121
ok 32 - uatomic_read(&vals.c[i]) == 119
ok 33 - uatomic_read(&vals.c[i]) == 121
ok 34 - uatomic_read(&vals.c[i]) == 1

Test atomic ops on byte with 2 byte offset from long alignment

ok 35 - uatomic_read(&vals.c[i]) == 10
ok 36 - uatomic_read(&vals.c[i]) == (typeof((&vals.c[i])))-1UL
ok 37 - uatomic_read(&vals.c[i]) == 22
ok 38 - v == (typeof((&vals.c[i])))-1UL
ok 39 - uatomic_read(&vals.c[i]) == 22
ok 40 - v == 22
ok 41 - uatomic_read(&vals.c[i]) == 55
ok 42 - v == 22
ok 43 - uatomic_read(&vals.c[i]) == 23
ok 44 - uatomic_read(&vals.c[i]) == 22
ok 45 - v == 96
ok 46 - uatomic_read(&vals.c[i]) == 96
ok 47 - uatomic_read(&vals.c[i]) == 122
ok 48 - v == 121
ok 49 - uatomic_read(&vals.c[i]) == 119
ok 50 - uatomic_read(&vals.c[i]) == 121
ok 51 - uatomic_read(&vals.c[i]) == 1

Test atomic ops on byte with 3 byte offset from long alignment

ok 52 - uatomic_read(&vals.c[i]) == 10
ok 53 - uatomic_read(&vals.c[i]) == (typeof((&vals.c[i])))-1UL
ok 54 - uatomic_read(&vals.c[i]) == 22
ok 55 - v == (typeof((&vals.c[i])))-1UL
ok 56 - uatomic_read(&vals.c[i]) == 22
ok 57 - v == 22
ok 58 - uatomic_read(&vals.c[i]) == 55
ok 59 - v == 22
ok 60 - uatomic_read(&vals.c[i]) == 23
ok 61 - uatomic_read(&vals.c[i]) == 22
ok 62 - v == 96
ok 63 - uatomic_read(&vals.c[i]) == 96
ok 64 - uatomic_read(&vals.c[i]) == 122
ok 65 - v == 121
ok 66 - uatomic_read(&vals.c[i]) == 119
ok 67 - uatomic_read(&vals.c[i]) == 121
ok 68 - uatomic_read(&vals.c[i]) == 1

Test atomic ops on byte with 4 byte offset from long alignment

ok 69 - uatomic_read(&vals.c[i]) == 10
ok 70 - uatomic_read(&vals.c[i]) == (typeof((&vals.c[i])))-1UL
ok 71 - uatomic_read(&vals.c[i]) == 22
ok 72 - v == (typeof((&vals.c[i])))-1UL
ok 73 - uatomic_read(&vals.c[i]) == 22
ok 74 - v == 22
ok 75 - uatomic_read(&vals.c[i]) == 55
ok 76 - v == 22
ok 77 - uatomic_read(&vals.c[i]) == 23
ok 78 - uatomic_read(&vals.c[i]) == 22
ok 79 - v == 96
ok 80 - uatomic_read(&vals.c[i]) == 96
ok 81 - uatomic_read(&vals.c[i]) == 122
ok 82 - v == 121
ok 83 - uatomic_read(&vals.c[i]) == 119
ok 84 - uatomic_read(&vals.c[i]) == 121
ok 85 - uatomic_read(&vals.c[i]) == 1

Test atomic ops on byte with 5 byte offset from long alignment

ok 86 - uatomic_read(&vals.c[i]) == 10
ok 87 - uatomic_read(&vals.c[i]) == (typeof((&vals.c[i])))-1UL
ok 88 - uatomic_read(&vals.c[i]) == 22
ok 89 - v == (typeof((&vals.c[i])))-1UL
ok 90 - uatomic_read(&vals.c[i]) == 22
ok 91 - v == 22
ok 92 - uatomic_read(&vals.c[i]) == 55
ok 93 - v == 22
ok 94 - uatomic_read(&vals.c[i]) == 23
ok 95 - uatomic_read(&vals.c[i]) == 22
ok 96 - v == 96
ok 97 - uatomic_read(&vals.c[i]) == 96
ok 98 - uatomic_read(&vals.c[i]) == 122
ok 99 - v == 121
ok 100 - uatomic_read(&vals.c[i]) == 119
ok 101 - uatomic_read(&vals.c[i]) == 121
ok 102 - uatomic_read(&vals.c[i]) == 1

Test atomic ops on byte with 6 byte offset from long alignment

ok 103 - uatomic_read(&vals.c[i]) == 10
ok 104 - uatomic_read(&vals.c[i]) == (typeof((&vals.c[i])))-1UL
ok 105 - uatomic_read(&vals.c[i]) == 22
ok 106 - v == (typeof((&vals.c[i])))-1UL
ok 107 - uatomic_read(&vals.c[i]) == 22
ok 108 - v == 22
ok 109 - uatomic_read(&vals.c[i]) == 55
ok 110 - v == 22
ok 111 - uatomic_read(&vals.c[i]) == 23
ok 112 - uatomic_read(&vals.c[i]) == 22
ok 113 - v == 96
ok 114 - uatomic_read(&vals.c[i]) == 96
ok 115 - uatomic_read(&vals.c[i]) == 122
ok 116 - v == 121
ok 117 - uatomic_read(&vals.c[i]) == 119
ok 118 - uatomic_read(&vals.c[i]) == 121
ok 119 - uatomic_read(&vals.c[i]) == 1

Test atomic ops on byte with 7 byte offset from long alignment

ok 120 - uatomic_read(&vals.c[i]) == 10
ok 121 - uatomic_read(&vals.c[i]) == (typeof((&vals.c[i])))-1UL
ok 122 - uatomic_read(&vals.c[i]) == 22
ok 123 - v == (typeof((&vals.c[i])))-1UL
ok 124 - uatomic_read(&vals.c[i]) == 22
ok 125 - v == 22
ok 126 - uatomic_read(&vals.c[i]) == 55
ok 127 - v == 22
ok 128 - uatomic_read(&vals.c[i]) == 23
ok 129 - uatomic_read(&vals.c[i]) == 22
ok 130 - v == 96
ok 131 - uatomic_read(&vals.c[i]) == 96
ok 132 - uatomic_read(&vals.c[i]) == 122
ok 133 - v == 121
ok 134 - uatomic_read(&vals.c[i]) == 119
ok 135 - uatomic_read(&vals.c[i]) == 121
ok 136 - uatomic_read(&vals.c[i]) == 1

Test atomic ops on short with 0 byte offset from long alignment

ok 137 - uatomic_read(&vals.s[i]) == 10
ok 138 - uatomic_read(&vals.s[i]) == (typeof((&vals.s[i])))-1UL
ok 139 - uatomic_read(&vals.s[i]) == 22
ok 140 - v == (typeof((&vals.s[i])))-1UL
ok 141 - uatomic_read(&vals.s[i]) == 22
ok 142 - v == 22
ok 143 - uatomic_read(&vals.s[i]) == 55
ok 144 - v == 22
ok 145 - uatomic_read(&vals.s[i]) == 23
ok 146 - uatomic_read(&vals.s[i]) == 22
ok 147 - v == 96
ok 148 - uatomic_read(&vals.s[i]) == 96
ok 149 - uatomic_read(&vals.s[i]) == 122
ok 150 - v == 121
ok 151 - uatomic_read(&vals.s[i]) == 119
ok 152 - uatomic_read(&vals.s[i]) == 121
ok 153 - uatomic_read(&vals.s[i]) == 1

Test atomic ops on short with 2 byte offset from long alignment

ok 154 - uatomic_read(&vals.s[i]) == 10
ok 155 - uatomic_read(&vals.s[i]) == (typeof((&vals.s[i])))-1UL
ok 156 - uatomic_read(&vals.s[i]) == 22
ok 157 - v == (typeof((&vals.s[i])))-1UL
ok 158 - uatomic_read(&vals.s[i]) == 22
ok 159 - v == 22
ok 160 - uatomic_read(&vals.s[i]) == 55
ok 161 - v == 22
ok 162 - uatomic_read(&vals.s[i]) == 23
ok 163 - uatomic_read(&vals.s[i]) == 22
ok 164 - v == 96
ok 165 - uatomic_read(&vals.s[i]) == 96
ok 166 - uatomic_read(&vals.s[i]) == 122
ok 167 - v == 121
ok 168 - uatomic_read(&vals.s[i]) == 119
ok 169 - uatomic_read(&vals.s[i]) == 121
ok 170 - uatomic_read(&vals.s[i]) == 1

Test atomic ops on short with 4 byte offset from long alignment

ok 171 - uatomic_read(&vals.s[i]) == 10
ok 172 - uatomic_read(&vals.s[i]) == (typeof((&vals.s[i])))-1UL
ok 173 - uatomic_read(&vals.s[i]) == 22
ok 174 - v == (typeof((&vals.s[i])))-1UL
ok 175 - uatomic_read(&vals.s[i]) == 22
ok 176 - v == 22
ok 177 - uatomic_read(&vals.s[i]) == 55
ok 178 - v == 22
ok 179 - uatomic_read(&vals.s[i]) == 23
ok 180 - uatomic_read(&vals.s[i]) == 22
ok 181 - v == 96
ok 182 - uatomic_read(&vals.s[i]) == 96
ok 183 - uatomic_read(&vals.s[i]) == 122
ok 184 - v == 121
ok 185 - uatomic_read(&vals.s[i]) == 119
ok 186 - uatomic_read(&vals.s[i]) == 121
ok 187 - uatomic_read(&vals.s[i]) == 1

Test atomic ops on short with 6 byte offset from long alignment

ok 188 - uatomic_read(&vals.s[i]) == 10
ok 189 - uatomic_read(&vals.s[i]) == (typeof((&vals.s[i])))-1UL
ok 190 - uatomic_read(&vals.s[i]) == 22
ok 191 - v == (typeof((&vals.s[i])))-1UL
ok 192 - uatomic_read(&vals.s[i]) == 22
ok 193 - v == 22
ok 194 - uatomic_read(&vals.s[i]) == 55
ok 195 - v == 22
ok 196 - uatomic_read(&vals.s[i]) == 23
ok 197 - uatomic_read(&vals.s[i]) == 22
ok 198 - v == 96
ok 199 - uatomic_read(&vals.s[i]) == 96
ok 200 - uatomic_read(&vals.s[i]) == 122
ok 201 - v == 121
ok 202 - uatomic_read(&vals.s[i]) == 119
ok 203 - uatomic_read(&vals.s[i]) == 121
ok 204 - uatomic_read(&vals.s[i]) == 1

Test atomic ops on int with 0 byte offset from long alignment

ok 205 - uatomic_read(&vals.i[i]) == 10
ok 206 - uatomic_read(&vals.i[i]) == (typeof((&vals.i[i])))-1UL
ok 207 - uatomic_read(&vals.i[i]) == 22
ok 208 - v == (typeof((&vals.i[i])))-1UL
ok 209 - uatomic_read(&vals.i[i]) == 22
ok 210 - v == 22
ok 211 - uatomic_read(&vals.i[i]) == 55
ok 212 - v == 22
ok 213 - uatomic_read(&vals.i[i]) == 23
ok 214 - uatomic_read(&vals.i[i]) == 22
ok 215 - v == 96
ok 216 - uatomic_read(&vals.i[i]) == 96
ok 217 - uatomic_read(&vals.i[i]) == 122
ok 218 - v == 121
ok 219 - uatomic_read(&vals.i[i]) == 119
ok 220 - uatomic_read(&vals.i[i]) == 121
ok 221 - uatomic_read(&vals.i[i]) == 1

Test atomic ops on int with 4 byte offset from long alignment

ok 222 - uatomic_read(&vals.i[i]) == 10
ok 223 - uatomic_read(&vals.i[i]) == (typeof((&vals.i[i])))-1UL
ok 224 - uatomic_read(&vals.i[i]) == 22
ok 225 - v == (typeof((&vals.i[i])))-1UL
ok 226 - uatomic_read(&vals.i[i]) == 22
ok 227 - v == 22
ok 228 - uatomic_read(&vals.i[i]) == 55
ok 229 - v == 22
ok 230 - uatomic_read(&vals.i[i]) == 23
ok 231 - uatomic_read(&vals.i[i]) == 22
ok 232 - v == 96
ok 233 - uatomic_read(&vals.i[i]) == 96
ok 234 - uatomic_read(&vals.i[i]) == 122
ok 235 - v == 121
ok 236 - uatomic_read(&vals.i[i]) == 119
ok 237 - uatomic_read(&vals.i[i]) == 121
ok 238 - uatomic_read(&vals.i[i]) == 1

Test atomic ops on long

ok 239 - uatomic_read(&vals.l) == 10
ok 240 - uatomic_read(&vals.l) == (typeof((&vals.l)))-1UL
ok 241 - uatomic_read(&vals.l) == 22
ok 242 - v == (typeof((&vals.l)))-1UL
ok 243 - uatomic_read(&vals.l) == 22
ok 244 - v == 22
ok 245 - uatomic_read(&vals.l) == 55
ok 246 - v == 22
ok 247 - uatomic_read(&vals.l) == 23
ok 248 - uatomic_read(&vals.l) == 22
ok 249 - v == 96
ok 250 - uatomic_read(&vals.l) == 96
ok 251 - uatomic_read(&vals.l) == 122
ok 252 - v == 121
ok 253 - uatomic_read(&vals.l) == 119
ok 254 - uatomic_read(&vals.l) == 121
ok 255 - uatomic_read(&vals.l) == 1

wangjingwb · 2023-09-05T12:11:26Z

I am not in favor of this patch being picked up as is in Debian without these issues being addressed first. I have voiced my concerns repeatedly to the patch submitter and they were never addressed.

Patch is already in the »unreleased« of Debian distribution as this would otherwise block many other packages.

Short-term, if you really need to deploy this patch, I recommend that you remove the UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT defines from include/urcu/uatomic/loongarch.h. It still needs to be tested on real hardware, but my main underlying concern is the presence of those two define in public headers that contradict the architecture reference manual.

So no, testing within QEMU is not enough for liburcu.

There are two LoongArch machines in the GCC compile farm for which access can be obtained by any open source developers:

https://cfarm.tetaneutral.net/machines/list/

The two machines are currently offline, but I will reach out to my contacts at Loongson to get them back online.

I would be interested to have liburcu tested on real Loongson boards before I merge its support into liburcu. Please let me know how it goes.

The LoongArch machine in the GCC compilation farm is currently being prepared, and there will be a notification when it goes online.
Thanks.

wangjingwb · 2023-09-05T12:28:51Z

There is still one unanswered question about this patch: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030119.html

Yes. char and short are implemented through the ll/sc instruction.

compudj · 2023-09-05T13:04:34Z

There is still one unanswered question about this patch: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030119.html

Yes. char and short are implemented through the ll/sc instruction.

Perfect, this makes sense. I'll apply the patch and add an extra comment stating that 8-bit and 16-bit atomic accesses are performed through ll/sc, and that the ll/sc loop may retry if the cache line is modified concurrently. This can be relevant for API users relying on strong forward progress guarantees.

compudj · 2023-09-05T13:08:58Z

Please add a patch commit message describing the change, and a "Signed-off-by: " with your email at the end, and I will be able to merge it.

This commit completes LoongArch support. LoongArch supports byte and short atomic operations, and defines UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT. Signed-off-by: Wang Jing <wangjing@loongson.cn> Change-Id: I335e654939bfc90994275f2a4fad550c95f3eba4

wangjingwb · 2023-09-06T04:33:18Z

Please add a patch commit message describing the change, and a "Signed-off-by: " with your email at the end, and I will be able to merge it.

Modified and completed.Thanks.

…LL/SC Based on the LoongArch Reference Manual: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html Section 2.2.7 "Atomic Memory Access Instructions" only lists atomic operations for 32-bit and 64-bit integers. As detailed in Section 2.2.7.1, LL/SC instructions operating on 32-bit and 64-bit integers are also available. Those are used by the compiler to support atomics on byte and short types. This means atomics on 32-bit and 64-bit types have stronger forward progress guarantees than those operating on 8-bit and 16-bit types. Link: #11 (comment) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I01569b718f7300a46d984c34065c0bbfbd2f7cc6

compudj · 2023-09-06T13:29:03Z

It is now merged into the liburcu master branch, thanks for your contribution !

…LL/SC Based on the LoongArch Reference Manual: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html Section 2.2.7 "Atomic Memory Access Instructions" only lists atomic operations for 32-bit and 64-bit integers. As detailed in Section 2.2.7.1, LL/SC instructions operating on 32-bit and 64-bit integers are also available. Those are used by the compiler to support atomics on byte and short types. This means atomics on 32-bit and 64-bit types have stronger forward progress guarantees than those operating on 8-bit and 16-bit types. Link: #11 (comment) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I01569b718f7300a46d984c34065c0bbfbd2f7cc6

Add LoongArch support

fe9c144

This commit completes LoongArch support. LoongArch supports byte and short atomic operations, and defines UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT. Signed-off-by: Wang Jing <wangjing@loongson.cn> Change-Id: I335e654939bfc90994275f2a4fad550c95f3eba4

wangjingwb force-pushed the la64/master branch from 9527dea to fe9c144 Compare September 6, 2023 02:48

compudj closed this Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoongArch support #11

Add LoongArch support #11

wangjingwb commented May 13, 2022

compudj commented May 13, 2022

loongson-sm commented Aug 8, 2023

compudj commented Aug 10, 2023

glaubitz commented Sep 3, 2023 •

edited

compudj commented Sep 3, 2023

compudj commented Sep 3, 2023

compudj commented Sep 3, 2023

glaubitz commented Sep 3, 2023

compudj commented Sep 3, 2023 •

edited

wangjingwb commented Sep 5, 2023

wangjingwb commented Sep 5, 2023

wangjingwb commented Sep 5, 2023

compudj commented Sep 5, 2023

compudj commented Sep 5, 2023

wangjingwb commented Sep 6, 2023

compudj commented Sep 6, 2023

Add LoongArch support #11

Add LoongArch support #11

Conversation

wangjingwb commented May 13, 2022

compudj commented May 13, 2022

loongson-sm commented Aug 8, 2023

compudj commented Aug 10, 2023

glaubitz commented Sep 3, 2023 • edited

compudj commented Sep 3, 2023

compudj commented Sep 3, 2023

compudj commented Sep 3, 2023

glaubitz commented Sep 3, 2023

compudj commented Sep 3, 2023 • edited

wangjingwb commented Sep 5, 2023

Test atomic ops on byte with 0 byte offset from long alignment

Test atomic ops on byte with 1 byte offset from long alignment

Test atomic ops on byte with 2 byte offset from long alignment

Test atomic ops on byte with 3 byte offset from long alignment

Test atomic ops on byte with 4 byte offset from long alignment

Test atomic ops on byte with 5 byte offset from long alignment

Test atomic ops on byte with 6 byte offset from long alignment

Test atomic ops on byte with 7 byte offset from long alignment

Test atomic ops on short with 0 byte offset from long alignment

Test atomic ops on short with 2 byte offset from long alignment

Test atomic ops on short with 4 byte offset from long alignment

Test atomic ops on short with 6 byte offset from long alignment

Test atomic ops on int with 0 byte offset from long alignment

Test atomic ops on int with 4 byte offset from long alignment

Test atomic ops on long

wangjingwb commented Sep 5, 2023

wangjingwb commented Sep 5, 2023

compudj commented Sep 5, 2023

compudj commented Sep 5, 2023

wangjingwb commented Sep 6, 2023

compudj commented Sep 6, 2023

glaubitz commented Sep 3, 2023 •

edited

compudj commented Sep 3, 2023 •

edited