Description
Original report: https://www.mail-archive.com/gem5-users@gem5.org/msg16677.html
At lkmc 99180e6 gem5 08c79a194d1a3430801c04f37d13216cc9ec1da3 May 2019:
./run --arch aarch64 --emulator gem5 --cpu 2 -- \
--cpu-type=HPI --caches --l2cache --l1d_size=64kB --l1i_size=64kB --l2_size=256kB
The simulator fails with:
Exiting @ tick 18446744073709551615 because simulate() limit reached
and the last log message is:
<6>[ 0.103629] printk: console [ttyAMA0] enabled
With 1 core, boot works, and the next message would have been:
<3>[ 0.103586] OF: amba_device_add() failed (-2) for /kmi@1c070000
With TimingSimpleCPU
, boot went further, also failed with those options just after init finished, the last message is:
root@buildroot#
and once again we have:
Exiting @ tick 18446744073709551615 because simulate() limit reached
I tried to bisect, and reached a commit. However, I then tried on another machine, and the problem could not be reproduced. This suggests that it is an undefined behavior, and so the bisect may be just bogus...
Bisection start:
- gem5 master 9af53ddaec43653d915649173660dc5c71f06a72: hangs at printk
- gem5 default 08c79a194d1a3430801c04f37d13216cc9ec1da3: blows up
simulate() limit reached
- gem5 good 91195ae7f637d1d4879cc3bf0860147333846e75
Bisection progress from git bisect start 9af53ddaec43653d915649173660dc5c71f06a72 91195ae7f637d1d4879cc3bf0860147333846e75
, I'm interactively inspecting and marking good if it passes the printk
, otherwise runs just take forever:
- a4f30167f676fa45192fc3322c96e24b83f5e96f good
- 8529cdad6bb6d6aceb2e12e785607f0d409c0d76 good
- c1e040d81aa6fe0dcd9ecce1d4bf4b3cef60f894 bad, hangs at printk, requires
git cherry-pick -n 4c38c7c02aca9922d7f30f2f399bbe94c034eb59
to even start - 39896bd265cfab20ab512cf4bceed7b38eca9d91 good
- ea088f5150d03d4481555ecbbfa2afba3a87468a good
- f5cf6d5f5ef8df0fedcba9d3cf3c16d76a6dceae good
- aece7fcdf97d2864fbb31e02940bfcdd470db7b9 bad, hangs at printk, requires
git cherry-pick -n 4c38c7c02aca9922d7f30f2f399bbe94c034eb59
to even start - 2574dc41a6b420f0101d0ecf2a3205091ef96940 bad, same as above. Also tested single core, and that works.
- f2be9f195c5aa226fa546e79c9acf95c8a800915 bad, same as above
Bisection result: f2be9f195c5aa226fa546e79c9acf95c8a800915 is the first bad "mem: Option to toggle DRAM low-power states", which does not look evil.
I then tried at later master commits:
- 16eeee5356585441a49d05c78abc328ef09f7ace: TimingSimpleCPU boot worked, HPI stuck at:
<5>[ 0.347375] sd 0:0:0:0: [sda] Attached SCSI disk
In the HPI, the stdout contains a few ubsan messages:
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/sim/probe/probe.hh:270:33: runtime error: downcast of address 0x563f5fc6da40 which does not point to an object of type 'ProbeListenerArgBase'
0x563f5fc6da40: note: object is of type 'ProbeListener'
00 00 00 00 98 06 89 44 3f 56 00 00 d0 1b 5e 5d 3f 56 00 00 60 da c6 5f 3f 56 00 00 04 00 00 00
^~~~~~~~~~~~~~~~~~~~~~~
vptr for 'ProbeListener'
info: Using bootloader at address 0x10
info: Using kernel entry physical address at 0x80080000
info: Loading DTB file: /mnt/sda3/linux-kernel-module-cheat-out/run/gem5/aarch64/0/m5out/system.dtb at address 0x88000000
**** REAL SIMULATION ****
warn: Existing EnergyCtrl, but no enabled DVFSHandler found.
info: Entering event queue @ 0. Starting simulation...
warn: SCReg: Access to unknown device dcc0:site0:pos0:fn7:dev0
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/exec-ns.cc.inc:89416:46: runtime error: load of value 475, which is not a valid value for type 'IntRegIndex'
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/exec-ns.cc.inc:89553:46: runtime error: load of value 463, which is not a valid value for type 'IntRegIndex'
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/decoder-ns.cc.inc:52295:52: runtime error: load of value 427, which is not a valid value for type 'IntRegIndex'
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/exec-ns.cc.inc:89620:40: runtime error: load of value 427, which is not a valid value for type 'IntRegIndex'
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/exec-ns.cc.inc:89683:40: runtime error: load of value 430, which is not a valid value for type 'IntRegIndex'
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/exec-ns.cc.inc:89035:46: runtime error: load of value 471, which is not a valid value for type 'IntRegIndex'
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/mem/cache/prefetch/queued.hh:57:12: runtime error: load of value 66, which is not a valid value for type 'bool'
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/exec-ns.cc.inc:89650:40: runtime error: load of value 430, which is not a valid value for type 'IntRegIndex'
warn: instruction 'csdb' unimplemented
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/exec-ns.cc.inc:89162:46: runtime error: load of value 474, which is not a valid value for type 'IntRegIndex'
warn: GIC APRn write ignored because not implemented: 0xd0
warn: GIC APRn write ignored because not implemented: 0xd4
warn: GIC APRn write ignored because not implemented: 0xd8
warn: GIC APRn write ignored because not implemented: 0xdc
warn: GIC APRn write ignored because not implemented: 0xd0
warn: GIC APRn write ignored because not implemented: 0xd4
warn: GIC APRn write ignored because not implemented: 0xd8
warn: GIC APRn write ignored because not implemented: 0xdc
/home/ciro/bak/git/linux-kernel-module-cheat/out/gem5/san/build/ARM/arch/arm/generated/exec-ns.cc.inc:89289:46: runtime error: load of value 473, which is not a valid value for type 'IntRegIndex'
but most of those errors don't seem to be on the right line, maybe because of opt vs debug?