RISC-V: Add support of RAS for RISC-V architecture #11209
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
RISC-V: Add support of RAS for RISCV64 architecture
RAS stack helps in catching and diagnosing hardware errors. It is required in server class platforms which need
high availability and reliability. This patch adds support to generate HEST table required for RAS.
NO
NO
How This Was Tested
Running
qemu/build/riscv64-softmmu/qemu-system-riscv64
-accel tcg -m 4096 -smp 2
-serial mon:stdio
-d guest_errors -D ./qemu.log
-bios <PATH/TO/OPENSBI/fw_dynamic.bin>
-device virtio-gpu-pci -full-screen
-device qemu-xhci
-device usb-kbd
-blockdev node-name=pflash0,driver=file,read-only=on,filename=<PATH/TO/RISCV_VIRT_CODE.fd>
-blockdev node-name=pflash1,driver=file,filename=<PATH/TO/RISCV_VIRT_VARS.fd>
-M virt,pflash0=pflash0,pflash1=pflash1,rpmi=true,ras=true,aia=aplic-imsic
-kernel <PATH/TO/KERNLE/Image>
-initrd <PATH/TO/ROOTFS>
-append "root=/dev/ram rw console=ttyS0 earlycon=uart8250,mmio,0x10000000"
NOTE: Please make sure that you replace all the <PATH/TO> to correct location of the related binaries.
Error Injection Using devmem utility
Currently, only the HART errors are supported and EINJ framework is not supported.
The error injection is done by using devmem utility to write directly to RERI
device address space. The following are 2 examples of HART errors.
devmem 0x4010040 32 0x2a1
devmem 0x4010048 32 0x9001404
devmem 0x4010044 8 1
Sample Output:
[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[Hardware Error]: event severity: recoverable
[Hardware Error]: Error 0, type: recoverable
[Hardware Error]: section_type: general processor error
[Hardware Error]: processor_type: 3, RISCV
[Hardware Error]: processor_isa: 6, RISCV64
[Hardware Error]: error_type: 0x02
[Hardware Error]: TLB error
[Hardware Error]: operation: 1, data read
[Hardware Error]: target_address: 0x0000000000000000
Internal HART Error
devmem 0x4010048 32 0xC001702
devmem 0x4010044 8 1
Sample Output:
[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[Hardware Error]: It has been corrected by h/w and requires no further action
[Hardware Error]: event severity: corrected
[Hardware Error]: Error 0, type: corrected
[Hardware Error]: section_type: general processor error
[Hardware Error]: processor_type: 3, RISCV
[Hardware Error]: processor_isa: 6, RISCV64
[Hardware Error]: error_type: 0x08
[Hardware Error]: micro-architectural error
[Hardware Error]: operation: 2, data write
[Hardware Error]: target_address: 0x0000000000000000
<Describe the test(s) that were run to verify the changes.>
Integration Instructions
N/A