Skip to content

RISC-V: Add support of RAS for RISC-V architecture #11209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

hschauhan
Copy link

RISC-V: Add support of RAS for RISCV64 architecture

RAS stack helps in catching and diagnosing hardware errors. It is required in server class platforms which need
high availability and reliability. This patch adds support to generate HEST table required for RAS.

  • Breaking change?
    NO
  • Impacts security?
    NO
  • Includes tests?
    • Tests - Does this PR include any explicit test code?
    • Examples: Unit tests or integration tests.

How This Was Tested

Running

qemu/build/riscv64-softmmu/qemu-system-riscv64
-accel tcg -m 4096 -smp 2
-serial mon:stdio
-d guest_errors -D ./qemu.log
-bios <PATH/TO/OPENSBI/fw_dynamic.bin>
-device virtio-gpu-pci -full-screen
-device qemu-xhci
-device usb-kbd
-blockdev node-name=pflash0,driver=file,read-only=on,filename=<PATH/TO/RISCV_VIRT_CODE.fd>
-blockdev node-name=pflash1,driver=file,filename=<PATH/TO/RISCV_VIRT_VARS.fd>
-M virt,pflash0=pflash0,pflash1=pflash1,rpmi=true,ras=true,aia=aplic-imsic
-kernel <PATH/TO/KERNLE/Image>
-initrd <PATH/TO/ROOTFS>
-append "root=/dev/ram rw console=ttyS0 earlycon=uart8250,mmio,0x10000000"

NOTE: Please make sure that you replace all the <PATH/TO> to correct location of the related binaries.

Error Injection Using devmem utility

Currently, only the HART errors are supported and EINJ framework is not supported.
The error injection is done by using devmem utility to write directly to RERI
device address space. The following are 2 examples of HART errors.

  1. RERI Config Register Programming

devmem 0x4010040 32 0x2a1

  1. TLB Error

devmem 0x4010048 32 0x9001404
devmem 0x4010044 8 1

Sample Output:

[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[Hardware Error]: event severity: recoverable
[Hardware Error]: Error 0, type: recoverable
[Hardware Error]: section_type: general processor error
[Hardware Error]: processor_type: 3, RISCV
[Hardware Error]: processor_isa: 6, RISCV64
[Hardware Error]: error_type: 0x02
[Hardware Error]: TLB error
[Hardware Error]: operation: 1, data read
[Hardware Error]: target_address: 0x0000000000000000

Internal HART Error

devmem 0x4010048 32 0xC001702
devmem 0x4010044 8 1

Sample Output:

[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[Hardware Error]: It has been corrected by h/w and requires no further action
[Hardware Error]: event severity: corrected
[Hardware Error]: Error 0, type: corrected
[Hardware Error]: section_type: general processor error
[Hardware Error]: processor_type: 3, RISCV
[Hardware Error]: processor_isa: 6, RISCV64
[Hardware Error]: error_type: 0x08
[Hardware Error]: micro-architectural error
[Hardware Error]: operation: 2, data write
[Hardware Error]: target_address: 0x0000000000000000
<Describe the test(s) that were run to verify the changes.>

Integration Instructions

N/A

Add new error codes defined in SBI relating to shared memory, etc.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Add new SBI error codes in translate_error and map them
to suitable EFI error codes.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Extensions should be detected before they can be used. Add a function
to detect the support for an SBI extension using probe extension SBI
function.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Add SBI MPXY extension and its function macros.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Add SBI Mpxy client library to communicate using RPMI or other
protocols using SBI MPXY extension.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Add the RAS agent client library which is used to communicate to
a remote RAS agent. The RAS agent can be queried about the details
of various hardware error sources in the system.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Add support for generation of HEST table with the help of a remote
RAS agent. The HEST table is generated but the error descriptors
(in GHESv2 or other platform specific format) are fetched from the
RAS agent.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Enable ACPI hardware error source table generation support
for RISC-V Qemu Virt platform.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants