[RISCV] Add SiFive X390 scheduling model #143938

mshockwave · 2025-06-12T17:24:10Z

This patch adds the scheduling model for sifive-x390. X390 is a dual issue in-order CPU. It has two scalar and two vector pipes, with VLEN=1024 and DLEN=512.

This patch stacks on top the new abstraction layer of #144442 .

I noticed that we probably need to update a few numbers in the model, but I prefer to do it incrementally in the future.

llvmbot · 2025-06-12T17:24:42Z

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

This patch adds the scheduling model for sifive-x390. X390 is a dual issue in-order CPU. It has two scalar and two vector pipes, with VLEN=1024 and DLEN=512.

This is a relatively big patch because it tries to reuse the existing SiFive7 scheduling model. Let me breakdown some of the biggest changes:

Processor resource definitions (i.e. pipes) are factored out into a multiclass, SiFive7ProcResources. Similarly, WriteRes entries and bypass entries (i.e. ReadAdvance) are also factored out into their own multiclass: SiFive7WriteResBase and SiFive7ReadAdvance, respectively.
The aforementioned three components, SiFive7ProcResources, SiFive7WriteResBase, and SiFive7ReadAdvance are encapsulated into a bigger multiclass, SiFive7SchedResources, which configures these components with parameters passed from the template arguments. An example configure value would be whether there is an extra vector ALU or not (i.e. extraVALU)
SiFive7's SchedMachineModel carries not only standard fields like issue width, but also the concrete config values corresponding to the processor. For instance, X390 has extraVALU sets to true.
In the final phase, we "bind" SchedMachineModel from each processor to a SiFive7SchedResources that is instantiated from that SchedMachineModel's config values.

I noticed that we probably need to update a few numbers in the model, but I prefer to do it incrementally in the future.

Patch is 1.08 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143938.diff

41 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVSchedSiFive7.td (+1140-1002)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/div-fdiv.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/gpr-bypass-c.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/gpr-bypass.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/instruction-tables-tests.s (+132-132)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/jump.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/different-lmul-instruments.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/different-sew-instruments.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/disable-im.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/fractional-lmul-data.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/lmul-instrument-at-start.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/lmul-instrument-in-middle.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/lmul-instrument-in-region.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/lmul-instrument-straddles-region.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/multiple-same-lmul-instruments.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/multiple-same-sew-instruments.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/needs-sew-but-only-lmul.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/no-vsetvli-to-start.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/reductions.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/sew-instrument-at-start.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/sew-instrument-in-middle.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/sew-instrument-in-region.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/sew-instrument-straddles-region.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/strided-load-store.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/strided-load-x0.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/vector-integer-arithmetic.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/vle-vse.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/vsetivli-lmul-instrument.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/vsetivli-lmul-sew-instrument.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/vsetvli-lmul-instrument.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFiveX280/vsetvli-lmul-sew-instrument.s (+8-8)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/div-fdiv.s (+54)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/fractional-lmul-data.s (+62)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/reductions.s (+678)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/strided-load-store.s (+368)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/strided-load-x0.s (+132)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/vector-fp.s (+4851)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/vector-integer-arithmetic.s (+2272)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/vgather-vcompress.s (+317)
(added) llvm/test/tools/llvm-mca/RISCV/SiFiveX390/vle-vse.s (+1256)

diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index de6f0ecfce737..b2a74987de66f 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -292,7 +292,8 @@ def SIFIVE_X280 : RISCVProcessorModel<"sifive-x280", SiFive7Model,
                                        FeatureStdExtZbb],
                                       SiFiveIntelligenceTuneFeatures>;
 
-def SIFIVE_X390 : RISCVProcessorModel<"sifive-x390", NoSchedModel,
+def SIFIVE_X390 : RISCVProcessorModel<"sifive-x390",
+                                      SiFiveX390Model,
                                       [Feature64Bit,
                                        FeatureStdExtI,
                                        FeatureStdExtM,
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
index c1d7cd4a716e7..78a176fcf18d9 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
@@ -169,6 +169,12 @@ class SiFive7GetOrderedReductionCycles<string mx, int sew, int VLEN> {
   int c = !mul(6, VLUpperBound);
 }
 
+class SiFive7FPLatencies {
+  int BasicFP16ALU;
+  int BasicFP32ALU;
+  int BasicFP64ALU;
+}
+
 class SiFive7AnyToGPRBypass<SchedRead read, int cycles = 2>
     : ReadAdvance<read, cycles, [WriteIALU, WriteIALU32,
                                  WriteShiftImm, WriteShiftImm32,
@@ -186,1121 +192,1253 @@ class SiFive7AnyToGPRBypass<SchedRead read, int cycles = 2>
                                  WriteIRem, WriteIRem32,
                                  WriteLDB, WriteLDH, WriteLDW, WriteLDD]>;
 
-// SiFive7 machine model for scheduling and other instruction cost heuristics.
-def SiFive7Model : SchedMachineModel {
-  let MicroOpBufferSize = 0; // Explicitly set to zero since SiFive7 is in-order.
-  let IssueWidth = 2;        // 2 micro-ops are dispatched per cycle.
-  let LoadLatency = 3;
-  let MispredictPenalty = 3;
-  let CompleteModel = 0;
-  let EnableIntervals = true;
-  let UnsupportedFeatures = [HasStdExtZbkb, HasStdExtZbkc, HasStdExtZbkx,
-                             HasStdExtZcmt, HasStdExtZknd, HasStdExtZkne,
-                             HasStdExtZknh, HasStdExtZksed, HasStdExtZksh,
-                             HasStdExtZkr];
-}
-
-// The SiFive7 microarchitecture has three pipelines: A, B, V.
+// The SiFive7 microarchitecture has three kinds of pipelines: A, B, V.
 // Pipe A can handle memory, integer alu and vector operations.
 // Pipe B can handle integer alu, control flow, integer multiply and divide,
 // and floating point computation.
-// The V pipeline is modeled by the VCQ, VA, VL, and VS resources.
-let SchedModel = SiFive7Model in {
-let BufferSize = 0 in {
-def SiFive7PipeA       : ProcResource<1>;
-def SiFive7PipeB       : ProcResource<1>;
-def SiFive7IDiv        : ProcResource<1>; // Int Division
-def SiFive7FDiv        : ProcResource<1>; // FP Division/Sqrt
-def SiFive7VA          : ProcResource<1>; // Arithmetic sequencer
-def SiFive7VL          : ProcResource<1>; // Load sequencer
-def SiFive7VS          : ProcResource<1>; // Store sequencer
-// The VCQ accepts instructions from the the A Pipe and holds them until the
-// vector unit is ready to dequeue them. The unit dequeues up to one instruction
-// per cycle, in order, as soon as the sequencer for that type of instruction is
-// available. This resource is meant to be used for 1 cycle by all vector
-// instructions, to model that only one vector instruction may be dequeued at a
-// time. The actual dequeueing into the sequencer is modeled by the VA, VL, and
-// VS sequencer resources below. Each of them will only accept a single
-// instruction at a time and remain busy for the number of cycles associated
-// with that instruction.
-def SiFive7VCQ         : ProcResource<1>; // Vector Command Queue
-}
-
-def SiFive7PipeAB : ProcResGroup<[SiFive7PipeA, SiFive7PipeB]>;
-
-defvar SiFive7VLEN = 512;
+// The V pipeline is modeled by the VCQ, VA, VL, and VS resources. There can
+// be one or two VA (Vector Arithmetic).
+multiclass SiFive7ProcResources<bit extraVALU = false> {
+  let BufferSize = 0 in {
+    def PipeA     : ProcResource<1>;
+    def PipeB     : ProcResource<1>;
+
+    def IDiv      : ProcResource<1>; // Int Division
+    def FDiv      : ProcResource<1>; // FP Division/Sqrt
+
+    // Arithmetic sequencer(s)
+    if extraVALU then {
+      // VA1 can handle any vector airthmetic instruction.
+      def VA1     : ProcResource<1>;
+      // VA2 generally can only handle simple vector arithmetic.
+      def VA2     : ProcResource<1>;
+    } else {
+      def VA      : ProcResource<1>;
+    }
 
-// Branching
-let Latency = 3 in {
-def : WriteRes<WriteJmp, [SiFive7PipeB]>;
-def : WriteRes<WriteJal, [SiFive7PipeB]>;
-def : WriteRes<WriteJalr, [SiFive7PipeB]>;
-}
+    def VL        : ProcResource<1>; // Load sequencer
+    def VS        : ProcResource<1>; // Store sequencer
+    // The VCQ accepts instructions from the the A Pipe and holds them until the
+    // vector unit is ready to dequeue them. The unit dequeues up to one instruction
+    // per cycle, in order, as soon as the sequencer for that type of instruction is
+    // available. This resource is meant to be used for 1 cycle by all vector
+    // instructions, to model that only one vector instruction may be dequeued at a
+    // time. The actual dequeueing into the sequencer is modeled by the VA, VL, and
+    // VS sequencer resources below. Each of them will only accept a single
+    // instruction at a time and remain busy for the number of cycles associated
+    // with that instruction.
+    def VCQ       : ProcResource<1>; // Vector Command Queue
+  }
 
-//Short forward branch
-def : WriteRes<WriteSFB, [SiFive7PipeA, SiFive7PipeB]> {
-  let Latency = 3;
-  let NumMicroOps = 2;
-}
+  def PipeAB : ProcResGroup<[!cast<ProcResource>(NAME#"PipeA"),
+                             !cast<ProcResource>(NAME#"PipeB")]>;
 
-// Integer arithmetic and logic
-let Latency = 3 in {
-def : WriteRes<WriteIALU, [SiFive7PipeAB]>;
-def : WriteRes<WriteIALU32, [SiFive7PipeAB]>;
-def : WriteRes<WriteShiftImm, [SiFive7PipeAB]>;
-def : WriteRes<WriteShiftImm32, [SiFive7PipeAB]>;
-def : WriteRes<WriteShiftReg, [SiFive7PipeAB]>;
-def : WriteRes<WriteShiftReg32, [SiFive7PipeAB]>;
+  if extraVALU then
+  def VA1OrVA2 : ProcResGroup<[!cast<ProcResource>(NAME#"VA1"),
+                               !cast<ProcResource>(NAME#"VA2")]>;
 }
 
-// Integer multiplication
-let Latency = 3 in {
-def : WriteRes<WriteIMul, [SiFive7PipeB]>;
-def : WriteRes<WriteIMul32, [SiFive7PipeB]>;
-}
+multiclass SiFive7WriteResBase<int VLEN,
+    ProcResourceKind PipeA, ProcResourceKind PipeB, ProcResourceKind PipeAB,
+    ProcResourceKind IDiv, ProcResourceKind FDiv,
+    ProcResourceKind VA1, ProcResourceKind VA1OrVA2,
+    ProcResourceKind VL, ProcResourceKind VS,
+    ProcResourceKind VCQ,
+    SiFive7FPLatencies fpLatencies,
+    bit isFP64Throttled = false> {
 
-// Integer division
-def : WriteRes<WriteIDiv, [SiFive7PipeB, SiFive7IDiv]> {
-  let Latency = 66;
-  let ReleaseAtCycles = [1, 65];
-}
-def : WriteRes<WriteIDiv32,  [SiFive7PipeB, SiFive7IDiv]> {
-  let Latency = 34;
-  let ReleaseAtCycles = [1, 33];
-}
-
-// Integer remainder
-def : WriteRes<WriteIRem, [SiFive7PipeB, SiFive7IDiv]> {
-  let Latency = 66;
-  let ReleaseAtCycles = [1, 65];
-}
-def : WriteRes<WriteIRem32,  [SiFive7PipeB, SiFive7IDiv]> {
-  let Latency = 34;
-  let ReleaseAtCycles = [1, 33];
-}
+  // Branching
+  let Latency = 3 in {
+    def : WriteRes<WriteJmp, [PipeB]>;
+    def : WriteRes<WriteJal, [PipeB]>;
+    def : WriteRes<WriteJalr, [PipeB]>;
+  }
 
-// Bitmanip
-let Latency = 3 in {
-// Rotates are in the late-B ALU.
-def : WriteRes<WriteRotateImm, [SiFive7PipeB]>;
-def : WriteRes<WriteRotateImm32, [SiFive7PipeB]>;
-def : WriteRes<WriteRotateReg, [SiFive7PipeB]>;
-def : WriteRes<WriteRotateReg32, [SiFive7PipeB]>;
+  //Short forward branch
+  def : WriteRes<WriteSFB, [PipeA, PipeB]> {
+    let Latency = 3;
+    let NumMicroOps = 2;
+  }
 
-// clz[w]/ctz[w] are in the late-B ALU.
-def : WriteRes<WriteCLZ, [SiFive7PipeB]>;
-def : WriteRes<WriteCLZ32, [SiFive7PipeB]>;
-def : WriteRes<WriteCTZ, [SiFive7PipeB]>;
-def : WriteRes<WriteCTZ32, [SiFive7PipeB]>;
+  // Integer arithmetic and logic
+  let Latency = 3 in {
+    def : WriteRes<WriteIALU, [PipeAB]>;
+    def : WriteRes<WriteIALU32, [PipeAB]>;
+    def : WriteRes<WriteShiftImm, [PipeAB]>;
+    def : WriteRes<WriteShiftImm32, [PipeAB]>;
+    def : WriteRes<WriteShiftReg, [PipeAB]>;
+    def : WriteRes<WriteShiftReg32, [PipeAB]>;
+  }
 
-// cpop[w] look exactly like multiply.
-def : WriteRes<WriteCPOP, [SiFive7PipeB]>;
-def : WriteRes<WriteCPOP32, [SiFive7PipeB]>;
+  // Integer multiplication
+  let Latency = 3 in {
+    def : WriteRes<WriteIMul, [PipeB]>;
+    def : WriteRes<WriteIMul32, [PipeB]>;
+  }
 
-// orc.b is in the late-B ALU.
-def : WriteRes<WriteORCB, [SiFive7PipeB]>;
+  // Integer division
+  def : WriteRes<WriteIDiv, [PipeB, IDiv]> {
+    let Latency = 66;
+    let ReleaseAtCycles = [1, 65];
+  }
+  def : WriteRes<WriteIDiv32,  [PipeB, IDiv]> {
+    let Latency = 34;
+    let ReleaseAtCycles = [1, 33];
+  }
 
-// min/max are in the late-B ALU
-def : WriteRes<WriteIMinMax, [SiFive7PipeB]>;
+  // Integer remainder
+  def : WriteRes<WriteIRem, [PipeB, IDiv]> {
+    let Latency = 66;
+    let ReleaseAtCycles = [1, 65];
+  }
+  def : WriteRes<WriteIRem32,  [PipeB, IDiv]> {
+    let Latency = 34;
+    let ReleaseAtCycles = [1, 33];
+  }
 
-// rev8 is in the late-A and late-B ALUs.
-def : WriteRes<WriteREV8, [SiFive7PipeAB]>;
+  // Bitmanip
+  let Latency = 3 in {
+    // Rotates are in the late-B ALU.
+    def : WriteRes<WriteRotateImm, [PipeB]>;
+    def : WriteRes<WriteRotateImm32, [PipeB]>;
+    def : WriteRes<WriteRotateReg, [PipeB]>;
+    def : WriteRes<WriteRotateReg32, [PipeB]>;
 
-// shNadd[.uw] is on the early-B and late-B ALUs.
-def : WriteRes<WriteSHXADD, [SiFive7PipeB]>;
-def : WriteRes<WriteSHXADD32, [SiFive7PipeB]>;
-}
+    // clz[w]/ctz[w] are in the late-B ALU.
+    def : WriteRes<WriteCLZ, [PipeB]>;
+    def : WriteRes<WriteCLZ32, [PipeB]>;
+    def : WriteRes<WriteCTZ, [PipeB]>;
+    def : WriteRes<WriteCTZ32, [PipeB]>;
 
-// Single-bit instructions
-// BEXT[I] instruction is available on all ALUs and the other instructions
-// are only available on the SiFive7B pipe.
-let Latency = 3 in {
-def : WriteRes<WriteSingleBit, [SiFive7PipeB]>;
-def : WriteRes<WriteSingleBitImm, [SiFive7PipeB]>;
-def : WriteRes<WriteBEXT, [SiFive7PipeAB]>;
-def : WriteRes<WriteBEXTI, [SiFive7PipeAB]>;
-}
+    // cpop[w] look exactly like multiply.
+    def : WriteRes<WriteCPOP, [PipeB]>;
+    def : WriteRes<WriteCPOP32, [PipeB]>;
 
-// Memory
-def : WriteRes<WriteSTB, [SiFive7PipeA]>;
-def : WriteRes<WriteSTH, [SiFive7PipeA]>;
-def : WriteRes<WriteSTW, [SiFive7PipeA]>;
-def : WriteRes<WriteSTD, [SiFive7PipeA]>;
-def : WriteRes<WriteFST16, [SiFive7PipeA]>;
-def : WriteRes<WriteFST32, [SiFive7PipeA]>;
-def : WriteRes<WriteFST64, [SiFive7PipeA]>;
-
-let Latency = 3 in {
-def : WriteRes<WriteLDB, [SiFive7PipeA]>;
-def : WriteRes<WriteLDH, [SiFive7PipeA]>;
-def : WriteRes<WriteLDW, [SiFive7PipeA]>;
-def : WriteRes<WriteLDD, [SiFive7PipeA]>;
-}
+    // orc.b is in the late-B ALU.
+    def : WriteRes<WriteORCB, [PipeB]>;
 
-let Latency = 2 in {
-def : WriteRes<WriteFLD16, [SiFive7PipeA]>;
-def : WriteRes<WriteFLD32, [SiFive7PipeA]>;
-def : WriteRes<WriteFLD64, [SiFive7PipeA]>;
-}
+    // min/max are in the late-B ALU
+    def : WriteRes<WriteIMinMax, [PipeB]>;
 
-// Atomic memory
-def : WriteRes<WriteAtomicSTW, [SiFive7PipeA]>;
-def : WriteRes<WriteAtomicSTD, [SiFive7PipeA]>;
+    // rev8 is in the late-A and late-B ALUs.
+    def : WriteRes<WriteREV8, [PipeAB]>;
 
-let Latency = 3 in {
-def : WriteRes<WriteAtomicW, [SiFive7PipeA]>;
-def : WriteRes<WriteAtomicD, [SiFive7PipeA]>;
-def : WriteRes<WriteAtomicLDW, [SiFive7PipeA]>;
-def : WriteRes<WriteAtomicLDD, [SiFive7PipeA]>;
-}
+    // shNadd[.uw] is on the early-B and late-B ALUs.
+    def : WriteRes<WriteSHXADD, [PipeB]>;
+    def : WriteRes<WriteSHXADD32, [PipeB]>;
+  }
 
-// Half precision.
-let Latency = 5 in {
-def : WriteRes<WriteFAdd16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMul16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMA16, [SiFive7PipeB]>;
-}
-let Latency = 3 in {
-def : WriteRes<WriteFSGNJ16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMinMax16, [SiFive7PipeB]>;
-}
+  // Single-bit instructions
+  // BEXT[I] instruction is available on all ALUs and the other instructions
+  // are only available on the B pipe.
+  let Latency = 3 in {
+    def : WriteRes<WriteSingleBit, [PipeB]>;
+    def : WriteRes<WriteSingleBitImm, [PipeB]>;
+    def : WriteRes<WriteBEXT, [PipeAB]>;
+    def : WriteRes<WriteBEXTI, [PipeAB]>;
+  }
 
-let Latency = 14, ReleaseAtCycles = [1, 13] in {
-def :  WriteRes<WriteFDiv16, [SiFive7PipeB, SiFive7FDiv]>;
-def :  WriteRes<WriteFSqrt16, [SiFive7PipeB, SiFive7FDiv]>;
-}
+  // Memory
+  def : WriteRes<WriteSTB, [PipeA]>;
+  def : WriteRes<WriteSTH, [PipeA]>;
+  def : WriteRes<WriteSTW, [PipeA]>;
+  def : WriteRes<WriteSTD, [PipeA]>;
+  def : WriteRes<WriteFST16, [PipeA]>;
+  def : WriteRes<WriteFST32, [PipeA]>;
+  def : WriteRes<WriteFST64, [PipeA]>;
+
+  let Latency = 3 in {
+  def : WriteRes<WriteLDB, [PipeA]>;
+  def : WriteRes<WriteLDH, [PipeA]>;
+  def : WriteRes<WriteLDW, [PipeA]>;
+  def : WriteRes<WriteLDD, [PipeA]>;
+  }
 
-// Single precision.
-let Latency = 5 in {
-def : WriteRes<WriteFAdd32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMul32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMA32, [SiFive7PipeB]>;
-}
-let Latency = 3 in {
-def : WriteRes<WriteFSGNJ32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMinMax32, [SiFive7PipeB]>;
-}
+  let Latency = 2 in {
+  def : WriteRes<WriteFLD16, [PipeA]>;
+  def : WriteRes<WriteFLD32, [PipeA]>;
+  def : WriteRes<WriteFLD64, [PipeA]>;
+  }
 
-def : WriteRes<WriteFDiv32, [SiFive7PipeB, SiFive7FDiv]> { let Latency = 27;
-                                                         let ReleaseAtCycles = [1, 26]; }
-def : WriteRes<WriteFSqrt32, [SiFive7PipeB, SiFive7FDiv]> { let Latency = 27;
-                                                          let ReleaseAtCycles = [1, 26]; }
+  // Atomic memory
+  def : WriteRes<WriteAtomicSTW, [PipeA]>;
+  def : WriteRes<WriteAtomicSTD, [PipeA]>;
 
-// Double precision
-let Latency = 7 in {
-def : WriteRes<WriteFAdd64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMul64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMA64, [SiFive7PipeB]>;
-}
-let Latency = 3 in {
-def : WriteRes<WriteFSGNJ64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMinMax64, [SiFive7PipeB]>;
-}
+  let Latency = 3 in {
+  def : WriteRes<WriteAtomicW, [PipeA]>;
+  def : WriteRes<WriteAtomicD, [PipeA]>;
+  def : WriteRes<WriteAtomicLDW, [PipeA]>;
+  def : WriteRes<WriteAtomicLDD, [PipeA]>;
+  }
 
-def : WriteRes<WriteFDiv64, [SiFive7PipeB, SiFive7FDiv]> { let Latency = 56;
-                                                         let ReleaseAtCycles = [1, 55]; }
-def : WriteRes<WriteFSqrt64, [SiFive7PipeB, SiFive7FDiv]> { let Latency = 56;
-                                                          let ReleaseAtCycles = [1, 55]; }
-
-// Conversions
-let Latency = 3 in {
-def : WriteRes<WriteFCvtI32ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI32ToF32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI32ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI64ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI64ToF32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtI64ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF16ToI32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF16ToI64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF16ToF32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF16ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF32ToI32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF32ToI64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF32ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF32ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF64ToI32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF64ToI64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF64ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCvtF64ToF32, [SiFive7PipeB]>;
-
-def : WriteRes<WriteFClass16, [SiFive7PipeB]>;
-def : WriteRes<WriteFClass32, [SiFive7PipeB]>;
-def : WriteRes<WriteFClass64, [SiFive7PipeB]>;
-def : WriteRes<WriteFCmp16, [SiFive7PipeB]>;
-def : WriteRes<WriteFCmp32, [SiFive7PipeB]>;
-def : WriteRes<WriteFCmp64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovI16ToF16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovF16ToI16, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovI32ToF32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovF32ToI32, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovI64ToF64, [SiFive7PipeB]>;
-def : WriteRes<WriteFMovF64ToI64, [SiFive7PipeB]>;
-}
+  // Half precision.
+  let Latency = fpLatencies.BasicFP16ALU in {
+  def : WriteRes<WriteFAdd16, [PipeB]>;
+  def : WriteRes<WriteFMul16, [PipeB]>;
+  def : WriteRes<WriteFMA16, [PipeB]>;
+  }
+  let Latency = 3 in {
+  def : WriteRes<WriteFSGNJ16, [PipeB]>;
+  def : WriteRes<WriteFMinMax16, [PipeB]>;
+  }
 
-// 6. Configuration-Setting Instructions
-let Latency = 3 in {
-def : WriteRes<WriteVSETVLI, [SiFive7PipeA]>;
-def : WriteRes<WriteVSETIVLI, [SiFive7PipeA]>;
-def : WriteRes<WriteVSETVL, [SiFive7PipeA]>;
-}
+  let Latency = 14, ReleaseAtCycles = [1, 13] in {
+  def :  WriteRes<WriteFDiv16, [PipeB, FDiv]>;
+  def :  WriteRes<WriteFSqrt16, [PipeB, FDiv]>;
+  }
 
-// 7. Vector Loads and Stores
-// Unit-stride loads and stores can operate at the full bandwidth of the memory
-// pipe. The memory pipe is DLEN bits wide on x280.
-foreach mx = SchedMxList in {
-  defvar Cycles = SiFive7GetCyclesDefault<mx>.c;
-  defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
-    defm "" : LMULWriteResMX<"WriteVLDE",    [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVLDFF",   [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+  // Single precision.
+  let Latency = fpLatencies.BasicFP32ALU in {
+    def : WriteRes<WriteFAdd32, [PipeB]>;
+    def : WriteRes<WriteFMul32, [PipeB]>;
+    def : WriteRes<WriteFMA32, [PipeB]>;
+  }
+  let Latency = 3 in {
+    def : WriteRes<WriteFSGNJ32, [PipeB]>;
+    def : WriteRes<WriteFMinMax32, [PipeB]>;
   }
-  let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
-  defm "" : LMULWriteResMX<"WriteVSTE",    [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
-}
 
-foreach mx = SchedMxList in {
-  defvar Cycles = SiFive7GetMaskLoadStoreCycles<mx>.c;
-  defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
-  defm "" : LMULWriteResMX<"WriteVLDM",    [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
-  let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
-  defm "" : LMULWriteResMX<"WriteVSTM",    [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
-}
+  def : WriteRes<WriteFDiv32, [PipeB, FDiv]> {
+    let Latency = 27;
+    let ReleaseAtCycles = [1, 26];
+  }
+  def : WriteRes<WriteFSqrt32, [PipeB, FDiv]> {
+    let Latency = 27;
+    let ReleaseAtCycles = [1, 26];
+  }
 
-// Strided loads and stores operate at one element per cycle and should be
-// scheduled accordingly. Indexed loads and stores operate at one element per
-// cycle, and they stall the machine until all addresses have been generated,
-// so they cannot be scheduled. Indexed and strided loads and stores have LMUL
-// specific suffixes, but since SEW is already encoded in the name of the
-// resource, we do not need to use LMULSEWXXX constructors. However, we do
-// use the SEW from the name to determine the number of Cycles.
-
-foreach mx = SchedMxList in {
-  defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
-  defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 8, SiFive7VLEN>.c;
-  defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS8",  VLDSX0Pred, [SiFive7VCQ, SiFive7VL],
-                                       4, [0, 1], [1, !add(1, VLDSX0Cycles)], !add(3, Cycles),
-                                       [0, 1], [1, !add(1, Cycles)], mx, IsWorstCase>;
-  let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1,...
[truncated]

wangpc-pp

Just a general comment: would the abstraction make the code hard to maintain compared to putting X380 model into a standalone file?

mshockwave · 2025-06-13T16:03:15Z

Just a general comment: would the abstraction make the code hard to maintain compared to putting X380 model into a standalone file?

The idea behind a common abstraction is that any update on the scheduling models of X390 and SiFive7 cores will only need to be done once, as they share lots common performance characteristics. So it actually lowers the maintenance cost

wangpc-pp

Can you separate this PR into 2 PRs at least, it is massive and hard to review:

Adding abstraction.
Adding X380.

mshockwave · 2025-06-16T22:02:42Z

Can you separate this PR into 2 PRs at least, it is massive and hard to review:

Adding abstraction.

Adding X380.

Good idea, the abstraction layer is now in #144442

wangpc-pp

LGTM.
I still feel uncomfortable about the complexity (too many conditions, and I personally prefer straightforward definitions) but I think we can improve it progressively.

mshockwave · 2025-06-17T16:28:08Z

LGTM. I still feel uncomfortable about the complexity (too many conditions, and I personally prefer straightforward definitions) but I think we can improve it progressively.

Yeah I agree the complexity is a bit daunting. I feel like fundamentally the scheduling model infrastructure should be more ergonomic on creating hierarchical scheduling models. In fact, many many moons ago on the inception of the current instr scheduling model framework, I think people were thinking about making a way to have a "base" / parent scheduling model (there is a comment left somewhere in the TableGen code that shows this) but obviously it went nowhere.

wangpc-pp · 2025-06-18T03:08:22Z

LGTM. I still feel uncomfortable about the complexity (too many conditions, and I personally prefer straightforward definitions) but I think we can improve it progressively.

Yeah I agree the complexity is a bit daunting. I feel like fundamentally the scheduling model infrastructure should be more ergonomic on creating hierarchical scheduling models. In fact, many many moons ago on the inception of the current instr scheduling model framework, I think people were thinking about making a way to have a "base" / parent scheduling model (there is a comment left somewhere in the TableGen code that shows this) but obviously it went nowhere.

I had the same plan but I didn't have much time to give it a try. Please feel free to tag me when you are trying to fix these scheduling model issues. :-)

mshockwave · 2025-06-20T16:04:33Z

Any comments from other reviewers? If there isn't I intend to merge this series of patches early next week.

Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>

This patch adds the scheduling model for sifive-x390. X390 is a dual issue in-order CPU. It has two scalar and two vector pipes, with VLEN=1024 and DLEN=512. Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>

mshockwave requested review from lenary, preames, tclin914, topperc and wangpc-pp June 12, 2025 17:24

llvmbot added the backend:RISC-V label Jun 12, 2025

wangpc-pp reviewed Jun 13, 2025

View reviewed changes

wangpc-pp reviewed Jun 16, 2025

View reviewed changes

mshockwave mentioned this pull request Jun 16, 2025

[RISCV] Factor out common SiFive7 scheduling model into an abstraction layer #144442

Merged

mshockwave force-pushed the patch/rvv/x390-sched-model branch from d8c0efa to d123b88 Compare June 16, 2025 22:01

wangpc-pp approved these changes Jun 17, 2025

View reviewed changes

[RISCV] Add X390 scheduling model

dc41b77

Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>

mshockwave force-pushed the patch/rvv/x390-sched-model branch from d123b88 to dc41b77 Compare June 23, 2025 16:50

mshockwave merged commit f40909f into llvm:main Jun 23, 2025
5 of 7 checks passed

mshockwave deleted the patch/rvv/x390-sched-model branch June 23, 2025 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Add SiFive X390 scheduling model #143938

[RISCV] Add SiFive X390 scheduling model #143938

Uh oh!

mshockwave commented Jun 12, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jun 12, 2025

Uh oh!

wangpc-pp left a comment •

edited

Loading

Uh oh!

mshockwave commented Jun 13, 2025

Uh oh!

wangpc-pp left a comment

Uh oh!

mshockwave commented Jun 16, 2025

Uh oh!

wangpc-pp left a comment

Uh oh!

mshockwave commented Jun 17, 2025 •

edited

Loading

Uh oh!

wangpc-pp commented Jun 18, 2025

Uh oh!

mshockwave commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

[RISCV] Add SiFive X390 scheduling model #143938

[RISCV] Add SiFive X390 scheduling model #143938

Uh oh!

Conversation

mshockwave commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 12, 2025

Uh oh!

wangpc-pp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshockwave commented Jun 13, 2025

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

mshockwave commented Jun 16, 2025

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

mshockwave commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangpc-pp commented Jun 18, 2025

Uh oh!

mshockwave commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

mshockwave commented Jun 12, 2025 •

edited

Loading

wangpc-pp left a comment •

edited

Loading

mshockwave commented Jun 17, 2025 •

edited

Loading