ORCA is an implementation of RISC-V. It is intended to target FPGAs and can be configured as either RV32I a RV32IM core.
ORCA can be used as a standalone processor, but was built to be a host to Vectorblox's proprietary Lightweight Vector Extensions (LVE) or full-fledged Matrix processor MXP.
It has optional AXI3/4 instruction and data caches, a separate AXI4-Lite interface for uncached transactions, and an auxiliary interface that can be configured as either WISHBONE, Intel Avalon, or Xilinx LMB.
A QSYS component is provided for easy integration into the Intel toolchain, as well as a Vivado IPI component for Xilinx integration.
GitHub Release Note (No Cache Support)
The GitHub release of ORCA currently does not have support for caches; when enabled the RTL will fail to build with an assertion error. Cache support will be released later; for now it is recommended to use low-latency onchip RAM for instructions and often-used data memory (stack segments, etc.).
To build the toolchain, set
RISCV_INSTALL to be the destination directory for
the toolchain, for instance
/opt/riscv/, then run
We have sample projects for different vendor platforms in the systems/ subdirectory.
The zedboard directory contains a Xilinx Vivado sample project that targets the Zeboard development board using the Zynq-7000 XC7Z020-CLG484-1 SoC.
The de2-115 directory contains a Intel QSYS/Quartus project that targets the DE2-115 (found in the TPad or Veek development systems).
In addition to these example system we provide a system (in the sim directory) for use in debug and automated tests using Modelsim. We use Intel QSYS to help maintain these systems and generate interconnect. The example systems can be simulated in full if desired; see the README in each individual directory for details.
ORCA Core Generics
Below is an overview of the various generics exposed from the top level to configure the ORCA core (ORCA external memory interface generics are in a separate section below). If using Intel Quartus/QSYS or Xilinx Vivado a graphical interface which simplifies setting the generics is provided.
REGISTER_SIZE (default = 32)
Reserved for future RV64 support; currently only a value of 32 supported.
RESET_VECTOR (default = 0x0000 0000)
Address that the first instruction to be executed at reset is located.
INTERRUPT_VECTOR (default = 0x0000 0200)
Address that will be jumped to when an interrupt is received.
MAX_IFETCHES_IN_FLIGHT (default = 1)
Number of outstanding instruction fetches in flight supported; a higher number will use more area but get more throughput if there are multiple cycles of delay to instruction RAM.
BTB_ENTRIES (default = 0)
Number of Branch Target Buffer (BTB) entries. If set to 0 there is no branch prediction (equivalent to all branches predicted untaken). If set to 1 or more then branch prediction is done using a simple direct-mapped 1-bit BTB. BTB entries are stored in simple dual-port distributed RAMs (LUTRAMs) for architectures that support them and so should not be set to excessive size (currently they are limited to 64-entries max). If your architecture does not support distributed RAMs then flip-flops will be used and the number of BTB entries should be kept to the low single digits.
MULTIPLY_ENABLE (default = 0)
Enable hardware multiplication. If set to 0 an illegal instruction exception will be thrown for multiply instructions.
DIVIDE_ENABLE (default = 0 )
Enable hardware division (32 cycles/instruction). If set to 0 an illegal instruction exception will be thrown for divide instructions.
SHIFTER_MAX_CYCLES (default = 1)
How many cycles a shift operation will take, with lower values using more logic.
Valid values are 1, 8 and 32. If
MULTIPLY_ENABLE is set to 1 this
configuration option is ignored and the shifter uses the multiplier.
POWER_OPTIMIZED (default = 0)
If this is set to 1, then extra gates are added to improve power usage at the expense of area and maximumx frequency.
COUNTER_LENGTH (default = 0)
How many bits the mtime register contains. The RISC-V standard dictates a 64-bit counter, but ORCA allows 32-bit and 0-bit (disabled) values to save area in embedded systems that do not need a full 64-bit counter.
ENABLE_EXCEPTIONS (default = 1)
If this is set to 1, then logic is added to allow the processor for supporting
illegal instruction traps and the mret instruction. If external interrupts are
required this must be set to 1 as well as
PIPELINE_STAGES ( default = 5)
Legal values are 4 and 5. If set to 4, the registers on the output side of the register file are eliminated to save area at the expense of maxiumum frequency.
VCP_ENABLE (default = 0)
Enable the Vector Coprocessor Port (VCP); this connects to VectorBlox's
proprietary Lightweight Vector Extensions (LVE). A value of 1 enables the
32-bit variant of the VCP instructions, while a value of 2 enables both 32-bit
and 64-bit VCP instructions (no other 64-bit instructions are supported even
VCP_ENABLE is set to 2).
ENABLE_EXT_INTERRUPTS (default = 0)
Enable interrupts from the outside world to interrupt the processor.
ENABLE_EXCEPTIONS must be set to 1 when setting this to 1.
NUM_EXT_INTERRUPTS (default = 1)
ENABLE_EXT_INTERRUPTS is set to 1 this selects how many interrupts to
FAMILY (default = GENERIC)
Enables certain portability workarounds and optimizations when using a specific FPGA family. Currently "GENERIC", "INTEL", "LATTICE", "MICROSEMI", and "XILINX" are supported.
ORCA supports multiple memory interfaces to interoperate with different FPGA IP. Accesses may be cached (if caches are enabled) or uncached. Cached instruction or data accesses go out over the IC or DC AXI interface respectively. Uncached accesses go over either the uncached AXI4-Lite (IUC/DUC) or auxiliary interfaces (WISHBONE, Avalon, or LMB).
Which memory interface is used depends on the values of the optional Auxiliarly
Memory Region (AMR) and Uncached Memory Region (UMR) CSRs. AMRs have the
highest priority; if one or more AMRs are instantiated (by setting the
AUX_MEMORY_REGIONS generic to 1 or more) and the address being accessed is
greater than or equal to
AMRx_ADDR_BASE and less than or equal to
AMRx_ADDR_LAST for any AMR x then the access will use the auxiliary memory
interface. Note that since these are inclusive, to disable an AMR you must set
the base address to greater than the last address. UMRs have next highest
priority and function the same way as AMRs. Finally, if the access matches
neither an AMR nor a UMR then the access will try to hit in the cache (if
instantiated by setting the
DCACHE_SIZE generics to be
non-zero). The caches currently only have write-through support with no
allocation on write misses, though other modes will be supported in the future.
|CSR Name||CSR Number||Access|
Normally AMRs and UMRs can be changed at run-time via CSR writes. However, if
neither instruction nor data caches are specified and
UC_MEMORY_REGIONS is 0 all accesses will go over the auxiliary
interface and the AMRs will be disabled to save area. Likewise, with no caches
UC_MEMORY_REGIONS set to non-zero and
AUX_MEMORY_REGIONS set to 0 all
accesses will go over the uncached interface and all UMRs will be disabled.
Disabled AMRs/UMRs return 0 when read and cannot be written.
ORCA uses AXI3 or AXI4 for its cached accesses and AXI4-Lite for uncached
accesses but exposes a full AXI3 master for all interfaces because some system
building tools require it. To use AXI4 interfaces for the cached masters set
LOG2_BURSTLENGTH to 8 (4 for AXI3) and do not connect the WID signals. To use
AXI4-Lite interfaces for the uncached masters only connect the AXI4-Lite
signals; the rest can be safely left unconnected.
ORCA Memory Generics
The base 2 logarithm of the burstlength used for cached acesses (has no effect if caches aren't enabled).
The AXI ID width for all AXI interfaces. ORCA does not use multiple AXI IDs (the xID signals will always be set to 0's); this generic is there simply for interfaces that require ID signals from a master.
There are three generics (
select which protocol the auxiliary memory interface uses, of which at most one
must be set to one. In Intel QSYS this is not exposed and Avalon is enabled,
and likewise in Xilinx Vivado IPI this is not exposed and LMB is enabled. These
can be safely ignored if the auxiliary memory interface is not used
AUX_MEMORY_REGIONS set to 0).
The number of auxiliary/uncached memory regions and corresponding AMRs/UMRs; refer to the above Memory Interfaces section for more detail.
Initial values for the first AMR/UMR CSRs. These can be set via generics for the common case that there needs to be a single contiguous WISHBONE/Avalon/LMB interface and/or a single contiguous uncached interface for peripherals and on-chip memories. If multiple non-contiguous regions are needed they must be set at run-time.
Size in bytes of the instruction cache and data cache. Setting to
0 disables the cache; each cache can be sized differently and it is possible to
use only instruction or only data caches if desired. There is a CSR for
determining if caches are enabled:
|CSR Name||CSR Number||Access|
Bit 0 is set to 1 if the instruction cache is enabled, bit 1 if the data cache is enabled.
Data cache write policy. Set to 0 for write-through and 1 for write-back. Write-through immediately puts all writes onto the DC bus and does not allocate on write misses. Write-back writes dirty cache lines back on eviction (conflict, IFENCE, or FLUSH/INVALIDATE) and allocates on write misses.
Size in bytes of cache lines in the instruction cache and data cache.
Size in bits of the external memory interface for the instruction cache and data cache.
Registers for increasing maximum frequency at the expense of adding latency to memory accesses. These affect all instruction or data accesses (including cache hits), as opposed to the below generics which are per-interface.
REQUEST registers affect the outgoing path from the master to the slave (reads and writes). They can be set to 0 for 'disabled, 1 for 'light', and 2 for 'full'. A light register does not add latency if the interface does not exert backpressure and isolates the combinational path through the READY/waitrequest signal which is often the critical path. A full register always add one cycle of latency but maintains full throughput when the interface exerts backpressure and isolates the combinational path through all signals.
RETURN registers affect the returning data from the slave back to the master on reads. They can be only be set set to 0 for 'disabled' and 1 for 'enabled'. They add a single cycle of latency to reads when enabled.
If instruction registers are necessary to increase frequency be sure to set the
MAX_IFETCHES_IN_FLIGHT generic to at least the full latency of the instruction
fetch path. For instance, if using a full instruction request register
INSTRUCTION_REQUEST_REGISTER set to 2) and an instruction return register
INSTRUCTION_RETURN_REGISTER set to 1) this adds two cycles of delay to
instruction fetch. As such even when connected to a single-cycle on-chip RAM
all instruction fetches will take three cycles. In that case with
MAX_IFETCHES_IN_FLIGHT set to 3 ORCA will still be able to run at 1 cycle per
instruction (aside from branch misprediction, load latency, multicycle
instructions, etc.) whereas with
MAX_IFETCHES_IN_FLIGHT set to 1 it would only
be able to execute at 3 cycles per instruction at best.
Interface specific memory interface registers. See the above generic
description for specifics. Note that the cached
(IC/DC)_(REQUEST/RETURN)_REGISTER registers apply to the cache master and
therefore are only used on cache misses, whereas the
(INSTRUCTION/DATA)_(REQUEST/RETURN)_REGISTER registers are in between the ORCA
core and the cache and so also affect cache hits.
We found that the Platform Level Interrupt controller (PLIC) that was described in the Privileged Specification V1.9 was too complex for our needs, so ORCA has a non-standard but very simple interrupt controller.
We define two more 32-bit CSRs,
MEIMASK. Because these registers
are a maximum of 32-bits wide, ORCA only supports 32 interrupts natively.
MEIMASK is a mask for the external interrupts. If bit n of
set, then that the nth interrupt is enabled.
MEIPEND is connected to the
external interrupt lines, and as such is read-only.
|CSR Name||CSR Number||Access|
The Processor is interrupted by an external interrupt on line n if and only if
MIE bit of the
MSTATUS CSR is 1 and bit n of the
MEIMASK CSR is set
to 1. Interrupts are level sensitive and active high; as such it is the
responsibility of the interrupt handler to clear the interrupt (generally by
communicating with the responsible peripherial over memory-mapped I/O) before
MIE bit. If the interrupt line is still high when the
bit is reset (typically via a mret instruction) then the interrupt will be
immediately taken again.
If interrupts are not enabled (
ENABLE_EXT_INTERRUPTS is set to 0 or
ENABLE_EXCEPTIONS is set to 0) then
MEIPEND will both be read
only and return 0.