Skip to content

vnegnev/marga

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MaRGA (MARcos fpGA)- FPGA firmware for the MaRCoS MRI platform

Overview

This is a centralised repo for the MaRGA HDL files, Icarus Verilog testbenches, Verilator simulation framework and Vivado IP core XML description. MaRGA is designed for the STEMlab-122.88 device, using the Xilinx Zynq ZC-7020 chip, but would be easily portable to smaller chips if the size of the main memory or FIFOs were reduced. Instead of an approach to TX and gradient output timing based on a fixed raster time, each output is independently timed relative to previous events, with cycle-accurate repeatability. This in principle allows more flexibility and a single general approach to event management.

Hardware introduction

MaRGA supports two RX and two TX channels, the OCRA1 and GPA-FHDO gradient DAC boards, and multiple digital I/O including TX and RX blanking bits and trigger I/O.

For the TX, it contains three 24-bit phase outputs (31-bit internal accumulators) to be used with DDS sin/cos LUTs, and two pairs of 16-bit I/Q outputs. Currently the modulation must be done externally, for example using Xilinx’s IP.

For the RX, it has control outputs for two pairs of CIC filters, including synchronous reset and rate changes, as well as two output streams containing I/Q samples from the external DDSes that are driven by the TX phase outputs. The three external DDSes can be multiplexed to any combination of these output streams. For an example project using MaRGA, please see the block diagram TODO .

The MaRGA core has a single unified memory space for synchronously executing sequences. There are no branch or loop instructions; flow control is accomplished by pauses. The inner FSM, handling the pausing and memory reads, sends data to multiple small buffers, along with delays that specify when in the future the buffer should output the data to downstream logic. Combined with the longer delays that the central FSM can run, these short buffer delays allow up to hundreds of buffers to synchronously update their outputs. The buffers serve a range of purpuses, and control all the external peripherals with which MaRGA is used. See mardecode.v for more information.

Although it can be operated standalone with any logic that can write to its instruction memory, MaRGA is designed to work with a real-time server running on the Zynq PS. The server tracks MaRGA’s read address in the memory space, and dynamically rewrites parts of the memory not yet being used. The execution can thus wrap multiple times around the address space during extended sequences. See the MaRCoS server project for an example of such a server.

While MaRGA is executing, the server must also monitor the two RX FIFOs and continuously process the incoming data; if the data bandwidth is too large, the system can become starved of resources and overrun its valid memory bounds. The server can recognise such events and notify the user that the sequence execution failed.

Running unit tests with Verilator

These require building against the MaRCoS server. Execute the following in a terminal starting in a folder where you want to store your MaRCoS-related projects. Make sure you at least roughly understand what the commands are doing.

Tested with Verilator 4.106, Python 3.9.1, GCC 10.2.0.

git clone https://github.com/vnegnev/marga.git
git clone https://github.com/vnegnev/marcos_server.git
git clone https://github.com/vnegnev/marcos_client.git
cd marga
mkdir build
cd build
cmake ../src
make -j4 # build using 4 cores
fallocate -l 516KiB /tmp/marcos_server_mem # temporary file to simulate hardware memory space
./marga_sim csv & # assuming you want CSV output
cd ../../marcos_client
cp local_config.py.example local_config.py
# edit local_config.py to suit your setup, i.e. using localhost
python test_marga_model.py # all unit tests should pass

You can run experiment scripts as well; the resultant trace file is generated by marga_sim and called marga_sim.fst inside the build folder. A reasonable .sav file is in the src folder; open the trace with gtkwave marga_sim.fst ../src/marga_sim.sav . Note that the server must be gracefully closed by a network command (see the test_exit() testcase in test_server.py for an example) before the FST file is finalised.

[OLD, WILL BE REMOVED AND PUT INTO THE WIKI] Instruction set [WIP]

General thoughts

Single device: 32 ADC inputs (0b each, just ‘acquire’) -> 5b information, 32 RF outputs (16b each) -> 21b information, 32 grad outputs (24b each) -> 29b each.

Also want a time delay built into the instruction, at least sufficient to provide simultaneous updates on every channel if they’re written in a row (i.e. 32 + 32 + 32 = 7 bits)

Buffers/blockers of 1-element depth for each output stream

up to 128 unique 16b output buffers (start with far fewer).

For now, just 24 unique 16b output buffers; 16 are real-time with flow control/error checking, 8 are for intermediate configuration and setting changes.

Basic types, all 32 bits

Type A: 1b zero, 7b instruction, 24b payload (internal delays, external trigger, config, memory offset, start/stop execution, other settings)

Type B: 1b one, 7b target, 8b time, 16b payload (external buffers and their delays)

  • Pipelined dataflow for the external buffers; extra latency but this is compensated in the timeouts/ready flags coming back to FSM
  • One instruction per cycle
  • Type A: exclusive main FSM timing, containing everything needed
  • Type B: exclusive external-buffer data/timing

External buffers can themselves internally have some FIFO depth (and timers), in case bursts are desired - e.g. for occasional really rapid TX or RX sequences (though the RX will probably just have a separate FSM for timing itself).

Each external buffer can flag an error if too much data is pushed to it, and this will make its way up a chain of ORs to a central error register

Different output buffers

  • TX buffer inputs: 16b data, 8b time, 1b valid, outputs: 16b data, 1b valid (maybe unused), 1b error. [Initially, 4 TX buffers, whose outputs will just go to the existing complex multipliers.]
  • Grad buffer inputs: 16b data, 8b time (MSB interpreted as hi/lo), 1b valid.
  • General buffer inputs: like TX

Write registers

  • 24b of memory space

Read-only registers

  • Current address (24b memory space)
  • Cycles since start of execution
  • Errors (latch each input bit until reset occurs)
  • Status (no latch, just allow for read-out)

About

Flow-based streaming ocra controller

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published