Tha main purpose of this project is to create general purpose video card using FPGA technology. This project is part of bigger project - computer based on Z80. In this project is used DE10-nano Cyclone V FPGA.
- general purpose (capability to program it)
- cuda-like architecture
- video output to VGA or\and hdmi
- 32 bit simple cores
- simple RISC ISA
- each core is mostly independent and can support branches
- using HPS component to prepare data for videocard
For usage:
- Quartus Prime, version 18+
- minicom / putty or other serial port communication software
- python 3
- DE10 nano cyclone V FPGA board
- Linux on HPS component (download here)
Additional (for development):
- Open project in Quartus Prime (project file - ./MCCP/ucu_gpu.qpf)
- Connect DE10-nano board to power and USB Blaster
- Open Programmer. Choose board in Hardware Setup, Add .sof file, press Auto-Detect, tick Program/Configure, press Start
- Connect board to uart to USB port. Open minicom/putty, set bit rate 115200, login to linux.
- Memory of videocard is mapped to [0xC0000000 - 0xC003fffc].
# write to memory
memtool address=value
# read
memtool address number_of_words #(word has 32 bits)
- fill RAM/ROM with text file (.txt extension, numbers divided by whitespace, or .out extension, binary numbers divided by whitespace (for programs))
RAM=0xC0000000
ROM=0xC0040000
./linux_code/mem_write ram.txt $RAM
./linux_code/mem_write my_prgram.out $ROM
- before starting - activate needed number of cores:
./scripts/activate_cores.sh 4
- To start videocard,
- clear finish interrupt from the videocard
memtool -8 0xFF200001=0x0
- send start interrupt to the videocard interrupt
memtool -8 0xFF200000=0x1
- Videocard can address only 64Kb of FPGA memory.
- Videocard's memory is mapped to address space of linuz through hps-to-fpga interface
- Special module reads starting interrupts on 0xFF200000 through lightweight interface. Also after work is done, videocard sends interrupt on 0xFF200001. Carefully, in 60 ticks, this module clears interrupt
- Each core can access any address in this 64Kb memory.
- There is arbiter that has clock with higher freq, it manages all requests to access memory from cores.
- There is FPGA ROM memory, where program lives. Program is hardcoded and cannot be changed during runtime. In ./assembler there is extremenly simple bootloader, that allows to chooce address of starting program for each core. So it is possible to make each core run different programs.
ISA documentation here
- Each core can be interpreted as individual core.
- After finishing work, core sends interrupt with number 1, that is passed to interrupt controller, that counts interrupts. After all cores sent finishing signal, interrupt controler writes to 0xFF200001 (read usage), which is signal to linux that work is done.
How matrix multiplication scales