

## ZNN U7-Core Experimental Chip

Neural-network matrix arithmetic on open source RISC-V ISA.  $\it Zeeshan\ Hooda - 01/21/2020$ 

## Intro

The ZNN U7-Core is an open-source chip designed specifically for running complex tasks involving convolutional neural networks and reinforcement learning. It features a dedicated hardware level matrix arithmetic logic unit along with the usual arithmetic unit. The ZNN U7 is based on the SiFive U74 standard core which is based on the RISC-V open source instruction set architecture. Since RISC-V is, of course, reduced instruction set computing (similar to ARM but without IP blocks), it can perform tasks with incredible efficiency. Compared to a standard x86\_64 ISA, RISC-V can provide equivalent performance at less than one tenth the power consumption. This makes RISC-V ISA chips great for low power tasks where low power consumption and efficiency are imperative; however, it also opens up opportunities to increase efficiency in more complex, power hungry workloads. Running these computationally intensive tasks on a specialized matrix unit on the RISC-V ISA could allow for more parallel convolutional neural networks to be run or even pseudo-quantum emulation through dedicated qubit registers/dictionaries. The ZNN U7-Core is designed with a matrix arithmetic unit on a 64-bit word architecture to have performance levels on par with x86\_64 matrix arithmetic chips of today.



Figure 1.1: Core Complex of the ZNN U7 Core

While the ZNN U7-Core is still very much in the idea phase, emulations of the specialized hardware indicate that it is a viable alternative to software matrix manipulations. Without the excess instructions of x86\_64, and the ability to implement custom instructions without having to navigate IP laws, the ZNN U7-Core could be a next generation chip for developers and industry with the ease of use and apparent scalability of the RISC-V instruction set architecture. Please see the following pages for proposed specifications for the ZNN U7-Core EXP version.

| Exp Chip Essentials        |                     |
|----------------------------|---------------------|
| Processor Number           | RV64IMAFDC-EXP      |
| Lithography                | 16 nm               |
| Number of Cores            | 8                   |
| Number of Threads          | 8                   |
| Processor Base Frequency   | 1.74 GHz            |
| Processor Boost Frequency  | 2.30 Ghz            |
| Cache                      | 4 MiB L2 @ 32CA-4CB |
| TDP                        | 5 W                 |
| Max Memory Size            | 256 GB              |
| Memory Types               | DDR4-2666 ECC       |
| Processor Graphics         | No                  |
| I/O                        | 64-bit AXI4         |
| System Port Size           | 512 MiB             |
| Physical Memory Protection | 16 Regions          |
| Debug Module               | Present             |
| Clock Gate Extraction      | Present             |

## More Specifications

| MODES & ISA                                      |            |      |
|--------------------------------------------------|------------|------|
| Number of Cores                                  | 8          | 1    |
| Machine Mode                                     | Present    |      |
| User Mode                                        | Present    |      |
| Supervisor Mode                                  | Present    |      |
| Multiply                                         | Present    |      |
| Atomics                                          | Present    |      |
| Floating Point                                   | FP (F & D) |      |
| SiFive Custom<br>Instruction Extension<br>(SCIE) | Present    | None |

| ON-CHIP MEMORY                   |                                      |         |
|----------------------------------|--------------------------------------|---------|
| Instr. Cache Size                | 32 KiB<br>Evaluation RTL:<br>16 KiB  | 32 KiB  |
| Instr. Cache Assoc               | 8-way                                |         |
| Data Cache                       | Present                              |         |
| Data Cache Size                  | 64 KiB<br>Evaluation RTL:<br>16 KiB  | 32 KiB  |
| Data Cache Base<br>Address       | 0x6000_0000                          |         |
| Data Cache Assoc                 | 16-way                               | 8-way   |
| Data Local Store                 | Present                              | None    |
| Data Local Store Size            | 512 KiB<br>Evaluation RTL:<br>16 KiB | None    |
| Data Local Store<br>Base Address | 0x7000_0000                          | None    |
| L2 Cache                         | Present                              |         |
| L2 Cache Size                    | 4 MiB<br>Evaluation RTL:<br>128 KiB  | 128 KiB |
| L2 Cache Assoc<br>(Ways)         | 32                                   | 8       |
| L2 Cache Banks                   | 4                                    | 1       |
| ITIM                             | Present                              | None    |
| ITIM Size                        | 4 KiB                                |         |
| ITIM Base Address                | 0x0180_0000                          | None    |

| PORTS                       |             |
|-----------------------------|-------------|
| Front Port                  | Present     |
| Front Port Protocol         | AXI4        |
| Front Port Width            | 64-bit      |
| System Port                 | Present     |
| System Port Protocol        | AXI4        |
| System Port Width           | 64-bit      |
| System Port Base<br>Address | 0x4000_0000 |

## ZNN U7-Core Experimental Chip

| System Port Size                | 512 MiB<br>Evaluation RTL:<br>8 KiB   | 512 MiB |
|---------------------------------|---------------------------------------|---------|
| Peripheral Port                 | Present                               |         |
| Peripheral Port<br>Protocol     | AXI4                                  |         |
| Peripheral Port Width           | 64-bit                                |         |
| Peripheral Port Base<br>Address | 0x2000_0000                           |         |
| Peripheral Port Size            | 512 MiB<br>Evaluation RTL:<br>8 KiB   | 512 MiB |
| Memory Port                     | Present                               |         |
| Memory Port Protocol            | AXI4                                  |         |
| Memory Port Width               | 128-bit                               |         |
| Memory Port Base<br>Address     | 0x8000_0000                           |         |
| Memory Port Size                | 512 MiB<br>Evaluation RTL:<br>128 KiB | 512 MiB |

| SECURITY                      |         |   |
|-------------------------------|---------|---|
| Physical Memory<br>Protection | Present |   |
| PMP Regions                   | 16      | 8 |

| DEBUG                               |         |
|-------------------------------------|---------|
| Debug Module                        | Present |
| Debug Interface                     | JTAG    |
| Hardware<br>Breakpoints             | 2       |
| External Triggers                   | 0       |
| System Bus Access                   | Present |
| Raw Instruction Trace Port          | None    |
| Performance<br>Counters             | 2       |
| Nexus Trace<br>Encoder (TE)         | None    |
| Multi-core TE                       | None    |
| Send Trace To                       | None    |
| TE Timestamp                        | None    |
| Trace Timestamp<br>Width (Bits)     | None    |
| Trace Timestamp<br>Source           | None    |
| External Trigger Inputs to TE       | None    |
| External Trigger<br>Outputs from TE | None    |
| On-chip Trace Buffer<br>Size        | None    |
| Instrumentation Trace Component     | None    |

| INTERRUPTS        |         |
|-------------------|---------|
| PLIC              | Present |
| Priority Levels   | 7       |
| Global Interrupts | 127     |
| Local Interrupts  | 16 0    |

| DESIGN FOR TEST          |         |      |
|--------------------------|---------|------|
| SRAM Macro<br>Extraction | Present |      |
| Clock Gate Extraction    | Present | None |
| Group and Wrap           | Present | None |

| POWER MANAGEMEN | Т       |
|-----------------|---------|
| Clock Gating    | Present |