# NYCU-EE IC LAB - Spring 2023

## Lab04 Exercise

# Design: Simple Recurrent Neural Network

## **Data Preparation**

- 1. Extract test data from TA's directory:
  - % tar xvf ~iclabta01/Lab04.tar
- 2. The extracted LAB directory contains:
  - a. 00 TESTBED
  - b. **01** RTL
  - c. 02 SYN
  - d. 03\_GATE

# System Integration I 🗸

# **Design Description**

A *Recurrent Neural Network* (RNN) is a class of artificial neural networks where connections between nodes form a directed or undirected graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.

In this exercise, you are asked to implement the simple RNN model as Fig1.



Fig 1. simple RNN

$$f(\mathbf{x}) = \max(0.1\mathbf{x}, \mathbf{x})$$
 
$$g(\mathbf{x}) = \sigma \ (\mathbf{x})$$
 
$$\mathbf{Sigmoid}$$
 
$$\max(0.1x, x)$$
 
$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

Fig 3.1. The Leaky ReLU activation function Fig 3.2. The sigmoid activation function

$$\mathsf{U} = \begin{bmatrix} u_{00} & u_{01} & u_{02} \\ u_{10} & u_{11} & u_{12} \\ u_{20} & u_{21} & u_{22} \end{bmatrix} \quad \mathsf{W} = \begin{bmatrix} w_{00} & w_{01} & w_{02} \\ w_{10} & w_{11} & w_{12} \\ w_{20} & w_{21} & w_{22} \end{bmatrix} \quad \mathsf{V} = \begin{bmatrix} v_{00} & v_{01} & v_{02} \\ v_{10} & v_{11} & v_{12} \\ v_{20} & v_{21} & v_{22} \end{bmatrix}$$

Fig 4. Weight matrix

$$\mathbf{x}_{1} = \begin{bmatrix} x_{10} \\ x_{11} \\ x_{12} \end{bmatrix} \quad \mathbf{x}_{2} = \begin{bmatrix} x_{20} \\ x_{21} \\ x_{22} \end{bmatrix} \quad \mathbf{x}_{3} = \begin{bmatrix} x_{30} \\ x_{31} \\ x_{32} \end{bmatrix} \quad \mathbf{h}_{0} = \begin{bmatrix} h_{00} \\ h_{01} \\ h_{02} \end{bmatrix} \quad \mathbf{y}_{1} = \begin{bmatrix} y_{10} \\ y_{11} \\ y_{12} \end{bmatrix} \quad \mathbf{y}_{2} = \begin{bmatrix} y_{20} \\ y_{21} \\ y_{22} \end{bmatrix} \quad \mathbf{y}_{3} = \begin{bmatrix} y_{30} \\ y_{31} \\ y_{32} \end{bmatrix}$$

Fig 5. Input x vector

Fig 6. h<sub>0</sub> vector

Fig 7. Output y vector

$$\begin{bmatrix} u_{00} & u_{01} & u_{02} \\ u_{10} & u_{11} & u_{12} \\ u_{20} & u_{21} & u_{22} \end{bmatrix} * \begin{bmatrix} x_{10} \\ x_{11} \\ x_{12} \end{bmatrix} = \begin{bmatrix} u_{00} * x_{10} + u_{01} * x_{11} + u_{02} * x_{12} \\ u_{10} * x_{10} + u_{11} * x_{11} + u_{12} * x_{12} \\ u_{20} * x_{10} + u_{21} * x_{11} + u_{22} * x_{12} \end{bmatrix}$$

Fig 8. Example of matrix operation

## Description

#### **Inputs and Outputs**

The following are the definitions of input signals

| Input Signals | Bit Width | Definition              |
|---------------|-----------|-------------------------|
| clk           | 1         | Clock.                  |
| rst_n         | 1         | Asynchronous active-low |
|               |           | reset.                  |

| in valid                     | 1                | High when all input is valid.               |
|------------------------------|------------------|---------------------------------------------|
| _                            | 32               | The input weight signals.                   |
| weight_u, weight_w, weight_v | 32               | 1 6 6                                       |
|                              |                  | There are 9 signals for each                |
|                              |                  | weights. The arithmetic                     |
|                              |                  | representation follows the                  |
|                              |                  | IEEE-754 floating number                    |
|                              |                  | format.                                     |
| data_x                       | 32               | The input x signals. There are              |
|                              |                  | 3*3=9 signals (x <sub>1</sub> : first 3     |
|                              |                  | signals, x <sub>2</sub> : middle 3 signals, |
|                              |                  | $x_3$ : last 3 signals). The                |
|                              |                  | arithmetic representation                   |
|                              |                  | follows the IEEE-754 floating               |
|                              |                  | number format.                              |
| data_h                       | 32               | The input h <sub>0</sub> signals. There are |
|                              |                  | 3 signals. The arithmetic                   |
|                              |                  | representation follows the                  |
| C.                           | al and           | IEEE-754 floating number                    |
| 5V                           | Stem Integration | format.                                     |

The following are the definitions of output signals

| Output Signals | Bit Width                           | Definition                                                                                                                                                                                                        |
|----------------|-------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| out_valid      |                                     | High when out is valid.                                                                                                                                                                                           |
| out            | 32 Multimedia Wireless Security PLL | The output y signals. There are 3*3=9 signals (y <sub>1</sub> : first 3 signals, y <sub>2</sub> : middle 3 signals, y <sub>3</sub> : last 3 signals). The arithmetic representation follows the IEEE-754 floating |
|                |                                     | number format.                                                                                                                                                                                                    |

Each time you output the result, our pattern will check the correctness of it. Basically, if you follow the formulas and use IEEE floating point number IP, you should get same result as our answer. However, we release the constraint; you may have an error under 0.0005 for the result after converting to float number. This means that we will convert your output from binary format

into real float number, and compare with our answer. Error will be calculated by '(golden-ans)/golden' and get its absolute value. If the error is higher than the value, you will fail this lab.

Binary form: 00111101101100101010000001000101

IEEE floating number: 0.08721975

nWave:

8,72198e-02

nWave will round the number for display, but the computation will not be affected. The following numbers are all from nWave, thus the computations are performed by IPs in Verilog.

$$\% \text{ e-02} = 10^{-2}$$

- 1. The input signal weight\_u, weight\_w, weight\_v, data\_x are delivered for 9 cycles continuously. When in valid is low, input is tied to unknown state.
- 2. The input signal **data\_h** is delivered for **3 cycles** continuously. After three cycles, input is tied to unknown state.
- 3. The input signal weight u, weight w, weight v, data x, data h will start to be given at same time.
- 4. All input signals are synchronized at negative edge of the clock.
- 5. The output signal **out** must be delivered for **9 cycles continuously**, and **out\_valid** should be **high** simultaneously.
- 6. You don't need to worry about infinity in the calculation process.



## **Specifications**

- 1. Top module name: NN (design file name: NN.v)
- 2. You have to check an error under 0.0005 for the result after converting to float number. If the error is higher than the value, you will fail this lab.
- 3. It is asynchronous reset and active-low architecture. If you use synchronous reset (considering reset after clock starting) in your design, you may fail to reset signals.
- 4. The reset signal (rst\_n) would be given only once at the beginning of simulation. All output signals should be reset after the reset signal is asserted.
- 5. The **out** should be reset after your **out valid** is pulled down.
- 6. The latency is the clock cycles between the first falling edge of the in\_valid and the last rising edge of out\_valid. The rising edge of out\_valid should arrive within 100 cycles after the falling edge of in valid.
- 7. The area is limited in **2500000**. Also, the synthesis time should be less than **1.5** hours.
- 8. You can adjust your clock period by yourself, but the maximum period is 50 ns. The precision of clock period is 0.1, for example, 4.5 is allowed, 4.55 is not allowed.
- 9. The input delay is set to **0.5\*(clock period)**.
- 10. The output delay is set to **0.5**\*(clock period), and the output loading is set to **0.05**.
- 11. The synthesis result of data type **cannot** include any **latches**.
- 12. After synthesis, you can check NN.area and NN.timing. The area report is valid when the slack in the end of timing report should be **non-negative (MET)**.
- 13. In this lab, you must use at least one IEEE floating point number IP from Designware. We will check it at NN.resource in 02\_SYN/Report/. The example shows in following figure.



### **Grading Policy**

1. Function Validity: 70%

2. Performance: 30 %

• Area \* Computation time: 30%

• Computation time = Latency \* clock cycle time

#### **Block diagram**



#### Note

- 1. Please submit following files under 09 SUBMIT before 12:00 at noon on March. 20:
  - NN.v
  - If uploaded files violate the naming rule, you will get 5 deduct points.
  - In this lab, you can adjust your clock cycle time. Consequently, make sure to key in your clock cycle time after the command like the figure below. It's means that the TA will demo your design under this clock cycle time.



- The 2nd demo deadline is 12:00 at noon on March.22.
- Check whether there is any wire / reg / submodule name called "error", "fail", "pass", "congratulation", "latch", "DW\_fp", if you used, you will fail the lab.
- 2. Template folders and reference commands:

```
01_RTL/ (RTL simulation) ./01_run
02_SYN/ (Synthesis) ./01_run_dc
```

(Check if there is any latch in your design in syn.log)

(Check the timing of design in /Report/NN.timing)

03 GATE / (Gate-level simulation) ./01 run

**XYou should make sure the three clock period values identical in** 00\_TESTBED/Pattern.v && /02 SYN/syn.tcl:

## **Sample Waveform**



Fig1. Input waveform

