# CS 61C

# RISC-V Single Cycle Datapath

## **Summer 2020**

Discussion 7: July 13, 2020

### 1 Pre-Check

This section is designed as a conceptual check for you to determine if you conceptually understand and have any misconceptions about this topic. Please answer true/false to the following questions, and include an explanation:

1.1 The single cycle datapath makes use of all hardware units for each instruction.

False. All units are active in each cycle, but their output may be ignored (gated) by control signals.

1.2 It is possible to execute the stages of the single cycle datapath in parallel to speed up execution of a single instruction.

False. Each stage depends on the value produced by the stage before it (e.g., instruction decode depends on the instruction fetched).

1.3 Combinational logic is only used in the instruction decode stage.

False. Other stages executes combinatorial logic too (muxes for instruction fetch, memory write, register updates; ALU operations during execute).

### 2 Single-Cycle CPU

- [2.1] For this worksheet, we will be working with the single-cycle CPU datapath on the last page.
  - (a) On the datapath, fill in each **round** box with the name of the datapath component, and each **square** box with the name of the control signal.
  - (b) Explain what happens in each datapath stage.
    - **IF** Instruction Fetch

Send address to the instruction memory (IMEM), and read IMEM at that address.

#### **ID** Instruction Decode

Generate control signals from the instruction bits, generate the immediate, and read registers from the RegFile.

#### EX Execute

Perform ALU operations, and do branch comparison.

#### **MEM** Memory

Read from or write to the data memory (DMEM).

#### WB Writeback

Write back either PC + 4, the result of the ALU operation, or data from memory to the RegFile.

2.2 Fill out the following table with the control signals for each instruction based on the datapath on the previous page. Wherever possible, use \* to indicate that what this signal is does not matter (as in, letting the value be whatever it wants won't affect the execution of the instruction). If the value of the signal does matter for correct execution, but can vary, list all of the values (for example, for a signal that matters with possible values of 0 and 1, write 0/1).

|      | BrEq | $\operatorname{BrLT}$ | PCSel      | ImmSel        | $\operatorname{BrUn}$ | ASel    | $\operatorname{BSel}$ | ALUSel | MemRW | RegWEn | WBSel      |
|------|------|-----------------------|------------|---------------|-----------------------|---------|-----------------------|--------|-------|--------|------------|
| add  | *    | *                     | 0 (PC + 4) | *             | *                     | 0 (Reg) | 0 (Reg)               | add    | 0     | 1      | 1 (ALU)    |
| ori  | *    | *                     | 0          | I             | *                     | 0 (Reg) | 1 (Imm)               | or     | 0     | 1      | 1 (ALU)    |
| lw   | *    | *                     | 0          | I             | *                     | 0 (Reg) | 1 (Imm)               | add    | 0     | 1      | 2  (MEM)   |
| sw   | *    | *                     | 0          | $\mathbf{S}$  | *                     | 0 (Reg) | 1 (Imm)               | add    | 1     | 0      | *          |
| beq  | 1/0  | *                     | 1/0        | $_{ m SB}$    | *                     | 1 (PC)  | 1 (Imm)               | add    | 0     | 0      | *          |
| jal  | *    | *                     | 1 (ALU)    | $\mathrm{UJ}$ | *                     | 1 (PC)  | 1 (Imm)               | add    | 0     | 1      | 0 (PC + 4) |
| bltu | *    | 1/0                   | 1/0        | $_{ m SB}$    | 1                     | 1 (PC)  | 1 (Imm)               | add    | 0     | 0      | *          |

#### 2.3 Clocking Methodology

- A **state element** is an element connected to the clock (denoted by a triangle at the bottom). The **input signal** to each state element must stabilize before each **rising edge**.
- The **critical path** is the longest delay path between state elements in the circuit. The circuit cannot be clocked faster than this, since anything faster would mean that the correct value is not guaranteed to reach the state element in the alloted time. If we place registers in the critical path, we can shorten the period by **reducing the amount of logic between registers**.

For this exercise, assume the delay for each stage in the datapath is as follows:

IF: 200 ps ID: 100 ps EX: 200 ps MEM: 200 ps WB: 100 ps

(a) Mark the stages of the datapath that the following instructions use and calculate the total time needed to execute the instruction.

|                        | IF | ID | $\mathbf{E}\mathbf{X}$ | MEM | WB | Total Time |
|------------------------|----|----|------------------------|-----|----|------------|
| add                    | X  | X  | X                      |     | X  | 600 ps     |
| ori                    | X  | X  | X                      |     | X  | 600  ps    |
| lw                     | X  | X  | X                      | X   | X  | 800 ps     |
| $\mathbf{s}\mathbf{w}$ | X  | X  | X                      | X   |    | 700 ps     |
| beq                    | X  | X  | X                      |     |    | 500  ps    |
| jal                    | X  | X  | X                      |     | X  | 600  ps    |
| bltu                   | X  | X  | X                      |     |    | 500 ps     |

(b) Which instruction(s) exercise the critical path?

Load word (lw), which uses all 5 stages.

(c) What is the fastest you could clock this single cycle datapath?

$$\frac{1}{800}$$
 picoseconds =  $\frac{1}{800*10^{-12}}$  seconds = 1, 250, 000, 000 $s^{-1}$  = 1.25 $GHz$ 

(d) Why is the single cycle datapath inefficient?

At any given time, most of the parts of the single cycle datapath are sitting unused. Also, even though not every instruction exercises the critical path, the datapath can only be clocked as fast as the slowest instruction.

(e) How can you improve its performance? What is the purpose of pipelining?

Performance can be improved with pipelining, or putting registers between stages so that the amount of combinational logic between registers is reduced, allowing for a faster clock time.



Control Logic