## VE370 Homework5

```
lw R2,0(R1)
Label1: beq R2,R0,Label2 ; Not taken once, then taken
    lw R3,0(R2)
    beq R3,R0,Label1 ; Taken
    add R1,R3,R1
Label2: sw R1,0(R2)
```

## Q1

- 1. The control hazard detection unit need to identify whether the current branch instruction depends on the previous result. As it is in the ID Stage, the previous one instruction just enters EX (affect R-type and lw) and second-previous just enters MEM (affect lw).
  - As the first branch instruction beq R2,R0,Label2 depends on R2, which is the previous one instruction, the result of Mem[0+R1] will not be produced until two clock cycles later (when beq enters). So need to add two NOPS.
  - If the lw is the second-previous instruction of beq, only one NOP needed.
  - o If depends on the previous R-type instruction, need to add one NOP.
- 2. Assume not taken.

**MEM Stage**, need to add two nops for  $\mathit{lw}$  data hazard and two times 3 clock cycles flush. Then total

$$6+4+2+6=18.$$

**ID Stage,** need to add four nops and two times 1 clock cycle flush. Then total 6+4+4+2=16.

So we have 2 clock cycles speedup.

- 3. Register R2 between lw R2,0(R1) and beq R2,R0,Label2.

  Register R3 between lw R3,0(R2) and beq R3,R0,Label1.
- 4. Branch in **ID**, then if there is previous dependent instruction and will give the expected result, need to add stall, then the forwarding path
  - o depends on previous R-type: in MEM / WB stage, need to be forward.

o depends on previous load: in WB stage, need to be forward.

Notice that the forwarding unit actually take **the same input** (MEM/WB.RegWrite, MEM/WB.RegisterRd, EX/MEM.RegWrite, EX/MEM.RegisterRd, ID/EX.RegisterRs, ID/EX.RegisterRt). Then the **output** should be same as two 2 bit control signal that control two MUXes, as the input of the comparator.

two MUX: selects among value from *register file*, *ALUResult from EX/MEM*, *data from MEM/WB*.

**beq R2,R0,Label2** will add two stall then the forwarding unit detect the ID/EX.RegisterRs(R2) == MEM/WB.RegisterRd(R2), MEM/WB.RegisterRd=1, no forward from EX/MEM. Then the forward A will choose from **data from MEM/WB** (same as before, will be 01). Forward B still 00.

## Q2

**EX stage:** two extra clock cycles.

- 1. extra CPI is  $30\% \times (1-45\%) \times 2 = 0.33$ .
- 2. Assume we will not jump. the jump outcome is determined in the ID stage, so one extra clock cycle needed.  $30\% \times (1-55\%) \times 2+5\% \times 1=0.32$

## Q3

Strong not taken: SNT

|         | Т    | NT   | NT    | Т     | Т    | Т    | Т     | NT   | (2)T  | NT    | NT    | Т    | Т    | Т    | Т     | NT   | (3)T |
|---------|------|------|-------|-------|------|------|-------|------|-------|-------|-------|------|------|------|-------|------|------|
| State   | SNT  | SNT  | SNT   | NT    | Т    | ST   | ST    | Т    | ST    | Т     | NT    | Т    | ST   | ST   | ST    | Т    | ST   |
| Predict | True | True | False | False | True | True | False | True | False | False | False | True | True | True | False | True |      |

Find that after the second eight instructions, it begins to loop. So with repeated loop forever, it will be

$$lim_{n
ightarrow\infty}rac{3+4n}{8n}=0.5$$