# **Solution for Example Midterm Exam**

**1.** (**Pipelining**) Unpipelined processor: An instruction completes every 20.2ns. So the throughput is 1/20.2ns = 0.05 BIPS.

10-stage pipelined processor: Gap between independent instructions = 20ns/10 + 0.2ns = 2.2ns. Gap between dependent instruction = 5x2.2ns = 11ns. Throughput = 2/13.2ns = 0.15 BIPS.

**2.** (**Power and Energy**) The power for the whole system  $P = S + Df^3 + Lf$ , where S is the constant power dissipated in other system components,  $Df^3$  is the dynamic power in the processor, and Lf is the leakage power in the processor. Note that voltage and frequency are proportional, so the dynamic power is a function of  $f^3$  and leakage power is a function of f. The execution time of the program is a function of 1/f. So, energy is expressed as  $E = S/f + Df^2 + L$ . To find the minimum value of E, we differentiate E with respect to f and equate to zero.

```
dE / df = -S / f 2 + 2 D f = 0
f = (S/2D)^{1/3}
```

S is 30W, D is 40W, so f is 0.72. Energy is therefore minimized when the frequency is  $0.72 \times 3$  GHz = 2.16 GHz

# 3. (Forwarding/Bypassing)

Without bypassing:

# With bypassing:

(a) Stalls: 2

BP :IC :DEC:DEC:RR :FP1:FP2:FP3:FP4:WB
BP :IC :DEC:DEC:DEC:DEC:RR :EfA:DC1:DC2:WB

(b) Stalls: 2

BP :IC :DEC:DEC:RR :EfA:DC1:DC2:WB
BP :IC :DEC:DEC:DEC:DEC:RR :EfA:DC1:DC2:WB

(c) Stalls: 2

BP :IC :DEC:DEC:RR :EfA:DC1:DC2:WB
BP :IC :DEC:DEC:DEC:DEC:RR :FP1:FP2:FP3:FP4:WB

#### 4. (Branch Predictors)

 $Selector = 32K \times 2b = 64 Kb.$ 

 $Global = 2^{15} x3b = 96Kb$ .

 $Local=2^{11}x12b+2^{14}x2b=24+32=56Kb$ .

Total = 216 Kb.

### 5. (Load and Stores)

| LD/ST | The register for the address calculation is made available | The register that must be stored into memory is made available | The calculated effective address | Address<br>calculation<br>happens | Data memory accessed |
|-------|------------------------------------------------------------|----------------------------------------------------------------|----------------------------------|-----------------------------------|----------------------|
| LD    | 5                                                          | -                                                              | 0xabcd                           | 6                                 | 7                    |
| ST    | 7                                                          | 4                                                              | 0xabbb                           | 8                                 | @commit              |
| LD    | 1                                                          | -                                                              | 0xabbb                           | 2                                 | 3/9                  |
| LD    | 3                                                          | -                                                              | 0xabcd                           | 4                                 | 5                    |
| ST    | 12                                                         | 17                                                             | 0xabbb                           | 13                                | @commit              |
| LD    | 2                                                          | -                                                              | 0xabbb                           | 3                                 | 4/9/18               |

### 6. (OoO Execution)

```
ADD PR33, PR1, PR2
ADD PR34, PR33, PR33
LD PR35, 8(PR34)
SD PR34, 16(PR4)
ADD PR36, PR34, PR33
ADD PR1, PR35, PR36
SD PR1, 32(PR4)
Show the renamed version of this code.

7. (Loop Scheduling)
(a)
Loop: L.D F1, 0(R1)
```

```
Loop: L.D F1, 0(R1)
L.D F2, 0(R2)
(stall)
MUL F1, F1, F2
DADDUI R1, R1, #-8
DADDUI R2, R2, #-8
BNE R1, R3, Loop
(stall)
S.D F1, 0(R1)
(b)

Loop: S.D F3, 16(R1)
MUL F3, F1, F2
L.D F1, 0(R1)
L.D F2, 0(R2)
DADDUI R1, R1, #-8
DADDUI R2, R2, #-8
```

BNE R1, R3, Loop