# ECE 4750 PSET 4

Tim Yao (ty252)

Nov 25, 2015

Worked with Gautam Ramaswamy, Gaurab Bhattacharya, and Sacheth Hegde.

# 1 Tree Network Topologies

#### 1.a Baseline I3L Microarchitecture

| Cycle:            | 0 | 1 | 2 | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 |
|-------------------|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| mul r1, r2, r3    | F | D | Ι | Y0 | Y1 | Y2 | Y3 | W  |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
| mul r4, r1, r5    |   | F | D | I  | I  | Ι  | I  | Y0 | Y1 | Y2 | Y3 | W  |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
| div r6, r7, r8    |   |   | F | D  | D  | D  | D  | I  | Z  | Z  | Z  | Z  | W  |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
| div r9, r10, r11  |   |   |   | F  | F  | F  | F  | D  | I  | Ι  | Ι  | Ι  | Z  | Z  | Z  | Z  | W  |    |    |    |    |    |    |    |    |    |    |    |    |
| div r12, r13, r14 |   |   |   |    |    |    |    | F  | D  | D  | D  | D  | Ι  | Ι  | Ι  | Ι  | Z  | Z  | Z  | Z  | W  |    |    |    |    |    |    |    |    |
| mul r15, r12, r16 |   |   |   |    |    |    |    |    | F  | F  | F  | F  | D  | D  | D  | D  | Ι  | I  | Ι  | Ι  | Y0 | Y1 | Y2 | Y3 | W  |    |    |    |    |
| mul r17, r15, r18 |   |   |   |    |    |    |    |    |    |    |    |    | F  | F  | F  | F  | D  | D  | D  | D  | I  | I  | I  | Ι  | Y0 | Y1 | Y2 | Y3 | W  |

Figure 1: Pipeline Diagram for Baseline I3L Architecture

The total issue to commit cycle count is 27.

#### 1.b Schedule Oldest Ready Instruction First on IO2L Microarchitecture

| Cycle:            | 0 | _  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9 | 10 | 11 | 12 | 13           | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
|-------------------|---|----|----|----|----|----|----|----|----|---|----|----|----|--------------|----|----|----|----|----|----|----|----|----|----|
| mul r1, r2, r3    | Ι | Y0 | Y1 | Y2 | Y3 | W  | С  |    |    |   |    |    |    |              |    |    |    |    |    |    |    |    |    |    |
| mul r4, r1, r5    |   |    |    |    | Ι  | Y0 | Y1 | Y2 | Y3 | W | С  |    |    |              |    |    |    |    |    |    |    |    |    |    |
| div r6, r7, r8    |   | Ι  | Z  | Z  | Z  | Z  | W  | r  | r  | r | r  | С  |    |              |    |    |    |    |    |    |    |    |    |    |
| div r9, r10, r11  |   |    |    |    |    | I  | Z  | Z  | Z  | Z | W  | r  | С  |              |    |    |    |    |    |    |    |    |    |    |
| div r12, r13, r14 |   |    |    |    |    |    |    |    |    | Ι | Z  | Z  | Z  | $\mathbf{Z}$ | W  | С  |    |    |    |    |    |    |    |    |
| mul r15, r12, r16 |   |    |    |    |    |    |    |    |    |   |    |    |    | Ι            | Y0 | Y1 | Y2 | Y3 | W  | С  |    |    |    |    |
| mul r17, r15, r18 |   |    |    |    |    |    |    |    |    |   |    |    |    |              |    |    |    | Ι  | Y0 | Y1 | Y2 | Y3 | W  | С  |

Figure 2: Pipeline Diagram for IO2L Architecture

The total issue to commit cycle count is 24.

# 1.c Optimal Scheduling on IO2L Microarchitecture

| Cycle:            | 0 | 1 | 2  | 3  | 4  | 5  | 6   | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
|-------------------|---|---|----|----|----|----|-----|----|----|----|----|----|----|----|----|----|----|----|
| mul r1, r2, r3    |   | Ι | Y0 | Y1 | Y2 | Y3 | l . |    |    |    |    |    |    |    |    |    |    |    |
| mul r4, r1, r5    |   |   |    |    |    | I  | Y0  | Y1 | Y2 | Y3 | W  | С  |    |    |    |    |    |    |
| div r6, r7, r8    |   |   |    |    | I  | Z  | Z   | Z  | Z  | W  | r  | r  | С  |    |    |    |    |    |
| div r9, r10, r11  |   |   |    |    |    |    |     |    | Ι  | Z  | Z  | Z  | Z  | W  | С  |    |    |    |
| div r12, r13, r14 | Ι | Z | Z  | Z  | Z  | W  | r   | r  | r  | r  | r  | r  | r  | r  | r  | С  |    |    |
| mul r15, r12, r16 |   |   |    |    |    |    | Ι   | Y0 | Y1 | Y2 | Y3 | W  | r  | r  | r  | r  | С  |    |
| mul r17, r15, r18 |   |   |    |    |    |    |     |    |    |    | Ι  | Y0 | Y1 | Y2 | Y3 | W  | r  | С  |

Figure 3: Pipeline Diagram for IO2L Architecture with Optimized Scheduling

The total issue to commit cycle count is 18.

#### 1.d Scheduling Comparison

TODO!

# 2 Register Renaming

#### 2.a Architectural RAW, WAW, and WAR Dependencies

```
1 mul r1, r2, r3
2 mul r4, r1, r5
3 addu r6, r7, r8
4 mul r1, r2, r5
5 addu r6, r6, r9
```

#### 2.b Pipeline Diagram with Register Renaming

| Cycle:          | 0 | 1 | 2 | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 |
|-----------------|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|----|
| mul r1, r2, r3  | F | D | Ι | Y0 | Y1 | Y2 | Y3 | W  | С  |    |    |    |    |    |    |    |
| mul r4, r1, r5  |   | F | D | i  | i  | i  | I  | Y0 | Y1 | Y2 | Y3 | W  | С  |    |    |    |
| addu r6, r7, r8 |   |   | F | D  | I  | X  | W  | r  | r  | r  | r  | r  | r  | С  |    |    |
| mul r1, r2, r5  |   |   |   | F  | D  | I  | Y0 | Y1 | Y2 | Y3 | W  | r  | r  | r  | С  |    |
| addu r6, r6, r9 |   |   |   |    | F  | D  | i  | I  | X  | W  | r  | r  | r  | r  | r  | С  |

Figure 4: Pipeline Diagram with Register Renaming

### 2.c Register Renaming with Pointers in the IQ/ROB

| -     |                         | $\mathbf{St}$ | age |                         |                          |               |    | ]   | RT |     |            |    |    |                           |               |
|-------|-------------------------|---------------|-----|-------------------------|--------------------------|---------------|----|-----|----|-----|------------|----|----|---------------------------|---------------|
| Cycle | $\overline{\mathbf{D}}$ | Ι             | W   | $\overline{\mathbf{C}}$ | $\overline{\mathbf{r1}}$ | $\mathbf{r2}$ | r3 | r4  | r5 | r6  | <b>r</b> 7 | r8 | r9 | Free List                 | $\mathbf{IQ}$ |
| 0     |                         |               |     |                         | p0                       | p1            | p2 | р3  | p4 | p5  | p6         | p7 | p8 | p9,pA,pB,pC,pD            |               |
| 1     | 1                       |               |     |                         | :                        | :             | :  | :   | :  | :   | :          | :  | :  | p9,pA,pB,pC,pD            |               |
| 2     | 2                       | 1             |     |                         | p9*                      | :             | :  | :   | :  | :   | :          | :  | :  | $_{\mathrm{pA,pB,pC,pD}}$ | p9/p1/p2      |
| 3     | 3                       |               |     |                         | :                        | :             | :  | pA* | :  | :   | :          | :  | :  | $_{ m pB,pC,pD}$          | pA/p9*/p4     |
| 4     | 4                       | 3             |     |                         | :                        | :             | :  | :   | :  | pB* | :          | :  | :  | $_{ m pC,pD}$             | pB/p6/p7      |
| 5     | 5                       | 4             |     |                         | pC*                      | :             | :  | :   | :  | :   | :          | :  | :  | pD                        | pC/p1/p4      |
| 6     |                         | 2             | 3   |                         | :                        | :             | :  | :   | :  | pD* | :          | :  | :  |                           | pD/pB*/p8     |
| 7     |                         | 5             | 1   |                         | :                        | :             | :  | :   | :  | :   | :          | :  | :  |                           |               |
| 8     |                         |               |     | 1                       | :                        | :             | :  | :   | :  | :   | :          | :  | :  |                           |               |
| 9     |                         |               | 5   |                         | :                        | :             | :  | :   | :  | :   | :          | :  | :  | p0                        |               |
| 10    |                         |               | 4   |                         | :                        | :             | :  | :   | :  | pD  | :          | :  | :  | p0                        |               |
| 11    |                         |               | 2   |                         | рC                       | :             | :  | :   | :  | :   | :          | :  | :  | p0                        |               |
| 12    |                         |               |     | 2                       | :                        | :             | :  | pΑ  | :  | :   | :          | :  | :  | p0                        |               |
| 13    |                         |               |     | 3                       | :                        | :             | :  | :   | :  | :   | :          | :  | :  | p0,p3                     |               |
| 14    |                         |               |     | 4                       | :                        | :             | :  | :   | :  | :   | :          | :  | :  | p0,p3,p5                  |               |
| 15    |                         |               |     | 5                       | :                        | :             | :  | :   | :  | :   | :          | :  | :  | p0,p3,p5,p9               |               |
| 16    |                         |               |     |                         | :                        | :             | :  | :   | :  | :   | :          | :  | :  | p0,p3,p5,p9,pB            |               |

Figure 5: Microarchitectural State (RT/FL/IQ) for Reg Renaming with Pointers in the IQ/ROB

|       |           |           | ROB                 |                |                     |
|-------|-----------|-----------|---------------------|----------------|---------------------|
| Cycle | 0         | 1         | 2                   | 3              | 4                   |
| 0     |           |           |                     |                |                     |
| 1     |           |           |                     |                |                     |
| 2     | p9*/r1/p0 |           |                     |                |                     |
| 3     |           | pA*/r4/p3 |                     |                |                     |
| 4     |           |           | pB*/r6/p5           |                |                     |
| 5     |           |           |                     | $pC^*/r1/p9^*$ |                     |
| 6     |           |           |                     |                | $pD^*/r6/pB^*$      |
| 7     |           |           | $\mathrm{pB/r6/p5}$ |                | pD*/r6/pB           |
| 8     | p9/r1/p0  |           |                     | pC*/r1/p9      |                     |
| 9     |           |           |                     |                |                     |
| 10    |           |           |                     |                | $\mathrm{pD/r6/pB}$ |
| 11    |           |           |                     | pC/r1/p9       |                     |
| 12    |           | pA/r4/p3  |                     |                |                     |
| 13    |           |           | •                   |                |                     |
| 14    |           |           |                     | •              |                     |
| 15    |           |           |                     |                | •                   |

Figure 6: Microarchitectural State (ROB) for Reg Renaming with Pointers in the IQ/ROB

# 2.d Register Renaming with Values in the IQ/ROB

|                  |   | $\mathbf{St}$ | age |                         |     |           |    |     | RT |     |           |            |    |           |        |        | RO  | ЭB             |        |        |
|------------------|---|---------------|-----|-------------------------|-----|-----------|----|-----|----|-----|-----------|------------|----|-----------|--------|--------|-----|----------------|--------|--------|
| $\mathbf{Cycle}$ | D | Ι             | W   | $\overline{\mathbf{C}}$ | r1  | <b>r2</b> | r3 | r4  | r5 | r6  | <b>r7</b> | <b>r</b> 8 | r9 | IQ        | 0      | 1      | 2   | 2              | 3      | 4      |
| 0                |   |               |     |                         |     |           |    |     |    |     |           |            |    |           |        |        |     |                |        |        |
| 1                | 1 |               |     |                         |     |           |    |     |    |     |           |            |    |           |        |        |     |                |        |        |
| 2                | 2 | 1             |     |                         | p0* |           |    |     |    |     |           |            |    | p0/r2/r3  | p0*/r1 |        |     |                |        |        |
| 3                | 3 |               |     |                         |     |           |    | p1* |    |     |           |            |    | p1/p0*/r5 |        | p1*/r4 |     |                |        |        |
| 4                | 4 | 3             |     |                         |     |           |    |     |    | p2* |           |            |    | p2/r7/r8  |        |        | p2* | 7r6            |        |        |
| 5                | 5 | 4             |     |                         | p3* |           |    |     |    |     |           |            |    | p3/r2/r5  |        |        |     |                | p3*/r1 |        |
| 6                |   | 2             | 3   |                         |     |           |    |     |    | p4* |           |            |    | p4/p2*/r9 |        |        |     |                |        | p4*/r6 |
| 7                |   | 5             | 1   |                         |     |           |    |     |    |     |           |            |    |           |        |        | p2, | $/\mathrm{r}6$ |        |        |
| 8                |   |               |     | 1                       |     |           |    |     |    |     |           |            |    |           | p0/r1  |        |     |                |        |        |
| 9                |   |               | 5   |                         |     |           |    |     |    |     |           |            |    |           |        |        |     |                |        |        |
| 10               |   |               | 4   |                         |     |           |    |     |    | p4  |           |            |    |           |        |        |     |                |        | p4/r6  |
| 11               |   |               | 2   |                         | р3  |           |    |     |    |     |           |            |    |           |        |        |     |                | p3/r1  |        |
| 12               |   |               |     | 2                       |     |           |    |     |    |     |           |            |    |           |        | p1/r4  |     |                |        |        |
| 13               |   |               |     | 3                       |     |           |    |     |    |     |           |            |    |           |        |        |     |                |        |        |
| 14               |   |               |     | 4                       |     |           |    |     |    |     |           |            |    |           |        |        | (   | •              |        |        |
| 15               |   |               |     | 5                       |     |           |    |     |    |     |           |            |    |           |        |        |     |                | •      |        |
| 16               |   |               |     |                         |     |           |    |     |    | •   |           |            |    |           |        |        |     |                |        | •      |

Figure 7: Microarchitectural State for Reg Renaming with Values in the IQ/ROB

#### 3 In-Order vs. Out-of-Order Superscalar Processors

#### 3.a Performance of In-Order Dual-Issue Processor

| Cycle:           | 0 | 1 | 2 | 3  | 4  | 5  | 6  | 7  | 8  | 9 | 10 | 11 | 12 | 13 |
|------------------|---|---|---|----|----|----|----|----|----|---|----|----|----|----|
| lw r1 , 0(r2)    | F | D | Ι | L0 | L1 | W  | С  |    |    |   |    |    |    |    |
| mul r3, r1, r4   | F | D | I | I  | I  | Y0 | Y1 | Y2 | Y3 | W | С  |    |    |    |
| sw r3, $0(r5)$   |   | F | D | D  | D  | D  | D  | D  | I  | S | W  | С  |    |    |
| addiu r2, r2, 4  |   | F | D | D  | D  | D  | D  | D  | I  | A | W  | С  |    |    |
| addiu r5, r5, 4  |   |   | F | F  | F  | F  | F  | F  | D  | Ι | A  | W  | С  |    |
| addiu r6, r6, -1 |   |   | F | F  | F  | F  | F  | F  | D  | Ι | В  | W  | С  |    |
| bne r6, r0, loop |   |   |   |    |    |    |    |    | F  | D | Ι  | A  | W  | С  |
| opA              |   |   |   |    |    |    |    |    | F  | * | *  | *  | *  | *  |

Figure 8: Pipeline Diagram for In-Order Dual-Issue Processor

As shown by the bold vertical lines, during steady state, each loop takes 9 cycles to execute. The W stage of the lw instruction is included because during looping, the W stages causes an extra cycle of "delay" between the last commit of the previous iteration and the first commit of the current iteration.

Therefore, the total number of cycles it takes to execute 64 iterations is 9\*64 = 576 cycles.

#### 3.b Performance of Out-of-Order Dual-Issue Processor

| Cycle:            | 0 | 1 | 2 | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
|-------------------|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| lw r1 , 0(r2)     | F | D | Ι | L0 | L1 | W  | С  |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
| mul r3, r1, r4    | F | D | i | i  | Ι  | Y0 | Y1 | Y2 | Y3 | W  | С  |    |    |    |    |    |    |    |    |    |    |    |
| sw $r3$ , $0(r5)$ |   | F | D | i  | i  | i  | i  | i  | Ι  | S  | W  | С  |    |    |    |    |    |    |    |    |    |    |
| addiu r2, r2, 4   |   | F | D | I  | A  | W  | r  | r  | r  | r  | r  | С  |    |    |    |    |    |    |    |    |    |    |
| addiu r5, r5, 4   |   |   | F | D  | i  | I  | Α  | W  | r  | r  | r  | r  | r  | С  |    |    |    |    |    |    |    |    |
| addiu r6, r6, -1  |   |   | F | D  | i  | I  | В  | W  | r  | r  | r  | r  | r  | С  |    |    |    |    |    |    |    |    |
| bne r6, r0, loop  |   |   |   | F  | D  | i  | I  | Α  | W  | r  | r  | r  | r  | r  | C  |    |    |    |    |    |    |    |
| opA               |   |   |   | F  | *  | *  | *  | *  | *  | *  | *  | *  | *  | *  |    |    |    |    |    |    |    |    |
| lw r1 , 0(r2)     |   |   |   |    | F  | D  | I  | L0 | L1 | W  | r  | r  | r  | r  | С  |    |    |    |    |    |    |    |
| mul r3, r1, r4    |   |   |   |    | F  | D  | i  | i  | I  | Y0 | Y1 | Y2 | Y3 | W  | r  | С  |    |    |    |    |    |    |
| sw r3, $0(r5)$    |   |   |   |    |    | F  | D  | i  | i  | i  | i  | i  | I  | S  | W  | С  |    |    |    |    |    |    |
| addiu r2, r2, 4   |   |   |   |    |    | F  | D  | I  | A  | W  | r  | r  | r  | r  | r  | r  | С  |    |    |    |    |    |
| addiu r5, r5, 4   |   |   |   |    |    |    | F  | D  | i  | I  | A  | W  | r  | r  | r  | r  | С  |    |    |    |    |    |
| addiu r6, r6, -1  |   |   |   |    |    |    | F  | D  | i  | I  | В  | W  | r  | r  | r  | r  | r  | С  |    |    |    |    |
| bne r6, r0, loop  |   |   |   |    |    |    |    | F  | D  | i  | I  | A  | W  | r  | r  | r  | r  | С  |    |    |    |    |
| opA               |   |   |   |    |    |    |    | F  | *  | *  | *  | *  | *  | *  | *  | *  | *  |    |    |    |    |    |
| lw r1 , 0(r2)     |   |   |   |    |    |    |    |    | F  | D  | I  | L0 | L1 | W  | r  | r  | r  | r  | С  |    |    |    |
| mul r3, r1, r4    |   |   |   |    |    |    |    |    | F  | D  | i  | i  | I  | Y0 | Y1 | Y2 | Y3 | W  |    |    |    |    |
| sw r3, $0(r5)$    |   |   |   |    |    |    |    |    |    | F  | D  | i  | i  | i  | i  | i  | I  | S  | W  | С  |    |    |
| addiu $r2, r2, 4$ |   |   |   |    |    |    |    |    |    | F  | D  | I  | A  | W  | r  | r  | r  | r  | r  | С  |    |    |
| addiu $r5, r5, 4$ |   |   |   |    |    |    |    |    |    |    | F  | D  | i  | I  | Α  | W  | r  | r  | r  | r  | С  |    |
| addiu r6, r6, -1  |   |   |   |    |    |    |    |    |    |    | F  | D  | i  | Ι  | В  | W  | r  | r  | r  | r  | С  |    |
| bne r6, r0, loop  |   |   |   |    |    |    |    |    |    |    |    | F  | D  | i  | Ι  | A  | W  | r  | r  | r  | r  | С  |
| opA               |   |   |   |    |    |    |    |    |    |    |    | F  | *  | *  | *  | *  | *  | *  | *  | *  | *  |    |

Figure 9: Pipeline Diagram for Out-of-Order Dual-Issue Processor

As shown by the bold vertical lines, during steady state, it takes 8 cycles to execute 2 loops. Therefore, the total number of cycles it takes to execute 64 iterations is 8\*32 = 256 cycles.

#### 3.c Dual-Issue In-Order versus Dual-Issue Out-of-Order

# 4 Branch Prediction

# 4.a Two-Bit Saturating Counter Branch History Table

|                 |        | Bra | nch B | 80           | Bra | nch B | 1                       | Bran | ch B | <b>2</b>     |
|-----------------|--------|-----|-------|--------------|-----|-------|-------------------------|------|------|--------------|
| i               | src[i] | BHT | P     | $\mathbf{A}$ | BHT | P     | $\overline{\mathbf{A}}$ | BHT  | P    | $\mathbf{A}$ |
| 0               | 0      | WT  | Т     | Т            | WT  | Τ     | Т                       | WT   | Τ    | Т            |
| 1               | 0      | ST  | Т     | Т            | ST  | Τ     | Τ                       | ST   | Τ    | Т            |
| 2               | 12     | ST  | Т     | NT           | ST  | Τ     | NT                      | ST   | Τ    | Т            |
| 3               | 15     | WT  | Τ     | NT           | WT  | Τ     | NT                      | ST   | Τ    | Т            |
| 4               | 0      | WNT | NT    | Т            | WNT | NT    | Τ                       | ST   | Τ    | Т            |
| 5               | 0      | WT  | Т     | Т            | WT  | Τ     | Τ                       | ST   | Τ    | Т            |
| 6               | 11     | ST  | Т     | NT           | ST  | Τ     | NT                      | ST   | Τ    | Т            |
| 7               | 17     | WT  | Т     | NT           | WT  | Τ     | NT                      | ST   | Τ    | Т            |
| 8               | 0      | WNT | NT    | Т            | WNT | NT    | Τ                       | ST   | Τ    | Т            |
| 9               | 0      | WT  | Т     | Т            | WT  | Τ     | Т                       | ST   | Τ    | Т            |
| 10              | 11     | ST  | Т     | NT           | ST  | Τ     | NT                      | ST   | Τ    | Т            |
| 11              | 13     | WT  | Т     | NT           | WT  | Т     | NT                      | ST   | Т    | Т            |
| 12              | 9      | WNT | NT    | Т            | WNT | NT    | NT                      | ST   | Т    | Т            |
| 13              | 0      | WT  | Т     | Т            | SNT | NT    | Т                       | ST   | Т    | Т            |
| $\overline{14}$ | 12     | ST  | Т     | NT           | WNT | NT    | NT                      | ST   | Т    | Т            |
| 15              | 15     | WT  | Т     | NT           | SNT | NT    | NT                      | ST   | Т    | Т            |
| 16              | 0      | WNT | NT    | Т            | SNT | NT    | Т                       | ST   | Т    | Т            |
| 17              | 8      | WT  | Т     | Т            | WNT | NT    | NT                      | ST   | Τ    | Т            |
| 18              | 12     | ST  | Т     | NT           | SNT | NT    | NT                      | ST   | Τ    | Т            |
| 19              | 18     | WT  | Τ     | NT           | SNT | NT    | NT                      | ST   | Τ    | NT           |

Figure 10: Two-Bit Saturating Counter BHT Execution

# 4.b Two-Level Adaptive Branch Predictor to Exploit Temporal Correlation

|                 |        | B     | ranch I | 30 |              | В     | ranch I | 31 |              | Bra   | anch B | 2 |                |
|-----------------|--------|-------|---------|----|--------------|-------|---------|----|--------------|-------|--------|---|----------------|
| i               | src[i] | BHSRT | BHT     | P  | $\mathbf{A}$ | BHSRT | BHT     | P  | $\mathbf{A}$ | BHSRT | BHT    | P | $\mathbf{A}$   |
| 0               | 0      | 000   | WT      | Т  | Т            | 000   | WT      | Т  | Т            | 000   | WT     | Т | Т              |
| 1               | 0      | 001   | WT      | Т  | Т            | 001   | WT      | Т  | Т            | 001   | WT     | Т | Т              |
| 2               | 12     | 011   | WT      | Т  | NT           | 011   | WT      | Т  | NT           | 011   | WT     | Т | Т              |
| 3               | 15     | 110   | WT      | Т  | NT           | 110   | WT      | Т  | NT           | 111   | WT     | Т | Т              |
| 4               | 0      | 100   | WT      | Т  | Τ            | 100   | WT      | Τ  | Τ            | 111   | ST     | Τ | Т              |
| 5               | 0      | 101   | WT      | Τ  | Τ            | 101   | WT      | Τ  | Τ            | 111   | ST     | Т | Т              |
| 6               | 11     | 011   | WNT     | NT | NT           | 011   | WNT     | NT | NT           | 111   | ST     | Т | Т              |
| 7               | 17     | 110   | WNT     | NT | NT           | 110   | WNT     | NT | NT           | 111   | ST     | Т | Т              |
| 8               | 0      | 100   | ST      | Τ  | Τ            | 100   | ST      | Τ  | Τ            | 111   | ST     | Τ | Т              |
| 9               | 0      | 101   | ST      | Τ  | Τ            | 101   | ST      | Τ  | Τ            | 111   | ST     | Т | Т              |
| 10              | 11     | 011   | SNT     | NT | NT           | 011   | SNT     | NT | NT           | 111   | ST     | Т | T              |
| 11              | 13     | 110   | SNT     | NT | NT           | 110   | SNT     | NT | NT           | 111   | ST     | Т | T              |
| $\overline{12}$ | 9      | 100   | ST      | Τ  | Τ            | 100   | ST      | Τ  | NT           | 111   | ST     | Т | T              |
| 13              | 0      | 101   | ST      | Т  | Τ            | 000   | ST      | Τ  | Τ            | 111   | ST     | Т | $\overline{T}$ |
| $\overline{14}$ | 12     | 011   | SNT     | NT | NT           | 001   | ST      | Т  | NT           | 111   | ST     | Т | Т              |
| $\overline{15}$ | 15     | 110   | SNT     | NT | NT           | 010   | WT      | Т  | NT           | 111   | ST     | Т | Т              |
| 16              | 0      | 100   | ST      | Τ  | Τ            | 100   | WT      | Τ  | Τ            | 111   | ST     | Т | Т              |
| $\overline{17}$ | 8      | 101   | ST      | Τ  | Τ            | 001   | WT      | Τ  | NT           | 111   | ST     | Т | Т              |
| 18              | 12     | 011   | SNT     | NT | NT           | 010   | WNT     | NT | NT           | 111   | ST     | Τ | T              |
| 19              | 18     | 110   | SNT     | NT | NT           | 100   | ST      | Т  | NT           | 111   | ST     | Τ | NT             |

Figure 11: Two-Level BHT for Temporal Correlation Execution

# 4.c Two-Level Adaptive Branch Predictor to Exploit Spatial Correlation

|    |        | В    | ranch | B0 |              | В    | ranch | B1 |                         | Br   | anch E | 32           |                         |
|----|--------|------|-------|----|--------------|------|-------|----|-------------------------|------|--------|--------------|-------------------------|
| i  | src[i] | BHSR | BHT   | P  | $\mathbf{A}$ | BHSR | BHT   | P  | $\overline{\mathbf{A}}$ | BHSR | BHT    | P            | $\overline{\mathbf{A}}$ |
| 0  | 0      | 0    | WT    | Τ  | Τ            | 1    | WT    | Τ  | Τ                       | 1    | WT     | Τ            | $\overline{\mathrm{T}}$ |
| 1  | 0      | 1    | WT    | Т  | Т            | 1    | ST    | Τ  | Τ                       | 1    | ST     | Т            | $\overline{\mathrm{T}}$ |
| 2  | 12     | 1    | ST    | Т  | NT           | 0    | WT    | Τ  | NT                      | 0    | WT     | Т            | Т                       |
| 3  | 15     | 1    | WT    | Т  | NT           | 0    | WNT   | NT | NT                      | 0    | ST     | Т            | Т                       |
| 4  | 0      | 1    | WNT   | NT | Т            | 1    | ST    | Т  | Т                       | 1    | ST     | Τ            | Т                       |
| 5  | 0      | 1    | WT    | Τ  | Τ            | 1    | ST    | Τ  | Τ                       | 1    | ST     | Т            | Т                       |
| 6  | 11     | 1    | ST    | Т  | NT           | 0    | SNT   | NT | NT                      | 0    | ST     | Т            | Т                       |
| 7  | 17     | 1    | WT    | Τ  | NT           | 0    | SNT   | NT | NT                      | 0    | ST     | Т            | Т                       |
| 8  | 0      | 1    | WNT   | NT | Τ            | 1    | ST    | Τ  | Τ                       | 1    | ST     | Τ            | T                       |
| 9  | 0      | 1    | WT    | Τ  | Τ            | 1    | ST    | Τ  | Τ                       | 1    | ST     | Τ            | Т                       |
| 10 | 11     | 1    | ST    | Τ  | NT           | 0    | SNT   | NT | NT                      | 0    | ST     | $\mathbf{T}$ | Т                       |
| 11 | 13     | 1    | WT    | Τ  | NT           | 0    | SNT   | NT | NT                      | 0    | ST     | $\mathbf{T}$ | Τ                       |
| 12 | 9      | 1    | WNT   | NT | Τ            | 1    | ST    | Τ  | NT                      | 0    | ST     | Τ            | Τ                       |
| 13 | 0      | 1    | WT    | Τ  | Τ            | 1    | WT    | Τ  | Τ                       | 1    | ST     | Т            | Τ                       |
| 14 | 12     | 1    | ST    | Τ  | NT           | 0    | SNT   | NT | NT                      | 0    | ST     | Т            | Т                       |
| 15 | 15     | 1    | WT    | Τ  | NT           | 0    | SNT   | NT | NT                      | 0    | ST     | Т            | Т                       |
| 16 | 0      | 1    | WNT   | NT | Τ            | 1    | ST    | Τ  | Τ                       | 1    | ST     | Τ            | Т                       |
| 17 | 8      | 1    | WT    | Т  | Т            | 1    | ST    | Т  | NT                      | 0    | ST     | Т            | Т                       |
| 18 | 12     | 1    | ST    | Т  | NT           | 0    | SNT   | NT | NT                      | 0    | ST     | Τ            | Т                       |
| 19 | 18     | 1    | WT    | Т  | NT           | 0    | SNT   | NT | NT                      | 0    | ST     | Т            | NT                      |

Figure 12: Two-Level BHT for Spatial Correlation Execution

# 4.d Branch Predictor Comparison

|              | Two-Bit FSM | Two-Level Temporal | Two-Level Spatial |
|--------------|-------------|--------------------|-------------------|
|              | Accuracy    | Accuracy           | Accuracy          |
| Branch B0    | 30%         | 90%                | 30%               |
| Branch B1    | 50%         | 65%                | 85%               |
| Branch B2    | 95%         | 95%                | 95%               |
| All Branches | 58%         | 83%                | 70%               |

Figure 13: Summary of Branch Predictor Accuracies