#### HARDWARE SPECULATION

Mahdi Nazm Bojnordi

**Assistant Professor** 

School of Computing

University of Utah



#### Overview

- □ Announcement
  - Mid-term exam: Mar. 5<sup>th</sup>
  - No homework till after Mar. 14<sup>th</sup> ©
- □ This lecture
  - Out-of-order pipeline
    - Issue queue
    - Register renaming
    - Branch recovery
    - Speculated execution

#### Recall: Out-of-Order Execution

- Producer-consumer chains on the fly
  - Register renaming: remove anti-/output-dependences via register tags
  - Limited by the number of instructions in the instruction window (ROB)
- □ Out-of-order issue (dispatch)
  - Broadcast tags to waiting instructions
  - Wake up ready instructions and select among them
- Out-of-order execute/complete
- In-order fetch/decode and commit

## Out-of-Order Pipelines

Distributed reservation stations



# Out-of-Order Pipelines

Out of order issue/dispatch to functional units



#### Out-of-Order Issue Queue



### Register Renaming

Register aliasing table for fast lookup



## Register Renaming

Free register list for fast register renaming



## **Example: Instruction Commit**

□ Update value in ROB



## **Example: Instruction Commit**

□ Register file write



## **Example: Instruction Commit**

□ Update tables and ROB



□ Allocate entries on ROB, IQ, and FL



□ Allocate entries on ROB, IQ, and FL



□ Allocate entries on ROB, IQ, and FL



□ Instruction has to wait for free resources



□ Issue ready instruction if free FU exists



□ Out-of-order issue is now possible



□ Out-of-order issue is now possible



□ Wakeup and select



Keep the program order to avoid starvation



## **Example: Instruction Execute**

□ Issue ready instructions



# **Example: Instruction Execute**

#### □ Update ROB



# Register Renaming Example

#### ■ Where values are stored?

#### Issue Queue









#### **Decode Queue**



#### Reorder Buffer

| ROB 1 | T0 | R1 |
|-------|----|----|
| ROB 2 | T1 | R2 |
| ROB 3 | T2 |    |
| ROB 4 | Т3 | R3 |
| ROB 5 | T4 | R1 |

# Branch Recovery



#### Revisit Branch Prediction

□ Problem: find the average number of stall cycles caused by branches in a pipeline, where branch misprediction penalty is 20 cycles, branch predictor accuracy is 90%, and branch target buffer hit rate is 80%. Every fifth instruction is a branch; 30% of branches are actually taken.

#### Revisit Branch Prediction

□ Problem: find the average number of stall cycles caused by branches in a pipeline, where branch misprediction penalty is 20 cycles, branch predictor accuracy is 90%, and branch target buffer hit rate is 80%. Every fifth instruction is a branch; 30% of branches are actually taken.

- $\blacksquare$  Average misses = 1- (0.3x0.9x0.8 + 0.7x0.9) = 0.151
- Average stalls = 20x0.2x0.151 = 0.6

## Speculated Execution

- Problem: branch may significantly limit performance
  - consumer of a load or long latency instructions
- Solution: speculative instruction execution
  - Fetch and decode instructions speculatively
  - Issue and execute speculative instructions
  - Branch resolution
    - Nullify the impact of speculative instructions if mispredicted
    - Commit speculative instructions (writes to register file/memory) only if prediction was correct

# Branch Recovery

Squash all mispredicted entries



# Physical Register File

Avoid copying register values multiple times



# Physical Register File

Front RAT

**P4** 

#### Avoid copying register values multiple times

Note1: only a subset of the Phy. Reg. file is committed at any time.

#### Decode Queue



R2 P1 R3 P3 R3 R2 R2 ...

R1

Note2: no need for storing values in ROB or IQ

#### Issue Queue



**Functional Units** 



P0

P1

P2 P3

**P4** 

P5

Retire RAT



Reorder Buffer Phy. Reg. File



#### Double RAT Architecture



# Physical Register Release

□ Example: when is it safe to free p30 (R1)?



