# Instruction-Level Parallelism

Presented by: Han Trung Dinh

## References and Evaluation

#### References:

- Kai H., Naresh J., Advanced Computer Architecture Parallelism, Scalability, Programmability, Mcgraw-Hill, 2008
- John L. H., David A. P., Computer Architecture, A Quantitative Approach, 6<sup>th</sup> edition, Elsevier, 2019
- Ata E., Computer Systems Digital Design, Fundamentals of Computer Architecture and Assembly Language, Springer, 2018
- Igor Z. Low-Level Programming C, Assembly, and Program Execution on Intel® 64 Architecture, Apress 2017
- Larry D. P., Modern Assembly Language Programming with the ARM Processor, Elsevier, 2016
- Barry B. B., The Intel microprocessors The Architecture, Programming, and Interfacing, 8th Edition, Prentice Hall, 2008
- Using ScoreBoard & Tomasulo slides from http://www.csie.nuk.edu.tw/~wuch/course/eef011/

#### (\*) Note that this course uses a lot of sources from the Internet

#### Evaluation:

- Projects/paper presentation/writing papers: 50%
- Final exam: 50%

#### • Prerequisites:

• Students should have fundamental knowledge of Computer Architecture

# What is pipelining?

### Pipelining:

- is an implementation technique whereby multiple instructions are overlapped in execution
- takes advantage of parallelism that exists among the actions needed to execute an instruction

# Pipeline Stages

- 1. Instruction fetch cycle (IF)
- 2. Instruction decode/register fetch cycle (ID)
- 3. Execution/effective address cycle (EX):
- 4. Memory access (MEM)
- 5. Write-back cycle (WB)

| Instr. No.     |    | ı  | Pipel | ine S | Stage | 9   |     |
|----------------|----|----|-------|-------|-------|-----|-----|
| 1              | IF | ID | EX    | MEM   | WB    |     |     |
| 2              |    | IF | ID    | EX    | МЕМ   | WB  |     |
| 3              |    |    | IF    | ID    | EX    | МЕМ | WB  |
| 4              |    |    |       | IF    | ID    | EX  | МЕМ |
| 5              |    |    |       |       | IF    | ID  | EX  |
| Clock<br>Cycle | 1  | 2  | 3     | 4     | 5     | 6   | 7   |

# Pipeline Stages (cont.)



# The Major Hurdle of Pipelining—Pipeline Hazards

- Structural hazards: arise from resource conflicts when the hardware cannot support all possible combinations of instructions simultaneously in overlapped execution
- 2. Data hazards: arise when an instruction depends on the results of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline.
- 3. Control hazards: arise from the pipelining of branches and other instructions that change the PC.

# Performance of Pipelines With Stalls

$$Speedup = \frac{CPI \text{ unpiplined}}{1 + Pipeline \text{ stall cycles per instruction}}$$

One important simple case is where all instructions take the same number of cycles: (Pipeline depth = # of pipeline stages

Speedup = 
$$\frac{\text{Pipeline depth}}{1 + \text{Pipeline stall cycles per instruction}}$$

### Data Hazards

Assume instruction i occurs in program order before instruction j and both instructions use register x, then there are three different types of hazards that can occur between i and j:

- 1. Read After Write (RAW) hazard: these occur when a read of register x by instruction j occurs before the write of register x by instruction i. If this hazard were not prevented instruction j would use the wrong value of x.
- 2. Write After Read (WAR) hazard: this hazard occurs when read of register x by instruction i occurs after a write of register x by instruction j. In this case, instruction i would use the wrong value of x
- 3. Write After Write (WAW) hazard: this hazard occurs when write of register x by instruction i occurs after a write of register x by instruction j. When this occurs, register x will have the wrong value going forward.

# An Example of Data Hazards



# Minimizing Data Hazard Stalls by Forwarding



# Forwarding of operand required by stores during MEM



# Cannot forward the result to negative time (LD -> Sub)



# An Example Why Stall is needed

| ld x1,0(x2)  | IF | ID | EX | MEM   | WB  |     |     |     |    |
|--------------|----|----|----|-------|-----|-----|-----|-----|----|
| sub x4,x1,x5 |    | IF | ID | EX    | MEM | WB  |     |     |    |
| and x6,x1,x7 |    |    | IF | ID    | EX  | MEM | WB  |     |    |
| or x8,x1,x9  |    |    |    | IF    | ID  | EX  | MEM | WB  |    |
| ld x1,0(x2)  | IF | ID | EX | MEM   | WB  |     |     |     |    |
| sub x4,x1,x5 |    | IF | ID | Stall | EX  | MEM | WB  |     |    |
| and x6,x1,x7 |    |    | IF | Stall | ID  | EX  | MEM | WB  |    |
| or x8,x1,x9  |    |    |    | Stall | IF  | ID  | EX  | MEM | WB |

# Control Hazards (Branch Hazards)

| Untaken branch instruction | IF | ID | EX   | MEM  | WB   |      |     |     |    |
|----------------------------|----|----|------|------|------|------|-----|-----|----|
| Instruction i+1            |    | IF | ID   | EX   | MEM  | WB   |     |     |    |
| Instruction $i+2$          |    |    | IF   | ID   | EX   | MEM  | WB  |     |    |
| Instruction i+3            |    |    |      | IF   | ID   | EX   | MEM | WB  |    |
| Instruction i+4            |    |    |      |      | IF   | ID   | EX  | MEM | WB |
| Taken branch instruction   | IF | ID | EX   | MEM  | WB   |      |     |     | Ē  |
| Instruction i+1            |    | IF | idle | idle | idle | idle |     |     |    |
| Branch target              |    |    | IF   | ID   | EX   | MEM  | WB  |     |    |
| Branch target+1            |    |    |      | IF   | ID   | EX   | MEM | WB  |    |
| Branch target+2            |    |    |      |      | IF   | ID   | EX  | MEM | WB |

# Example to eleminate WAR and WAW by register renaming

### Original

```
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
```

WAR between ADD.D and SUB.D, WAW between ADD.D and MUL.D (Due to that DIV.D needs to take much longer cycles to get F0)

### Register renaming

```
DIV.D F0, F2, F4
ADD.D S, F0, F8
S.D S, 0(R1)
SUB.D T, F10, F14
MUL.D F6, F10, T
```

# Dynamic Scheduling with a Scoreboard

- Out-of-order completion => WAR, WAW hazards?
- Solutions for WAR
  - Queue both the operation and copies of its operands
  - Read registers only during Read Operands stage
- For WAW, must detect hazard: stall until other completes
- Need to have multiple instructions in execution phase
   => multiple execution units or pipelined execution units
- Scoreboard keeps track of dependencies, state or operations
- Scoreboard replaces ID, EX, WB with 4 stages

### Four Stages of Scoreboard Control

- 1. Issue —decode instructions & check for structural hazards (ID1)
  - If a functional unit for the instruction is free and
  - no other active instruction has the same destination register (WAW),
  - => the scoreboard issues the instruction to the functional unit and updates its internal data structure.

If a structural or WAW hazard exists,

=> then the instruction issue stalls, and no further instructions will issue until these hazards are cleared.

2. Read operands —wait until no data hazards, then read operands (ID2)

#### A source operand is available

- if no earlier issued active instruction is going to write it (RAW),
- or if the register containing the operand is being written by a currently active functional unit.

When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution.

The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.

- 3. Execution operate on operands (EX)
  - The functional unit begins execution upon receiving operands.
  - When the result is ready, it notifies the scoreboard that it has completed execution.
- 4. Write result —finish execution (WB)

Once the scoreboard is aware that the functional unit has completed execution, the scoreboard checks for WAR hazards.

- If none, it writes results.
- If WAR, then it stalls the instruction.

#### **Example:**

DIVD F0,F2,F4

**ADDD F10,F0,F8** 

SUBD **F8,F8,F14** 

Scoreboard would stall SUBD until ADDD reads operands

### Three Parts of the Scoreboard

- 1. Instruction status—Indicates which of 4 steps the instruction is in
- 2. Functional unit status—Indicates the state of the functional unit (FU).

  9 fields for each functional unit

```
Busy—Indicates whether the unit is busy or not
```

Op—Operation to perform in the unit (e.g., + or –)

Fi—Destination register

Fj, Fk—Source-register numbers

Qj, Qk—Functional units producing source registers Fj, Fk

Rj, Rk—Flags indicating when Fj, Fk are ready and not yet read.

Set to No after operands are read

3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

# Detailed Scoreboard Pipeline Control

| Instruction status | Not busy (FU) and not result(D) (WAW)  Rigard Rigard Rk (RAW)  Functional unit done  ∀f((Fj(f)≠Fi(FU) or Rj(f)=No) & (Fk(f)≠Fi(FU) or | Bookkeeping                                                                                                                                                     |
|--------------------|---------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue              | and not result(D)                                                                                                                     | Busy(FU)← yes; Op(FU)← op;<br>Fi(FU)← `D';<br>Fj(FU)← `S1'; Fk(FU)← `S2';<br>Qj← Result('S1'); Qk← Result(`S2');<br>Rj← not Qj; Rk← not Qk;<br>Result('D')← FU; |
| Read operands      |                                                                                                                                       | Rj← No; Rk← No                                                                                                                                                  |
| Execution complete |                                                                                                                                       |                                                                                                                                                                 |
| Write result       | or Rj( f )=No) &                                                                                                                      | ∀f(if Qj(f)=FU then Rj(f)← Yes);<br>∀f(if Qk(f)=FU then Rj(f)← Yes);<br>Result(Fi(FU))← 0; Busy(FU)← No                                                         |

### Scoreboard Example

This is the sample code we'll be working with in the example:

| LD   | F6, 34(R2)  |
|------|-------------|
| LD   | F2, 45(R3)  |
| MULT | F0, F2, F4  |
| SUBD | F8, F6, F2  |
| DIVD | F10, F0, F6 |
| ADDD | F6, F8, F2  |

| Latencies ( | clock cycles): |
|-------------|----------------|
| LD          | 1              |
| MULT        | 10             |
| SUBD        | 2              |
| DIVD        | 40             |
| ADDD        | 2              |

What are the hazards in this code?

Without pipelining:

$$3 * 6 (ID1,ID2,WB) + (1 + 1 + 10 + 2 + 40 + 2) (EX) = 18 + 56 = 74 \text{ cycles}$$

Ideal single-FU pipelining:

$$3 (ID1 + ID2 + WB) + (1 + 1 + 10 + 2 + 40 + 2) (EX) = 3 + 56 = 59$$
cycles

### Scoreboard Example















| Instruction status Instruction j k | Issue |         |      |    |     |          | MULT can't read its operands (F2) because LD #2 hasn't |     |     |  |  |
|------------------------------------|-------|---------|------|----|-----|----------|--------------------------------------------------------|-----|-----|--|--|
| LD F6 34+ R2                       | 1     | 2       | 3    | 4  |     | finished | I.                                                     |     |     |  |  |
| LD F2 45+ R3                       | 5     | 6       | 7    |    | L   |          |                                                        |     |     |  |  |
| MULT F0 F2 F4                      | 6     |         |      |    |     |          |                                                        |     |     |  |  |
| SUBD F8 F6 F2                      | 7     |         |      |    |     |          |                                                        |     |     |  |  |
| DIVD F10 F0 F6                     |       |         |      |    |     |          |                                                        |     |     |  |  |
| ADDD F6 F8 F2                      |       |         |      |    |     |          |                                                        |     |     |  |  |
| Functional unit status             |       |         | dest | S1 | S2  | FU for j | FU for k                                               | Fj? | Fk? |  |  |
| Time Name                          | Busy  | Ор      | Fi   | Fj | Fk  | Qj       | Qk                                                     | Rj  | Rk  |  |  |
| Integer                            | Yes   | Load    | F2   |    | R3  |          |                                                        |     | Yes |  |  |
| Mult1                              | Yes   | Mult    | F0   | F2 | F4  | Integer  |                                                        | No  | Yes |  |  |
| Mult2                              | No    |         |      |    |     |          |                                                        |     |     |  |  |
| Add                                | Yes   | Sub     | F8   | F6 | F2  |          | Integer                                                | Yes | No  |  |  |
| Divide                             | No    |         |      |    |     |          |                                                        |     |     |  |  |
| Register result status             |       |         |      |    |     |          |                                                        |     |     |  |  |
| Clock                              | F0    | F2      | F4   | F6 | F8  | F10      | F12                                                    |     | F30 |  |  |
| <b>7</b> FU                        | Mult1 | Integer |      |    | Add |          |                                                        |     |     |  |  |

| Instruction solution Instruction | status<br><i>j</i><br>34+ | <i>k</i><br>R2 | <i>Issue</i> | Read<br>operand | Execution Execution    Second completed    3 |    |     |          |          | BD bo | th waiting |
|----------------------------------|---------------------------|----------------|--------------|-----------------|----------------------------------------------|----|-----|----------|----------|-------|------------|
| LD F2                            | 45+                       | R3             | 5            | 6               | 7                                            |    |     |          |          |       |            |
| MULT F0                          | F2                        | F4             | 6            |                 |                                              |    |     |          |          |       |            |
| SUBD F8                          | F6                        | F2             | 7            |                 |                                              |    |     |          |          |       |            |
| DIVD F10                         | F0                        | F6             | 8            |                 |                                              |    |     |          |          |       |            |
| ADDD F6                          | F8                        | F2             |              |                 |                                              |    |     |          |          |       |            |
| <u>Functional ι</u>              | unit sta                  | atus           |              |                 | dest                                         | S1 | S2  | FU for j | FU for k | Fj?   | Fk?        |
| Time                             | Nam                       | е              | Busy         | Ор              | Fi                                           | Fj | Fk  | Qj       | Qk       | Rj    | Rk         |
|                                  | Integ                     | er             | Yes          | Load            | F2                                           |    | R3  |          |          |       | Yes        |
|                                  | Mult1                     |                | Yes          | Mult            | F0                                           | F2 | F4  | Integer  |          | No    | Yes        |
|                                  | Mult2                     | <u>-</u>       | No           |                 |                                              |    |     |          |          |       |            |
|                                  | Add                       |                | Yes          | Sub             | F8                                           | F6 | F2  |          | Integer  | Yes   | No         |
|                                  | Divid                     | е              | Yes          | Div             | F10                                          | F0 | F6  | Mult1    |          | No    | Yes        |
| Register res                     | sult sta                  | <u>itus</u>    |              |                 |                                              |    |     |          |          |       |            |
| Clock                            |                           |                | F0           | F2              | F4                                           | F6 | F8  | F10      | F12      |       | F30        |
| 8                                |                           | FU             | Mult1        | Integer         |                                              |    | Add | Divide   |          |       |            |

| Instruction  | status   |             |       | Read   | Execut   | ic Write | )  | LD #2    | writes I | F2. |     |
|--------------|----------|-------------|-------|--------|----------|----------|----|----------|----------|-----|-----|
| Instruction  | j        | k           | Issue | operan | d comple | t Resu   | lt |          |          |     |     |
| LD F6        | 34+      | R2          | 1     | 2      | 3        | 4        |    |          |          |     |     |
| LD F2        | 45+      | R3          | 5     | 6      | 7        | 8        |    |          |          |     |     |
| MULT F0      | F2       | F4          | 6     |        |          |          |    |          |          |     |     |
| SUBD F8      | F6       | F2          | 7     |        |          |          |    |          |          |     |     |
| DIVD F10     | F0       | F6          | 8     |        |          |          |    |          |          |     |     |
| ADDD F6      | F8       | F2          |       |        |          |          |    |          |          |     |     |
| Functional ι | unit sta | atus        |       |        | dest     | S1       | S2 | FU for j | FU for k | Fj? | Fk? |
| Time         | Nam      | е           | Busy  | Ор     | Fi       | Fj       | Fk | Qj       | Qk       | Rj  | Rk  |
|              | Integ    | jer         | No    |        |          |          |    |          |          |     |     |
|              | Mult     | 1           | Yes   | Mult   | F0       | F2       | F4 |          |          | Yes | Yes |
|              | Mult2    | 2           | No    |        |          |          |    |          |          |     |     |
|              | Add      |             | Yes   | Sub    | F8       | F6       | F2 |          |          | Yes | Yes |
|              | Divid    | le          | Yes   | Div    | F10      | F0       | F6 | Mult1    |          | No  | Yes |
| Register res | sult sta | <u>atus</u> |       |        |          |          |    |          |          |     |     |
| Clock        |          |             | FO    | F2     | F4       | F6       | F8 | F10      | F12      |     | F30 |
| Olook        |          |             |       |        |          |          |    |          |          |     |     |

| Instruction status                  |                | Read         | Executi               |             |     |           |          |         |          |
|-------------------------------------|----------------|--------------|-----------------------|-------------|-----|-----------|----------|---------|----------|
| Instruction <i>j k</i> LD F6 34+ R2 | <i>Issue</i> 1 | operani<br>2 | <u>d comple:</u><br>3 | ≀ Resu<br>4 |     | Now M     | ULT and  | SUBD    | can both |
| LD F2 45+ R3                        | 5              | 6            | 7                     | 8           |     | read F2   |          |         |          |
| MULT F0 F2 F4                       | 6              | 9            |                       |             |     | How ca    | n both i | instruc | tions do |
| SUBD F8 F6 F2                       | 7              | 9            |                       |             |     | this at t | the sam  | e time  | ??       |
| DIVD F10 F0 F6                      | 8              |              |                       |             | L   |           |          |         |          |
| ADDD F6 F8 F2                       |                |              |                       |             |     |           |          |         |          |
| Functional unit status              |                |              | dest                  | S1          | S2  | FU for j  | FU for k | Fj?     | Fk?      |
| Time Name                           | Busy           | Ор           | Fi                    | Fj          | Fk  | Qj        | Qk       | Rj      | Rk       |
| Integer                             | No             |              |                       |             |     |           |          |         |          |
| 10 Mult1                            | Yes            | Mult         | F0                    | F2          | F4  |           |          | Yes     | Yes      |
| Mult2                               | No             |              |                       |             |     |           |          |         |          |
| 2 Add                               | Yes            | Sub          | F8                    | F6          | F2  |           |          | Yes     | Yes      |
| Divide                              | Yes            | Div          | F10                   | F0          | F6  | Mult1     |          | No      | Yes      |
| Register result status              |                |              |                       |             |     |           |          |         |          |
| Clock                               | F0             | F2           | F4                    | F6          | F8  | F10       | F12      |         | F30      |
| <b>9</b> FU                         | Mult1          |              |                       |             | Add | Divide    |          | -       |          |

| Instruction s | status   |      |       | Read   | Execut   | ic Write | )           | ADDD     | can't st | art be | cause add |
|---------------|----------|------|-------|--------|----------|----------|-------------|----------|----------|--------|-----------|
| Instruction   | j        | k    | Issue | operan | d comple | t Resu   | <u>ı</u> lt | unit is  | busy.    |        |           |
| LD F6         | 34+      | R2   | 1     | 2      | 3        | 4        |             |          |          |        |           |
| LD F2         | 45+      | R3   | 5     | 6      | 7        | 8        |             |          |          |        |           |
| MULT F0       | F2       | F4   | 6     | 9      |          |          |             |          |          |        |           |
| SUBD F8       | F6       | F2   | 7     | 9      | 11       |          |             |          |          |        |           |
| DIVD F10      | F0       | F6   | 8     |        |          |          |             |          |          |        |           |
| ADDD F6       | F8       | F2   |       |        |          |          |             |          |          |        |           |
| Functional ı  | unit sta | atus |       |        | dest     | S1       | S2          | FU for j | FU for k | Fj?    | Fk?       |
| Time          | Nam      | е    | Busy  | Ор     | Fi       | Fj       | Fk          | Qj       | Qk       | Rj     | Rk        |
|               | Integ    | er   | No    |        |          |          |             |          |          |        |           |
| 8             | Mult1    |      | Yes   | Mult   | F0       | F2       | F4          |          |          | Yes    | Yes       |
|               | Mult2    | 2    | No    |        |          |          |             |          |          |        |           |
| 0             | Add      |      | Yes   | Sub    | F8       | F6       | F2          |          |          | Yes    | Yes       |
|               | Divid    | е    | Yes   | Div    | F10      | F0       | F6          | Mult1    |          | No     | Yes       |
| Register res  | sult sta | atus |       |        |          |          |             |          |          |        |           |
| Clock         |          |      | F0    | F2     | F4       | F6       | F8          | F10      | F12      |        | F30       |
| 11            |          | FU   | Mult1 |        |          |          | Add         | Divide   |          |        |           |



| Instruction s       | status   |      |       | Read   | Execut   | ic Write | <b>)</b> |          | ADDI     | ) issue | es. |
|---------------------|----------|------|-------|--------|----------|----------|----------|----------|----------|---------|-----|
| Instruction         | j        | k    | Issue | operan | d comple | t Resu   | lt       |          |          |         |     |
| LD F6               | 34+      | R2   | 1     | 2      | 3        | 4        |          |          |          |         |     |
| LD F2               | 45+      | R3   | 5     | 6      | 7        | 8        |          |          |          |         |     |
| MULT F0             | F2       | F4   | 6     | 9      |          |          |          |          |          |         |     |
| SUBD F8             | F6       | F2   | 7     | 9      | 11       | 12       |          |          |          |         |     |
| DIVD F10            | F0       | F6   | 8     |        |          |          |          |          |          |         |     |
| ADDD F6             | F8       | F2   | 13    |        |          |          |          |          |          |         |     |
| <u>Functional ι</u> | ınit sta | atus |       |        | dest     | S1       | S2       | FU for j | FU for k | Fj?     | Fk? |
| Time                | Nam      | е    | Busy  | Ор     | Fi       | Fj       | Fk       | Qj       | Qk       | Rj      | Rk  |
|                     | Integ    | er   | No    |        |          |          |          |          |          |         |     |
| 6                   | Mult1    | 1    | Yes   | Mult   | F0       | F2       | F4       |          |          | Yes     | Yes |
|                     | Mult2    | 2    | No    |        |          |          |          |          |          |         |     |
|                     | Add      |      | Yes   | Add    | F6       | F8       | F2       |          |          | Yes     | Yes |
|                     | Divid    | е    | Yes   | Div    | F10      | F0       | F6       | Mult1    |          | No      | Yes |
| Register res        | sult sta | atus |       |        |          |          |          |          |          |         |     |
| Clock               |          |      | F0    | F2     | F4       | F6       | F8       | F10      | F12      |         | F30 |
| 13                  |          | FU   | Mult1 |        |          | Add      |          | Divide   |          |         |     |

| Instruction s       | status   | _    |       | Read    | Execution | Write |    |          |          |     |     |
|---------------------|----------|------|-------|---------|-----------|-------|----|----------|----------|-----|-----|
| Instruction         | j        | k    | Issue | operand | d complet | Resu  | lt |          |          |     |     |
| LD F6               | 34+      | R2   | 1     | 2       | 3         | 4     |    |          |          |     |     |
| LD F2               | 45+      | R3   | 5     | 6       | 7         | 8     |    |          |          |     |     |
| MULT F0             | F2       | F4   | 6     | 9       |           |       |    |          |          |     |     |
| SUBD F8             | F6       | F2   | 7     | 9       | 11        | 12    |    |          |          |     |     |
| DIVD F10            | F0       | F6   | 8     |         |           |       |    |          |          |     |     |
| ADDD F6             | F8       | F2   | 13    | 14      |           |       |    |          |          |     |     |
| <u>Functional ι</u> | unit sta | atus |       |         | dest      | S1    | S2 | FU for j | FU for k | Fj? | Fk? |
| Time                | Nam      | е    | Busy  | Ор      | Fi        | Fj    | Fk | Qj       | Qk       | Rj  | Rk  |
|                     | Integ    | jer  | No    |         |           |       |    |          |          |     |     |
| 5                   | Mult'    | 1    | Yes   | Mult    | F0        | F2    | F4 |          |          | Yes | Yes |
|                     | Mult2    | 2    | No    |         |           |       |    |          |          |     |     |
| 2                   | Add      |      | Yes   | Add     | F6        | F8    | F2 |          |          | Yes | Yes |
|                     | Divid    | le   | Yes   | Div     | F10       | F0    | F6 | Mult1    |          | No  | Yes |
| Register res        | sult sta | atus |       |         |           |       |    |          |          |     |     |
| Clock               |          |      | F0    | F2      | F4        | F6    | F8 | F10      | F12      |     | F30 |
| 14                  |          | FU   | Mult1 |         |           | Add   |    | Divide   |          |     |     |

| Instruction status     |       | Read    | Execution | Write |    |          |          |     |     |
|------------------------|-------|---------|-----------|-------|----|----------|----------|-----|-----|
| Instruction <i>j k</i> | Issue | operand | d complet | Resu  | lt |          |          |     |     |
| LD F6 34+ R2           | 1     | 2       | 3         | 4     |    |          |          |     |     |
| LD F2 45+ R3           | 5     | 6       | 7         | 8     |    |          |          |     |     |
| MULT F0 F2 F4          | 6     | 9       |           |       |    |          |          |     |     |
| SUBD F8 F6 F2          | 7     | 9       | 11        | 12    |    |          |          |     |     |
| DIVD F10 F0 F6         | 8     |         |           |       |    |          |          |     |     |
| ADDD F6 F8 F2          | 13    | 14      |           |       |    |          |          |     |     |
| Functional unit status |       |         | dest      | S1    | S2 | FU for j | FU for k | Fj? | Fk? |
| Time Name              | Busy  | Ор      | Fi        | Fj    | Fk | Qj       | Qk       | Rj  | Rk  |
| Integer                | No    |         |           |       |    |          |          |     |     |
| 4 Mult1                | Yes   | Mult    | F0        | F2    | F4 |          |          | Yes | Yes |
| Mult2                  | No    |         |           |       |    |          |          |     |     |
| 1 Add                  | Yes   | Add     | F6        | F8    | F2 |          |          | Yes | Yes |
| Divide                 | Yes   | Div     | F10       | F0    | F6 | Mult1    |          | No  | Yes |
| Register result status |       |         |           |       |    |          |          |     |     |
| Clock                  | F0    | F2      | F4        | F6    | F8 | F10      | F12      |     | F30 |
| <b>15</b> FU           | Mult1 |         |           | Add   |    | Divide   |          |     |     |

| Instruction     | n status   | <u> </u>     |       | Read   | Executi  | c Write |            |          |          |     |     |
|-----------------|------------|--------------|-------|--------|----------|---------|------------|----------|----------|-----|-----|
| Instruction     | n <i>j</i> | k            | Issue | operan | d comple | t Resu  | <u>l</u> t |          |          |     |     |
| LD F6           | 34+        | R2           | 1     | 2      | 3        | 4       |            |          |          |     |     |
| LD F2           | 45+        | R3           | 5     | 6      | 7        | 8       |            |          |          |     |     |
| MULT FO         | ) F2       | F4           | 6     | 9      |          |         |            |          |          |     |     |
| SUBD F8         | 3 F6       | F2           | 7     | 9      | 11       | 12      |            |          |          |     |     |
| DIVD F          | 10 F0      | F6           | 8     |        |          |         |            |          |          |     |     |
| ADDD F6         | 6 F8       | F2           | 13    | 14     | 16       |         |            |          |          |     |     |
| <b>Function</b> | al unit st | <u>tatus</u> |       |        | dest     | S1      | S2         | FU for j | FU for k | Fj? | Fk? |
| Ti              | me Nan     | ne           | Busy  | Ор     | Fi       | Fj      | Fk         | Qj       | Qk       | Rj  | Rk  |
|                 | Inte       | ger          | No    |        |          |         |            |          |          |     |     |
|                 | 3 Mult     | t1           | Yes   | Mult   | F0       | F2      | F4         |          |          | Yes | Yes |
|                 | Mult       | t2           | No    |        |          |         |            |          |          |     |     |
|                 | 0 Add      |              | Yes   | Add    | F6       | F8      | F2         |          |          | Yes | Yes |
|                 | Divi       | de           | Yes   | Div    | F10      | F0      | F6         | Mult1    |          | No  | Yes |
| Register        | result st  | <u>tatus</u> |       |        |          |         |            |          |          |     |     |
| Clock           |            |              | F0    | F2     | F4       | F6      | F8         | F10      | F12      |     | F30 |
| 16              |            | FU           | Mult1 |        |          | Add     |            | Divide   |          |     |     |
|                 |            |              |       |        |          |         |            |          |          |     |     |

| Instruction status   | ,          | Read Executic Write Issue operand complet Result |         |          |        |             | ADDD can't write because of DIVD. RAW! |          |     |     |  |
|----------------------|------------|--------------------------------------------------|---------|----------|--------|-------------|----------------------------------------|----------|-----|-----|--|
| Instruction <i>j</i> | k .        | Issue                                            | operand | d comple | t Resu | <u>l</u> lt | DIVD.                                  | IVAVV :  |     |     |  |
| LD F6 34+            | R2         | 1                                                | 2       | 3        | 4      |             |                                        |          |     |     |  |
| LD F2 45+            | R3         | 5                                                | 6       | 7        | 8      |             |                                        |          |     |     |  |
| MULT F0 F2           | F4         | 6                                                | 9       |          |        |             |                                        |          |     |     |  |
| SUBD F8 F6           | F2         | 7                                                | 9       | 11       | 12     |             |                                        |          |     |     |  |
| DIVD F10 F0          | F6         | 8                                                |         |          |        |             |                                        |          |     |     |  |
| ADDD F6 F8           | F2         | 13                                               | 14      | 16       |        |             |                                        |          |     |     |  |
| Functional unit stat | <u>tus</u> |                                                  |         | dest     | S1     | S2          | FU for j                               | FU for k | Fj? | Fk? |  |
| Time Name            | ,          | Busy                                             | Ор      | Fi       | Fj     | Fk          | Qj                                     | Qk       | Rj  | Rk  |  |
| Intege               | er         | No                                               |         |          |        |             |                                        |          |     |     |  |
| 2 Mult1              |            | Yes                                              | Mult    | F0       | F2     | F4          |                                        |          | Yes | Yes |  |
| Mult2                |            | No                                               |         |          |        |             |                                        |          |     |     |  |
| Add                  |            | Yes                                              | Add     | F6       | F8     | F2          |                                        |          | Yes | Yes |  |
| Divide               | ,          | Yes                                              | Div     | F10      | F0     | F6          | Mult1                                  |          | No  | Yes |  |
| Register result stat | tus .      |                                                  |         |          |        |             |                                        |          |     |     |  |
| Clock                |            | F0                                               | F2      | F4       | F6     | F8          | F10                                    | F12      |     | F30 |  |
| 17                   | FU [       | Mult1                                            |         |          | Add    |             | Divide                                 |          |     |     |  |

| Instruction s       | status   |      |       | Read   | Executi  | c Write |            |          | Nothing  | Нарр | ens!! |
|---------------------|----------|------|-------|--------|----------|---------|------------|----------|----------|------|-------|
| Instruction         | j        | k    | Issue | operan | d comple | t Resu  | <u>I</u> t |          |          |      |       |
| LD F6               | 34+      | R2   | 1     | 2      | 3        | 4       |            |          |          |      |       |
| LD F2               | 45+      | R3   | 5     | 6      | 7        | 8       |            |          |          |      |       |
| MULT F0             | F2       | F4   | 6     | 9      |          |         |            |          |          |      |       |
| SUBD F8             | F6       | F2   | 7     | 9      | 11       | 12      |            |          |          |      |       |
| DIVD F10            | F0       | F6   | 8     |        |          |         |            |          |          |      |       |
| ADDD F6             | F8       | F2   | 13    | 14     | 16       |         |            |          |          |      |       |
| <u>Functional ι</u> | unit sta | atus |       |        | dest     | S1      | S2         | FU for j | FU for k | Fj?  | Fk?   |
| Time                | Nam      | е    | Busy  | Ор     | Fi       | Fj      | Fk         | Qj       | Qk       | Rj   | Rk    |
|                     | Integ    | er   | No    |        |          |         |            |          |          |      |       |
| 1                   | Mult1    |      | Yes   | Mult   | F0       | F2      | F4         |          |          | Yes  | Yes   |
|                     | Mult2    | 2    | No    |        |          |         |            |          |          |      |       |
|                     | Add      |      | Yes   | Add    | F6       | F8      | F2         |          |          | Yes  | Yes   |
|                     | Divid    | е    | Yes   | Div    | F10      | F0      | F6         | Mult1    |          | No   | Yes   |
| Register res        | sult sta | atus |       |        |          |         |            |          |          |      |       |
| Clock               |          |      | F0    | F2     | F4       | F6      | F8         | F10      | F12      |      | F30   |
| 18                  |          | FU   | Mult1 |        |          | Add     |            | Divide   |          |      |       |

| Instruction s | tatus    | _           |       | Read   | Execut   | ic Write | ,   | MULT co  | mpiete   | s exec | ution. |
|---------------|----------|-------------|-------|--------|----------|----------|-----|----------|----------|--------|--------|
| Instruction   | j        | k           | Issue | operan | d comple | t Resu   | ılt |          |          |        |        |
| LD F6         | 34+      | R2          | 1     | 2      | 3        | 4        |     |          |          |        |        |
| LD F2         | 45+      | R3          | 5     | 6      | 7        | 8        |     |          |          |        |        |
| MULT F0       | F2       | F4          | 6     | 9      | 19       |          |     |          |          |        |        |
| SUBD F8       | F6       | F2          | 7     | 9      | 11       | 12       |     |          |          |        |        |
| DIVD F10      | F0       | F6          | 8     |        |          |          |     |          |          |        |        |
| ADDD F6       | F8       | F2          | 13    | 14     | 16       |          |     |          |          |        |        |
| Functional u  | ınit sta | atus        | ,     |        | dest     | S1       | S2  | FU for j | FU for k | Fj?    | Fk?    |
| Time          | Nam      | е           | Busy  | Ор     | Fi       | Fj       | Fk  | Qj       | Qk       | Rj     | Rk     |
|               | Integ    | jer         | No    |        |          |          |     |          |          |        |        |
| 0             | Mult     | 1           | Yes   | Mult   | F0       | F2       | F4  |          |          | Yes    | Yes    |
|               | Mult2    | 2           | No    |        |          |          |     |          |          |        |        |
|               | Add      |             | Yes   | Add    | F6       | F8       | F2  |          |          | Yes    | Yes    |
|               | Divid    | le          | Yes   | Div    | F10      | F0       | F6  | Mult1    |          | No     | Yes    |
| Register res  | ult sta  | <u>atus</u> |       |        |          |          |     |          |          |        |        |
| Clock         |          |             | F0    | F2     | F4       | F6       | F8  | F10      | F12      |        | F30    |
| 19            |          | FU          | Mult1 |        |          | Add      |     | Divide   |          |        |        |

| Instruction s | status   |      |       | Read    | Executi  | c Write |           |          | MULT     | r write | s.  |
|---------------|----------|------|-------|---------|----------|---------|-----------|----------|----------|---------|-----|
| Instruction   | j        | k    | Issue | operand | d comple | t Resu  | <u>It</u> |          |          |         |     |
| LD F6         | 34+      | R2   | 1     | 2       | 3        | 4       |           |          |          |         |     |
| LD F2         | 45+      | R3   | 5     | 6       | 7        | 8       |           |          |          |         |     |
| MULT F0       | F2       | F4   | 6     | 9       | 19       | 20      |           |          |          |         |     |
| SUBD F8       | F6       | F2   | 7     | 9       | 11       | 12      |           |          |          |         |     |
| DIVD F10      | F0       | F6   | 8     |         |          |         |           |          |          |         |     |
| ADDD F6       | F8       | F2   | 13    | 14      | 16       |         |           |          |          |         |     |
| Functional u  | unit sta | atus |       |         | dest     | S1      | S2        | FU for j | FU for k | Fj?     | Fk? |
| Time          | Nam      | е    | Busy  | Ор      | Fi       | Fj      | Fk        | Qj       | Qk       | Rj      | Rk  |
|               | Integ    | er   | No    |         |          |         |           |          |          |         |     |
|               | Mult1    |      | No    |         |          |         |           |          |          |         |     |
|               | Mult2    | 2    | No    |         |          |         |           |          |          |         |     |
|               | Add      |      | Yes   | Add     | F6       | F8      | F2        |          |          | Yes     | Yes |
|               | Divid    | е    | Yes   | Div     | F10      | F0      | F6        |          |          | Yes     | Yes |
| Register res  | sult sta | atus |       |         |          |         |           |          |          |         |     |
| Clock         |          |      | F0    | F2      | F4       | F6      | F8        | F10      | F12      |         | F30 |
| 20            |          | FU   |       |         | •        | Add     |           | Divide   |          |         |     |

| Instruction s | tatus    |             |       | Read   | Execut   | ic Write | •          | DIVD loads operands |          |     |     |  |
|---------------|----------|-------------|-------|--------|----------|----------|------------|---------------------|----------|-----|-----|--|
| Instruction   | j        | k           | Issue | operan | d comple | t Resu   | <u>l</u> t |                     |          |     |     |  |
| LD F6         | 34+      | R2          | 1     | 2      | 3        | 4        |            |                     |          |     |     |  |
| LD F2         | 45+      | R3          | 5     | 6      | 7        | 8        |            |                     |          |     |     |  |
| MULT F0       | F2       | F4          | 6     | 9      | 19       | 20       |            |                     |          |     |     |  |
| SUBD F8       | F6       | F2          | 7     | 9      | 11       | 12       |            |                     |          |     |     |  |
| DIVD F10      | F0       | F6          | 8     | 21     |          |          |            |                     |          |     |     |  |
| ADDD F6       | F8       | F2          | 13    | 14     | 16       |          |            |                     |          |     |     |  |
| Functional u  | ınit sta | atus        |       |        | dest     | S1       | S2         | FU for j            | FU for k | Fj? | Fk? |  |
| Time          | Name     | е           | Busy  | Op     | Fi       | Fj       | Fk         | Qj                  | Qk       | Rj  | Rk  |  |
|               | Integ    | er          | No    |        |          | _        |            | _                   |          | _   |     |  |
|               | Mult1    |             | No    |        |          |          |            |                     |          |     |     |  |
|               | Mult2    | <u>-</u>    | No    |        |          |          |            |                     |          |     |     |  |
|               | Add      |             | Yes   | Add    | F6       | F8       | F2         |                     |          | Yes | Yes |  |
|               | Divid    | е           | Yes   | Div    | F10      | F0       | F6         |                     |          | Yes | Yes |  |
| Register res  | ult sta  | <u>atus</u> |       |        |          |          |            |                     |          |     |     |  |
| Clock         |          |             | F0    | F2     | F4       | F6       | F8         | F10                 | F12      |     | F30 |  |
| 21            |          | FU          |       |        |          | Add      |            | Divide              |          |     |     |  |





| Instruction  | <u>status</u> | _    |       | Read   | Execut   | ic Write | <b>)</b>    | DONE!!   |          |     |     |  |
|--------------|---------------|------|-------|--------|----------|----------|-------------|----------|----------|-----|-----|--|
| Instruction  | j             | k    | Issue | operan | d comple | t Resu   | <u>ıl</u> t |          |          |     |     |  |
| LD F6        | 34+           | R2   | 1     | 2      | 3        | 4        |             |          |          |     |     |  |
| LD F2        | 45+           | R3   | 5     | 6      | 7        | 8        |             |          |          |     |     |  |
| MULT F0      | F2            | F4   | 6     | 9      | 19       | 20       |             |          |          |     |     |  |
| SUBD F8      | F6            | F2   | 7     | 9      | 11       | 12       |             |          |          |     |     |  |
| DIVD F10     | F0            | F6   | 8     | 21     | 61       | 62       |             |          |          |     |     |  |
| ADDD F6      | F8            | F2   | 13    | 14     | 16       | 22       |             |          |          |     |     |  |
| Functional ( | unit sta      | atus | "     |        | dest     | S1       | S2          | FU for j | FU for k | Fj? | Fk? |  |
| Time         | Nam           | е    | Busy  | Ор     | Fi       | Fj       | Fk          | Qj       | Qk       | Rj  | Rk  |  |
|              | Integ         | jer  | No    |        |          |          |             |          |          |     |     |  |
|              | Mult          | 1    | No    |        |          |          |             |          |          |     |     |  |
|              | Mult2         | 2    | No    |        |          |          |             |          |          |     |     |  |
|              | Add           |      | No    |        |          |          |             |          |          |     |     |  |
| 0            | Divid         | le   | No    |        |          |          |             |          |          |     |     |  |
| Register res | sult sta      | atus | "     |        |          |          |             |          |          |     | ,   |  |
| Clock        |               |      | F0    | F2     | F4       | F6       | F8          | F10      | F12      |     | F30 |  |
| 62           |               | FU   |       |        |          |          |             |          |          |     |     |  |

#### Tomasulo Algorithm

#### Register renaming provided

- by reservation stations, which buffer the operands of instructions waiting to issue
- by the issue logic

#### · Basic idea:

- a reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from a register (WAR)
- pending instructions designate the reservation station that will provide their input (RAW)
- when successive writes to a register overlap in execution, only the last one is actually used to update the register (WAW)

As instructions are issued, the register specifiers for pending operands are renamed to the names of the reservation station, which provides register renaming

more reservation stations than real registers

#### Properties of Tomasulo Algorithm

- 1. Control & buffers distributed with Function Units (FU)
  - Hazard detection and execution control are distributed
  - FU buffers called "reservation stations"; have pending operands
  - Registers in instructions replaced by values or pointers to reservation stations(RS)
    - » form of register renaming to avoids WAR, WAW hazards
- 2. Bypassing: Results passed directly to FU from RS, not through registers, over <u>Common Data Bus</u>
  - that broadcasts results to all FUs, so allows all units waiting for an operand to be loaded simultaneously
- Load and Stores treated as FUs with RSs as well
- Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue

# Figure 3.2 Basic structure of a MIPS floating-point unit using Tomasulo's algorithm

#### Load buffers:

- 1. hold components of the effected addr
- track outstanding loads that are waiting on the memory
- 3. hold the results of completed loads that are waiting for the CDB

#### Store buffers:

- 1. hold components of the effected addr
- 2. hold the destination memory addresses of outstanding stores that are waiting for the data value to store
- 3. hold the addr and value to store until the memory unit is available



From instruction unit

#### Three Stages of Tomasulo Algorithm

- 1. Issue—get instruction from the head of the instruction queue
  - If reservation station free (no structural hazard), control issues instr with the operand values (renames registers).
  - No free RS => there is a structural hazard
  - If the operands are not in the registers, keep track of FU
    - » This step renames registers, eliminating WAR and WAW hazards
- 2. Execute—operate on operands (EX)
  - When both operands ready (placed into RS), then execute; if not ready, monitor Common Data Bus for result
  - By delaying EX until the operands are available, RAW hazards are avoided
- 3. Write result—finish execution (WB)

Write on Common Data Bus to the registers and the RS of all awaiting units; mark reservation station available

- Normal data bus: data + destination ("go to" bus)
- Common data bus: data + source ("come from" bus)
  - 64 bits of data + 4 bits of Functional Unit source address
  - Write if matches expected Functional Unit (produces result)
  - Does the broadcast

#### 7 Components of Reservation Station

- Op: Operation to perform in the unit (e.g., + or -)
- Qj, Qk: Reservation stations producing the corresponding source operand
  - Note: Qj,Qk=0 => ready or unnessary
  - Store buffers only have Qi for RS producing result
- Vj, Vk: Value of Source operands
  - Only one of V field or the Q field is valid
  - Store buffers has V field, result to be stored

A: used to hold information for the memory address calculation for a load or a store

Busy: Indicates reservation station or FU is busy

Register result status Qi—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

#### Tomasulo Example







Note: Can have multiple loads outstanding



- Note: registers names are removed ("renamed") in Reservation Stations; MULT issued
- Load1 completing; what is waiting for Load1?



Load2 completing; what is waiting for Load2?



· Timer starts down for Add1, Mult1



Issue ADDD here despite name dependency on F6?

```
Instruction status:
                                 Exec Write
   Instruction
                           Issue Comp Result
                                                         Busy Address
                       k
   LD
                 34+
                      R2
                                                   Load1
                                                           No
            F6
                                   3
                                         4
                             1
   LD
            F2
                 45+
                      R3
                                   4
                                         5
                                                   Load2
                                                           No
   MULTD
            F0
                 F2
                      F4
                             3
                                                   Load3
                                                           No
   SUBD
                 F6
                             4
            F8
                      F2
   DIVD
                 F0
                             5
           F10
                      F6
   ADDD
                             6
            F6
                 F8
                      F2
Reservation Stations:
                                  SI
                                              RS
                                        S2
                                                    RS
           Time Name Busy
                                        Vk
                                               Oj
                                                    Ok
                      Yes SUBD M(A1) M(A2)
              0 Add1
                Add2
                      Yes ADDD
                                       M(A2) Add1
                Add3
                      No
              8 Mult1
                      Yes MULTD M(A2) R(F4)
               Mult2 | Yes DIVD
                                       M(A1) Mult1
Register result status:
   Clock
                            F0
                                  F2
                                        F4
                                              F6
                                                    F8
                                                         F10
                                                                 F12
                                                                             F30
                                 M(A2)
                                              Add2
                           Mult1
                                                    Add1
                                                          Mult2
      7
                      FU
```

· Add1 (SUBD) completing; what is waiting for it?

```
Instruction status:
                                Exec Write
   Instruction
                          Issue Comp Result
                                                        Busy Address
                       k
   LD
                34+
                      R2
                                   3
                                         4
                                                  Load1
                                                          No
            F6
                            1
   LD
            F2
                45+
                      R3
                                   4
                                         5
                                                  Load2
                                                          No
   MULTD
                                                  Load3
           F0
                 F2
                      F4
                            3
                                                          No
   SUBD
                      F2
                            4
                 F6
                                   7
                                         8
            F8
   DIVD
           F10
                 F0
                      F6
                            5
   ADDD
                 F8
                      F2
                            6
           F6
Reservation Stations:
                                  SI
                                        S2
                                              RS
                                                    RS
                                        Vk
          Time Name Busy
                           Op
                                                    Qk
               Add1
                      No
              2 Add2
                      Yes ADDD (M-M) M(A2)
               Add3
                      No
              7 Mult1
                      Yes MULTD M(A2) R(F4)
               Mult2 | Yes DIVD
                                      M(A1) Mult1
Register result status:
   Clock
                                                                F12
                           F0
                                 F2
                                       F4
                                             F6
                                                   F8 F10
                                                                           F30
                          Mult1 M(A2)
                                             Add2
      8
                      FU
                                                         Mult2
```

```
Instruction status:
                                Exec Write
   Instruction
                          Issue Comp Result
                                                        Busy Address
                       k
   LD
                34+
                      R2
                                   3
                                         4
                                                  Load1
                                                          No
            F6
                            1
   LD
            F2
                45+
                      R3
                                   4
                                         5
                                                  Load2
                                                          No
   MULTD
                                                  Load3
           F0
                 F2
                      F4
                            3
                                                          No
   SUBD
                      F2
                            4
                 F6
                                   7
                                         8
            F8
   DIVD
           F10
                 F0
                      F6
                            5
   ADDD
                      F2
                            6
                 F8
           F6
Reservation Stations:
                                  SI
                                        S2
                                              RS
                                                    RS
                                  V_i
                                        Vk
                                              Oj
          Time Name Busy Op
                                                    Qk
               Add1
                      No
              1 Add2
                      Yes ADDD (M-M) M(A2)
               Add3
                      No
              6 Mult1
                     Yes MULTD M(A2) R(F4)
               Mult2
                     Yes DIVD
                                      M(A1) Mult1
Register result status:
                                                        F10
                                                                F12
   Clock
                           F0
                                 F2
                                       F4
                                             F6
                                                   F8
                                                                           F30
                                M(A2)
                                             Add2 (M-M) Mult2
      9
                      FU
                          Mult1
```

```
Instruction status:
                                 Exec Write
   Instruction
                           Issue Comp Result
                                                         Busy Address
                       k
   LD
                 34+
                      R2
                                                   Load1
                                                           No
            F6
                                   3
                                         4
                             1
   LD
            F2
                 45+
                      R3
                                                   Load2
                                   4
                                         5
                                                           No
   MULTD
            F0
                 F2
                      F4
                             3
                                                   Load3
                                                           No
   SUBD
                 F6
                             4
                                   7
                                         8
            F8
                      F2
   DIVD
                 F0
                             5
           F10
                      F6
   ADDD
                 F8
            F6
                      F2
                                   10
Reservation Stations:
                                  SI
                                              RS
                                        S2
                                                     RS
                                         Vk
                                                     Qk
           Time Name Busy
                Add1
                      No
              0 Add2
                      Yes ADDD (M-M) M(A2)
                Add3
                      No
              5 Mult1
                      Yes MULTD M(A2) R(F4)
               Mult2
                      Yes DIVD
                                       M(A1) Mult1
Register result status:
   Clock
                            F0
                                  F2
                                        F4
                                              F6
                                                    F8
                                                          F10
                                                                 F12
                                                                             F30
                                 M(A2)
     10
                           Mult1
                                              Add2 (M-M)
                                                         Mult2
                      FU
```

Add2 (ADDD) completing; what is waiting for it?



- · Write result of ADDD here?
- · All quick instructions complete in this cycle!

```
Instruction status:
                                Exec Write
   Instruction
                          Issue Comp Result
                                                         Busy Address
                       k
   LD
                34 +
                      R2
                                   3
                                         4
                                                   Load1
                                                          No
            F6
                            1
   LD
           F2
                45+
                      R3
                                   4
                                         5
                                                   Load2
                                                          No
   MULTD
                            3
                                                   Load3
           F0
                 F2
                      F4
                                                          No
   SUBD
                      F2
                            4
                                   7
                 F6
                                         8
            F8
   DIVD
           F10
                 F0
                      F6
                            5
   ADDD
                 F8
                      F2
                            6
                                  10
                                        11
           F6
Reservation Stations:
                                  SI
                                        S2
                                              RS
                                                    RS
                                  V_i
                                        Vk
                                              Oj
          Time Name Busy
                           Op
                                                    Qk
               Add1
                      No
               Add2
                      No
               Add3
                      No
              3 Mult1
                      Yes MULTD M(A2) R(F4)
                     Yes DIVD
               Mult2
                                       M(A1) Mult1
Register result status:
   Clock
                                                         F10
                                                                F12
                           F0
                                  F2
                                       F4
                                             F6
                                                    F8
                                                                           F30
                                M(A2)
     12
                      FU
                          Mult1
                                           (M-M+M(M-M) Mult2
```

```
Instruction status:
                                Exec Write
   Instruction
                          Issue Comp Result
                                                         Busy Address
                       k
   LD
                 34+
                      R2
                                   3
                                         4
                                                   Load1
                                                          No
            F6
                            1
   LD
           F2
                45+
                      R3
                                   4
                                         5
                                                   Load2
                                                          No
   MULTD
                            3
                                                   Load3
           F0
                 F2
                      F4
                                                          No
   SUBD
                      F2
                                   7
                 F6
                            4
                                         8
            F8
   DIVD
           F10
                 F0
                      F6
                            5
   ADDD
                 F8
                      F2
                            6
                                  10
           F6
                                        11
Reservation Stations:
                                  SI
                                        S2
                                              RS
                                                    RS
                                  V_i
                                        Vk
                                              Oj
          Time Name Busy
                           Op
                                                    Qk
               Add1
                      No
               Add2
                      No
               Add3
                      No
              2 Mult1
                      Yes MULTD M(A2) R(F4)
                     Yes DIVD
               Mult2
                                       M(A1) Mult1
Register result status:
   Clock
                                                         F10
                                                                F12
                           F0
                                 F2
                                       F4
                                              F6
                                                   F8
                                                                           F30
                                M(A2)
     13
                      FU
                          Mult1
                                           (M-M+M(M-M) Mult2
```

```
Instruction status:
                                Exec Write
   Instruction
                          Issue Comp Result
                                                         Busy Address
                       k
   LD
                 34+
                      R2
                                   3
                                         4
                                                   Load1
                                                          No
            F6
                            1
   LD
           F2
                45+
                      R3
                                   4
                                         5
                                                   Load2
                                                          No
   MULTD
                            3
                                                   Load3
           F0
                 F2
                      F4
                                                          No
   SUBD
                      F2
                                   7
                 F6
                            4
                                         8
            F8
   DIVD
           F10
                 F0
                      F6
                            5
   ADDD
                 F8
                      F2
                            6
                                  10
                                        11
           F6
Reservation Stations:
                                  SI
                                        S2
                                              RS
                                                    RS
                                  V_i
                                        Vk
                                              Oj
          Time Name Busy
                           Op
                                                    Qk
               Add1
                      No
               Add2
                      No
               Add3
                      No
              1 Mult1
                      Yes MULTD M(A2) R(F4)
                     Yes DIVD
               Mult2
                                       M(A1) Mult1
Register result status:
   Clock
                                                         F10
                                                                F12
                           F0
                                 F2
                                       F4
                                              F6
                                                    F8
                                                                           F30
                                M(A2)
     14
                      FU
                          Mult1
                                           (M-M+M(M-M) Mult2
```

```
Instruction status:
                                 Exec Write
   Instruction
                           Issue Comp Result
                                                          Busy Address
                       k
   LD
                 34+
                       R2
                                                    Load1
                                                           No
            F6
                                    3
                                          4
                             1
   LD
            F2
                 45+
                       R3
                                                    Load2
                                    4
                                          5
                                                            No
   MULTD
            F0
                 F2
                       F4
                             3
                                   15
                                                    Load3
                                                           No
                                    7
   SUBD
                 F6
                             4
                                          8
            F8
                       F2
   DIVD
                 F0
                             5
           F10
                       F6
   ADDD
                             6
                                         11
            F6
                 F8
                       F2
                                   10
Reservation Stations:
                                   SI
                                               RS
                                         S2
                                                     RS
                                         Vk
           Time Name Busy
                            Op
                                   V_i
                                               Oj
                                                     Qk
                Add1
                       No
                Add2
                       No
                Add3
                       No
              0 Mult1
                      Yes MULTD M(A2) R(F4)
                Mult2
                      Yes DIVD
                                        M(A1) Mult1
Register result status:
   Clock
                            F0
                                  F2
                                         F4
                                               F6
                                                     F8
                                                          F10
                                                                  F12
                                                                              F30
                                 M(A2)
     15
                      FU
                           Mult1
                                             (M-M+N.(M-M))
                                                          Mult2
```

· Mult1 (MULTD) completing; what is waiting for it?

```
Instruction status:
                                 Exec Write
   Instruction
                           Issue Comp Result
                                                          Busy Address
                       k
   LD
                 34+
                      R2
                                                   Load1
                                                           No
            F6
                                   3
                                          4
                             1
   LD
            F2
                 45+
                      R3
                                   4
                                          5
                                                   Load2
                                                           No
                                   15
                                                   Load3
   MULTD
            F0
                 F2
                      F4
                             3
                                         16
                                                           No
                      F2
                                   7
   SUBD
                 F6
                             4
                                          8
            F8
   DIVD
                 F0
                      F6
           F10
                             5
   ADDD
                 F8
                      F2
                             6
                                         11
            F6
                                   10
Reservation Stations:
                                  SI
                                         S2
                                               RS
                                                     RS
                                         Vk
          Time Name Busy
                            Op
                                               Oj
                                                     Qk
                Add1
                      No
                Add2
                      No
                Add3
                      No
                Mult1
                      No
                          DIVD M*F4 M(A1)
             40 Mult2 Yes
Register result status:
   Clock
                            F0
                                  F2
                                        F4
                                              F6
                                                    F8
                                                          F10
                                                                 F12
                                                                             F30
                                                          Mult2
     16
                      FU
                                 M(A2)
                                            (M-M+N.(M-M))
```

Just waiting for Mult2 (DIVD) to complete

```
Instruction status:
                                Exec Write
                          Issue Comp Result
   Instruction
                                                         Busy Address
                       k
   LD
                34 +
                      R2
                                   3
                                         4
                                                   Load1
                                                          No
            F6
                            1
   LD
           F2
                45+
                      R3
                                   4
                                         5
                                                   Load2
                                                          No
   MULTD
                            3
                                  15
                                                   Load3
           F0
                 F2
                      F4
                                        16
                                                          No
   SUBD
                      F2
                            4
                                   7
                 F6
                                         8
            F8
   DIVD
           F10
                 F0
                      F6
                            5
   ADDD
                 F8
                      F2
                            6
                                  10
                                        11
           F6
Reservation Stations:
                                  SI
                                        S2
                                              RS
                                                    RS
                                  V_i
                                        Vk
                                              Oj
          Time Name Busy
                           Op
                                                    Qk
               Add1
                      No
               Add2
                      No
               Add3
                      No
               Mult1
                      No
              1 Mult2 | Yes DIVD M*F4 M(A1)
Register result status:
   Clock
                                                         F10
                                                                F12
                           F0
                                  F2
                                       F4
                                              F6
                                                    F8
                                                                           F30
                           M*F4 M(A2)
     55
                      FU
                                           (M-M+M(M-M) Mult2
```

```
Instruction status:
                                 Exec Write
   Instruction
                           Issue Comp Result
                                                         Busy Address
                       k
   LD
                 34+
                      R2
                                                   Load1
                                                           No
            F6
                                   3
                                         4
                             1
   LD
            F2
                 45+
                      R3
                                                   Load2
                                   4
                                         5
                                                           No
                                   15
   MULTD
            F0
                 F2
                      F4
                             3
                                         16
                                                   Load3
                                                           No
                      F2
   SUBD
                 F6
                             4
                                   7
                                         8
            F8
   DIVD
                 F0
                                   56
           F10
                      F6
   ADDD
                                         11
            F6
                 F8
                      F2
                                   10
Reservation Stations:
                                              RS
                                  SI
                                        S2
                                                     RS
                                         Vk
          Time Name Busy
                            Op
                                               Oj
                                                     Qk
                Add1
                      No
                Add2
                      No
                Add3
                      No
               Mult1
                      No
              0 Mult2
                      Yes DIVD M*F4 M(A1)
Register result status:
   Clock
                            F0
                                  F2
                                        F4
                                              F6
                                                    F8
                                                          F10
                                                                 F12
                                                                             F30
                                 M(A2)
     56
                          M*F4
                                            (M-M+N.(M-M))
                                                          Mult2
                      FU
```

Mult2 (DIVD) is completing; what is waiting for it?



 Once again: In-order issue, out-of-order execution and out-of-order completion.

#### Tomasulo Drawbacks

- · Complexity
  - delays of 360/91, MIPS 10000, Alpha 21264, IBM PPC 620 in CA: AQA 2/e, but not in silicon!
- Many associative stores (CDB) at high speed
- · Performance limited by Common Data Bus
  - Each CDB must go to multiple functional units ⇒high capacitance, high wiring density
  - Number of functional units that can complete per cycle limited to one!
    - » Multiple CDBs  $\Rightarrow$  more FU logic for parallel assoc stores
- Non-precise interrupts!
  - We will address this later

#### Tomasulo Loop Example

| Loop:LD | FO | 0    | R1         |
|---------|----|------|------------|
| MULTD   | F4 | F0   | <b>F</b> 2 |
| SD      | F4 | 0    | R1         |
| SUBI    | R1 | R1   | #8         |
| BNEZ    | R1 | Loop | <b>)</b>   |

- This time assume Multiply takes 4 clocks
- Assume 1st load takes 8 clocks
   (L1 cache miss), 2nd load takes 1 clock (hit)
- · To be clear, will show clocks for SUBI, BNEZ
  - Reality: integer instructions ahead of Fl. Pt. Instructions
- · Show 2 iterations

#### Loop Example



Value of Register used for address, iteration control







· Implicit renaming sets up data flow graph

```
Instruction status:
                                     Exec Write
                                Issue CompResult
                                                                      Fu
  ITER Instruction
                                                        Busy Addr
                            k
                           R1
                                                         Yes
        LD
                 F0
                                                   Load1
                                                                80
        MULTD
                F4
                            F2
                                  2
                                                   Load2
                                                         No
        SD
                 F4
                           R1
                                  3
                                                   Load3
                                                         No
                       0
                                                   Storel Yes
                                                                80
                                                                     Mult1
                                                   Store2
                                                         No
                                                   Store3 No
Reservation Stations:
                                 SI
                                       S2
                                            RS
   Time Name Busy
                                 Vk
                                       Qj
                                                  Code:
                      Оp
                            Vi
                                            Ok
         Add1
                No
                                                 LD
                                                                      R1
                                                          F0
                                                                 0
         Add2
                                                 MULTD
                No
                                                                       F2
                                                          F4
                                                                F0
         Add3
                No
                                                 SD
                                                          F4
                                                                      R1
                                                                 0
         Mult1
                                                                R1
                                                                       #8
                Yes Multd
                                R(F2) Load1
                                                 SUBI
                                                          R1
         Mult2
                No
                                                 BNEZ
                                                          R1
                                                               Loop
Register result status
                           F2
                                                  F10 F12
                      F0
                                F4 F6 F8
                                                                     F30
 Clock
          R1
                Fu | Load1
          80
                                Mult1
     4
```

· Dispatching SUBI Instruction (not in FP queue)



· And, BNEZ instruction (not in FP queue)



· Notice that FO never sees Load from location 80

- · Register file completely detached from computation
- First and Second iteration completely overlapped

```
Instruction status:
                                       Exec Write
                                 Issue CompResult
                                                                         Fu
   ITER Instruction
                             k
                                                          Busy Addr
                            R1
        LD
                 F0
                       0
                                                    Load1
                                                           Yes
                                                                  80
                                   1
        MULTD
                 F4
                             F2
                                   2
                                                    Load2 Yes
                                                                  72
        SD
                 F4
                            R1
                                   3
                                                    Load3
                                                           No
                       0
        LD
                 F0
                                   6
                                                                  80
                                                                        Mult1
                       0
                            R1
                                                    Store 1
                                                           Yes
        MULTD
                             F2
                                   7
                                                           Yes
                 F4
                       F<sub>0</sub>
                                                    Store2
                                                                  72
                                                                        Mult2
        SD
                 F4
                       0
                             R1
                                   8
                                                     Store3 No
Reservation Stations:
                                  SI
                                        S2
                                             RS
                                  Vk
                                                    Code:
   Time Name Busy
                                        Oi
                                             Ok
                      Op
                             V_{i}
          Add1
                 No
                                                   LD
                                                            F0
                                                                         R1
                                                                   0
          Add2
                                                   MULTD
                 No
                                                            F4
                                                                         F2
                                                                  F0
          Add3
                 No
                                                   SD
                                                            F4
                                                                   0
                                                                         R1
         Mult1
                 Yes Multd
                                 R(F2) Load1
                                                                  R1
                                                                         #8
                                                   SUBI
                                                            R1
         Mult2
                 Yes Multd
                                 R(F2) Load2
                                                   BNEZ
                                                            R1
                                                                 Loop
Register result status
                            F2
                                 F4 F6 F8
                                                    F10 F12
                                                                        F30
                      F0
 Clock
           R1
                 Fu | Load2
     8
           72
                                 Mult2
```

```
Instruction status:
                                     Exec Write
                                                                     Fu
  ITER Instruction
                               Issue CompResult
                                                       Busy Addr
                           k
                           R1
        LD
                F0
                      0
                                                  Load1
                                                        Yes
                                                               80
        MULTD
                F4
                           F2
                                 2
                                                  Load2 Yes
                                                               72
        SD
                F4
                           R1
                                 3
                                                  Load3
                                                        No
                      0
        LD
                F0
                           R1
                                 6
                                                  Store 1 Yes
                                                               80
                                                                    Mult1
                      0
        MULTD
                                 7
                                                  Store2 Yes
                F4
                      F0
                           F2
                                                               72
                                                                    Mult2
        SD
                F4
                           R1
                                 8
                                                  Store3 No
                      0
Reservation Stations:
                                 SI
                                      S2
                                           RS
                                                 Code:
   Time Name Busy
                                 Vk
                                      Oi
                           Vi
                                           Ok
                     Оp
         Add1
                No
                                                LD
                                                                     R1
                                                         F0
                                                               0
         Add2
                                                MULTD
                No
                                                                     F2
                                                         F4
                                                               F0
         Add3
                No
                                                SD
                                                         F4
                                                               0
                                                                     R1
         Mult1
                Yes Multd
                               R(F2) Load1
                                                                     #8
                                                SUBI
                                                        R1
                                                               R1
         Mult2
                Yes Multd
                               R(F2) Load2
                                                BNEZ
                                                        R1
                                                              Loop
Register result status
                                                 F10 F12
                                                                    F30
                     F0
                          F2
                                F4 F6 F8
 Clock
          R1
                Fu Load2
     9
          72
                               Mult2
  Load1 completing: who is waiting?
  Note: Dispatching SUBI
```





Next load in sequence

```
Instruction status:
                                       Exec Write
                                                                        Fu
   ITER Instruction
                                 Issue CompResult
                                                          Busy Addr
                             k
                            R1
        LD
                 F0
                       0
                                        9
                                              10
                                                    Load1
        MULTD
                             F2
                                   2
                                                    Load2
                                                           No
        SD
                 F4
                            R1
                                                                  64
                       0
                                                    Load3 Yes
        LD
                 F0
                                        10
                                                                  80
                       0
                            R1
                                                    Store 1 Yes
                                                                       Mult1
                                              11
        MULTD
                                   7
                 F4
                            F2
                                                    Store2 Yes
                                                                  72
                                                                        Mult2
                       F<sub>0</sub>
        SD
                 F4
                            R1
                                   8
                                                    Store3 No
Reservation Stations:
                                  SI
                                        S2
                                             RS
   Time
        Name Busy
                                  Vk
                                        Oi
                                                   Code:
                             V_{i}
                                             Ok
                      Оp
          Add1
                 No
                                                  LD
                                                            F0
                                                                  0
                                                                         R1
          Add2
                 No
                                                  MULTD
                                                           F4
                                                                  F0
                                                                         F2
          Add3
                 No
                                                  SD
                                                            F4
                                                                  0
                                                                         R1
                 Yes Multd M[80] R(F2)
                                                                         #8
         Mult1
                                                   SUBI
                                                           R1
                                                                  R1
         Mult2
                 Yes Multd M[72] R(F2)
                                                  BNEZ
                                                           R1
                                                                 Loop
Register result status
                                                   F10 F12
                                                                        F30
                      F0
                            F2
                                 F4 F6 F8
 Clock
           R1
                 Fu Load3
    12
           64
                                 Mult2
```

· Why not issue third multiply?

```
Instruction status:
                                      Exec Write
                                                                        Fu
   ITER Instruction
                                Issue CompResult
                                                          Busy Addr
                             k
                            R1
        LD
                 F0
                       0
                                        9
                                             10
                                                    Load1
        MULTD
                            F2
                                   2
                                                    Load2
                                                           No
        SD
                 F4
                            R1
                                                    Load3 Yes
                                                                  64
                       0
        LD
                                        10
                                                                  80
                 F0
                            R1
                                                    Store 1 Yes
                                                                       Mult1
                       0
                                             11
        MULTD
                 F4
                            F2
                                   7
                                                    Store2 Yes
                                                                  72
                                                                       Mult2
                       F<sub>0</sub>
        SD
                 F4
                            R1
                                   8
                                                    Store3 No
                       0
Reservation Stations:
                                  SI
                                        S2
                                             RS
        Name Busy
                                  Vk
                                        Oi
                                                   Code:
   Time
                            Vi
                                             Ok
                      Оp
          Add1
                 No
                                                  LD
                                                           F0
                                                                  0
                                                                        R1
          Add2
                 No
                                                  MULTD
                                                           F4
                                                                  F0
                                                                        F2
          Add3
                 No
                                                  SD
                                                           F4
                                                                  0
                                                                        R1
                 Yes Multd M[80] R(F2)
                                                                        #8
         Mult1
                                                  SUBI
                                                           R1
                                                                 R1
         Mult2
                 Yes Multd M[72] R(F2)
                                                  BNEZ
                                                           R1
                                                                 Loop
Register result status
                                                   F10 F12
                                                                       F30
                      F0
                            F2
                                 F4 F6 F8
 Clock
           R1
                 Fu Load3
    13
           64
                                 Mult2
```

Why not issue third store?

```
Instruction status:
                                      Exec Write
                                                                       Fu
  ITER Instruction
                                Issue CompResult
                                                         Busy Addr
                            k
                            R1
        LD
                 F0
                       0
                                             10
                                                   Load1
                                                          No
        MULTD
                 F4
                            F2
                                  2
                                                   Load2
                                      14
                                                          No
                                  3
        SD
                 F4
                            R1
                                                                 64
                       0
                                                   Load3 Yes
        LD
                                  6
                                       10
                                                                 80
                 F0
                            R1
                                             11
                                                   Store 1 Yes
                                                                      Mult1
                       0
        MULTD
                 F4
                            F2
                                  7
                                                   Store2 Yes
                                                                 72
                                                                      Mult2
                       F0
        SD
                 F4
                            R1
                                  8
                                                   Store3 No
                       0
Reservation Stations:
                                  SI
                                       S2
                                            RS
        Name Busy
                                  Vk
                                       Oi
                                                  Code:
   Time
                            V_{i}
                                            Ok
                      Оp
         Add1
                No
                                                 LD
                                                                       R1
                                                          F0
                                                                 0
         Add2
                                                 MULTD
                 No
                                                          F4
                                                                 F0
                                                                        F2
         Add3
                No
                                                 SD
                                                          F4
                                                                 0
                                                                       R1
                Yes Multd M[80] R(F2)
                                                                        #8
         Mult1
                                                  SUBI
                                                          R1
                                                                 R1
     0
         Mult2
                Yes Multd M[72] R(F2)
                                                 BNEZ
                                                          R1
                                                                Loop
Register result status
                      F0
                                                   F10 F12
                                                                      F30
                           F2
                                 F4 F6 F8
 Clock
          R1
                Fu Load3
    14
          64
                                Mult2
```

Mult1 completing. Who is waiting?



Mult2 completing. Who is waiting?



| Instruction status: Exec Write |           |       |       |            |       |           |        |             |      |      |            |
|--------------------------------|-----------|-------|-------|------------|-------|-----------|--------|-------------|------|------|------------|
| ITER                           | Instructi | ion   | j     | k          | Issue | Comp      | Result |             | Busy | Addr | Fu         |
| 1                              | LD        | F0    | 0     | R1         | 1     | 9         | 10     | Load1       | No   |      |            |
| 1                              | MULTD     | F4    | F0    | F2         | 2     | 14        | 15     | Load2       | No   |      |            |
| 1                              | SD        | F4    | 0     | <b>R</b> 1 | 3     |           |        | Load3       | Yes  | 64   |            |
| 2                              | LD        | F0    | 0     | R1         | 6     | 10        | 11     | Store 1     | Yes  | 80   | [80]*R2    |
| 2                              | MULTD     | F4    | F0    | F2         | 7     | 15        | 16     | Store2      | Yes  | 72   | [72]*R2    |
| 2                              | SD        | F4    | 0     | R1         | 8     |           |        | Store3      | Yes  | 64   | Mult1      |
| Reservat                       | tion Stat | ions: |       |            | S1    | <i>S2</i> | RS     |             |      |      |            |
| Time                           | Name      | Busy  | Ор    | Vj         | Vk    | Qj        | Qk     | Code:       |      |      |            |
|                                | Add1      | No    |       |            |       |           |        | LD          | F0   | 0    | R1         |
|                                | Add2      | No    |       |            |       |           |        | MULTD       | F4   | F0   | F2         |
|                                | Add3      | No    |       |            |       |           |        | SD          | F4   | 0    | R1 🔷       |
|                                | Mult1     | Yes   | Multd |            | R(F2) | Load3     |        | <b>SUBI</b> | R1   | R1   | #8         |
|                                | Mult2     | No    |       |            |       |           |        | <b>BNEZ</b> | R1   | Loop |            |
| Register result status         |           |       |       |            |       |           |        |             |      |      |            |
| Clock                          | R1        |       | _F0_  | <i>F2</i>  | F4    | <i>F6</i> | F8     | F10         | F12  | •••  | <i>F30</i> |
| 17                             | 64        | Fu    | Load3 |            | Mult1 |           |        |             |      |      |            |

| Instructi              | on statu  | s:    |       |           |           | Exec      | Write  |             |            |      |            |
|------------------------|-----------|-------|-------|-----------|-----------|-----------|--------|-------------|------------|------|------------|
| ITER                   | Instructi | ion   | j     | k         | Issue     | Comp      | Result |             | Busy       | Addr | Fu         |
| 1                      | LD        | F0    | 0     | R1        | 1         | 9         | 10     | Load1       | No         |      | ]          |
| 1                      | MULTD     | F4    | F0    | F2        | 2         | 14        | 15     | Load2       | No         |      |            |
| 1                      | SD        | F4    | 0     | R1        | 3         | 18        |        | Load3       | Yes        | 64   |            |
| 2                      | LD        | F0    | 0     | R1        | 6         | 10        | 11     | Store 1     | Yes        | 80   | [80]*R2    |
| 2                      | MULTD     | F4    | F0    | F2        | 7         | 15        | 16     | Store2      | Yes        | 72   | [72]*R2    |
| 2                      | SD        | F4    | 0     | R1        | 8         |           |        | Store3      | Yes        | 64   | Mult1      |
| Reserva                | tion Stat | ions: |       |           | S1        | <i>S2</i> | RS     |             |            |      |            |
| Time                   | Name      | Busy  | Ор    | Vj        | Vk        | Qj        | Qk     | Code:       |            |      |            |
|                        | Add1      | No    |       |           |           |           |        | LD          | F0         | 0    | R1         |
|                        | Add2      | No    |       |           |           |           |        | MULTD       | F4         | F0   | F2         |
|                        | Add3      | No    |       |           |           |           |        | SD          | F4         | 0    | R1         |
|                        | Mult1     | Yes   | Multd |           | R(F2)     | Load3     |        | SUBI        | <b>R</b> 1 | R1   | #8         |
|                        | Mult2     | No    |       |           |           |           |        | <b>BNEZ</b> | R1         | Loop |            |
| Register result status |           |       |       |           |           |           |        |             |            |      |            |
| Clock                  | R1        |       | _F0_  | <i>F2</i> | <i>F4</i> | <i>F6</i> | F8     | F10         | F12        | •••  | <i>F30</i> |
| 18                     | 64        | Fu    | Load3 | <u> </u>  | Mult1     |           |        |             |            |      |            |





Once again: In-order issue, out-of-order execution and out-of-order completion.

#### Why can Tomasulo overlap iterations of loops?

#### Register renaming

- Multiple iterations use different physical destinations for registers (dynamic loop unrolling).

#### Reservation stations

- Permit instruction issue to advance past integer control flow operations
- Also buffer old values of registers totally avoiding the WAR stall that we saw in the scoreboard.
- Other perspective: Tomasulo building data flow dependency graph on the fly.

#### Tomasulo's scheme offers 2 major advantages

- (1) the distribution of the hazard detection logic
  - distributed reservation stations and the CDB
  - If multiple instructions waiting on single result, & each instruction has other operand, then instructions can be released simultaneously by broadcast on CDB
  - If a centralized register file were used, the units would have to read their results from the registers when register buses are available.
- (2) the elimination of stalls for WAW and WAR hazards of scoreboard

## What you have learned

- Pipelining
- Hazards problems
- Dynamic Scheduling using a ScoreBoard
- Tomasulo Algorithm