# Supporting Loads



# R+I Arithmetic/Logic Datapath









#### Add lw

RISC-V Assembly Instruction (I-type): 1w x14, 8(x2)



- The 12-bit signed immediate is added to the base address in register rs1 to form the memory address
  - This is very similar to the add-immediate operation but used to create address not to create final result
- The value loaded from memory is stored in register rd







# R+I Arithmetic/Logic Datapath









# R+I Arithmetic/Logic Datapath









#### **All RV32 Load Instructions**

| imm[11:0] | rs1 | 000 | rd | 0000011 | 1b  |
|-----------|-----|-----|----|---------|-----|
| imm[11:0] | rs1 | 001 | rd | 0000011 | lh  |
| imm[11:0] | rs1 | 010 | rd | 0000011 | lw  |
| imm[11:0] | rs1 | 100 | rd | 0000011 | 1bu |
| imm[11:0] | rs1 | 101 | rd | 0000011 | 1hu |

funct3 field encodes size and 'signedness' of load data

- Supporting the narrower loads requires additional logic to extract the correct byte/halfword from the value loaded from memory, and sign- or zero-extend the result to 32 bits before writing back to register file.
  - It is just a mux + a few gates





# Datapath for Stores



#### Adding sw instruction

■ **sw**: Reads two registers, rs1 for base memory address, and rs2 for data to be stored, as well immediate offset!

sw x14, 8(x2)



**RISC-V (43)** 



# Datapath with 1w









#### Adding sw to Datapath









# Adding sw to Datapath









#### **I+S Immediate Generation**



- Just need a 5-bit mux to select between two positions where low five bits of immediate can reside in instruction
- Other bits in immediate are wired to fixed positions in instruction







#### **All RV32 Store Instructions**

- Store byte writes the low byte to memory
- Store halfword writes the lower two bytes to memory

| Imm[11:5] | rs2 | rs1 | 000 | imm[4:0] | 0100011 | sb |
|-----------|-----|-----|-----|----------|---------|----|
| Imm[11:5] | rs2 | rs1 | 001 | imm[4:0] | 0100011 | sh |
| Imm[11:5] | rs2 | rs1 | 010 | imm[4:0] | 0100011 | SW |

width





# Implementing Branches



#### **RISC-V B-Format for Branches**



- B-format is mostly same as S-Format, with two register sources (rs1/rs2) and a 12-bit immediate imm[12:1]
- But now immediate represents values
   -4096 to +4094 in 2-byte increments
- The 12 immediate bits encode even 13-bit signed byte offsets (lowest bit of offset is always zero, so no need to store it)







# Datapath So Far









# **To Add Branches**

Different change to the state:

- Six branch instructions: beq, bne, blt, bge,
   bltu, bgeu
- Need to compute PC + immediate and to compare values of rs1 and rs2
  - But have only one ALU need more hardware







#### **Adding Branches**









#### **Branch Comparator**



$$BrEq = 1$$
, if  $A=B$ 

**BrUn** = 1 selects unsigned comparison for **BrLT**, 0=signed

$$A < B = !(A < B)$$







#### **Branch Immediates (In Other ISAs)**

12-bit immediate encodes PC-relative offset of -4096 to +4094 bytes in multiples of 2 bytes

Standard approach: Treat immediate as in range -2048..+2047, then shift left by 1 bit to multiply by 2 for branches



Each instruction immediate bit can appear in one of two places in output immediate value – so need one 2-way mux per bit







#### **Branch Immediates (In Other ISAs)**

12-bit immediate encodes PC-relative offset of -4096 to +4094 bytes in multiples of 2 bytes

RISC-V approach: keep 11 immediate bits in fixed position in output value

| sign=imm[11] |         | imm[10:5] | imm[4:0]   | S-Immediate                   |  |
|--------------|---------|-----------|------------|-------------------------------|--|
| sign=imm[12] | imm[11] | imm[10:5] | imm[4:1] 0 | B-Immediate (shift left by 1) |  |

Only one bit changes position between S and B, so only need a single-bit 2-way mux







#### **RISC-V Immediate Encoding**

|     |         |      |     |      | nstruction e | encodings, ins | t[31:0]   |           |        |
|-----|---------|------|-----|------|--------------|----------------|-----------|-----------|--------|
| 31  | 30      | 25   | 24  | 20 1 | 19 15        | 14 1           | 2 11 8    | 7 6       | 0      |
|     | im      | m[11 | :0] |      | rs1          | funct3         | rd        | opcode    | l-type |
| imn | n[11:5] | ]    | rs2 |      | rs1          | funct3         | imm[4:0]  | opcode    | S-typ  |
| imm | [12 10  | :5]  | rs2 |      | rs1          | funct3         | imm[4:1 1 | 1] opcode | B-typ  |







Upper bits sign-extended from inst[31] always

Only bit 7 of instruction changes role in immediate between S and B

**RISC-V (57)** 





# **Lighting Up Branch Path**







# Adding JALR to Datapath



#### Let's Add JALR (I-Format)

| 31 2         | 0 19 | 15 14 | 12    | 11 7 | 6 0    |
|--------------|------|-------|-------|------|--------|
| imm[11:0]    | rs1  | f     | func3 | rd   | opcode |
| 12           | 5    |       | 3     | 5    | 7      |
| offset[11:0] | bas  | e     | 0     | dest | JALR   |

- JALR rd, rs, immediate
- Two changes to the state
  - Writes PC+4 to rd (return address)
  - Dets PC = rs1 + immediate
  - Uses same immediates as arithmetic and loads
    - no multiplication by 2 bytes
    - LSB is ignored





#### Datapath So Far, With Branches









# Datapath With JALR









# Datapath With JALR







# Adding JAL



#### J-Format for Jump Instructions



- Two changes to the state
  - jal saves PC+4 in register rd (the return address)
  - Set PC = PC + offset (PC-relative jump)
- Target somewhere within ±2<sup>19</sup> locations, 2 bytes apart
  - □ ±2<sup>18</sup> 32-bit instructions
- Immediate encoding optimized similarly to branch instruction to reduce hardware cost







# Datapath with JAL









# Light Up JAL Path







# Adding U-Types



#### **U-Format for "Upper Immediate" Instructions**



- Has 20-bit immediate in upper 20 bits of 32-bit instruction word
- One destination register, rd
- Used for two instructions
  - lui Load Upper Immediate
  - auipc Add Upper Immediate to PC







# Datapath With LUI, AUIPC









# Lighting Up LUI









#### Lighting Up AUIPC







# "And In Conclusion..."



# Complete RV32I Datapath!









# Complete RV32I ISA!

| Open 尽 |  | RISC-V | Reference Card |
|--------|--|--------|----------------|
|--------|--|--------|----------------|

|                           | Bas | e Integer Instructions: | RV32I                |     |                  |
|---------------------------|-----|-------------------------|----------------------|-----|------------------|
| Category Name             | Fmt | RV32I Base              | Category Name        | Fmt | RV32I Base       |
|                           |     |                         |                      |     |                  |
| Shifts Shift Left Logical | R   | SLL rd,rs1,rs2          | Loads Load Byte      | 1   | LB rd,rs1,imm    |
| Shift Left Log. Imm.      | ı   | SLLI rd,rs1,shamt       | Load Halfword        | ı   | LH rd,rs1,imm    |
| Shift Right Logical       | R   | SRL rd,rs1,rs2          | Load Byte Unsigned   | ı   | LBU rd,rs1,imm   |
| Shift Right Log. Imm.     | - 1 | SRLI rd,rs1,shamt       | Load Half Unsigned   | - 1 | LHU rd,rs1,imm   |
| Shift Right Arithmetic    | R   | SRA rd,rs1,rs2          | Load Word            | ı   | LW rd,rs1,imm    |
| Shift Right Arith. Imm.   | ı   | SRAI rd,rs1,shamt       | Stores Store Byte    | s   | SB rs1,rs2,imm   |
| Arithmetic ADD            | R   | ADD rd,rs1,rs2          | Store Halfword       | s   | SH rs1,rs2,imm   |
| ADD Immediate             | ı   | ADDI rd,rs1,imm         | Store Word           | s   | SW rs1,rs2,imm   |
| SUBtract                  | R   | SUB rd,rs1,rs2          | Branches Branch =    | В   | BEQ rs1,rs2,imm  |
| Load Upper Imm            | U   | LUI rd,imm              | Branch ≠             | В   | BNE rs1,rs2,imm  |
| Add Upper Imm to PC       | U   | AUIPC rd,imm            | Branch <             | В   | BLT rs1,rs2,imm  |
| Logical XOR               | R   | XOR rd,rs1,rs2          | Branch ≥             | В   | BGE rs1,rs2,imm  |
| XOR Immediate             | - 1 | XORI rd,rs1,imm         | Branch < Unsigned    | В   | BLTU rs1,rs2,imm |
| OR                        | R   | OR rd,rs1,rs2           | Branch ≥ Unsigned    | В   | BGEU rs1,rs2,imm |
| OR Immediate              | - 1 | ORI rd,rs1,imm          | Jump & Link J&L      | J   | JAL rd,imm       |
| AND                       | R   | AND rd,rs1,rs2          | Jump & Link Register | - 1 | JALR rd,rs1,imm  |
| AND Immediate             | - 1 | ANDI rd,rs1,imm         |                      |     |                  |
| Compare Set <             | R   | SLT rd,rs1,rs2          | Synch Synch thread   | 1   | FENCE            |
| Set < Immediate           |     | SLTI rd,rs1,imm         |                      |     | Not in           |
| Set < Unsigned            | R   | SLTU rd,rs1,rs2         | Environment CALL     | - 1 | ECALL 61C        |
| Set < Imm Unsigned        | - 1 | SLTIU rd,rs1,imm        | BREAK                | - 1 | EBREAK OIC       |







#### Review

- We have designed a complete datapath
  - Capable of executing all RISC-V instructions in one cycle each
  - Not all units (hardware) used by all instructions
- 5 Phases of execution
  - IF, ID, EX, MEM, WB
  - Not all instructions are active in all phases
- Controller specifies how to execute instructions
  - We still need to design it



