

# 计算机组成与设计

# Computer Organization & Design The Hardware/Software Interface

Chapter 2

Instructions: Language of the Machine

Ma De 马德

College of Computer Science and Technology Zhejiang University made@zju.edu.cn

### The process of compiling



- Machine language(机器语言)
  - Computers only understands electrical signalson/off
  - Binary numbers express machine instructions
    - ex. 1000110010100000 means to add two numbers
- □ Assembly language(汇编语言)
  - Symbolic notations ex. add A, B
  - The assembler translates them into machine instruction
- □ High-level programming language(高级编程语言)
  - Notations more closer to the natural language ex. A + B
  - The compiler translates into assembly language
  - Advantages over assembly language
    - □ Think in a more natural language
    - Programs can be independent of hardware

High-level language program (in C)

Assembly language program (for MIPS)

Binary machine

language

program

(for MIPS)

temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; } C compiler swap: muli \$2, \$5,4 add \$2, \$4,\$2 lw \$15, 0(\$2) lw \$16, 4(\$2)

swap(int v∏, int k)

{int temp;



ir \$31

\$16, 0(\$2) \$15, 4(\$2)

浙江大学系统结构与网络安全研究所

# **Outline**



- **■** Introduction
- □ Operations of the computer hardware (计算机硬件的操作)
- □ Operands of the computer hardware(计算机硬件的操作数)
- □ Signed and unsigned numbers (有符号和无符号数)
- □ Representing instructions in the computer(计算机中指令的表示)
- □ Logical operations(逻辑操作)
- □ Instructions for making decision(决策指令)
- □ Supporting procedures in computer hardware(计算机对过程的支持)
- □ Instruction addressing (指令的寻址)



### 2.1 Introduction



- **□** Language of the machine
  - Instructions → Statement
  - Instruction set  $\rightarrow$  Syntax
- Design goals
  - Maximize performance
  - Minimize cost
  - Reduce design time
- **□** Our chosen instruction set: RISC-V
  - Developed at Berkeley starting in 2010



# The RISC-V Instruction Set



- □ Used as the example throughout the book
- **□** Developed at UC Berkeley as open ISA
- □ Now managed by the RISC-V Foundation (<u>riscv.org</u>)
- **□** Typical of many modern ISAs
  - See RISC-V Reference Data tear-out card
- □ Similar ISAs have a large share of embedded core market
  - Applications in consumer electronics, network/storage equipment, cameras, printers, ...



# Instruction characteristics wide variety



| Operators | wide variety |  |
|-----------|--------------|--|
| Ор        | Operands     |  |

- **■** Type of internal storage in processor
- **□** The number of the memory operand In the instruction
- Operations in the instruction Set
- **□** Type and Size of Operands
- **□** Representing Instructions in the Computer
  - Encoding





系统结构与网络安全研究所

# **Stored-program concept**



- □ Today's computers are built on 2 key principles: (Stored-program concept)
  - ①Instruction are represented as numbers.
  - ②Programs can be stored in memory to be read or written just like numbers.

Von Neumann' Computer



# 2.2 Operations

# of the Computer Hardware



- Every computer must be able to perform arithmetic
  - Only one operation per instruction
  - Exactly three variables add a,b,c a←b+c
- All arithmetic operations have this form
- □ Design Principle 1: Simplicity favors regularity
  - Regularity makes implementation simpler
  - Simplicity enables higher performance at lower cost



### **Arithmetic**



### **■ Example 2.2** Compiling a complex C statement

C code:

$$f = (g + h) - (i + j);$$

RISC-V code:

```
add t0, g, h # temporary variable t0 contains g + h add t1, i, j # temporary variable t1 contains i + j sub f, t0, t1 # f gets t0 - t1
```

#### RISC-V assembly language

| Category   | Instruction | Example   | Meaning        | Comments             |
|------------|-------------|-----------|----------------|----------------------|
| Arithmetic | add         | add a,b,c | a←b+c          | Always three operand |
| Anumeuc    |             | sub a,b,c | a←b <b>-</b> c | Always three operand |



# 2.3 Operands of the Computer Hardware



□ Arithmetic instructions use register operands

### □ RISC-V has a 32 × 64-bit register file

- Use for frequently accessed data
- 64-bit data is called a "doubleword"
  32 x 64-bit general purpose registers x0 to x31
- 32-bit data is called a "word"

### □ *Design Principle 2:* Smaller is faster

• c.f. main memory: millions of locations



# **RISC-V Registers**



| Name    | Register | Usage                         | Preserved |
|---------|----------|-------------------------------|-----------|
|         | name     |                               | On call?  |
| x0      | 0        | The constant value 0          | n.a.      |
| x1(ra)  | 1        | Return address(link register) | yes       |
| x2(sp)  | 2        | Stack pointer                 | yes       |
| x3(gp)  | 3        | Global pointer                | yes       |
| x4(tp)  | 4        | Thread pointer                | yes       |
| x5-x7   | 5-7      | Temporaries                   | no        |
| x8-x9   | 8-9      | Saved                         | yes       |
| x10-x17 | 10-17    | Arguments/results             | no        |
| x18-x27 | 18-27    | Saved                         | yes       |
| x28-x31 | 28-31    | Temporaries                   | no        |



# **Register Operand Example**



#### □ C code:

$$f = (g + h) - (i + j);$$
  
• f, ..., j in x19, x20, ..., x23

# **□** Compiled RISC-V code:

```
add x5, x20, x21
add x6, x22, x23
sub x19, x5, x6
```

# **Memory Operands**



- **■** Main memory used for composite data
  - Arrays, structures, dynamic data
- **□** To apply arithmetic operations
  - Load values from memory into registers
  - Store result from register to memory
- **■** Memory is byte addressed
  - Each address identifies an 8-bit byte
- **□** RISC-V is Little Endian
  - Least-significant byte at least address of a word
  - c.f. Big Endian: most-significant byte at least address
- RISC-V does not require words to be aligned in memory
  - Unlike some other ISAs





# **Memory Alignment**



#### 正确

struct {
 int a;
 char b;
 char c[2];
 char d[3]
 float e;

| e    |      |        |        |  |  |
|------|------|--------|--------|--|--|
| d[1] | d[2] | No use | No use |  |  |
| b    | c[0] | c[1]   | d[0]   |  |  |
| a    |      |        |        |  |  |

#### 错误

| e    |      | No use    | No use |  |  |
|------|------|-----------|--------|--|--|
| d[1] | d[2] | e         |        |  |  |
| b    | c[0] | c[1] d[0] |        |  |  |
| a    |      |           |        |  |  |

因为一次只能读出4字 节内存中的一行 这样布局,e变量不能 一次读出



# **Endianness/byte order**



- □ Big end: Left Least Addr.
  - PowerPC
  - Byte1:01 Byte0:02 = 513
- □ Little end: Right Least Addr.
  - RISC-V
  - Byte1:01 Byte0:02 = 258
- **□** Bi-endian
  - MIPS, ARM, Alpha, SPARC



#### Memory





# **Memory Operand Example**



### □ C code:

$$A[12] = h + A[8];$$

h in x21, base address of A in x22

# **□** Compiled RISC-V code:

- Index 8 requires offset of 64
  - 8 bytes per doubleword
- Offset: the constant in a data transfer instruction
- Base register: the register added to form the address

```
1d x9, 64(x22)
add x9, x21, x9
sd x9, 96(x22)
```



# Register vs. Memory



- □ Registers are faster to access than memory
- □ Operating on memory data requires loads and stores
  - More instructions to be executed
- □ Compiler must use registers for variables as much as possible
  - Only spill to memory for less frequently used variables
  - Register optimization is important!



# Discussion: How to represent?



Constant

g = h + 55

Many time a program will use a constant in an operation

### **Constant or immediate Operands**



- Many time a program will use a constant in an operation
  - Incrementing index to point to next element of array
  - Add the constant 4 to register x9
  - Assuming AddrConstants 4 is address pointer of constant 55

ld x9, AddrConstant4(x3) // x9=constant 4 add x22, x22, x9



**x**3

- **■** Immediate: Other method for adding constant 4 to x22
  - Avoids the load instruction
  - Offer versions of the instruction

addi x22, x22, 4 // x22 = x22 + 4

- Constant zero: a register x0
- **■** Design Principle 3: Make common case fast (Why?)



# **Brief summary**



#### **RISC-V** operands

| Name                | Example                          | Comments                                                                                                    |
|---------------------|----------------------------------|-------------------------------------------------------------------------------------------------------------|
| 32 register         | x0~x31                           | Fast locations for data. In RISC-V, data must be in registers to perform arithmetic.                        |
|                     |                                  | Register x0 always equals 0.                                                                                |
| 2 <sup>61</sup>     | Memory[0], Memory[8],,           | Accessed only by data transfer instructions. RISC-V uses byte addresses,                                    |
| memory double words | Memory[18446744073709<br>551608] | so sequential doubleword accesses differ by 8. Memory holds data structures, arrays, and spilled registers. |

#### RISC-V assembly language

| Category      | Instruction         | Example       | Meaning          | Comments                                   |
|---------------|---------------------|---------------|------------------|--------------------------------------------|
|               | add                 | add x5,x6,x7  | x5=x6 + x7       | Add two source register operands           |
| Arithmetic    | subtract            | sub x5,x6,x7  | x5=x6 - x7       | First source register subtracts second one |
|               | add immediate       | addi x5,x6,20 | x5=x6+20         | Used to add constants                      |
| Data transfer | load doubleword     | ld x5, 40(x6) | x5=Memory[x6+40] | doubleword from memory to register         |
| Data transfer | store<br>doubleword | sd x5, 40(x6) | Memory[x6+40]=x5 | doubleword from register to memory         |



沙人曾系统结构与网络安全研究所

# 2.4 Signed and unsigned numbers



- Bits are just bits (no inherent meaning): conventions define relationship between bits and numbers
- □ Binary numbers (base 2) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001... decimal: 0...2<sup>n</sup>-1
- □ Of course it gets more complicated: numbers are finite (overflow) fractions and real numbers negative numbers
- How do we represent negative numbers? which bit patterns will represent which numbers?



# **Unsigned Binary Integers**



#### □ Given an n-bit number

$$x = x_{n-1} 2^{n-1} + x_{n-2} 2^{n-2} + \dots + x_1 2^1 + x_0 2^0$$

- $\square$  Range: 0 to  $+2^n 1$
- **■** Example
  - **0000 0000 ... 0000 1011**<sub>2</sub>  $= 0 + ... + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0$  $= 0 + ... + 8 + 0 + 2 + 1 = 11_{10}$
- □ Using 64 bits: 0 to +18,446,774,073,709,551,615

# 2s-Complement Signed Integers



#### □ Given an n-bit number

$$x = -x_{n-1}2^{n-1} + x_{n-2}2^{n-2} + \dots + x_12^1 + x_02^0$$

- □ Range:  $-2^{n-1}$  to  $+2^{n-1}-1$
- **■** Example
  - 1111 1111 ... 1111 1100<sub>2</sub>  $=-1\times2^{31}+1\times2^{30}+...+1\times2^{2}+0\times2^{1}+0\times2^{0}$  $=-2,147,483,648+2,147,483,644=-4_{10}$
- □ Using 64 bits: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807



# **2s-Complement Signed Integers**



- □ Bit 63 is sign bit
  - 1 for negative numbers
  - 0 for non-negative numbers
- $\Box$  -(-2<sup>n-1</sup>) can't be represented
- Non-negative numbers have the same unsigned and 2s-complement representation
- **□** Some specific numbers
  - **0**: 0000 0000 ... 0000
  - −1: 1111 1111 ... 1111
  - Most-negative: 1000 0000 ... 0000
  - Most-positive: 0111 1111 ... 1111



# **Signed Negation**



# □ Complement and add 1

■ Complement means  $1 \rightarrow 0, 0 \rightarrow 1$ 

$$x + x = 1111...111_2 = -1$$
  
 $x + 1 = -x$ 

# **■ Example:** negate +2

 $- +2 = 0000 \ 0000 \ \dots \ 0010_{\text{two}}$ 

$$-2 = 1111 \ 1111 \dots 1101_{\text{two}} + 1$$
  
= 1111 1111 \dots 1110\_{\text{two}}



### 2.5 Representing Instructions in the computer



- □ All information in computer consists of binary bits
- **□** Instructions are encoded in binary
  - Called machine code
- **■** Mapping registers into numbers
  - map registers x0 to x31 onto registers 0 to 31

#### **□** RISC-V instructions

- Encoded as 32-bit instruction words
- Small number of formats encoding operation code (opcode), register numbers, ...
- Regularity



### **Example: Translating assembly code**



### ■ (p81) Translating assembly into machine instruction

RISC-V code

add x9, x20, x21

Decimal version of machine code

0 21 20 0 9 51

Binary version of machine code



### Hexadecimal



#### **□ Base 16**

- Compact representation of bit strings
- 4 bits per hex digit

| 0 | 0000 | 4 | 0100 | 8 | 1000 | С | 1100 |
|---|------|---|------|---|------|---|------|
| 1 | 0001 | 5 | 0101 | 9 | 1001 | d | 1101 |
| 2 | 0010 | 6 | 0110 | а | 1010 | е | 1110 |
| 3 | 0011 | 7 | 0111 | b | 1011 | f | 1111 |

### **■ Example: eca8 6420**

**1110 1100 1010 1000 0110 0100 0010 0000** 



### **RISC-V R-Format Instructions**



| funct7 | rs2    | rs1    | funct3 | rd     | opcode |
|--------|--------|--------|--------|--------|--------|
| 7 bits | 5 bits | 5 bits | 3 bits | 5 bits | 7 bits |

#### **■** Instruction fields

- opcode: operation code
- rd: destination register number
- funct3: 3-bit function code (additional opcode)
- rs1: the first source register number
- rs2: the second source register number
- funct7: 7-bit function code (additional opcode)
- Design Principle 3
  - Good design demands good compromises
- All instructions in RISC-V have the same length
  - Conflict: same length ←--→ single instruction



# R-format Example



| funct7 | rs2    | rs1    | funct3 | rd     | opcode |
|--------|--------|--------|--------|--------|--------|
| 7 bits | 5 bits | 5 bits | 3 bits | 5 bits | 7 bits |

add x9, x20, x21

| 0       | 21    | 20    | 0   | 9     | 51      |
|---------|-------|-------|-----|-------|---------|
| 0000000 | 10101 | 10100 | 000 | 01001 | 0110011 |

0000 0001 0101 1010 0000 0100 1011  $0011_{two} = 015A04B3_{16}$ 



### **RISC-V I-Format Instructions**



| immediate | rs1    | funct3 | rd     | opcode |
|-----------|--------|--------|--------|--------|
| 12 bits   | 5 bits | 3 bits | 5 bits | 7 bits |

#### **■** Immediate arithmetic and load instructions

- rs1: source or base address register number
- immediate: constant operand, or offset added to base address
  - 2s-complement, sign extended

#### □ Design Principle 3: Good design demands good compromises

- Different formats complicate decoding, but allow 32-bit instructions uniformly
- Keep formats as similar as possible

#### $\square$ Example: Id x9, 64(x22)

- 22 (x22) is placed rs1;
- 64 is placed immediate
- 9 (x9) is placed rd



### **RISC-V S-Format Instructions**



| imm[11:5] | rs2    | rs1    | funct3 | imm[4:0] | opcode |
|-----------|--------|--------|--------|----------|--------|
| 7 bits    | 5 bits | 5 bits | 3 bits | 5 bits   | 7 bits |

#### **□** Different immediate format for store instructions

- rs1: base address register number
- rs2: source operand register number
- immediate: offset added to base address
  - □ Split so that rs1 and rs2 fields always in the same place

#### $\square$ Example: sd x9, 64(x22)

- 22 (x22) is placed rs1;
- 64 is placed immediate
- 9 (x9) is placed rs2



# **RISC-V** instruction encoding



| Name | Format | Example |     |   |   |   | Comment |                 |
|------|--------|---------|-----|---|---|---|---------|-----------------|
| add  | R      | 0       | 3   | 2 | 0 | 1 | 51      | add x1, x2, x3  |
| sub  | R      | 32      | 3   | 2 | 0 | 1 | 51      | sub x1, x2, x3  |
| addi | I      | 10      | 000 | 2 | 0 | 1 | 19      | addi x1,x2,1000 |
| ld   | I      | 1000    |     | 2 | 3 | 1 | 3       | ld x1, 1000(x2) |
| sd   | S      | 63      | 1   | 2 | 3 | 8 | 35      | sd x1, 1000(x2) |

# **Example**



# ■ Example(p85) Translating assembly into machine instruction

#### C code:

```
A[30] = h + A[30] + 1;
(Assume: h ---- x21 base address of A ---- x10)
```

#### RISC-V assembly code:

```
ld x9, 240(x10)  // temporary reg x9 gets A[30]
add x9, x21, x9  // temporary reg x9 gets h + A[30]
addi x9, x9, 1  // temporary reg x9 gets h + A[30] + 1
sd x9, 240(x10)  // stores h + A[30] + 1 back into A[30]
```

#### RISC-V machine language code:

Decimal version

| ld | <b>immediate</b> | rs1 | funct3 | rd | opcode |
|----|------------------|-----|--------|----|--------|
|    | 240              | 10  | 3      | 9  | 3      |





| add  | funct7   | rs2  | rs1    | l fur        | ict3 | rd      | opcode |
|------|----------|------|--------|--------------|------|---------|--------|
|      | 0        | 9    | 21     | 0            |      | 9       | 51     |
| addi | immedia  | te r | s1   f | unct3        | rd   | opco    | ode    |
|      | 1        | 9    |        | 0            | 9    | 19      |        |
| sd   | im[11:5] | rs2  | rs1    | <b>funct</b> | 3    | im[4:0] | opcode |
|      | 7        | 9    | 10     | 3            |      | 16      | 35     |

### **■** Two key principles of today's computers

- Instructions are represented as numbers
- Programs can be stored in memory like numbers





# RISC-V fields (format)

THE WAS UNINTED

Imm Region: ±2<sup>12</sup>

| Name       |                 | Comments           |                    |             |                   |                           |                                |
|------------|-----------------|--------------------|--------------------|-------------|-------------------|---------------------------|--------------------------------|
| Field size | 31 7bits 25     | 24 <i>5bits</i> 20 | 19 <i>5bits</i> 15 | 14 3bits 12 | 11 <b>5bits</b> 7 | 6 7bits 0                 | All RISC-V instruction 32 bits |
| R-type     | funct7          | rs2                | rs1                | funct3      | rd                | opcode                    | Arithmetic instruction format  |
| I-type     | immediate[11:0] |                    | rs1                | funct3      | rd                | opcode                    | Loads & immediate arithmetic   |
| S-type     | immed[11:5]     | rs2                | rs1                | funct3      | immed[4:0]        | opcode                    | Stores                         |
| SB-type    | imm[12,10:5]    | rs2                | rs1                | funct3      | imm[4:1,11]       | opcode                    | Conditional branch format      |
| UJ-type    | immed           | iate[20,10:        | 1,11,19:12         | rd          | opcode            | Unconditional jump format |                                |
| U-type     | iı              | mmediate[3         | 31:12]             | rd          | opcode            | Upper immediate format    |                                |

**op:** basic operation of the instruction, traditionally called the opcode.

**rd:** *destination register number.* 

funct3: 3-bit function code (additional opcode).

**rs1:** *the first register source operand.* 

**rs2:** the second register source operand.

□ **funct7** 7-bit function code (additional opcode).



### MIPS fields (format)



| Field size | 6bits | 5bits | 5bits                                    | 5bits 5bits 6bits All MIPS instruction    |  | All MIPS instruction 32 bits |                               |
|------------|-------|-------|------------------------------------------|-------------------------------------------|--|------------------------------|-------------------------------|
| R-format   | ор    | rs    | rt                                       | rd shamt funct Arithmetic instruction for |  |                              | Arithmetic instruction format |
| i-format   | ор    | rs    | rt                                       | Imm/Word address Data transfer ,branch    |  | Data transfer ,branch format |                               |
| J-format   | ор    |       | target address (word) Unconditional jump |                                           |  |                              |                               |

Region: ±2<sup>15</sup>

□ op: basic operation of the instruction, traditionally called the opcode.

□ rs: the first register source operand.

**the second register source operand.** 

**rd**: the register destination operand.

□ shamt: shift amount.

■ **funct**: function, this field selects the specific variant of the operation in the op field.



### **Stored Program Computer**



#### **The BIG Picture**

**Memory** Accounting program (machine code) Editor program (machine code) C compiler (machine code) Payroll data Book text Source code in C for editor program

■ Instructions represented in binary, just like data

■ Instructions and data stored in memory

Programs can operate on programs

• e.g., compilers, linkers, ...

■ Binary compatibility allows compiled programs to work on different computers

Standardized ISAs



Processor

### 2.6 Logical Operations



#### **■** Instructions for bitwise manipulation

| Operation      | С               | Java            | RISC-V    |
|----------------|-----------------|-----------------|-----------|
| Shift left     | <b>&lt;&lt;</b> | <b>&lt;&lt;</b> | slli      |
| Shift right    | >>              | >>>             | srli      |
| Bit-by-bit AND | &               | &               | and, andi |
| Bit-by-bit OR  |                 |                 | or, ori   |
| Bit-by-bit XOR | ۸               | ۸               | xor, xori |
| Bit-by-bit NOT | ~               | ~               |           |

□ Useful for extracting and inserting groups of bits in a word



### **Shift Operations**



| funct6 | immed  | rs1    | funct3 | rd     | opcode |
|--------|--------|--------|--------|--------|--------|
| 6 bits | 6 bits | 5 bits | 3 bits | 5 bits | 7 bits |

- □ immed: how many positions to shift
- □ Shift left logical
  - Shift left and fill with 0 bits
  - slli by *i* bits multiplies by  $2^i$
- □ Shift right logical
  - Shift right and fill with 0 bits
  - **srli** by *i* bits divides by  $2^i$  (unsigned only)



### **AND Operations**



#### □ Useful to mask bits in a word

Select some bits, clear others to 0

and x9,x10,x11

| x10 | 00000000 00000000 00000000 00000000 0000 | 0011    | 01 11000000 |
|-----|------------------------------------------|---------|-------------|
| 4.4 |                                          |         |             |
| x11 | 00000000 00000000 00000000 00000000 0000 | 1111    | 00 00000000 |
|     | 00000000 00000000 00000000 00000000 0000 | <u></u> | 00 0000000  |
| x9  | 0000000 0000000 0000000 0000000 0000000  | 0011    | 00 0000000  |



### **OR Operations**



#### □ Useful to include bits in a word

Set some bits to 1, leave others unchanged

or x9, x10, x11

| x10 | 00000000 00000000 00000000 00000000 0000 | 001101 11  | 000000 |
|-----|------------------------------------------|------------|--------|
|     |                                          |            |        |
| x11 | 00000000 00000000 00000000 00000000 0000 | 111100 00  | 000000 |
|     |                                          |            |        |
| x9  | 00000000 00000000 00000000 00000000 0000 | 111101 110 | 000000 |



### **XOR Operations**



#### **□** Differencing operation

Set some bits to 1, leave others unchanged

xor x9, x10, x12 // NOT operation

| x10 | 0000000  | 00000000 | 00000000 | 00000000 | 00000000 | 00000000 | 0000 | 1101 | 11(             | 000000 |
|-----|----------|----------|----------|----------|----------|----------|------|------|-----------------|--------|
| x12 | 11111111 | 11111111 | 11111111 | 11111111 | 11111111 | 11111111 | 1111 | 1111 | 11 <sup>-</sup> | 111111 |
| x9  | 11111111 | 11111111 | 11111111 | 11111111 | 11111111 | 11111111 | 1111 | 0010 | 00              | 111111 |



#### **RISC-V** operands

| Name                                   | Example                                                     | Comments                                                                                                                                                                              |
|----------------------------------------|-------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 32 registers                           | x0-x31                                                      | Fast locations for data. In RISC-V, data must be in registers to perform arithmetic. Register x0 always equals 0.                                                                     |
| 2 <sup>61</sup> memory<br>double words | Memory[0], Memory[8],, Memory[18,446,744,073,7 09,551,608]] | Accessed only by data transfer instructions. RISC-V uses byte addresses, so sequential double word accesses differ by 8. Memory holds data structures, arrays, and spilled registers. |

RISC-V assembly language

|          |                              |                | . · · · · · · · · · · · · · · · · · · · |                                    |
|----------|------------------------------|----------------|-----------------------------------------|------------------------------------|
| Category | Instruction                  | Example        | Meaning                                 | Comments                           |
|          | and                          | and x5, x6, 3  | x5=x6 & 3                               | Arithmetic shift right by register |
|          | inclusive or                 | or x5,x6,x7    | x5=x6   x7                              | Bit-by-bit OR                      |
|          | exclusive or                 | xor x5,x6,x7   | x5=x6 ^ x7                              | Bit-by-bit XOR                     |
| Logical  | and immediate                | andi x5,x6,20  | x5=x6 & 20                              | Bit-by-bit AND reg. with constant  |
|          | inclusive or immediate       | ori x5,x6,20   | x5=x6   20                              | Bit-by-bit OR reg. with constant   |
|          | exclusive or immediate       | xori x5,x6,20  | X5=x6 ^ 20                              | Bit-by-bit XOR reg. with constant  |
|          | shift left logical           | sll x5, x6, x7 | x5=x6 << x7                             | Shift left by register             |
| Shift    | shift right logical          | srl x5, x6, x7 | x5=x6 >> x7                             | Shift right by register            |
|          | shift right arithmetic       | sra x5, x6, x7 | x5=x6 >> x7                             | Arithmetic shift right by register |
|          | shift left logical immediate | slli x5, x6, 3 | x5=x6 << 3                              | Shift left by immediate            |



浙江人乡系统结构与网络安全研究所

### 2.7 Instructions for making decisions



#### **■** Branch instructions

- beq register1, register2, L1
- bne register1, register2, L1

# ■ Example 2.9 Compiling an *if* statement to a branch (Assume: $f \sim j - x19 \sim x23$ )

C code:

```
if(i == j) goto L1;

f = g + h;

L1: f = f - i;
```

RISC-V assembly code:

```
beq x21, x22, L1 # go to L1 if i equals j add x19, x20, x21 # f = g + h (skipped if i equals j) L1: sub x19, x19, x22 # f = f - i (always executed)
```



#### Branch



 $i \uparrow j$ 

F=g-h

i==j?

#### **□** Example 2.10

Compiling *if-then-else* into Conditional Branches

(Assume:  $f \sim j$  ---- x19 ~ x23)

C code:

(RISCV assembly code:

beq x0, x0, EXIT / # go to Exit

..... statement

bne x22, x23, Else 
$$/\#$$
 go to Else if i != j  $/\#$  Exit:  
add x19, x20, x21  $/\#$  f = g + h (Executed if i == j  $/\#$  if)

F=g+h

Else: sub x19, x20, x21 # f = g - h (Executed if  $i \neq j$  else)

# the first instruction of the next C



**Exit:** 

### **Supports LOOPs**



■ Example 2.11 Compiling a loop with variable array index

```
(Assume: g \sim j ---- x19 \sim x23 base of A[i] ---- x25)
```

C code:

```
Loop: g = g + A[i]; // A is an array of 100 words i = i + j; if (i != h) goto Loop;
```

RISC-V assembly code:

```
Loop: slli x10, x22, 3 # temp reg x10 = 8 * i

add x10, x10, x25 # x10 = address of A[i]

ld x19 $t0, 0(x10) # temp reg x19 = A[i]

add x20, x20, x19 # g = g + A[i]

add x22, x22, x23 # i = i + j

bne x22, x21, Loop # go to Loop if i != h
```

### **Supports while**



#### ■ Example 2.12 Compiling a while loop

```
(Assume: i \sim k---- x22 and x24 base of save ---- x25)
```

C code:

```
while ( save[i] = = k )

i = +i;
```

RISCV assembly code:

```
Loop: slli x10, x22, 3
add x10, x10, x25
ld x9, 0(x10)
bne x9, x24, Exit
addi x22, x22, 1
beq x0, x0, Loop

Exit:
```

```
# temp reg $t1 = 8 * i
# x10 = address of save[i]
# x9 gets save[i]
# go to Exit if save[i] != k
# i += 1
# go to Loop
```

#### **Most popular Compare Operation**



#### -- set on less than : slt

- □ set on less than -slt
  - If the first reg. is less than second reg. then sets third reg to 1 slt x5, x19, x20 # x5=1 if x19 < x20
- Example 2.13 Compiling a less than test

```
(Assume: a - s0 b - s1)
```

C language:

if 
$$(a < b)$$
, goto Less

RISCV assembly code:

```
slt x5, x8, x9 \#x5 = 1 if x8 < x9 (a < b)
bne x5, zero, Less \# go to Less if x5 != 0 (that is, if a < b)
```

Less:



### **More Conditional Operations**



- □ blt rs1, rs2, L1
  - if (rs1 < rs2) branch to instruction labeled L1
- **□** bge rs1, rs2, L1
  - if  $(rs1 \ge rs2)$  branch to instruction labeled L1
- **■** Example
  - if (a > b) a += 1;
  - a in x22, b in x23
     bge x23, x22, Exit # branch if b >= a
     addi x22, x22, 1

Exit:



### Signed vs. Unsigned



- □ Signed comparison: blt, bge
- □ Unsigned comparison: bltu, bgeu
- **■** Example

  - x22 < x23 # signed □-1 < +1

### **Hold out Case/Switch**



- used to select one of many alternatives
- **■ Example 2.14**

```
Compiling a switch using jump address table (Assume: f \sim k --x20 \sim x25 x5 contains 4/8)

C code:
   switch ( k ) {
      case 0: f = i + j; break; /* k = 0 */
      case 1: f = g + h; break; /* k = 1 */
      case 2: f = g - h; break; /* k = 2 */
      case 3: f = i - j; break; /* k = 3 */
}
```

### Jump register & jump address table





### **RISC-V** assembly code:



```
x25, x0, Exit
                                              # test if k < 0
     Boundary
                      bge x25, x5, Exit
                                              \# if k \ge 4, go to Exit
                                              # temp reg x7 = 8 * k (0 \le k \le 3)
                      slli
                           x7, x25, 3
                      add x7, x7, x6
                                             \# x7 = address of JumpTable[k]
                           x7, 0(x7)
                                              # temp reg x7 gets JumpTable[k]
                      ld
                     jalr x1, 0(x7)
                                              # jump based on register x7(entrance)
              Exit:
jump address table
 x7 = x6 + 8 * k:
                     L0:
                          add
                               $s0,<mark>\$</mark>s3, $s4
                                                   \# k = 0 so f gets i + j
                                                   # end of this case so go to Exit
                               x0, 0(x1)
                          jalr
  L0:address
                          add
                               $s0, $s1, $s2
                                                   \# k = 1 so f gets g + h
  L1:address
                               x0, 0(x1)
                                                  # end of this case so go to Exit
                          jalr
                     L2:
                               $s0, $s1, $s2
                                                  \# k = 2 so f gets g - h
                          sub
  L2: address
                               x0, 0(x1)
                                                  # end of this case so go to Exit
                     L3:
                          sub $s0, $s3, $s4
                                                  \# k = 3 so f gets i - j
  L3:address
                          jalr x0, 0(x1) Memo
                                                  # end of switch statement
```



系统结构与网络安全研究所

### Important conception-Basic Blocks



#### ■ A basic block is a sequence of instructions with

- No embedded branches (except at end)
- No branch targets/branch lables (except at beginning)



- A compiler identifies basic blocks for optimization
- An advanced processor can accelerate execution of basic blocks

#### 2.8 Supporting Procedures



#### in Computer Hardware

#### Procedure/function be used to structure programs

- A stored subroutine that performs a specific task based on the parameters with which it is provided
  - easier to understand, allow code to be reused
- Six step
- 1. Place Parameters in a place where the procedure can access them
- 2. Transfer control to the procedure: jump to
- 3. Acquire the storage resources needed for the procedure
- 4. **Perform** the desired task
- 5. Place the result value in a place where the calling program can access it
- 6. Return control to the point of origin



### **Procedure Call Instructions**



- Instruction for procedures: jal (jump-and-link)Caller jal x1, ProcedureAddress
  - Address of following instruction put in x1
  - Jumps to target address

**PC+4** → ra

- □ Procedure return: jump and link register
   Callee jalr x0, 0(x1)
  - Like jal, but jumps to  $0 + address in x \hat{1}$
  - Use x0 as rd (x0 cannot be changed)
  - Can also be used for computed jumps
    - e.g., for case/switch statements

Special registers



### **Using More Registers**



#### **■** More Registers for procedure calling

- $a0 \sim a7(x10-x17)$ : eight argument registers to pass parameters & return values
- ra/x1: one return address register to return to origin point

addi sp,sp,-8 sd ...,8(sp)

#### □ Stack

- ideal data structure for spilling registers
  - □ Push, pop
  - □ Stack pointer (sp)
- Stack grow from higher address to lower address
  - Push: sp= sp 8
  - Pop: sp = sp + 8  $\begin{cases} ld & ..., 8(sp) \\ addi & sp, sp, 8 \end{cases}$

High address



Low address



沙人学系统结构与网络安全研究所



#### ■ Example 2.15 Compiling a leaf procedure

(Assume: g, ..., j in x10, ..., x13 and f in x20) **High address** C code: (\$t1) long long int leaf example ( (\$t0) long long int g, long long int h, (\$s0) \$sp(-24) long long int i, long long int j){ long long int f; f = (g + h) - (i + j);Low address return f; Save value **Return value** 

#### RISC-V assembly code:

# adjust stack to make room for 3 items sd  $\times 5$ , 16(sp) #These three instructions save three sd  $\times 6$ , 8(sp) # register  $\times 5$ ,  $\times 6$ ,  $\times 20$ , 0(sp) # Let's consider why it need to be done.





```
add x5,x10,x11 # register x5contains g + h
add x6,x12,x1 # register x6 contains i + j
sub x20,x5,x6 # f = x5-x6, which is (g + h)-(i + j)
addi x10,x20,0 # copy f to return register (x10 = x20 + 0)
```

#### But maybe some of the three are not used by the caller

- So, this way might be inefficient to save x5, x6, x20 on stack
- Two classes of registers
  - $\Box$  t0 ~ t6: 7 temporary registers, by the callee not preserved
  - $= s0 \sim s11$ : 12 saved registers, must be preserved If used



### **RISC-V** operands



| Name                            | Example                                                           | Comments                                                                                                                                                                              |
|---------------------------------|-------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 32 registers                    | x0-x31                                                            | Fast locations for data. In RISC-V, data must be in registers to perform arithmetic. Register x0 always equals 0.                                                                     |
| 2 <sup>61</sup> memory<br>words | Memory[0], Memory[8],,<br>Memory[18,446,744,073,7<br>09,551,608]] | Accessed only by data transfer instructions. RISC-V uses byte addresses, so sequential double word accesses differ by 8. Memory holds data structures, arrays, and spilled registers. |

| Name                             | Register no. | Usage                            | Preserved on call |
|----------------------------------|--------------|----------------------------------|-------------------|
| x0(zero)                         | 0            | The constant value 0             | n.a.              |
| <i>x</i> 1( <b>ra</b> )          | 1            | Return address(link register)    | yes               |
| $x2(\mathbf{sp})$                | 2            | Stack pointer                    | yes               |
| <i>x</i> 3(gp)                   | 3            | Global pointer                   | yes               |
| <i>x</i> 4(tp)                   | 4            | Thread pointer                   | yes               |
| x5-x7(t0-t2)                     | 5-7          | Temporaries                      | no                |
| x8(s0/fp)                        | 8            | Saved/frame point                | Yes               |
| <i>x</i> 9(s1)                   | 9            | Saved                            | Yes               |
| <i>x</i> 10- <i>x</i> 17(a0-a7)  | 10-17        | Arguments/results                | no                |
| <i>x</i> 18- <i>x</i> 27(s2-s11) | 18-27        | Saved                            | yes               |
| <i>x</i> 28- <i>x</i> 31(t3-t6)  | 28-31        | Temporaries                      | No                |
| PC                               | -            | Auipc(Add Upper Immediate to PC) | Yes               |

ZheJiang University

# The values of the stack pointer and stack before, during and after procedure call in Example 2.15





#### **Nested Procedures**



#### **■ Example 2.16** Compiling a recursive procedure (Assume: n -- a0)

```
C code for n!
 int fact (int n)
                                             Argument n in a0
     if (n < 1) return (1);
                                             Result in a0
        else return (n * fact(n - 1));
RISC-V assembly code
 fact: addi sp, sp, 16
                                    # adjust stack for 2 items
        ra, 8(sp)
                                    # save the return address: x1
        sd a0, 0(\$sp)
                                    # save the argument n: x10
        addi t0, a0, -1
                                    \# x5 = n - 1
                                    \# if n \ge 1, go to L1(else)
        bge t0, zero, L1
        addi a0, zero, 1
                                    # return 1 if n < 1
       addi sp, sp, 16
                            # Recover sp (Why not recover x1and x10?)
       jalr zero, 0(ra)
                                    # return to caller
```

#### **Nested Procedures-Continue**



```
\# n \ge 1: argument gets (n - 1)
L1: addi a0, a0, -1
                                \# call fact with (n - 1)
    jal
        ra, fact
                                #move result of fact(n - 1) to x6(t1)
    add t1, a0, zero
    1d a0, 0(\$sp)
                                # return from jal: restore argument n
    ld ra, 8($sp)
                               # restore the return address
    add sp, sp, 16
                                # adjust stack pointer to pop 2 items
          a0, $a0, t1
                               # return n*fact(n-1)
    mul
                               # return to the caller
     jalr
          zero, 0(ra)
```

- Why a0 is saved? Why ra is saved?
- Preserved things across a procedure call

Saved registers ( $s0 \sim s11$ ), stack pointer register (sp), return address register (ra/x1), stack above the stack pointer

■ Not preserved things across a procedure call

Temporary registers ( $t0 \sim t7$ ), argument registers ( $a0 \sim a7$ ), return value registers ( $a0 \sim a7$ ), stack below the stack pointer



### What is and what is not preserved across a procedure call



| Preserved                       | Not preserved                       |
|---------------------------------|-------------------------------------|
| Saved registers: x8-x9, x18-x27 | Temporary registers: x5-x7, x28-x31 |
| Stack pointer register: x2(sp)  | Argument/result registers: x10-x17  |
| Frame pointer: x8(fp)           |                                     |
| Return address: x1(ra)          |                                     |
| Stack above the stack pointer   | Stack below the stack pointer       |







C.

#### **Memory Layout**



- **□** Text: program code
- **□** Static data: global variables
  - e.g., static variables in C, constant arrays and strings
  - x3 (global pointer) initialized to address allowing ± offsets into this segment

    SP → 0000 003f ffff fff0hex Stock
- Dynamic data: heap
  - E.g., malloc in C, new in Java
- **■** Stack: automatic storage
- **■** Storage class of C variables
  - automatic
  - static



0

PC → 0000 0000 0040 0000<sub>hex</sub>

Text

Reserved





#### **Local Data on the Stack**



#### **□** Allocating Space for New Data on the **Stack**

- Procedure frame/activation record
  - The segment of stack containing a procedure's saved registers and local variables
- Frame pointer
  - A value denoting the location of saved register and local variables for a given procedure

    High address





系统结构与网络安全研究所

### **RISC-V** operands



| Name                            | Example                                                           | Comments                                                                                                                                                                              |
|---------------------------------|-------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 32 registers                    | x0-x31                                                            | Fast locations for data. In RISC-V, data must be in registers to perform arithmetic. Register x0 always equals 0.                                                                     |
| 2 <sup>61</sup> memory<br>words | Memory[0], Memory[8],,<br>Memory[18,446,744,073,7<br>09,551,608]] | Accessed only by data transfer instructions. RISC-V uses byte addresses, so sequential double word accesses differ by 8. Memory holds data structures, arrays, and spilled registers. |

| Name                             | Register no. | Usage                            | Preserved on call |
|----------------------------------|--------------|----------------------------------|-------------------|
| x0(zero)                         | 0            | The constant value 0             | n.a.              |
| <i>x</i> 1( <b>ra</b> )          | 1            | Return address(link register)    | yes               |
| $x2(\mathbf{sp})$                | 2            | Stack pointer                    | yes               |
| <i>x</i> 3(gp)                   | 3            | Global pointer                   | yes               |
| <i>x</i> 4(tp)                   | 4            | Thread pointer                   | yes               |
| x5-x7(t0-t2)                     | 5-7          | Temporaries no                   |                   |
| x8(s0/fp)                        | 8            | Saved/frame point Yes            |                   |
| <i>x</i> 9(s1)                   | 9            | Saved Yes                        |                   |
| <i>x</i> 10- <i>x</i> 17(a0-a7)  | 10-17        | Arguments/results no             |                   |
| <i>x</i> 18- <i>x</i> 27(s2-s11) | 18-27        | Saved yes                        |                   |
| <i>x</i> 28- <i>x</i> 31(t3-t6)  | 28-31        | Temporaries No                   |                   |
| PC                               | -            | Auipc(Add Upper Immediate to PC) | Yes               |

ZheJiang University

## RISC-V assembly language



| Category      | Instruction                | Example         | Meaning                    | Comments                                    |
|---------------|----------------------------|-----------------|----------------------------|---------------------------------------------|
| Arithmetic    | add                        | add x5,x6,x7    | x5=x6 + x7                 | Add two source register operands            |
|               | subtract                   | sub x5,x6,x7    | x5=x6 - x7                 | First source register subtracts second one  |
|               | add immediate              | addi x5,x6,20   | x5=x6+20                   | Used to add constants                       |
| Data transfer | load doubleword            | ld x5, 40(x6)   | x5=Memory[x6+40]           | doubleword from memory to register          |
|               | store doubleword           | sd x5, 40(x6)   | Memory[x6+40]=x5           | doubleword from register to memory          |
|               | load word                  | lw x5, 40(x6)   | x5=Memory[x6+40]           | word from memory to register                |
|               | load word,<br>unsigned     | lwu x5, 40(x6)  | x5=Memory[x6+40]           | Unsigned word from memory to register       |
|               | store word                 | sw x5, 40(x6)   | Memory[x6+40]=x5           | word from register to memory                |
|               | load halfword              | lh x5, 40(x6)   | x5=Memory[x6+40]           | Halfword from memory to register            |
| Data transfer | load halfword,<br>unsigned | lhu x5, 40(x6)  | x5=Memory[x6+40]           | Unsigned halfword from memory to register   |
|               | store halfword             | sh x5, 40(x6)   | Memory[x6+40]=x5           | halfword from register to memory            |
|               | load byte                  | lb x5, 40(x6)   | x5=Memory[x6+40]           | byte from memory to register                |
|               | load word,<br>unsigned     | lbu x5, 40(x6)  | x5=Memory[x6+40]           | Unsigned byte from memory to register       |
|               | store byte                 | sb x5, 40(x6)   | Memory[x6+40]=x5           | byte from register to memory                |
|               | load reserved              | lr.d x5,(x6)    | x5=Memory[x6]              | Load;1st half of atomic swap                |
|               | store conditional          | sc.d x7,x5,(x6) | Memory[x6]=x5;<br>x7 = 0/1 | Store;2nd half of atomic swap               |
|               | Load upper<br>immediate    | lui x5,0x12345  | x5=0x12345000              | Loads 20-bits constant shifted left 12 bits |

RISC-V assembly language

| Category                | Instruction                          | Example          | Meaning                        | Comments                                                   |
|-------------------------|--------------------------------------|------------------|--------------------------------|------------------------------------------------------------|
| Logical                 | and                                  | and x5, x6, 3    | x5=x6 & 3                      | Arithmetic shift right by register                         |
|                         | inclusive or                         | or x5,x6,x7      | x5=x6   x7                     | Bit-by-bit OR                                              |
|                         | exclusive or                         | xor x5,x6,x7     | x5=x6 ^ x7                     | Bit-by-bit XOR                                             |
|                         | and immediate                        | andi x5,x6,20    | x5=x6 & 20                     | Bit-by-bit AND reg. with constant                          |
|                         | inclusive or immediate               | ori x5,x6,20     | x5=x6   20                     | Bit-by-bit OR reg. with constant                           |
|                         | exclusive or immediate               | xori x5,x6,20    | X5=x6 ^ 20                     | Bit-by-bit XOR reg. with constant                          |
| Shift                   | shift left logical                   | sll x5, x6, x7   | x5=x6 << x7                    | Shift left by register                                     |
|                         | shift right logical                  | srl x5, x6, x7   | x5=x6 >> x7                    | Shift right by register                                    |
|                         | shift right arithmetic               | sra x5, x6, x7   | x5=x6 >> x7                    | Arithmetic shift right by register                         |
|                         | shift left logical immediate         | slli x5, x6, 3   | x5=x6 << 3                     | Shift left by immediate                                    |
| Shift                   | shift right logical immediate        | srli x5,x6,3     | x5=x6 >> 3                     | Shift right by immediate                                   |
|                         | shift right arithmetic immediate     | srai x5,x6,3     | x5=x6 >> 3                     | Arithmetic shift right by immediate                        |
| Conditional<br>branch   | branch if equal                      | beq x5, x6, 100  | if(x5 == x6) go to $PC+100$    | PC-relative branch if registers equal                      |
|                         | branch if not equal                  | bne x5, x6, 100  | if(x5 != x6) go to $PC+100$    | PC-relative branch if registers not equal                  |
|                         | branch if less than                  | blt x5, x6, 100  | if( $x5 < x6$ ) go to PC+100   | PC-relative branch if registers less                       |
|                         | branch if greater or equal           | bge x5, x6, 100  | if(x5 >= x6) go to PC+100      | PC-relative branch if registers greater or equal           |
|                         | branch if less, unsigned             | bltu x5, x6, 100 | if( $x5 \ge x6$ ) go to PC+100 | PC-relative branch if registers less, unsigned             |
|                         | branch if greater or equal, unsigned | bgeu x5, x6, 100 | if( $x5 \ge x6$ ) go to PC+100 | PC-relative branch if registers greater or equal, unsigned |
| Unconditional<br>branch | jump and link                        | jal x1, 100      | x1 = PC + 4;<br>go to $PC+100$ | PC-relative procedure call                                 |
|                         | jump and link register               | jalr x1, 100(x5) | x1 = PC + 4;<br>go to $x5+100$ | procedure return; indirect call                            |



#### **■** Byte-encoded character sets

- ASCII (American Standard Code for Information Interchange)
  - 128 characters: 95 graphic, 33 control
- Latin-1: 256 characters
  - ASCII, +96 more graphic characters

#### □ Unicode: 32-bit character set

- Used in Java, C++ wide characters, ...
- Most of the world's alphabets, plus symbols
- UTF-8, UTF-16: variable-length encodings



### Byte/Halfword/Word Operations



#### ■ RISC-V byte/halfword/word load/store

- Load byte/halfword/word: Sign extend to 64 bits in rd
  - □ lb rd, offset(rs1)
  - □ lh rd, offset(rs1)
  - □ lw rd, offset(rs1)
- Load byte/halfword/word unsigned: 0 extend to 64 bits in rd
  - □ lbu rd, offset(rs1)
  - □ lhu rd, offset(rs1)
  - □ lwu rd, offset(rs1)
- Store byte/halfword/word: Store rightmost 8/16/32 bits
  - □ sb rs2, offset(rs1)
  - □ sh rs2, offset(rs1)
  - □ sw rs2, offset(rs1)



### String Copy Example



■ Example 2.17 Compiling a string copy procedure

(Assume: base addresses for i - x19, x's base --x10, y's base --x11)

C code: Y→X
void strcpy(char x[], char y[])
{ size\_t i;
 i = 0;
 while((x[i] = y[i])!='\0') /\* copy and test byte \*/
 i += 1;
}

RISC-V assembly code:



系统结构与网络安全研究所



```
beq t1, zero, L2 \# if y[i] == 0 then exit
addi s3, s3, 1 \# i = i + 1
jal zero, L1 \# next iteration of loop
L2: ld s3, 0(sp) \# restore saved old s3
addi sp, sp, 8 \# pop 1 double word from stack
jalr zero 0(x1) \# return
```

#### Optimization for example 2.17

- strcpy is a leaf procedure
- Allocate i to a temporary register s3/x18

#### ■ For a leaf procedure

- The compiler exhausts all temporary registers
- Then use the registers it must save



# 2.10 RISC V Addressing for 32-Bit Immediate and Addresses



- Wide Bit Immediate addressing
  - most constants is short and fit into 12-bit field
  - Set upper 20 bits of a constants in a register with load upper immediate (lui rd, constant)
- □ instruction format (U-type)

| _      | 31                    |                          | 12 11 | 76    | 0         |
|--------|-----------------------|--------------------------|-------|-------|-----------|
|        |                       | immediate[31:12]         |       | rd    | Opcode    |
|        |                       | 20bits                   |       | 5bits | 7bits     |
|        | <ul><li>lui</li></ul> | x19, 976 # 0x003D0       |       |       |           |
|        |                       | 31                       | 12    | 11 7  | 6 0       |
| Instru | action                | 0000 0000 0011 1101 0000 |       | 10011 | 011 0111  |
|        |                       | •                        |       | Fil   | ling zero |
| Regis  | ster                  | 0000 0000 0011 1101 0000 |       | 00000 | 0000000   |



系统结构与网络安全研究所

### **32-bit Constants**



- Example 2.19 Loading a 32-bit constant
  - The 32-bit constant:

RISC V code:

The value of s3 afterward is:

**■** Note: Why does it need two steps?



## **Branch Addressing**



- Addressing in branches
  - Branch instructions specify
    - Opcode, two registers, target address
  - Most branch targets are near branch
    - □ Forward or backward

**SB-type:** bne x10, x11, 2000,  $\frac{1}{2000} = 0111 \ 1101 \ 0000$ 

inst[31:0]

| 0       | 111110    | 01011 | 01010 | 001    | 1000     | 0       | 1100011 |
|---------|-----------|-------|-------|--------|----------|---------|---------|
| imm[12] | imm[10:5] | rs2   | rs1   | funct3 | imm[4:1] | imm[11] | opcode  |

imm[31:0]

|                        |         |             | · · · · · · · · · · · · · · · · · · · |      |
|------------------------|---------|-------------|---------------------------------------|------|
| Inst[31] signextension | Inst[7] | Inst[30:25] | Inst[11:8]                            | 1'B0 |
|                        |         |             |                                       |      |

PC-relative addressing

$$=$$
 PC + immediate  $\times$  2



系统结构与网络安全研究所

### **Jump Addressing**



- □ Jump and link (jal)
  - target uses 20-bit immediate for larger range
- UJ = {{11{inst[31]}}, inst[19:12], inst[20], inst[30:21],1'b0};

  31 1bit 10bits 1bit 8bits 1211 7 6 0 inn[20] imm[10:1] imm[11] imm[19:12] rd Opcode

  20bit 5bit 7bit
  - $\square$  jal x0, 2000 # 2000<sub>10</sub> = (0 00000000 0 111 1101 000<sub>0</sub>)<sub>2</sub>

| 31 1bit | 10bits     | 1bit | 8bits 12 | 7     | 6 0     |
|---------|------------|------|----------|-------|---------|
| 0       | 1111101000 | 0    | 00000000 | 00000 | 1101111 |
|         | 20bit      | 5hit | 7bit     |       |         |

- □ For more long jumps: eg, to 32-bit absolute address
  - lui: load address[31:12] to temp register
  - jalr: add address[11:0] and jump to target



### Show branch offset in machine language



### **■ Example 2.20** P116/p94

C language:

```
while (save[i]==k) i=i+1;
```

RISC-V assembler code in Example 2.12:

```
Loop: slli a0, s6, 3  # temp reg x10 = 8 * i

add a0, a0, s9  # x10 = address of save[i]

ld s1, 0(a0)  # temp reg x9 = save[i]

bne s1, s8, Exit  # go to Exit if save[i] != k

addi s6, s6, 1  # i = i + j

beq zero,zero, Loop # go to Loop

Exit:
```

## **Instructions Addressing and their Offset**



|            |         | instructions Code with Binary |       |       |      |                        | Hex     |          |
|------------|---------|-------------------------------|-------|-------|------|------------------------|---------|----------|
|            | Address | fun7                          | rs2   | rs1   | fun3 | rd/offset              | OP      | 1100     |
| Loop: slli | 80000   | 000000                        | 00011 | 10110 | 001  | 01010                  | 0010011 | 003B1513 |
| add        | 80004   | 0000000                       | 11001 | 01010 | 000  | 01010                  | 0110011 | 01950533 |
| ld         | 80008   | 0000000                       | 00000 | 01010 | 011  | 01001                  | 0000011 | 00053483 |
| bne        | 80012   | 0000000                       | 11000 | 01001 | 001  | 01100                  | 1100011 | 01849663 |
| addi       | 80016   | 0000000                       | 00001 | 10110 | 000  | 10110                  | 0010011 | 001B0B13 |
| beq        | 80020   | 1111111                       | 00000 | 00000 | 000  | <b>.</b> 0110 <b>1</b> | 1100011 | FE0006E3 |
| Exit:      | 80024   | •••••                         |       |       |      |                        |         |          |
|            | •       | -10                           |       |       | _    |                        | 6 ₹0110 |          |
|            |         |                               |       |       |      |                        |         |          |

**-20 = 80000 - 80020** 

PC + offset : 12 = 80024 - 80012

#### Modification:

- □ All RISC-V instructions are 4 bytes long
- PC-relative addressing refers to the number of halfwords
  - The address field at 80012 above should be 6 instead of 12



系统结构与网络安全研究所

### While branch target is far away



- □ Inserts an unconditional jump to target
  - Invert the condition so that the branch decides whether to skip the jump
- Example 2.21 p117: Branching far away
  - Given a branch:

```
beq a0, zero, L1
```

Rewrite it to offer a much greater branching distance:

```
bne a0, zero, L2 jal zero, L1
```



L2:

### **Summary of RISC-V architecture in Ch. 2**



#### RISC-V Instruction Format and Their Operands

| Name           |                                        |                 | Comments          |                                                  |                                              |                    |           |                               |
|----------------|----------------------------------------|-----------------|-------------------|--------------------------------------------------|----------------------------------------------|--------------------|-----------|-------------------------------|
| Field size     | 31                                     | 7bits 25 24     | 5bits 20          | 19 <b>5bits</b> 15                               | 14 3bits 12                                  | 11 <b>5bits</b> 7  | 6 7bits 0 | All RISC-V instruction 32 b   |
| R-type         | fı                                     | ınct7           | rs2               | rs1                                              | funct3                                       | rd                 | opcode    | Arithmetic instruction format |
| I-type         | i                                      | mmediate[1      | 1:0]              | rs1                                              | funct3                                       | rd                 | opcode    | Loads & immediate arithmetic  |
| S-type         | imı                                    | med[11:5]       | rs2               | rs1                                              | funct3                                       | immed[4:0 <i>]</i> | opcode    | Stores                        |
| SB-type        | imn                                    | n[12,10:5]      | rs2               | rs1                                              | funct3                                       | imm[4:1,11]        | opcode    | Conditional branch format     |
| UJ-type        |                                        | immedi          | ate[20,1          | 0:1,11,19:12                                     | 7                                            | rd                 | opcode    | Unconditional jump format     |
| U-type         |                                        | ir              | nmediate          | e[31:12]                                         |                                              | rd                 | opcode    | Upper immediate format        |
| Name           |                                        |                 |                   |                                                  |                                              | erands             |           |                               |
| 32 register    |                                        |                 |                   |                                                  | \$zero, ra, sp, gp, tp, t0-t6, s0~s11, a0~a7 |                    |           |                               |
| Mem word       | S                                      | Men             | nory[0], N        | /lemory[8], Me                                   | y[8], Memory[10], , Memory[18,446,744,07     |                    |           | 3,709,551,608]                |
|                |                                        |                 |                   |                                                  |                                              |                    |           |                               |
| x0(zero)       | )                                      |                 | 0 The constant va |                                                  |                                              | ue 0               |           | n.a.                          |
| x1(ra)         |                                        | 1 Return addre  |                   |                                                  | address(li                                   | nk register)       |           | yes                           |
| x2(sp)         |                                        |                 | 2 Stack pointer   |                                                  |                                              |                    |           | yes                           |
| <i>x</i> 3(gp) |                                        |                 | 3                 | Global                                           | Global pointer                               |                    |           | yes                           |
| x4(tp)         |                                        |                 | 4                 | Thread                                           | Thread pointer                               |                    |           | yes                           |
| x5-x7(t0-t)    | 2)                                     | 5-7 Temporaries |                   |                                                  | raries                                       |                    | no        |                               |
| x8(s0/fp)      | )                                      | 8 Saved/frame p |                   |                                                  | frame poin                                   | t                  | Yes       |                               |
| x9(s1)         | · · · · · ·                            |                 | Saved             |                                                  |                                              |                    | Yes       |                               |
| x10-x17(a0-    | <i>x</i> 10- <i>x</i> 17(a0-a7) 10-17  |                 | Argume            | Arguments/results                                |                                              |                    | no        |                               |
|                | <i>x</i> 18- <i>x</i> 27(s2-s11) 18-27 |                 | Saved             | <del>                                     </del> |                                              |                    | yes       |                               |
| x28-x31(t3-    |                                        | 2               | 28-31             | Tempoi                                           | Temporaries                                  |                    |           | No                            |
| PC             | ,                                      |                 | -                 |                                                  | Auipc(Add Upper Immediate to PC)             |                    |           |                               |

## RISC-V Addressing Summary

统结构与网络安全研究





## **RISC-V Disassembly**



- **Example 2.22 P120: Decoding machine code** 
  - Machine instruction(0x00578833)

#### **Decoding**

□ Determine the operation from opcode
 opcode: 0110011 → R-type arithmetic instruction

| funct7   | rs2   | rs1   | funct3 | rd    | opcode  |
|----------|-------|-------|--------|-------|---------|
| 000 0000 | 00101 | 01111 | 000    | 10000 | 0110011 |
| -        |       | -     | )      |       | /       |

funct7 and funct3 are all  $0 \rightarrow add$  instruction

p107-P119

□ Determine other fields

rs2: x5/t0; rs1: x15/a5; rd: x16/a7

■ Show the assembly instruction:

add a7, a5, t0 (Note: add rd,rs,rt)



系统结构与网络安全研究所

### **Summary of RISC-V instruction encoding**



| Format | Instruction | Opcode  | Funct3 | Funct6/7 |
|--------|-------------|---------|--------|----------|
|        | add         | 0110011 | 000    | 0000000  |
|        | sub         | 0110011 | 000    | 0100000  |
|        | sll         | 0110011 | 001    | 0000000  |
|        | xor         | 0110011 | 100    | 0000000  |
| D type | srl         | 0110011 | 101    | 0000000  |
| R-type | sra         | 0110011 | 101    | 0000000  |
|        | or          | 0110011 | 110    | 0000000  |
|        | and         | 0110011 | 111    | 0000000  |
|        | lr.d        | 0110011 | 011    | 0001000  |
|        | sc.d        | 0110011 | 011    | 0001100  |



### **Summary of RISC-V instruction encoding**



| Format | Instruction | Opcode  | Funct3 | Funct6/7 |
|--------|-------------|---------|--------|----------|
|        | lb          | 0000011 | 000    | n.a.     |
|        | lh          | 0000011 | 001    | n.a.     |
|        | lw          | 0000011 | 010    | n.a.     |
|        | ld          | 0000011 | 011    | n.a.     |
|        | lbu         | 0000011 | 100    | n.a.     |
|        | lhu         | 0000011 | 101    | n.a.     |
|        | lwu         | 0000011 | 110    | n.a.     |
| l-type | addi        | 0010011 | 000    | n.a.     |
|        | slli        | 0010011 | 001    | 000000   |
|        | xori        | 0010011 | 100    | n.a.     |
|        | srli        | 0010011 | 101    | 000000   |
|        | srai        | 0010011 | 101    | 010000   |
|        | ori         | 0010011 | 110    | n.a.     |
|        | andi        | 0010011 | 111    | n.a.     |
|        | jalr        | 1100111 | 000    | n.a.     |



### **Summary of RISC-V instruction encoding**



| Format  | Instruction | Opcode  | Funct3 | Funct6/7 |
|---------|-------------|---------|--------|----------|
|         | sb          | 0100011 | 000    | n.a.     |
| C tuno  | sh          | 0100011 | 001    | n.a.     |
| S-type  | SW          | 0100011 | 010    | n.a.     |
|         | sd          | 0100011 | 111    | n.a.     |
|         | beq         | 1100111 | 000    | n.a.     |
|         | bne         | 1100111 | 001    | n.a.     |
| SP type | blt         | 1100111 | 100    | n.a.     |
| SB-type | bge         | 1100111 | 101    | n.a.     |
|         | bltu        | 1100111 | 110    | n.a.     |
|         | bgeu        | 1100111 | 111    | n.a.     |
| U-type  | lui         | 0110111 | n.a.   | n.a.     |
| UJ-type | jal         | 1101111 | n.a.   | n.a.     |



## 2.11 Synchronization in RISC-V



- **□** Two processors sharing an area of memory
  - P1 writes, then P2 reads
  - Data race if P1 and P2 don't synchronize
    - Result depends of order of accesses

#### **□** Hardware support required

- Atomic read/write memory operation
- No other access to the location allowed between the read and write

### **□** Could be a single instruction

- E.g., atomic swap of register  $\leftrightarrow$  memory
- Or an atomic pair of instructions



## **Synchronization in RISC-V**



- □ Load reserved: lr.d rd, (rs1)
  - Load from address in rs1 to rd
  - Place reservation on memory address
- □ Store conditional: sc.d rd, (rs1), rs2
  - Store from rs2 to address in rs1
  - Succeeds if location not changed since the lr.d
    - □ Returns 0 in rd
  - Fails if location is changed
    - □ Returns non-zero value in rd

### **Synchronization in RISC-V**



**■** Example 1: atomic swap (to test/set lock variable)

```
again: lr.d x10,(x20)
sc.d x11,(x20),x23 // x11 = status
bne x11,x0,again // branch if store failed
addi x23,x10,0 // x23 = loaded value
```

#### **■ Example 2: lock**

```
addi x12,x0,1 // copy locked value
again: lr.d x10,(x20) // read lock
bne x10,x0,again // check if it is 0 yet
sc.d x11,(x20),x12 // attempt to store
bne x11,x0,again // branch if fails
```

#### □ Unlock:

```
sd x0.0(x20) // free lock
```



### 2.12 Translating and starting a program







### Producing an Object Module



- Assembler (or compiler) translates program into machine instructions
- □ Provides information for building a complete program from the pieces
  - Header: described contents of object module
  - Text segment: translated instructions
  - Static data segment: data allocated for the life of the program
  - Relocation info: for contents that depend on absolute location of loaded program
  - Symbol table: global definitions and external refs
  - Debug info: for associating with source code





| Object file head       | ler       |                    |            |
|------------------------|-----------|--------------------|------------|
|                        | Name      | Procedure A        |            |
|                        | Text size | 100 <sub>hex</sub> |            |
|                        | Data size | 20 <sub>hex</sub>  |            |
| Text segment           | Address   | instruction        |            |
|                        | 0         | ld x10, 0(gp)      |            |
|                        | 4         | jal x1, 0          |            |
|                        |           |                    |            |
| Data segment           | 0         | (X)                |            |
|                        |           |                    |            |
| Relocation information | Address   | Instruction type   | Dependency |
|                        | 0         | ld                 | X          |
|                        | 4         | jal                | В          |
| Symbol table           | label     | Address            |            |
|                        | х         |                    |            |
|                        | В         |                    |            |



### Link



- Object modules(including library routine) → executable program
- 3 step of Link
  - Place code and data modules symbolically in memory
  - Determine the addresses of data and instruction labels
  - Patch both the internal and external references (Address of invoke)





### Loading a Program



### **■** Load from image file on disk into memory

- 1. Read header to determine segment sizes
- 2. Create virtual address space
- 3. Copy text and initialized data into memory
  - □ Or set page table entries so they can be faulted in
- 4. Set up arguments on stack
- 5. Initialize registers (including sp, fp, gp)
- 6. Jump to startup routine
  - □ Copies arguments to x10, ... and calls main
  - When main returns, do exit syscall



## **Dynamic Linking**



# □ Only link/load library procedure when it is called

- Requires procedure code to be relocatable
- Avoids image bloat caused by static linking of all (transitively) referenced libraries
- Automatically picks up new library versions

## Lazy Linkage



Indirection table

Stub: Loads routine ID, Jump to linker/loader

Linker/loader code

Dynamically mapped code



(a) First call to DLL routine



(b) Subsequent calls to DLL routine



### **Starting Java Applications**







### 2.13 A C Sort Example To Put it All Together



- □ Three general steps for translating C procedures
  - Allocate registers to program variables
  - Produce code for the body of the procedures
  - Preserve registers across the procedures invocation

#### □ Procedure swap

```
C code
void swap (long long v[], size_t k)
{
    long lon temp;
    temp = v[k];
    v[k] = v[k+1];
    v[k+1] = temp;
```



### The Procedure Swap



Register allocation for swap

```
v ---- x10 k ---- x11 temp ---- x5
```

- swap is a leaf procedure, nothing to preserve
- RISC-V code for the procedure swap

#### **□** Procedure body

### The Sort Procedure in C



#### □ Procedure *sort*

C code

```
void sort (long long v[], size_t n)
{
    size_t i, j;
    for (i = 0; i < n; i+= 1) {
        for (j = i - 1; j >= 0 && v[j] > v[j+1]; j == 1)
            swap (v, j);
    }
}
```

Register allocation for sort

- Passing parameters in sort
- Preserving registers in sort x1, x19, x20, x21, x22



### The Outer Loop



#### **□** Skeleton of outer loop:

```
• for (i = 0; i < n; i += 1)
  1i \quad x19,0 \quad // i = 0
for1tst:
  bge x19,x11,exit1 // go to exit1 if x19 \geq x11 (i\geqn)
  (body of outer for-loop)
  addi x19, x19, 1 // i += 1
       for1tst
                       // branch to test of outer loop
exit1:
```

### The Inner Loop



#### **□** Skeleton of inner loop:

```
• for (i = i - 1; i >= 0 \&\& v[i] > v[i + 1]; i -= 1)
     addi x20, x19, -1 // j = i -1
for2tst:
   blt x20,x0,exit2 // go to exit2 if x20 < 0 (j < 0)
   slli x5,x20,3 // reg x5 = j * 8
   add x5,x10,x5 // reg x5 = v + (j * 8)
   1d x6,0(x5) // reg x6 = v[j]
   1d x7,8(x5) // reg x7 = v[j + 1]
   ble x6,x7,exit2 // go to exit2 if x6 \leq x7
   mv x21, x10 // copy parameter x10 into x21
   mv x22, x11 // copy parameter x11 into x22
   mv x10, x21 // first swap parameter is v
        x11, x20 // second swap parameter is j
   mv
     jal x1, swap // call swap
      addi x20, x20, -1 // j -= 1
          for2tst // branch to test of inner loop
 exit2:
```

### **Preserving Registers**



#### **□** Saving registers

```
sort: addi sp, sp, -40 // make room on stack for 5 registers sd x1, 32(sp) // save return address on stack sd x22, 24(sp) // save x22 on stack sd x21, 16(sp) // save x21 on stack sd x20, 8(sp) // save x20 on stack sd x19, 0(sp) // save x19 on stack
```

- Procedure body{Outer loop {Inner loop} }
- **□** Restoring registers

```
exit1: ld x19, 0(sp) // restore x19 from stack
ld x20, 8(sp) // restore x20 from stack
ld x21, 16(sp) // restore x21 from stack
ld x22, 24(sp) // restore x22 from stack
ld x1, 32(sp) // restore return address from stack
addi sp, sp, 40 // restore stack pointer
```

#### ■ Procedure return

```
jalr x0, 0(x0) // return to calling routine
```

## 2.14 Arrays versus Pointers



- **□** Array indexing involves
  - Multiplying index by element size
  - Adding to array base address
- □ Pointers correspond directly to memory addresses
  - Can avoid indexing complexity

### **Example: Clearing an Array**



```
clear1(int array[], int size) {
                                        clear2(int *array, int size) {
 int i:
                                          int *p:
 for (i = 0; i < size; i += 1)
                                          for (p = \&array[0]; p < \&array[size];
   array[i] = 0;
                                               p = p + 1
                                            p = 0;
       x5.0 // i = 0
  lί
                                           mv x5, x10 // p = address
loop1:
                                                          // of array[0]
  x6, x5, 3 // x6 = i * 8
                                           slli x6, x11, 3 // x6 = size * 8
  add x7,x10,x6 // x7 = address
                                           add x7, x10, x6 // x7 = address
                 // of array[i]
                                                          // of array[size]
  x0,0(x7) // array[i] = 0
                                        loop2:
                                           sd x0,0(x5) // Memory[p] = 0
  addi x5, x5, 1 // i = i + 1
  blt x5,x11,loop1 // if (i<size)
                                           addi x5, x5, 8 // p = p + 8
                                           bltu x5,x7,loop2
                     // go to loop1
                                                          // if (p<&array[size])</pre>
                                                          // go to loop2
```

## Comparison of Array vs. Ptr



- Multiply "strength reduced" to shift
- □ Array version requires shift to be inside loop
  - Part of index calculation for incremented i
  - c.f. incrementing pointer
- □ Compiler can achieve same effect as manual use of pointers
  - Induction variable elimination
  - Better to make program clearer and safer



# 2.16 Real Stuff: MIPS Instructions



- **MIPS: commercial predecessor to RISC-V**
- **□** Similar basic set of instructions
  - 32-bit instructions
  - 32 general purpose registers, register 0 is always 0
  - 32 floating-point registers
  - Memory accessed only by load/store instructions
    - □ Consistent use of addressing modes for all data sizes

### **□** Different conditional branches

- For <, <=, >, >=
- RISC-V: blt, bge, bltu, bgeu
- MIPS: slt, sltu (set less than, result is 0 or 1)
  - □ Then use beq, bne to complete the branch



# **Instruction Encoding**



| _ |     |     |    |     |     |     |    |
|---|-----|-----|----|-----|-----|-----|----|
| ĸ | ല   | 191 | ei | -re | ומי | IST | ei |
|   | ~ ~ |     |    |     |     | -   | •  |

|        | 31        | 25 2  | 24 20  | 19     | 15 | 14 12     | 11      | 7  | 6 |           | 0 |
|--------|-----------|-------|--------|--------|----|-----------|---------|----|---|-----------|---|
| RISC-V | funct7(7) |       | rs2(5) | rs1(5) |    | funct3(3) | rd(5)   |    |   | opcode(7) |   |
|        | 31        | 26 25 | 21 20  | 16     | 15 |           | 11 10   |    | 6 | 5         | 0 |
| MIPS   | Op(6)     |       | Rs1(5) | Rs2(5) |    | Rd(5)     | Const(5 | 5) |   | Opx(6)    |   |

#### Load

|        | 31    |          | 20 | 19 1   | 5 14   | 12 11 | 7       | 6  |           | 0 |
|--------|-------|----------|----|--------|--------|-------|---------|----|-----------|---|
| RISC-V | immed | iate(12) |    | rs1(5) | funct3 | (3)   | rd(5)   |    | opcode(7) |   |
|        | 31 26 | 25 21    | 20 | 16 1   | 5      |       |         |    |           | 0 |
| MIPS   | Op(6) | Rs1(5)   |    | Rs2(5) |        |       | Const(1 | 6) |           |   |

#### **Store**

|        | 31           | 25 24  | 20 19  | 15 14 12  | 11 7         | 6 0       |
|--------|--------------|--------|--------|-----------|--------------|-----------|
| RISC-V | immediate(7) | rs2(5) | rs1(5) | funct3(3) | immediate(5) | opcode(7) |
|        | 31 26        | 25 21  | 20 16  | 15        |              | 0         |
| MIPS   | Op(6)        | Rs1(5) | Rs2(5) |           | Const(16     | 5)        |

#### **Branch**

|        | 31           | 25 24  | 20 19      | 15 14 12  | 11 7         | 6 0       |
|--------|--------------|--------|------------|-----------|--------------|-----------|
| RISC-V | immediate(7) | rs2(5) | rs1(5)     | funct3(3) | immediate(5) | opcode(7) |
|        | 31 26        | 25 21  | 20 16      | 15        |              | 0         |
| MIPS   | Op(6)        | Rs1(5) | Opx/Rs2(5) |           | Const(16     | 6)        |



# 2.17 Real Stuff: The Intel x86 ISA



#### **■** Evolution with backward compatibility

- 8080 (1974): 8-bit microprocessor
  - Accumulator, plus 3 index-register pairs
- 8086 (1978): 16-bit extension to 8080
  - □ Complex instruction set (CISC)
- 8087 (1980): floating-point coprocessor
  - Adds FP instructions and register stack
- 80286 (1982): 24-bit addresses, MMU
  - Segmented memory mapping and protection
- 80386 (1985): 32-bit extension (now IA-32)
  - Additional addressing modes and operations
  - □ Paged memory mapping as well as segments



# The Intel x86 ISA



#### **□** Further evolution...

- i486 (1989): pipelined, on-chip caches and FPU
  - □ Compatible competitors: AMD, Cyrix, ...
- Pentium (1993): superscalar, 64-bit datapath
  - Later versions added MMX (Multi-Media eXtension) instructions
  - □ The infamous FDIV bug
- Pentium Pro (1995), Pentium II (1997)
  - New microarchitecture (see Colwell, *The Pentium Chronicles*)
- Pentium III (1999)
  - Added SSE (Streaming SIMD Extensions) and associated registers
- Pentium 4 (2001)
  - New microarchitecture
  - Added SSE2 instructions



## The Intel x86 ISA



#### □ And further...

- AMD64 (2003): extended architecture to 64 bits
- EM64T Extended Memory 64 Technology (2004)
  - AMD64 adopted by Intel (with refinements)
  - Added SSE3 instructions
- Intel Core (2006)
  - □ Added SSE4 instructions, virtual machine support
- AMD64 (announced 2007): SSE5 instructions
  - □ Intel declined to follow, instead...
- Advanced Vector Extension (announced 2008)
  - Longer SSE registers, more instructions
- ☐ If Intel didn't extend with compatibility, its competitors would!
  - Technical elegance ≠ market success



# Basic x86 Registers





# Basic x86 Addressing Modes



## **■** Two operands per instruction

| Source/dest operand | Second source operand |
|---------------------|-----------------------|
| Register            | Register              |
| Register            | Immediate             |
| Register            | Memory                |
| Memory              | Register              |
| Memory              | Immediate             |

### **■** Memory addressing modes

- Address in register
- $\blacksquare$  Address =  $R_{base}$  + displacement
- Address =  $R_{base} + 2^{scale} \times R_{index}$  (scale = 0, 1, 2, or 3)
- Address =  $R_{base} + 2^{scale} \times R_{index} + displacement$



# **x86 Instruction Encoding**





# □ Variable length encoding

- Postfix bytes specify addressing mode
- Prefix bytes modify operation
  - Operand length, repetition, locking, ...

# **Implementing IA-32**



# □ Complex instruction set makes implementation difficult

- Hardware translates instructions to simpler microoperations
  - Simple instructions: 1–1
  - □ Complex instructions: 1—many
- Microengine similar to RISC
- Market share makes this economically viable
- **□** Comparable performance to RISC
  - Compilers avoid complex instructions



# 2.18 Other RISC-V Instructions



# **□** Base integer instructions (RV64I)

- Those previously described, plus
- auipc rd, immed // rd = (imm<<12) + pc</li>
   follow by jalr (adds 12-bit immed) for long jump
- slt, sltu, slti, sltui: set less than (like MIPS)
- addw, subw, addiw: 32-bit add/sub
- sllw, srlw, srlw, slliw, srliw, sraiw: 32-bit shift

#### □ 32-bit variant: RV32I

registers are 32-bits wide, 32-bit operations



# **Instruction Set Extensions**



- M: integer multiply, divide, remainder
- **□** A: atomic memory operations
- □ F: single-precision floating point
- **□ D**: double-precision floating point
- □ C: compressed instructions
  - 16-bit encoding for frequently used instructions

## **Fallacies**



### $\square$ Powerful instruction $\Rightarrow$ higher performance

- Fewer instructions required
- But complex instructions are hard to implement
  - May slow down all instructions, including simple ones
- Compilers are good at making fast code from simple instructions

### **□** Use assembly code for high performance

- But modern compilers are better at dealing with modern processors
- More lines of code  $\Rightarrow$  more errors and less productivity



# **Fallacies**



# ■ Backward compatibility ⇒ instruction set doesn't change

But they do accrete more instructions



x86 instruction set



# **Pitfalls**



- Sequential words are not at sequential addresses
  - Increment by 4, not by 1!
- □ Keeping a pointer to an automatic variable after procedure returns
  - e.g., passing pointer back via an argument
  - Pointer becomes invalid when stack popped

# Summary



## **□** Design principles

- 1. Simplicity favors regularity
- 2. Smaller is faster
- 3. Good design demands good compromises
- Make the common case fast
- **■** Layers of software/hardware
  - Compiler, assembler, hardware
- **□ RISC-V:** typical of RISC ISAs
  - c.f. x86



## Homework



 $\square$  2.4, 2.8, 2.12, 2.14, 2.17, 2.22, 2.24, 2.29