

### University College London

DEPARTMENT OF ELECTRONIC AND ELECTRICAL ENGINEERING

## Performance characterisation of 8-bit RISC and OISC architectures

Author:
Mindaugas

JARMOLOVIČIUS
zceemja@ucl.ac.uk

Supervisor:
Prof. Robert
KILLEY
r.killey@ucl.ac.uk

Second Assessor:
Dr. Ed
ROMANS
e.romans@ucl.ac.uk

A BEng Project Final Report

 $March\ 27,\ 2020$ 

## 1 Abstract

### 2 Introduction

Since the 70s there has been a rise of many processor architectures that try to fulfil specific performance and power application constraints. One of more noticeable cases are ARM's RISC architecture being used in mobile devices instead of the more popular and robust x86 CISC (Complex Instruction Set Computer) architecture in favour of simplicity, cost and lower power consumption [11, 3]. It has been shown that in low power applications, such as IoTs (Internet of Things), OISC implementation can be superior in power and data throughput comparing to traditional RISC architectures [22, 2]. This project proposes to compare two novel RISC and OISC 8bit architectures and compare their performance, design complexity and efficiency.

Project 3 aims:

- 1. Compare how well OISC MOVE architecture would perform in low performance microcontroller application comparing to equivalent RISC architecture.
- 2. Study and explore RISC and OISC architectures.
- 3. View an alternative method of using OISC MOVE as SISO (single instruction, single operation) architecture.

The project has 3 main objectives:

- 1. Design and build a RISC based processor. As this is aimed for low power and performance applications it will be 8bit word processor with 4 general purpose registers, structure is similar to MIPS.
- 2. Design and build an OISC based processor. There are many different types of OISC processor, MOVE variant has been selected which is described in Referencessec:theory chapter. This type of OISC variant is also known as TTA (Transport Triggered Architecture).

3. Design a fair benchmark that both processors could execute. This benchmark include different algorithms that are commonly used in controllers, IoT devices or similar low power microprocessor applications.

## 3 Goals and Objective

This is just a list of research papers and relative context:

[19] - TTA instruction redundancy remoal method with base plus offset addressing load/store function unit (LSFU) [21] - TTA code compression using arithmetic coding [1] - Novel processor for Multiple Instruction Multiple Data packet triggered architecture for pipeline and parallel processing. [8] - Another template based compression method to improve code density

[18] - Scalable FIR filtering on TTA [5] - MOVE32INT TTA implementation. Achieved parallel processing with 80MHz 320Mops/s comparing to RISC 20MHz 20Mops/s. Includes automated design [16] - Parallel programming of a TTA for LDPC encoding application [6] - TTA for encryption specific ASIP [12] - Focuses on software pipelining and solved with GNU Linear Programming Kit (Very interesting) [15] - Try to reduce power by encoding buses thus reducing switching (read a bit more) [20] - Reducing VLIW interconnects to achieve 10% core energy in 4-issue VLIW [14] - Implementing TTA for SDR and focuses on power optimisations. It show 24.8-26.1% decrease in power consumption with 3.3% area increase. [13] - Using soft cores in comparision to VLIW to have 67% of resources with up to 88% improvement in execution time and 21-49% cost in program size. Low power implementation TTA for FFT [17] - Compressive Sensing Applications on ARM Cortex-A15, NIOS II and TTA architectures. TTA has lowest time and power consumption, however about 2.5 higher area to NIOS II

[10] - Implemented TTA that is efficient on RSA calculations, 3 1024bit pairs/s at 100MHz [4] - ASIP TTA for matrix inversion. [23] - Introduce Test space exploration costs for TTA templates. [7] - A novel microachitecture that combines VLIW and TTA for different applications. Takes less area than existing TTA and VLIW [9] - Instruction template based compression method for TTA processors

# 4 Theory and Analytical Bases

RISC that this paper will be exploring is classical SISO (single instruction, single operation) processor. TTAs are usually of type SIMT (single instruction, multiple transports) [5]; A middle between these two classes is SIMO type (single instruction, multiple operation)

#### Decided design criteria:

- Minimal instruction size
- Minimalistic design

## 5 Technical Method

This section describes methods and design choices used to construct two processors.

#### 5.1 Machine Code

#### 5.1.1 RISC

As the aim of instruction size to be as minimal as possible, RISC instruction decided to be 8bits with optional additional immediate value from 1 to 3 bytes. Immediate values are explained in section 5.4.

Decision was made to have instruction compose of operation code two operands - source/destination and source, which is similar to x86 architecture rather than MIPS. Three possible combinations of register address sizes are possible in such case from one to three bits. Two was selected as

it allow having four general purpose registers which is sufficient for most applications, and allow four bits for operation code - allowing up to 16 instructions.

Due to small amount of available operation codes and not all instructions requiring two operands (for example JUMP instruction may not need any operands or could use one operand to have address offset), other two type instructions are added to the design - with one and zero operands. See figure 5.1.1. This enabled processor to have 45 different instructions while maintaining minimal instruction size. Final design has:

- 8 2-operand instructions
- 32 1-operand instructions
- **5** 0-operand instructions

Full list of RISC instructions are listed in table 9.1.1 in Appendix section.



Figure 5.1.1: RISC instructions composition. Number inside box represents bit index. Destination (dst.) bits represents of source and destination register address.

#### 5.1.2 OISC

As OISC requires only a single instruction, composition of instruction mainly requires two parts - source and destination. To allow higher instruction flexibility a immediate bit has been added to replace source address by immediate value. Composition of finalised machine code is shown in figure 5.1.2.



Figure 5.1.2: OISC instruction composition. Number inside box represents bit index.

Decision was made to have source address to be eight bits to allow it be replaced with immediate value. Destination address was chosen to be as minimal as possible, leaving only four bits or 16 possible destinations. Final design has 15 destination and 41 source addresses. This is not the most space efficient design as 41 source addresses would require only six bits for address, wasting two bits every time non-immediate source is used.

Full list of OISC sources and destinations are listed in table 9.1.2 in Appendix section.

## 5.2 Arithmetic Logic Unit

This section will discuss ALU implementations of both processors. For fair comparison between OISC and RISC, ALU in both system will have the same capabilities described in table 5.2.1.

| Name | Description                      |
|------|----------------------------------|
| ADD  | Arithmetic addition (inc. carry) |
| SUB  | Arithmetic subtraction (inc.     |
|      | carry)                           |
| AND  | Bitwise AND                      |
| OR   | Bitwise OR                       |
| XOR  | Bitwise XOR                      |
| SLL  | Shift left logical               |
| SRL  | Shift right logical              |
| ROL  | Shifted carry from previous SLL  |
| ROR  | Shifted carry from previous SRL  |
| MUL  | Arithmetic multiplication        |
| DIV  | Arithmetic division              |
| MOD  | Arithmetic modulus               |

Table 5.2.1: Supported ALU commands for both processors

#### 5.2.1 OISC

Due to the structure of OISC processor, ALU source A and B are two latches that are written into when ALU0 or ALU1 destination address is present. ALU sources are connected with every ALU operator and



Figure 5.2.1: Digital diagram of OISC partial ALU logic

performed in single clock cycle. This value is stored in register so that it would immediately available in a next clock cycle as a source data. Figure 5.2.1 represents logic diagram of ALU with only addition and multiplication operators present. Note that output of EQ3 is connected to enable of REG3, enabling output of carry to be only read after ADD source is requested. This previous source memory is also used for SUB, ROL and ROR operations. This allows processor to perform other operations such as store or load values, before accessing carry bit, or carried byte for ROL and ROR operations.

#### 5.2.2 RISC

RISC processor has very similar structure to OISC with two exceptions. Inputs to ALU comes from logic router that decided how to route data in datapath. Output buffers are replaced by one multiplexer that selects single output from all ALU operations. Another point is that RISC ALU output is 16bit, higher byte saved in "ALU high byte register" for MUL, MOD, ROL and ROR operations. This register is accessible with GETAH instruction.

### 5.3 Memory

This section describes how instruction memory (ROM) is implemented for both processors.

#### 5.3.1 RISC

In order to allow dynamic instruction size from one to four bytes a special memory arrangement is made. A system was required to access word (8bits) from memory and next three words. To achieve this four ROM blocks been utilised, each containing one fourth of sliced original data. Input address is offset by adders *ADDER1-3* and further divided by four by removing two least significant bits at **addr0-3**. Before concatenating output of each ROM block

into final four bytes, ROM outputs  $\mathbf{q0-3}$  are rearranged depending on  $\mathbf{ar}$  signal. Note that MUX1-4 each input is different, this may be better visualised with Verilog code in listing 1.

Listing 1: RISC sliced ROM memory multiplexer arrangement Verilog code

```
case(ar)
2'b00: data={q3,q2,q1,q0};
2'b01: data={q0,q3,q2,q1};
2'b10: data={q1,q0,q3,q2};
2'b11: data={q2,q1,q0,q3};
endcase
```

#### 5.3.2 OISC

OISC instructions are fixed 13 bits, which causes different problems to RISC sliced memory - non-standard memory word size. To implement ROM in FPGA, Altera Cyclone IV M9K memory configurable blocks were used. Each blocks as 9kB of memory each allowing 1024x9bit configuration. Combining three of such blocks together yields 27bits if readable data in single clock cycle. To store instruction code to such configuration, pairs of instruction machine code sliced into three parts plus one bit for parity check, see figure 5.3.2. Circuit extracting each instruction is fairly simple, shown in figure 5.3.3.

## 5.4 Instruction decoding

This section describes RISC and OISC differences between instruction decoding and immediate value handling.

#### 5.4.1 RISC

Already described in previous section 5.3, instruction from memory comes as 4 bytes. Least significant byte is sent to control block, other three bytes are sent to immediate override block (IMO) shown in figure 5.4.1. These three bytes are labelled as **immr**.



Figure 5.3.1: Digital diagram of RISC sliced ROM memory logic

IMO block is a solution to change immediate value which enabled dynamically calculated memory pointers, branches dependant on register value or any other function that needs instruction immediate value been replaced by calculated register value. IMO is controlled by control block and cdi.imoctl signal, which is changed by CIO, CI1 and CI2 instructions. When signal is 0h, this block is transparent connecting immr directly to imm. When any of CI instructions executed, one of IMO register is overridden by reg1 value from register file. In order to override two or three bytes of immediate, CI instructions need to be executed in order. Only for one next instruction after last CI will have immediate bytes changed depending on what are values in IMO registers.

This circuit has two disadvantages:

1. Overriding immediate bytes takes one or more clock cycles,

2. At override, **immr** bytes are ignored therefore they are wasting instruction memory space.

Second point can be resolved by designing a circuit that would subtract the amount of overridden IMO bytes from pc\_off signal (program counter offset that is dependant on i-size value) at the program counter, thus effectively saving instruction memory space. This solution however would introduce a complication with the assembler as additional checks would need to be done during compiling to check if IMO instruction are used.

#### 5.4.2 **OISC**

OISC immediate value is set in instruction decoder shown in figure 5.4.2. Decoder operation is simple - instruction machine code is split into three parts as described in 5.1.2. If instruction source address is 00h, connect data bus with constant 0 via MUX2.



Figure 5.3.2: OISC three memory words composition. Number inside box represents bit index.



Figure 5.3.3: Digital diagram of OISC instruction ROM logic



Figure 5.4.1: Digital diagram of RISC immediate override system



Figure 5.4.2: Digital diagram of OISC instruction decoder

If immediate bit is 1, set source address to 00h (to make sure no other buffer source connects to data bus), and connect instruction source address (immediate value) to databus via MUX2 and BUF1.

## 6 Results and Analysis

# 6.1 FPGA logic component composition

This subsection looks at test and its results to find how much FPGA logic components each processor takes and what is composition of each part.

Test was performed with Quartus synthesis tool and viewing flow summary report. This report includes synthesised design metrics including total logic elements, registers, memory bits and other FPGA resources. Test will only look at logic elements and registers. Total number of logic elements was found out by synthesising full processors, then commenting relevant parts of code, re-synthesising and viewing changes in total logic elements. Such method may not be the most accurate, because during HDL synthesis circuit is optimised an unused connections removed. This means that more logic may be not synthesised than intended.

There are four parts of each processor that will be tested:

- 1. Common processor auxiliary logic that is used by both processors. It includes communication block with UART, RAM and PLL (Phase-Locked Loop, for master clock generation).
- 2. **ALU** as described in section 5.2, both processors have slightly different implementation of ALU.
- 3. **Memory** processors memory management, including stack.
- 4. Other reminding logic of processor that was not analysed.



Figure 6.1.1: Bar graph of FPGA logic components taken by each processor.

Results of a test are shown in figures 6.1.1 and 6.1.2. Common logic uses 293 logic elements and 170 registers. OISC uses 1705 logic elements, while RISC uses 3218. Excluding common logic, OISC takes 48.3% of RISC's logic elements.



Figure 6.1.2: Bar graph of FPGA register resources taken by each processor.

OISC uses 726 logic elements, while RISC uses 407. Excluding common logic, OISC uses 78.4% more registers than RISC.

Looking at composition, OISC ALU takes 30.2% more logic gates. Looking at figure 6.1.2, high number of OISC ALU registers can be observed which concludes, that higher resource usage is OISC ALU code include buffer logic.

Memory logic elements composition of OISC is only 34.4% of RISC's and 7% lower Function composition register resources, comparing to RISC. following code:

This indicate that by removing memory logic for RISC, synthesis tool may removed also other parts of processor, possibly part of control block because it mostly contains combinational logic.

Other logic includes instruction decoding with ROM, register file, program counter. RISC exclusively has control block. Note that OISC uses only three ROM memory blocks whereas RISC uses four as explained in section 5.3, however this should make a minimal difference as M9K memory blocks are not included in FPGA logic element or register count. Comparing both processors, OISC has only 37% of other logic components to RISC, however it has 2.28 times more registers. This shows a logic component - register trade-off. OISC buffer and common registers logic that connects bus require many more registers whereas RISC uses combination logic in control block in order to control same data in datapath.

Much higher logic components in RISC can be also explained more complicated register file, ROM memory logic and program counter. All of these components has some additional logic for timing correction or additional functionality required by these blocks integration into datapath.

## 6.2 Benchmark Programs

#### 6.2.1 Number of instructions

#### 6.2.2 Instruction composition

Function composition was executed with following code:

Listing 2: RISC assembly frame for executring tests

```
setup:

JUMP .start
.done:

JUMP .done
.start:

; Setup values
; Call function
JUMP .done
```

Listing 3: OISC assembly frame for executring tests

```
setup:

BR1 .start @1

BR0 .start @0

BRZ 0x00

.done:

BRZ 0x00

.start:

; Setup values
; Call function

BR1 .done @1

BR0 .done @0

BRZ 0x00
```

#### 6.3 Maximum clock frequency

6.4

### 7 Conclusion

#### 8 References

## References

[1] F. Adamec and T. Fryza. "Introduction to the new Packet Triggered Architecture for pipelined and parallel data processing". In: *Proceedings of 21st International Conference Radioelektronika 2011*. 2011, pp. 1–4.

- [2] Tanvir Ahmed et al. "Synthesizable-from-C Embedded Processor Based on MIPS-ISA and OISC". In: 2015. DOI: 10.1109/euc.2015.23.
- [3] E. Blem, J. Menon, and K. Sankaralingam. "Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures". In: 2013. DOI: 10.1109/hpca.2013.6522302.
- [4] A. Burian, P. Salmela, and J. Takala. "Complex fixed-point matrix inversion using transport triggered architecture". In: 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05). 2005, pp. 107–112.
- [5] H. Corporaal. "Design of transport triggered architectures". In: *Proceedings of 4th Great Lakes Symposium on VLSI*. 1994, pp. 130–135.
- [6] P. Hamalainen et al. "Implementation of encryption algorithms on transport triggered architectures". In: ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196). Vol. 4. 2001, 726–729 vol. 4.
- [7] S. Hauser, N. Moser, and B. Juurlink. "SynZEN: A hybrid TTA/VLIW architecture with a distributed register file". In: *NORCHIP 2012*. 2012, pp. 1–4.
- [8] J. Heikkinen et al. "Evaluating template-based instruction compression on transport triggered architectures". In: The 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications, 2003. Proceedings. 2003, pp. 192–195.
- [9] J. Helkala et al. "Variable length instruction compression on Transport Triggered Architectures". In: 2014 International Conference on Embedded Computer Systems: Architectures,

- Modeling, and Simulation (SAMOS [17] XIV). 2014, pp. 149–155.
- [10] J. Hu et al. "A Novel Architecture for Fast RSA Key Generation Based on RNS". In: 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming. 2011, pp. 345–349.
- [11] T. Jamil. "RISC versus CISC". In: vol. 14. 3. 1995, pp. 13–16. DOI: 10. 1109/45.464688.
- [12] L. Jiang, Y. Zhu, and Y. Wei. "Software Pipelining with Minimal Loop Overhead on Transport Triggered Architecture". In: 2008 International Conference on Embedded Software and Systems. 2008, pp. 451–458.
- [13] P. Jääskeläinen et al. "Transport-Triggered Soft Cores". In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 2018, pp. 83–90.
- [14] J. Multanen et al. "Power optimizations for transport triggered SIMD processors". In: 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). 2015, pp. 303–309.
- [15] T. Pionteck et al. "Hardware evaluation of low power communication mechanisms for transport-triggered architectures". In: 14th IEEE International Workshop on Rapid Systems Prototyping, 2003. Proceedings. 2003, pp. 141–147.
- [16] B. Rister et al. "Parallel programming of a symmetric transport-triggered architecture with applications in flexible LDPC encoding". In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014, pp. 8380–8384.

- M. Safarpour, I. Hautala, and O. Silvén. "An Embedded Programmable Processor for Compressive Sensing Applications". In: 2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC). 2018, pp. 1–5.
- [18] P. Salmela et al. "Scalable FIR filtering on transport triggered architecture processor". In: International Symposium on Signals, Circuits and Systems, 2005. ISSCS 2005. Vol. 2. 2005, 493–496 Vol. 2.
- [19] Su Wang et al. "An instruction redundancy removal method on a transport triggered architecture processor". In:

  Proceedings of the 2009 12th International Symposium on Integrated Circuits. 2009, pp. 602–604.
- [20] T. Viitanen et al. "Heuristics for greedy transport triggered architecture interconnect exploration". In: 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 2014, pp. 1–7.
- [21] J. Wei et al. "Program Compression Based on Arithmetic Coding on Transport Triggered Architecture". In: 2008 International Conference on Embedded Software and Systems Symposia. 2008, pp. 126–131.
- [22] Minato Yokota, Kaoru Saso, and Yuko Hara-Azumi. "One-instruction set computer-based multicore processors for energy-efficient streaming data processing". In: 2017. DOI: 10. 1145/3130265.3130318.
- [23] V. A. Zivkovic, R. J. W. T. Tangelder, and H. G. Kerkhoff. "Design and test space exploration of transport-triggered architectures". In: *Proceedings Design, Automation and Test in Europe Conference and Exhibi-*

tion 2000 (Cat. No. PR00537). 2000, pp. 146–151.

[24] J. ádník and J. Takala. "Low-power Programmable Processor for Fast Fourier Transform Based on Transport Triggered Architecture". In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019, pp. 1423–1427.

## 9 Appendix

## 9.1 Processor instruction set tables

Table 9.1.1: Instruction set for RISC processor. \* Required immediate size in bytes

| Instr. | Description                                          | I-size * |  |  |
|--------|------------------------------------------------------|----------|--|--|
|        | 2 register instructions                              |          |  |  |
| MOVE   | Copy value from one register to other                | 0        |  |  |
| ADD    | Arithmetical addition                                | 0        |  |  |
| SUB    | Arithmetical subtraction                             | 0        |  |  |
| AND    | Logical AND                                          | 0        |  |  |
| OR     | Logical OR                                           | 0        |  |  |
| XOR    | Logical XOR                                          | 0        |  |  |
| MUL    | Arithmetical multiplication                          | 0        |  |  |
| DIV    | Arithmetical division (inc. modulus)                 | 0        |  |  |
|        | 1 register instructions                              |          |  |  |
| COPY0  | Copy intimidate to a register 0                      | 1        |  |  |
| COPY1  | Copy intimidate to a register 1                      | 1        |  |  |
| COPY2  | Copy intimidate to a register 2                      | 1        |  |  |
| COPY3  | Copy intimidate to a register 3                      | 1        |  |  |
| ADDC   | Arithmetical addition with carry bit                 | 0        |  |  |
| ADDI   | Arithmetical addition with immediate                 | 1        |  |  |
| SUBC   | Arithmetical subtraction with carry bit              | 0        |  |  |
| SUBI   | Arithmetical subtraction with immediate              | 1        |  |  |
| ANDI   | Logical AND with immediate                           | 1        |  |  |
| ORI    | Logical OR with immediate                            | 1        |  |  |
| XORI   | Logical XOR with immediate                           | 1        |  |  |
| CI0    | Replace intimidate value byte 0 for next instruction | 1        |  |  |
| CI1    | Replace intimidate value byte 1 for next instruction | 1        |  |  |
| CI2    | Replace intimidate value byte 2 for next instruction | 1        |  |  |
| SLL    | Shift left logical                                   | 1        |  |  |
| SRL    | Shift right logical                                  | 1        |  |  |
| SRA    | Shift right arithmetical                             | 1        |  |  |
| LWHI   | Load word (high byte)                                | 3        |  |  |
| SWHI   | Store word (high byte, reg. only)                    | 0        |  |  |
| LWLO   | Load word (low byte)                                 | 3        |  |  |
| SWLO   | Store word (low byte, stores high byte reg.)         | 3        |  |  |
| INC    | Increase by 1                                        | 0        |  |  |
| DEC    | Decrease by 1                                        | 0        |  |  |
| GETAH  | Get ALU high byte reg. (only for MUL & DIV & ROL &   | 0        |  |  |
|        | ROR)                                                 |          |  |  |
| GETIF  | Get interrupt flags                                  | 0        |  |  |
| PUSH   | Push to stack                                        | 0        |  |  |
| POP    | Pop from stack                                       | 0        |  |  |
| COM    | Send/Receive to/from com. block                      | 1        |  |  |
| BEQ    | Branch on equal                                      | 3        |  |  |
| BGT    | Branch on greater than                               | 3        |  |  |

Table 9.1.1: Instruction set for RISC processor. \* Required immediate size in bytes

| Instr.                  | Description                        | I-size * |
|-------------------------|------------------------------------|----------|
| BGE                     | Branch on greater equal than       | 3        |
| BZ                      | Branch on zero                     | 2        |
| 0 register instructions |                                    |          |
| CALL                    | Call function, put return to stack | 2        |
| RET                     | Return from function               | 0        |
| JUMP                    | Jump to address                    | 2        |
| RETI                    | Return from interrupt              | 0        |
| INTRE                   | Set interrupt entry pointer        | 2        |

Table 9.1.2: Instructions for OISC processor.

| Name                  | Description                                                 |  |
|-----------------------|-------------------------------------------------------------|--|
| Destination Addresses |                                                             |  |
| ACC0                  | Set ALU source A accumulator                                |  |
| ACC1                  | Set ALU source B accumulator                                |  |
| BR0                   | Set Branch pointer register (low byte)                      |  |
| BR1                   | Set Branch pointer register (high byte)                     |  |
| BRZ                   | If source value is 0, set program counter to branch pointer |  |
| STACK                 | Push value to stack                                         |  |
| MEM0                  | Set Memory pointer register (low byte)                      |  |
| MEM1                  | Set Memory pointer register (middle byte)                   |  |
| MEM2                  | Set Memory pointer register (high byte)                     |  |
| MEMHI                 | Save high byte to memory at memory pointer                  |  |
| MEMLO                 | Save low byte to memory at memory pointer                   |  |
| COMA                  | Set communication block address register                    |  |
| COMD                  | Send value to communication block                           |  |
| REG0                  | Set general purpose register 0                              |  |
| REG1                  | set general purpose register 1                              |  |
|                       | Source Addresses                                            |  |
| NULL                  | Get constant 0                                              |  |
| ALU0                  | Get value at ALU source A accumulator                       |  |
| ALU1                  | Get value at ALU source B accumulator                       |  |
| ADD                   | Get Arithmetical addition of ALU sources                    |  |
| ADDC                  | Get Arithmetical addition carry                             |  |
| ADC                   | Get Arithmetical addition of ALU sources and carry          |  |
| SUB                   | Get Arithmetical subtraction of ALU sources                 |  |
| SUBC                  | Get Arithmetical subtraction carry                          |  |
| SBC                   | Get Arithmetical subtraction of ALU sources and carry       |  |
| AND                   | Get Logical AND of ALU sources                              |  |
| OR                    | Get Logical OR of ALU sources                               |  |
| XOR                   | Get Logical XOR of ALU sources                              |  |
| SLL                   | Get ALU source A shifted left by source B                   |  |
| SRL                   | Get ALU source A shifted right by source B                  |  |
| ROL                   | Get rolled off value from previous SLL instance             |  |
| ROR                   | Get rolled off value from previous SRL instance             |  |

Table 9.1.2: Instructions for OISC processor.

| Name  | Description                                                |
|-------|------------------------------------------------------------|
| MULLO | Get Arithmetical multiplication of ALU sources (low byte)  |
| MULHI | Get Arithmetical multiplication of ALU sources (high byte) |
| DIV   | Get Arithmetical division of ALU sources                   |
| MOD   | Get Arithmetical modulus of ALU sources                    |
| EQ    | Check if ALU source A is equal to source B                 |
| GT    | Check if ALU source A is greater than source B             |
| GE    | Check if ALU source A is greater or equal to source B      |
| NE    | Check if ALU source A is not equal to source B             |
| LT    | Check if ALU source A is less than source B                |
| LE    | Check if ALU source A is less or equal to to source B      |
| BR0   | Get Branch pointer register value (low byte)               |
| BR1   | Get Branch pointer register value (high byte)              |
| PC0   | Get Program counter value (low byte)                       |
| PC1   | Get Program counter value (high byte)                      |
| MEM0  | Get Memory pointer register value (low byte)               |
| MEM1  | Get Memory pointer register value (middle byte)            |
| MEM2  | Get Memory pointer register value (high byte)              |
| MEMHI | Load high byte from memory at memory pointer               |
| MEMLO | Load low byte from memory at memory pointer                |
| STACK | Pop value from stack                                       |
| ST0   | Get stack address value (low byte)                         |
| ST1   | Get stack address value (high byte)                        |
| COMA  | Get communication block address register value             |
| COMD  | Read value from communication block                        |
| REG0  | Get value from general purpose register 0                  |
| REG1  | Get value from general purpose register 1                  |