

### University College London

DEPARTMENT OF ELECTRONIC AND ELECTRICAL ENGINEERING

### Performance characterisation of 8-bit RISC and OISC architectures

Author:
Mindaugas
JARMOLOVICIUS
zceemja@ucl.ac.uk

Supervisor:
Prof. Robert
KILLEY
r.killey@ucl.ac.uk

Second Assessor:
Dr. Ed
ROMANS
e.romans@ucl.ac.uk

A BEng Project Interim Report

December 13, 2019

### 1 Abstract

This project investigates area / performance / power / complexity benefits and trade-offs of One Instruction Set Computer (OISC¹) in comparison to traditional Reduced Instruction Set Computer (RISC) architectures in a general computing application. It is shown that SUBLEQ OISC performs better in special cases, however there is a lack of research info more complicated OISC architectures designed for general computing. At this point the project is mostly on track with the benchmark being behind schedule.

### 2 Introduction

Since the 80s there has been a rise of many processor architectures that try to fulfil specific performance and power application constraints. One of more noticeable cases are ARM's RISC architecture being used in mobile devices instead of the more popular and robust x86 CISC (Complex Instruction Set Computer) architecture in favour of simplicity, cost and lower power consumption [6, 2]. It has been shown that in low power applications, such as IoTs (Internet of Things), OISC implementation can be superior in power and data throughput comparing to traditional RISC architectures [10, 1]. This project proposes to compare two novel RISC and OISC architectures and compare their performance, design complexity and efficiency.

The project has 3 main objectives:

- Design and build a RISC based processor. As this is aimed for low power and performance applications it will be 8bit word processor with 4 general purpose registers.
- Design and build an OISC based processor. There are multiple different implementations that are still under consideration, such as SUBLEQ or one proposed in chapter 3.1.

• Design a fair benchmark that both processors could execute. This benchmark may include different algorithms that are commonly used in controllers, IoT devices or similar low power microprocessor applications.

The following chapters will describe the estimated project outcome, project schedule and work completed so far.

# 3 The Work Performed to Date

### 3.1 Supporting Theory

This section explains the theory and predictions of RISC and OISC architectures.

Figure 3.1.1 represents simplified diagrams of RISC and OISC architectures. In RISC and CISC architecture, program data travels from program memory to the control block where instruction is decoded. Then control block further decides how data is directed in the datapath block which is described in section 3.3.6. Such structure requires a complicated control block and additional data routing blocks. In order to increase performance of one such processor you would need to add pipelining or multiple cores. Both methods have disadvantages: multicore processor requires software adjustments and each core doubles the control and datapath blocks, substantially increasing transistor count; pipelinig allows operation at higher frequencies however it brings design complications such as complicated hazard prevention logic and instruction lookup. RISC architecture in this project is mainly based on theory in [5]. The simplicity of OISC architecture overcomes these disadvantages:

Pipelining can be done by individual blocks and programmibly waiting for results, this is represented in figure 3.1.1b Adder and Multiply vertical blocks, multicore can be simulated by adding more data and instruc-

<sup>&</sup>lt;sup>1</sup>Also known as URISC (Ultimate Reduced Instruction Set Computer)

tion buses, hazards can be prevented with software and/or integrated into address registers.

ALU and other processor components can be divided by adding different address registers. This allow utilisation of multiple components at the same time given that multiple data buses are used. This is represented in figure 3.1.1b Arithmetic Unit horizontal blocks. Assuming 4 data and instructions buses are used, **AND** and **OR** blocks sources A and B can all be written during one cycle utilising both blocks at the same time.

These principal functions should allow OISC architecture to have advantage in performance and power consumption while having lower transistor count and this is the main motivation behind the project. This expectation is supported mainly by the following papers:

- Using OISC SUBLEQ as a coprocessor for the MIPS-ISA processor to emulate the functionality of different classes shows desirable area/performance/power trade-offs [1].
- Comparing OISC SUBLEQ multicore to RISC achieves better performance and lower energy for streaming data processing [10].

There is a lack of research investigating and comparing more general purpose OISC non-SUBLEQ processor. The main theory for OISC architecture used in this project will be based on [9, 4, 7, 3].



(b) Single data bus OISC microarchitecture diagram  $\,$ 

Reg

0

Reg

i

Register File

Reg

i+1

**INSTR. BUS** 

· DATA BUS

data

INSTR. BUS ----

DATA BUS \*\*\*

**RAM** 

read data

reg

Memory

Figure 3.1.1: Simplified diagrams of both architectures. Blue lines indicate control/instruction buses and black - data buses

### 3.2 Project Scheduling

As it can be seen in table 1 below, the project is mainly split into Term 1 which is dedicated to RISC and Term 2 which is dedicated to OISC implementation. Approximately 3 weeks before the final report submission are left to have enough spare time to finish all tests and complete the poster & the final report itself. It is also expected a lot of coursework to be assigned around this time. Currently the benchmark development is behind schedule, more information in section 4.1.1.



Table 1: Project schedule Grantt chart

### 3.3 Progress to Date

This section mainly includes progress on RISC processor and its components.

### **3.3.1** Memory

The initial plan was to use 32MiB 16bit SDRAM chip located on FPGA board. After successfully simulating most of the processor functions, the next step was to synthesise and run it on the FPGA which brought 2 problems:

- (1) Flashing program to ROM was not simple because generic Verilog's unpacked register array cannot be initialised from file. The solution to this problem was solved by using FPGA's built-in M9K memory that allows flexible RAM/ROM configuration. In addition, M9K memory can be read from/written to via JTAG connection without affecting FPGA's operation which enabled a quick method to upload programs without the need to resynthesise processor HDL code.
- (2) Used SDRAM memory controller runs at much higher frequency than the processor (at 100MHz versus 1MHz). The initial implementation of the interface between the two was multiple 1-word length FIFO (First-In-First-Out) registers which caused memory read operation to take 2 processor cycles. 3 possible solutions were considered suspend the processor clock while memory data is read which would harm performance; remove FIFO registers and rely on SDRAM clock being much greater than the

processor clock which is not ideal if higher processor clocks are intended to be used; use M9K memory. The last option was chosen due to ease of implementation and ability to read RAM content via JTAG for debugging purposes.

#### 3.3.2 Instructions

Table 2 below represents RISC processor instructions that been implemented so far. In this processor design, instructions are 8 bit size, where 5th and 6th bits points to register 1 address and 7th and 8th bits to register 2 address. Instructions are separated in 3 groups:

- 2 register uses both registers,
- 1 register uses only register 1,
- 0 register do not use registers.

Such grouping allowed to encode more instructions in 8 bits. At every processor cycle 4 bytes of instructions are fetched, first byte encodes instruction, last 3 may encode immediate value. This allows jump and conditional branch instructions to use 16bit jump address. Memory instructions LWHI, SWHI, LWLO and SWLO can use 24bit address that allows it to access all addresses of SDRAM chip that been described in section 3.3.1. MOVE instruction is a special case, if register addresses 1 and 2 are different, data will be copied from register address 2 to 1. If these register address values are the same, the value from the immediate will be copied to register 1 address.

Table 2: Instruction set for RISC processor. \* Required immediate size in bytes

| Instr.                  | Description                 | I-size * | Completed |  |
|-------------------------|-----------------------------|----------|-----------|--|
| 2 register instructions |                             |          |           |  |
| MOVE                    | Copy intimidate or register | 1 or 0   | X         |  |
| ADD                     | Arithmetical addition       | 0        | X         |  |
| SUB                     | Arithmetical subtraction    | 0        | x         |  |
| AND                     | Logical AND                 | 0        | X         |  |
| OR                      | Logical OR                  | 0        | X         |  |
| XOR                     | Logical XOR                 | 0        | X         |  |
| MUL                     | Arithmetical multiplication | 0        | x         |  |

Table 2: Instruction set for RISC processor. \* Required immediate size in bytes

| Instr.                  | Description                                          | I-size * | Completed |  |  |  |
|-------------------------|------------------------------------------------------|----------|-----------|--|--|--|
| DIV                     | Arithmetical division (inc. modulus)                 | 0        | X         |  |  |  |
|                         | 1 register instructions                              |          |           |  |  |  |
| CI0                     | Replace intimidate value byte 0 for next instruction | 1        | X         |  |  |  |
| CI1                     | Replace intimidate value byte 1 for next instruction | 1        | X         |  |  |  |
| CI2                     | Replace intimidate value byte 2 for next instruction | 1        | X         |  |  |  |
| SLL                     | Shift left logical                                   | 1        |           |  |  |  |
| SRL                     | Shift right logical                                  | 1        |           |  |  |  |
| SRA                     | Shift right arithmetical                             | 1        |           |  |  |  |
| LWHI                    | Load word (high byte)                                | 3        | X         |  |  |  |
| SWHI                    | Store word (high byte, reg. only)                    | 0        | X         |  |  |  |
| LWLO                    | Load word (low byte)                                 | 3        | X         |  |  |  |
| SWLO                    | Store word (low byte, stores high byte reg.)         | 3        | X         |  |  |  |
| INC                     | Increase by 1                                        | 0        | X         |  |  |  |
| DEC                     | Decrease by 1                                        | 0        | X         |  |  |  |
| GETAH                   | Get ALU high byte reg. (only for MUL & DIV)          | 0        | X         |  |  |  |
| GETIF                   | Get interrupt flags                                  | 0        | X         |  |  |  |
| PUSH                    | Push to stack                                        | 0        | X         |  |  |  |
| POP                     | Pop from stack                                       | 0        | X         |  |  |  |
| COM                     | Send/Receive to/from com. block                      | 1        | X         |  |  |  |
| ADDI                    | Arithmetical addition with intimidate                | 1        | X         |  |  |  |
| SUBI                    | Arithmetical subtraction with intimidate             | 1        | X         |  |  |  |
| ANDI                    | Logical AND with intimidate                          | 1        | X         |  |  |  |
| ORI                     | Logical OR with intimidate                           | 1        | X         |  |  |  |
| XORI                    | Logical XOR with intimidate                          | 1        | X         |  |  |  |
| BEQ                     | Branch on equal                                      | 3        | X         |  |  |  |
| BGT                     | Branch on greater than                               | 3        | X         |  |  |  |
| BGE                     | Branch on greater equal than                         | 3        | X         |  |  |  |
| BZ                      | Branch on zero                                       | 2        | X         |  |  |  |
| 0 register instructions |                                                      |          |           |  |  |  |
| CALL                    | Call function, put return to stack                   | 2        | X         |  |  |  |
| RET                     | Return from function                                 | 0        | X         |  |  |  |
| JUMP                    | Jump to address                                      | 2        | X         |  |  |  |
| RJUMP                   | Relative jump                                        | 2        |           |  |  |  |
| RETI                    | Return from interrupt                                | 0        | X         |  |  |  |
| INTRE                   | Set interrupt entry pointer                          | 2        | X         |  |  |  |
| CLC                     | Clear ALU carry-in                                   | 0        |           |  |  |  |
| SETC                    | Set ALU carry-in                                     | 0        |           |  |  |  |
| CLS                     | Clear ALU sign                                       | 0        |           |  |  |  |
| SETS                    | Set ALU sign                                         | 0        |           |  |  |  |
| SSETS                   | Enable ALU sign                                      | 0        |           |  |  |  |
| CLN                     | Clear ALU negative                                   | 0        |           |  |  |  |
| SETN                    | Set ALU negative                                     | 0        |           |  |  |  |
| SSETN                   | Enable ALU negative                                  | 0        |           |  |  |  |

#### 3.3.3 Program Counter

Figure 3.3.1 represents the digital diagram for program counter. There are a few key features about this design: it can take values from memory for RET instruction; immediate value ( $PC\_IMM2$  is shifted by 1 byte to allow BEQ, BGT, BGE instructions as first immediate byte used as ALU source B); can jump to interrupt address; produces  $pc\_halted$  signal when memory is read (RET instruction takes 2 cycles, because cycle one fetches the address from stack and second cycle fetches the instruction from the instruction memory).

#### 3.3.4 Immediate Override

Due to the limited amount of instructions available and processor data width being 1 byte comparing to 2 bytes for instruction address and 3 bytes for memory address, there is no easy way to operate program pointers that can be dynamically calculated which limits program flexibility. Solution to this is IMO (Immediate Override), the logic for which is represented in figure 3.3.2.

This circuit is connected between instruction memory's output higher 3 bytes (the immediate value, *immr* bus) and immediate bus (*imm*) at datapath. Override bytes can be written with CIO, CI1 and CI2 instructions. On CIO a flag for the next clock cycle is enabled that replaces *imm* value by ones stored in IMO registers. This circuit has two disadvantages:

- 1. Overriding immediate bytes takes one or more clock cycles,
- 2. At override, *immr* bytes are ignored therefore they are wasting instruction memory space.

Point 2 can be resolved by designing a circuit that would subtract the amount of overridden IMO bytes from pc\_off signal (program counter offset that is dependant on i-size value) at the program counter, thus effectively saving instruction memory space. This solution however would introduce a complication with the assembler as additional checks would need to be done during compiling to check if IMO instruction are used.



Figure 3.3.1: Digital diagram of RISC program counter



Figure 3.3.2: Digital diagram of RISC immediate override logic

#### 3.3.5 Stack Pointer

RISC processor implements the stack pointer that is used in PUSH, POP, CALL and RET instructions. The stack pointer's initial address starts at the highest memory address (0xFFFF) and subtracts 1 when data is put to stack. Figure 3.3.3 represents the digital diagram for stack pointer. Note that the stack is only 16bit in size and the most significant byte is set to 0xFF. The stack pointer circuit also supports  $pc\_halted$  signal from program counter to prevent the

stack pointer from being added by 1 twice during RET instruction.

One of the problems with the current stack pointer implementation is 8bit data stored in 16bit memory address, wasting a byte. This can be avoided by adding a high byte register, however then it would cause problems when a 16bit program pointer is stored with CALL instruction. This can still be improved with a more complex circuit, or by using memory cache with 8bit data input. The solution to this may be implemented in future.



Figure 3.3.3: Digital diagram of RISC stack pointer circuit

#### 3.3.6 Datapath



Figure 3.3.4: Digital diagram of RISC8 datapath

Figure 3.3.4 above represents partial datapath. Program counter, Stack pointer and Immediate Override logics are represented in figures 3.3.1, 3.3.3 and 3.3.2 respectively. CDI (Control-Data Interface) is HDL concept that connects datapath and control unit. Immediate value to datapath is provided by IMO block described in section 3.3.4.

Data to register file is selected and saved with MUX0. This data is delayed 1 cycle with R2 to match timing that of data is taken from memory. If LWLO or LWHI is executed, MUX1 select high or low byte from memory to read. To compensate for timing as value written to register file is delayed by 1 cycle, register file has internal logic that outputs  $wr_data$  to  $rd_data1$  or/and  $rd_data2$  immediately if  $wr_en$  is high and  $rd_addr1$  or/and  $rd_addr2$  matches  $wr_addr$ .

MUX2 allows override ALU source B, R3 and MUX3 enables control unit to enable ALU carry in allowing multi-byte number addition/subtraction. This function is not fully implemented yet. MUX4 and MUX5 allows to send data to COM block with COM instruction, if other instruction performed then  $\theta x\theta\theta$  byte for COM address and data is sent, indicating no action. Data is stored in memory only with SWLO instruction writing to high byte whatever is stored in R4 buffer. This buffer can be written to using SWHI instruction. MUX6 selects memory address value from imm or stack pointer.

### 4 Summary of Difficulties and Issues

This chapter focuses on any difficulties and issues that are hindering the project from moving forward.

#### 4.1 List of Difficulties

This subsection will list difficulties currently encountered:

- Benchmark
- Assembler/Compiler

#### 4.1.1 Benchmark

One of the difficulties is to design an appropriate benchmark that could test scenarios used in actual such processor applications. Other issues comes with writing the benchmark itself, for instance the benchmark test to finding prime numbers may use many different algorithms, where one of the fastest and used in actual industry may be "Sieve of Atkin" algorithm [8], however is it fairly complicated and time consuming especially when written in assembly.

#### 4.1.2 Assembler/Compiler

In order to write more complicated code for the benchmark, a proper compiler is required. The current assembler supports definitions and labels, however not macros or imports from other files, nor linked libraries. Ideally, a common language compiler needs to be developed so that the benchmark programs could be imported without the need to completely rewrite them in assembly.

However such compiler might take more time to develop than writing benchmarks in assembly. Therefore it needs to be decided which option to proceed with.

#### 4.2 Failure Assessment

This section describes the likely possibilities of project failures:

As of the current schedule OISC processor will be implemented in Term 2, however due to personal schedule, the project will be given about twice as less time than in Term 1 which may result in not finishing OISC processor or developing all the benchmarks for it on time. Mitigation for this is to closely follow the schedule and adjust OISC design in such way that would take less time to implement the benchmark.

Another possible failure may be FPGA failure which would delay testing and benchmarking processors. This has already been encountered before, JTAG has caused errors while communicating with FPGA which indicated that FPGA chip is damaged. After long investigation it has been discovered that the problem was caused by a bug in a Linux JTAG Daemon service that needed to be simply restarted. Mitigation for any further issues is to have access to a backup FPGA board.

## 4.3 Updated Safety Risk Assessment

There are no changes to Safety Risk Assessment.

### 5 Appendix A: Safety Risk Assessment

RiskNet report is appended at the end of this document.

### 6 Appendix B: Computer Code

All code to HDL processor implementation and assembler are in git repository that can be accessed in https://gogs.infcof.com/min/ucl project y3

### 7 References

### References

- [1] Tanvir Ahmed et al. "Synthesizable-from-C Embedded Processor Based on MIPS-ISA and OISC". In: 2015 IEEE 13th International Conference on Embedded and Ubiquitous Computing (2015). DOI: 10.1109/euc.2015.23.
- [2] E. Blem, J. Menon, and K. Sankaralingam. "Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures". In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) (2013). DOI: 10.1109/hpca.2013.6522302.
- [3] K. S. Dharshana, Kannan Balasubramanian, and M. Arun. "Encrypted computation on a one instruction set architecture". In: 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016). DOI: 10.1109/iccpct.2016.7530376.
- [4] William F Gilreath and Phillip A Laplante. Computer Architecture: A Minimalist Perspective. Kluwer Academic Publishers, 2003.
- [5] David Money Harris and Sarah L Harris. Digital design and computer architecture. 2nd ed. Elsevier, 2013.
- [6] T. Jamil. "RISC versus CISC". In: *IEEE Potentials* 14.3 (1995), pp. 13–16. DOI: 10.1109/45.464688.
- [7] J. H. Kong et al. "Minimal Instruction Set FPGA AES processor using Handel". In: 2010 International Conference on Computer Applications and Industrial Electronics (2010). DOI: 10.1109/iccaie.2010.5735100.
- [8] François Morain. "Atkin's Test: News from the Front". In: Lecture Notes in Computer Science (1989), pp. 626–635. DOI: 10.1007/3-540-46885-4 59.
- [9] Jia Jan Ong, L.-M. Ang, and K. P. Seng. "Implementation of (15, 9) Reed Solomon Minimal Instruction Set Computing on FPGA using Handel-C". In: 2010 International Conference on Computer Applications and Industrial Electronics (2010). DOI: 10.1109/iccaie.2010.5735103.
- [10] Minato Yokota, Kaoru Saso, and Yuko Hara-Azumi. "One-instruction set computer-based multicore processors for energy-efficient streaming data processing". In: Proceedings of the 28th International Symposium on Rapid System Prototyping Shortening the Path from Specification to Prototype RSP '17 (2017). DOI: 10.1145/3130265.3130318.



### Summary

Reference: RA030726/1 Sign-off Status: Authorised

| Date Created:                                                                                         | 09/10/2019                                                                                             | Confidential?                     | No               |  |  |  |
|-------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|-----------------------------------|------------------|--|--|--|
| Assessment Title: 3rd year project: Performance characterisation of 8-bit RISC and OISC architectures |                                                                                                        |                                   |                  |  |  |  |
| Assessment Outline: New activity                                                                      |                                                                                                        |                                   |                  |  |  |  |
| Area Responsible (for management of risks)                                                            |                                                                                                        | Location of Risks                 | On-Site          |  |  |  |
| Division, School, Faculty, Institute:                                                                 | Faculty of Engineering Science                                                                         | Building:                         | Roberts Building |  |  |  |
| Department:                                                                                           | Dept of Electronic & Electrical Eng                                                                    | Area:                             | Ground and Above |  |  |  |
| Group/Unit:                                                                                           | All Groups/Units                                                                                       | Sub Area:                         | Laboratory       |  |  |  |
| Further Location Information:  Roberts building Rooms 704, 905. Also working from home.               |                                                                                                        |                                   |                  |  |  |  |
| Assessment<br>Start Date:                                                                             | 09/10/2019                                                                                             | Review or 31/03/2020<br>End Date: |                  |  |  |  |
| Relevant Attachments:                                                                                 |                                                                                                        |                                   |                  |  |  |  |
|                                                                                                       | Description of attachments:                                                                            |                                   |                  |  |  |  |
|                                                                                                       |                                                                                                        |                                   |                  |  |  |  |
|                                                                                                       | Location of non-electronic documents:                                                                  |                                   |                  |  |  |  |
|                                                                                                       |                                                                                                        |                                   |                  |  |  |  |
| Assessor(s):                                                                                          | Jarmolovicius, Min                                                                                     |                                   |                  |  |  |  |
| Approver(s):                                                                                          | Approver(s): ROBERT KILLEY                                                                             |                                   |                  |  |  |  |
| Signed Off:                                                                                           | Signed Off: ROBERT KILLEY (09/10/2019 12:56)                                                           |                                   |                  |  |  |  |
| Distribution List:                                                                                    | Gerald McBrearty (g.mcbrearty@ucl.ac.uk) - 09/10/2019 ANDREW MOSS (andrew.moss@ucl.ac.uk) - 09/10/2019 |                                   |                  |  |  |  |
| PEOPLE AT RISK (from the Activities covered by this Risk Assessment)                                  |                                                                                                        |                                   |                  |  |  |  |
| CATEGORY                                                                                              |                                                                                                        |                                   |                  |  |  |  |
| Undergraduates                                                                                        |                                                                                                        |                                   |                  |  |  |  |



### **Activities, Hazards, Controls**

Reference: RA030726/1 Sign-off Status: Authorised

| 1. Working in a lab                              |                                                                                                   |  |  |  |
|--------------------------------------------------|---------------------------------------------------------------------------------------------------|--|--|--|
| Description of Activity:                         |                                                                                                   |  |  |  |
| Hazard 1. Using computer                         |                                                                                                   |  |  |  |
| RSI Eye strain Bad posture                       | Existing Control Measures Properly use mouse/keyboard Make constant breaks Make sure sit properly |  |  |  |
| Hazard 2. Testing FPGA                           |                                                                                                   |  |  |  |
| Burns from hot wires/chips due to short-circuits | Existing Control Measures Use current limit on power supplies                                     |  |  |  |



09/10/2019 12:56 - Page 2 of 2 **Powered By OSHENS**