

#### Computer Engineering

Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/

## **MSc THESIS**

## LLVM based $\rho\text{-VEX}$ compiler

Maurice Daverveldt

Abstract

CE-MS-2014



# LLVM based $\rho$ -VEX compiler asd

#### THESIS

submitted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

in

COMPUTER ENGINEERING

by

Maurice Daverveldt born in Utrecht, Netherlands

Computer Engineering Department of Electrical Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology

## LLVM based $\rho\text{-VEX}$ compiler

#### by Maurice Daverveldt

#### Abstract

**Laboratory** : Computer Engineering

Codenumber : CE-MS-2014

Committee Members :

Advisor: dr.ir. Stephan Wong, CE, TU Delft

Chairperson: dr.ir. K.L.M. Bertels, CE, TU Delft

Member: dr.ir. A. van Genderen, CE, TU Delft

Member: dr.ir. Guido Wachsmuth, CE, TU Delft

Dedicated to my family and friends



## Contents

| Li           | st of | Figures                       | ⁄ii |
|--------------|-------|-------------------------------|-----|
| Li           | st of | Γables                        | ix  |
| Li           | st of | Acronyms                      | хi  |
| $\mathbf{A}$ | cknov | ledgements                    | iii |
| 1            | Intr  | oduction                      | 1   |
|              | 1.1   | Motivation                    | 1   |
|              |       | 1.1.1 Futore of $\rho$ -VEX   | 1   |
|              | 1.2   | Problem statement             | 2   |
|              |       | 1.2.1 Goals                   | 3   |
|              |       | 1.2.2 Subgoals                | 3   |
|              | 1.3   | Methodology                   | 3   |
|              | 1.4   | Thesis overview               | 3   |
| <b>2</b>     | Bac   | ${f ground}$                  | 5   |
|              | 2.1   | VEX System                    | 5   |
|              |       | 2.1.1 Architecture            | 5   |
|              |       | 2.1.2 ISA                     | 5   |
|              |       | 2.1.3 Registers               | 5   |
|              |       | 2.1.4 Run-time architecture   | 5   |
|              | 2.2   | LLVM Compiler infrastructure  | 5   |
|              |       | 2.2.1 LLVM                    | 6   |
| 3            | Imp   | ementation                    | 9   |
|              | 3.1   | Tablegen                      | 9   |
|              | 3.2   | Code generation               | 9   |
|              |       | 3.2.1 Instruction selection   | 9   |
|              |       | 3.2.2 Scheduling              | 9   |
|              |       | 3.2.3 Register allocation     | 9   |
|              |       | 3.2.4 Pipeline description    | 9   |
|              |       | 3.2.5 VLIW Packetizer         | 9   |
|              |       | 3.2.6 Assembly emitter        | 9   |
|              | 3.3   | LLVM Improvements             | 9   |
|              |       | 3.3.1 Reconfigurable compiler | 9   |

| 4             | Results                       | 11 |
|---------------|-------------------------------|----|
|               | 4.1 Benchmark results         |    |
|               | 4.2 Generic binary            | 11 |
| 5             | Conclusion                    | 13 |
| Bi            | ibliography                   | 15 |
| $\mathbf{Li}$ | st of Definitions             | 17 |
| $\mathbf{A}$  | LLVM Quickstart guide         | 19 |
|               | A.1 LLVM toolchain            | 19 |
|               | A.2 XSTsim                    | 19 |
| В             | LLVM Development guide        | 21 |
|               | B.1 Building LLVM from source | 21 |

# List of Figures

| 2.1 | Basic compiler | structure |  |  |  |  |  |  |  |  |  |  |  |  |  |  | ٦ | ó |
|-----|----------------|-----------|--|--|--|--|--|--|--|--|--|--|--|--|--|--|---|---|
|     |                |           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   |   |



## List of Tables



# List of Acronyms



# Acknowledgements

Maurice Daverveldt Delft, The Netherlands January 20, 2014



Introduction

#### 1.1 Motivation

In 2008 Thijs van As designed the first version of the  $\rho$ -VEX processor [1]. This processor uses a VLIW design and is based on the VEX ISA. The VEX ISA is a derivative of the Lx family of embedded VLIW processors [2] from HP/STMicroelectronics.

Around this processor a set of tools has been developed in collaboration with the TU Delft, IBM, STMicroelectronics and other universities. Currently the  $\rho$ -VEX 2.0 tool suite include a synthesizable core, a compiler system and a processor simulator. A GCC based VLIW compiler has been developed by IBM.

VLIW differs from multiple issue in that parallelism is found during compile-time instead of during run-time. This results in a processor that can be made significantly simpler because OoO does not need to be implemented. The compiler is responsible for finding all the parallelism and for scheduling code in an efficient way. This also enables the compiler to execute additional optimization because the compiler has got a higher level view of the code that needs to be executed. Optimizations such as *swing modulo scheduling* and loop vectorization are nearly impossible to achieve in hardware because the higher level structure is no longer available. A compiler can interpret the higher level structure of a program and optimize the output for better scheduling.

The origins of the VEX ISA can be traced to the company Multiflow and John Fisher, one of the inventors of VLIW processors at Yale University [3]. Multiflow designed a computer that used VLIW processors to execute instructions up to 1024-bit in size. Along with these computers Multiflow also designed a compiler system that used trace based scheduling to extract ILP from programs. Reportedly the code base for the Multiflow compiler has been used in modern compiler such as Intel C Compiler (ICC) and HP VEX compiler because of the robustness and the amount of ILP that could be exposed by the compiler [4].

#### 1.1.1 Futore of $\rho$ -VEX

asd

- Runtime reconfiguration of processor and of compiler.
- Possible JITing of code
- Generic binaries, one size fits all

#### 1.2 Problem statement

Currently both HP-VEX and GCC can be used to generate code for the rvex processor. Both compilers have got a number of advantages and disadvantages that will be explored. The compilers will be judged on the following subjects: Code quality, support, languages support, backend supported and customization possibilities.

#### **HP-VEX:**

- Code quality: Excellent code quality and ILP extraction.
- **Support:** Bad, no active community.
- Front-end: Bad, only support for C.
- Back-end: Not applicable since compiler is specifically targeted to one architecture.
- Customization: Customization possible through machine description. Further research on optimization strategies not possible because compiler is proprietary and closed source. Because of this expanding the functionality of the compiler is impossible.

#### GCC:

- Code quality: Excellent code quality with performance approaching that of commercial compilers (CITATION NEEDED).
- Support: There is a very active development community around GCC.
- Front-end: GCC supports a large number of programming languages including C, C++, Fortran and Java
- Back-end: Support exists for a large number of processors including x86, ARM, MIPS and ofcourse VEX
- Customization: Because GCC is open source the compiler can be customized to support new passes, optimizations and instructions.

Unfortunately GCC has a number of disadvantages that need mentioning.

- **VEX code quality:** The VEX backend for GCC has not been optimized and the quality of the code is quite low. Performance of GCC executables is lower than code compiled by the HP-VEX compiler. Some programs do not function correctly when compiled by GCC. Some programs are unable to be compiled by GCC.
- VEX reconfiguration: The current GCC VEX compiler does not support runtime reconfiguration. The compiler has been set to a 4 issue width  $\rho$ -VEX and this cannot be changed without rebuilding GCC.
- Bloated: GCC consists of millions of lines of code and is arguable one of the most complex programs in existence. This makes understanding GCC and developing for GCC very hard.

• Complexity: GCC is written in C. Design is complex, not very modular and documentation is not very good. Different parts of the compiler are linked in a complex way and it is very difficult to obtain a general picture on how the compiler operates. Because of the complexity it is difficult to achieve high performance in GCC.

The comparison shows that both the HP-VEX and GCC compilers have serious disadvantages. The fact that HP-VEX cannot be customized excludes it from further development for the  $\rho$ -VEX project. Bringing the GCC compiler performance and features up to the same level as HP-VEX will be very difficult because of the complexity involved with GCC development.

#### 1.2.1 Goals

asd

#### 1.2.2 Subgoals

asd

#### 1.3 Methodology

asd

#### 1.4 Thesis overview

asd

Background

#### 2.1 VEX System

asd

2.1.1 Architecture

asd

2.1.2 ISA

asd

2.1.3 Registers

asd

#### 2.1.4 Run-time architecture

asd

#### 2.2 LLVM Compiler infrastructure

LLVM is based on the classic three-stage compiler architecture shown in figure 2.1. The compiler uses a number language specific front-ends, an optimizer and target specific backends. This modular design enables compiler designers to work on different parts of the compiler as a separate part. Support for a new processor can be added by building a new back-end. The existing front-end and optimizer can be reused for the new compiler.



Figure 2.1: Basic compiler structure

The front-end is used to transform the plain text source code of a program into an intermediate representation that will be used during compilation process. This transformation is achieved by performing the following steps:

- 1. **Lexical analysis:** Break input into individual tokens.
- 2. **Syntax analysis:** Using a grammer the sequence of tokens is transformed into a parse tree which represents the structure of the program.
- 3. **Semantic analysis:** Semantic information is added to the parse tree, type checking is performed and a symbol table is built.

The resulting abstract syntax tree (AST) is transformed into LLVM IR and passed to the optimizer and backend of the compiler. These parts of the compilation process are completely language agnostic and do not require any other information from the front-end.

The optimizer is used to analyze and optimize the program. Optimization such as dead code elimination and copy propagation are performed during this phase but also more advanced operations that extract ILP (loop vectorization) can be enabled.

The back-end optimizes and generates code for a specific architecture. The LLVM IR is transformed into processor specific assembly instructions, allocates registers and schedules code for better performance.

#### 2.2.1 LLVM

asd

#### 2.2.1.1 Front-end

The modular design of LLVM enables the compiler to be used as a part of the existing GCC compiler. For example, the dragonegg GCC plugin is designed to replace the GCC code generator and optimizer with the LLVM backend. This would enable LLVM to be able to use the existing GCC based front-ends and supported languages.

Clang has been developed to allow LLVM to operate independently of GCC. Clang is a front-end supporting C, C++ and ObjC. The front-end is designed to be closely integrated with the Integrated Development Environment (IDE) allowing more expressive diagnostic messages. In addition Clang also aims to provide faster compilation and lower memory usage [5].

#### 2.2.1.2 LLVM IR

The front-end transforms a source code into the LLVM internal representation (LLVM IR). The LLVM IR is used to represent a high level language cleanly in a target independent way and is used during all phases of compilation. Instructions are similar to RISC instructions and can use three operands. Control flow instructions and type specific load/store instructions are used and an infinite amount of registers are available in Single Static Assignment (SSA) form. The LLVM IR is available as human readable text, in memory and in binary form [6].

The LLVM IR is designed to expose high-level information for further optimization. Examples of high-level information include dataflow through SSA form, control-flow graphs, language independent type information and explicit use of pointer arithmetic.

Primitives such as voids, floats and integers are natively supported in the LLVM IR. The bitwidth of the integers can be defined manually. Pointers, arrays, structures and functions are derived from these basic types.

Object oriented constructs such as classes and virtual methods are not natively supported but can be built using the existing type system. For example a C++ class can be represented by a struct and a list of methods.

The SSA based dataflow form allows the compiler to efficiently perform code optimizations such as dead code elimination and constant propagation.

Figure 2.1 shows an example program in C. The equivalent LLVM IR representation is shown in figure 2.2.

```
int main() {
   int sum = 1;

   while(sum < 10)
   {
      sum = sum + 1;
   }
   return sum;
}</pre>
```

Listing 2.1: C example program

```
define i32 @main() nounwind ssp uwtable {
  %1 = alloca i32, align 4
  %sum = alloca i32, align 4
  store i32 0, i32* %1
  store i32 1, i32* %sum, align 4
  br label %2
                                                        ; preds = \frac{1}{5}, \frac{1}{5}0
 <label>:2
  %3 = load i32* %sum, align 4
  %4 = icmp slt i32 %3, 10
  br i1 %4, label %5, label %8
 <label>:5
                                                        ; preds = %2
  %6 = load i32* %sum, align 4
  \frac{%7}{} = add nsw i32 \frac{%6}{}, 1
  store i32 %7, i32* %sum, align 4
  br label %2
; <label>:8
                                                        ; preds = \frac{1}{2}
  %9 = load i32* %sum, align 4
  ret i32 %9
```

Listing 2.2: LLVM Intermediate representation

#### 2.2.1.3 Code generation

During code generation the optimized LLVM IR is translated into machine specific assembly instructions. The modular design of LLVM enables generic algorithms to be used for this process.

A backend is described in a domain specific language (DSL) called tablegen. The tablegen files describe properties of a backend such as available instructions, registers,

calling convention and pipeline structure. During compilation of LLVM the tablegen files are converted into a C++ description of the backend. Tablegen has been specifically designed to describe the backend structure in a flexible and generic way. Common features can be more easily described using tablegen. For example the Add and Sub instruction are almost identical and using tablegen can be described in a more generic way. This results in less repetition and reduces the chance of error.

Because of the generic description of the backend large amount of code can be reused by each backend. Algorithms such as register allocation and instruction selection operate on the generic tablegen descriptions and do not require target specific hooks to operate correctly. An additional advantage of this approach is that multiple algorithms are available to achieve certain functionality. For example, LLVM offers the developer a choice between four different register allocation algorithms. Each algorithm has a number of advantages and disadvantages and the developer can choose between an algorithm which matches the target processor best.

At the moment not all parts of the backend can be described in Tablegen and hand written C++ code is still needed for a backend to operate. As LLVM develops more parts of the backend description should be integrated into the backend.

The following sequence is completed to translate LLVM IR into target specific assembly language:

- 1. Instruction selection
- 2. Scheduling
- 3. Register allocation
- 4. Prologue / epilogue insertion
- 5. Assembly printing

Implementation

#### Tablegen 3.1 asd Code generation 3.2asd 3.2.1 Instruction selection asd Scheduling 3.2.2 asd 3.2.3 Register allocation asd 3.2.4 Pipeline description asd **VLIW Packetizer** 3.2.5 asd Assembly emitter 3.2.6 asd LLVM Improvements 3.3 asd 3.3.1 Reconfigurable compiler asd

#### 3.3.1.1 Machine description

asd

#### 3.3.1.2 Compiler run-time reconfiguration

asd

#### 3.3.1.3 Optimize VLIW scheduling

asd

#### 3.3.1.4 Generic binary support

 $\operatorname{asd}$ 

Results

#### 4.1 Benchmark results

Comparison to GCC and HP-VEX Different issue widths Different optimization levels

#### 4.2 Generic binary

Comparison to GCC Different issue widths Conclusion

## Bibliography

- [1] T. van As, "-vex: A reconfigurable and extensible vliw processor," Master's thesis, Technical university Delft, Mekelweg 4, 2628 CD, Delft, The Netherlands, 2008.
- [2] P. Faraboschi, G. Brown, J. Fisher, G. Desoll, and F. Homewood, "Lx: a technology platform for customizable vliw embedded processing," in *Computer Architecture*, 2000. Proceedings of the 27th International Symposium on, 2000, pp. 203–213.
- [3] J. A. Fisher, "Very long instruction word architectures and the eli-512," SIGARCH Comput. Archit. News, vol. 11, no. 3, pp. 140–150, Jun. 1983. [Online]. Available: http://dl.acm.org/citation.cfm?id=1067651.801649
- [4] P. G. Lowney, P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'donnell, and J. C. Ruttenberg, "The multiflow trace scheduling compiler," *JOURNAL OF SUPERCOMPUTING*, vol. 7, pp. 51–142, 1993.
- [5] Clang, "Clang features and goals."
- [6] V. A. Olatunji Ruwase, Chris Lattner and G. P. David Koes, "The llvm compiler framework and infrastructure (part 1)," Presentation.

<u>16</u> BIBLIOGRAPHY

# List of definitions

·· ...

# A

# LLVM Quickstart guide

asd

A.1 LLVM toolchain

 $\operatorname{asd}$ 

A.2 XSTsim

asd

LLVM Development guide

asd

### B.1 Building LLVM from source

 $\operatorname{asd}$