# 國立臺灣大學電子工程學研究所 碩士論文

Graduate Institute of Electronic Engineering
National Taiwan University
Master Thesis

# 支援 Xilinx AXI DMA 的 Linux UIO 驅動程式 Linux UIO driver for Xilinx AXI DMA

劉宇唐 Yu-Tang Liu

指導教授:鄭振牟博士

Advisor: Chen-Mou Cheng, Ph.D.

中華民國 107 年 7 月 July 2018 近年來,由於 AI、VR 產業的崛起,FPGA 產業越來越受到重視。為了簡化 FPGA 的開發流程,使用嵌入式 Linux 會是一個不錯的方法。透過 Linux Kernel 提供的 UIO 驅動程式,我們可以把我們在硬體端設計出來的 IP 視為一個外部裝置,然後在 Linux 使用者空間裡的程式中,輕鬆地開發軟體端的應用。然而,有些硬體端的設計,卻無法透過同樣的方法,利用 UIO 驅動程式,建立裝置節點,而帶有直接記憶體存取 IP 的設計就是其中之一。由於 UIO 驅動程式並無法支援此種設計,我們必須擁有"root"權限,才能使用我們的設計,但是提供"root"給一般使用者並不是一個好方法。在此論文中,我們修改了 Linux 內建的 UIO 驅動程式,使得一般用戶也能在使用者空間中使用帶有 DMA 的硬體設計。

關鍵字:賽靈思,直接記憶體存取,AXI,Linux UIO 驅動程式

Abstract

In recent year, increasingly importance has been attached to FPGAs with the development

of AI,VR. To simplify the development process on FPGAs, embedded Linux on FPGAs

will be a good way. With UIO driver provided in Linux Kernel, we can mount our block

design, that is, custom IP(Intellectual Property) core in Vivado as a device node, and pro-

gram it in Linux user space. However, there are some designs that UIO driver cannot

recognizes. The design with DMAs(Direct Memory Access) is the one of them. With

this kind of design, because UIO driver is not work, we need "root" to control our IP, and

providing root privileges to users is never a good solution. In this thesis, we modify UIO

driver so that users can easily use designs with DMA in user-space.

**Keywords:** Xilinx, DMA, AXI, Linux UIO driver

# **Contents**

| 1 | Intr | duction                      | 1 |
|---|------|------------------------------|---|
|   | 1.1  | Motivation                   | 1 |
|   | 1.2  | Contribution                 | 1 |
| 2 | Prel | minaries                     | 2 |
|   | 2.1  | Embedded Linux               | 2 |
|   |      | 2.1.1 Device Tree            | 2 |
|   |      | 2.1.2 Linux Kernel Driver    | 3 |
|   | 2.2  | UIO                          | 3 |
|   | 2.3  | AXI Bus                      | 4 |
|   |      | 2.3.1 AMBA                   | 4 |
|   |      | 2.3.2 AXI4                   | 4 |
|   | 2.4  | DMA                          | 4 |
|   |      | 2.4.1 DMA Engine             | 4 |
| 3 | Proj | osed solution                | 5 |
|   | 3.1  | Linux UIO driver for AXI DMA | 5 |
|   |      | 3.1.1 Scatter Gather         | 5 |
|   |      | 3.1.2 Cache Coherency        | 6 |
|   | 3.2  | Implementation               | 6 |
| 4 | Env  | ronment Framework            | 8 |

| 5  | Ana    | lysis   |                                 | 9  |
|----|--------|---------|---------------------------------|----|
|    | 5.1    | AXI F   | IFO                             | 9  |
|    | 5.2    | Custon  | n Stream IP                     | 9  |
|    |        | 5.2.1   | DMA with OpenCores tinyAES      | 9  |
|    |        | 5.2.2   | DMA with ECDSA(Curve secp256k1) | 9  |
|    | 5.3    | Compa   | arison                          | 9  |
| 6  | Con    | clusion |                                 | 11 |
| Re | eferen | ces     |                                 | 12 |

# **List of Figures**

| 2.1 | The UIO way.                           | 3 |
|-----|----------------------------------------|---|
| 3.1 | Custom IP with AXI4-Lite/Full Register | 5 |
| 3.2 | Cache Coherency Problems.              | 7 |

# **List of Tables**

| 5 1 | Top 20 malware families in the dataset. |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   | 1 | 0 |
|-----|-----------------------------------------|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| J.1 | 10p 20 marware rammes in the dataset.   | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • |   | v |

# Introduction

FPGA(Field Programmable Gate Array) is

- 1.1 Motivation
- 1.2 Contribution

### **Preliminaries**

In this chapter, we introduce the background technology for our work. Embedded Linux, UIO Driver, AXI Bus, and DMA.

### 2.1 Embedded Linux

Embedded Linux is a kernel and set of libraries and utilitied designed to run on an embedded system(for example:router).

#### 2.1.1 Device Tree

Device Tree is a mechanism to describe all hardware and devices of a system. In early Linux kernel, hardware description is hardcode in kernel file, so porting kernel to different ARM-CPU based system is painful. To solve this problem, Device Tree is introduced. Like x86 based system, we should consider Linux kernel image is a black box, and give the hardware informations of system to kernel.

In FPGA development flow, the whole system keeps almost the same, the =only thing changes is our design in PL(Programmable Logic) side. To boot Linux with different PL design, only a little modification of devicetree file is needed.

#### 2.1.2 Linux Kernel Driver

### **2.2** UIO

For many types of devices, creating a Linux kernel driver is overkill. All that is really needed is some way to handle an interrupt and provide access to the memory space of the device. To address this situation, the userspace I/O system (UIO) was designed. Hardware that is ideally suited for an UIO driver fulfills all of the following:

- The device has memory that can be mapped.
- The device can be controlled completely by writing to this memory.
- The device usually generates interrupts.
- The device does not fit into one of the standard kernel subsystems.

Figure 2.1 shows how the UIO system works, in software-side of FPGA development, we only care about the value in the hardware register and when we can get the correct value, so memory -mapping to user-speae application and interrupt handler is realy enough in our design flow.



Figure 2.1: The UIO way.

### 2.3 AXI Bus

Advanced eXtensible Interface (AXI) protocol is part of ARM AMBA(Advanced Microcontroller Bus Architecture). It is a on-chip bus interface that is targeted at high performance, high clock frequency system designs and includes features that make it suitable for high speed sub-micrometer interconnect:

- separate address/control and data phases
- support for unaligned data transfers using byte strobes
- burst based transactions with only start address issued
- issuing of multiple outstanding addresses with out of order responses
- easy addition of register stages to provide timing closure.

#### 2.3.1 AMBA

#### 2.3.2 **AXI4**

AXI4

- AXI4:
- AXI4-Lite:
- AXI4-Stream:

### 2.4 **DMA**

### 2.4.1 DMA Engine

### **Proposed solution**



Figure 3.1: Custom IP with AXI4-Lite/Full Register.

### 3.1 Linux UIO driver for AXI DMA

#### 3.1.1 Scatter Gather

In tranditional DMA transaction, it can only accept a contiguous (nonsegmented) block of physical memory, so, if we want to use DMA in userspace, and we can not get a contiguous memory space(like CMA), then we need to use DMA Scatter/Gather mode. This mode allows non- contiguous (nonsegmented) block of physical memory and this mode need to be turn on in Vivado design first. In this mode, DMA controller automatically give the start address of the segmented of memory after the previous transaction of segmented memory

is completed. To apply this mode, we need to construct a special data structure, Scatterlist, which collects start address and lengths of segmented block of user buffer memory. DMA engine will do the transaction according to this list.

### 3.1.2 Cache Coherency

While using DMA to do the data transfering, it may lead cache coherency problems. Figure 3.2 shows the cache coherency problem, both read and write will lead this problem, so if we want to transfer correct data, we must solve this problem.

My first trial is try to solve this problem in user application, that is, announce "volatile" buffer to prevents an optimizing compiler from optimizing away subsequent reads or writes and thus incorrectly reusing a stale value or omitting writes.

### 3.2 Implementation



Figure 3.2: Cache Coherency Problems.

### **Environment Framework**

In our work,

- ZedBoard() as our target FGPA platform.
- Linux Kernel is compiled from the github repository provided by Xilinx.
- Linaro as our file system.
- Custom IP is designed(and provided) in Vivado 2016.04.
- Device Tree file is generated by SDK 2016.04, and need some little but significant changes.

# **Analysis**

This chapter, we use three different IP in our hardware designs. DMA with AXI Stream FIFO, DMA with OpenCores tinyAES and DMA with ECDSA(Curve secp256k1).

### 5.1 AXI FIFO

- 5.2 Custom Stream IP
- 5.2.1 DMA with OpenCores tinyAES
- 5.2.2 DMA with ECDSA(Curve secp256k1)
- 5.3 Comparison

| Id | Family       | #   | Id | Family          | #  |
|----|--------------|-----|----|-----------------|----|
| A  | DroidKungFu  | 473 | K  | FakePlayer      | 74 |
| В  | DorDrae      | 420 | L  | Wroba           | 74 |
| C  | Meds         | 221 | M  | Plankton        | 63 |
| D  | Fakeguard    | 203 | N  | DroidDreamLight | 52 |
| E  | Boxer        | 202 | О  | Cawitt          | 51 |
| F  | Kmin         | 183 | P  | Badao           | 46 |
| G  | Rooter       | 117 | Q  | Fake10086       | 46 |
| Η  | Boqx         | 114 | R  | Cupi            | 39 |
| I  | DroidAp      | 106 | S  | Coogos          | 39 |
| J  | DroidKungFu3 | 93  | T  | DroidDream      | 39 |

Table 5.1: Top 20 malware families in the dataset.

# **Conclusion**

# References