# Microprocessor Systems: Principles and Implementation



Chun-Jen Tsai NYCU 09/15/2023

#### Before We Start ...

- What is "hacker spirit?"
- □ Jargon File: a hacker is a person who enjoys exploring the details of programmable systems and stretching their capabilities, as opposed to most users, who prefer to learn only the minimum necessary.
- □ RFC-1392: a hacker is a person who delights in having an intimate understanding of the internal workings of a system, computers and computer networks in particular.

# A Human Being Has No Limit

- □ A senior US high school student, Sam Zeloof, built an IC (an OP AMP in 5µm process) in a garage in 2018
- □ In 2021, he made an IC with the same 10µm polysilicon gate process used by Intel's 4004 CPU



(source URL: sam.zeloof.xyz)

#### Introduction to the Course

- □ Lecture outline
  - Introduction to the Course and Target Platform
  - RISC-V Instruction Set Architecture
  - Microprocessor Design: History and Review
  - Application Processors and Aquila SoC
  - Memory Subsystem of Microprocessors
  - I/O Subsystem of Microprocessors
  - Operating System Support of Microprocessors
  - Multicore Organization of Microprocessors

#### Homework & Grading

- □ Homework (based on RISC-V processor for FPGA):
  - 0: Simulation of a HW-SW platform
  - 1: Real-time debugging of a HW-SW platform
  - 2: Branch predictor analysis and improvement
  - 3: Cache analysis and improvement
  - 4: Multithread synchronization improvement under RTOS
  - 5: Domain-specific accelerator for application processors
- □ Grading
  - HW# 1 ~ 5: 70%
  - Midterm online test: 5%, Final online test: 25%

# The Open-Source Aquila SoC

- □ In this course, labs are based on the open-source Aquila SoC: http://github.com/eisl-nctu/aquila
  - However, we will use a simplified version of Aquila
- ☐ The full version Aquila SoC is an open-source processor core:
  - Developed at the EISLab, Dept. of CS, NYCU (NCTU)
  - RISC-V RV32-IMA with zicsr, zifencei, and SV32 MMU
  - In-order, single-issue, five-stage pipeline microarchitecture
  - Capable of running RTOS and Linux 5.19.x
  - 0.96 Dhrystone MIPS/MHz compiled with GCC 12.2.0 –O2

#### A Short Table of DMIPS/MHz

#### □ Dhrystone is one of the oldest CPU benchmarks:

| Computer/CPU                                      | Year | Clock (MHz) | DMIPS/MHz             |
|---------------------------------------------------|------|-------------|-----------------------|
| UNIVAC I (first ISA computer)                     | 1951 | 2.25        | 0.0008                |
| Intel 4004 (first microprocessor)                 | 1971 | 0.74        | 0.124                 |
| IBM PC/Intel 8088 (begins the fall of mainframes) | 1979 | 4.77        | 0.145                 |
| PDP-11/70 (where UNIX & C were created)           | 1970 | ? (~ 10)    | 0.15                  |
| Apple II (origin of Taiwan's PC industry)         | 1977 | 1           | 0.43                  |
| Alpha 21064 (descendent used in SunWay)           | 1993 | 150         | 0.675                 |
| ARM 7 (popular for 2G mobile phones)              | 1994 | 45          | 0.889<br>← Aquila (0. |
| ARM Cortex M4 (intended for microcontrollers)     | 2010 | 200         | 1.25                  |
| Cray I (first super computer)                     | 1975 | 80          | 2.0 Falco (1.8        |
| ARM Cortex A53 (popular in 4G smartphones)        | 2014 | 1500        | 2.24 MediumE          |
| ARM Cortex A9 (popular for 3G feature phones)     | 2009 | 1500        | 2.5 ← Boom (3.9       |
| ARM Cortex A76 (popular in 4G smartphones)        | 2018 | 3000        | 12.4                  |
| Intel i7-12700                                    | 2018 | 4900        | 8.69                  |

#### Skill Requirements

- □ Verilog programming for HW design
  - Using FPGA as digital design target
  - Using waveform simulator for functional debugging
  - Using embedded logic analyzer for live debugging
- C programming for SW design
  - Using GCC toolchain for SW development
  - Using linker script usage for SW memory organization
  - Tracing assembly code for SW debugging

#### **Board-Level System Implementation**

- □ Integrate different ICs on a printed circuit board (PCB):
  - A PCB has multiple layers
  - Trace layers: connecting IC ports to other IC ports
  - Power layers: supply power to circuit components on the PCB
  - Ground layers: provide electrical grounds for signals
- □ Each IC implement analog and/or digital functions
  - Digital ICs: data processing, buffering, signal arbitration and bridging, etc.
  - Analog ICs: voltage regulation, AD/DA, signal amplification, etc.

#### Chip-Level System Implementation

- A System-on-Chip (SoC) is a complex IC that integrates the major functional components of a computing system into a single chip
- □ The SoC design typically incorporates several computing components (aka IPs)
  - Embedded processor cores (RISC/DSP/GPU/AI)
  - On-chip memory blocks
  - Accelerator Logic blocks
  - I/O logic blocks
  - Embedded software



# "Board-level" vs. "Chip-level" Design

- □ Board-level Design
  - More flexible in development cycle
  - High manufacturing cost for large quantities
  - Larger form factor and higher power consumption
- □ Chip-level Design (i.e. SoC)
  - More demanding in design, debug, and verification
  - Low cost for large quantities
  - Smaller form factor and lower power consumption
- □ A new trend: package-level design, aka system-inpackage (SiP)

#### **Application Processor Concept**

- □ You probably have learned the following terms:
  - CPU
  - Microprocessor
  - Microcontroller
  - **I** . . .
- ☐ The concept of "Application Processor (AP)" comes from the big boom of smartphones
- □ An AP is an SoC that contains:
  - Processor cores
  - Optimized domain-specific accelerators (DSAs)
  - An efficient memory subsystem

#### The Architecture of the Aquila SoC

- ☐ The Aquila SoC specification on Arty A7-100T:
  - A 32-bit RISC-V core @ 41.6667 MHz
    - L1 4-way set associative I/D-caches
  - 64KB tightly-coupled memory (TCM) / 256MB DDR3 DRAM
  - Integrated UART, Timer, and device controllers



#### The Target Development Board

- □ Arty A7-100T contains an XC7A100T FPGA:
  - 63,400 LUTs, 126,800 FFs, 600KB BRAMs, and 220 DSPs
  - Aquila core uses 5,300 LUTs, 3,000 FFs, and 4 DSPs
  - SoC uses 37% LUTs, 19% FFs, 26% BRAMs, and 2% DSPs



# The System HW-SW Layers

☐ The layers of the HW-SW system in this course:



- BSP stands for Board Support Package, it contains a basic set of I/O routines, usually for C programs (similar to BIOS for PC)
- FPGAs can initialize the on-chip RAM with a boot code.

# Implementation of Computing Systems

- □ All computing systems are used to compute functions
  - A function can be implemented in software, hardware, or both
  - Software: processor instructions or FPGA program bits



Hardware: analog circuits or digital circuits

# System Convergence

- □ Today, smart devices have many things in common:
  - Powerful processors and accelerators
  - Sensor integrations
  - Audio/video recording and/or playback
  - Complex feedback control of motors
  - Wireless communication capabilities
- □ One platform for all devices (Drones, ADAS, etc.)?
  - Application Processor
  - GPS
  - AV codecs
  - Motor control
  - Wi-Fi and BT





#### **Discussions**

- ☐ This course tries to teach you the entire HW-SW system, with almost all source code exposed to you
- □ Taiwan has an IT industry today because back in 1977, the co-funder of Apple, Steve Wozniak, insisted that the entire source code and the schematics of the Apple II computer should be available to users