# Reference Manual for the LatticeMico32 soft CPU Instruction Set Simulator



Simon Southwell
August 2016

## Introduction

This package comprises an instruction set simulator, modelling the Lattice Mico32 soft-CPU. It implements all the non-optional features, and most of the optional features of that core. It has both C++ and C compatible APIs and is extensible to include additional modelled functionality. The source is free-software, released under the terms of the GNU licence (see LICENCE.txt included in the package).

#### **Features**

#### Included Features:

- All supported core instructions
- All h/w modelled for configurable instructions
  - Multiplier
  - Divider
  - Sign extender
  - Barrel shifter
- Configurable internal memory
- All h/w debug break- and watchpoints modelled
- Cycle count functionality
- Configurable 'hardware', as per the Mico32
- Run-time and static disassembly
- Data and Instruction caches for timing model
- Extensibility via callbacks
  - Intercept memory accesses
  - Regular callback with ability for external interrupt generation
  - JTAG register access callback
- Configurable execution break points
  - On a given address
  - After a single step, or clock tick
  - After a fixed number of cycles
  - On 'hardware' debug break point
- Access to internal Memory
- Access to internal state
- Compatible with GNU tool chain (1m32-elf-xx)
- Both C++ and C linkage interfaces available

#### Features Not Included:

- JTAG interface model (but callback interface included)
- User definable instructions

The code is a simple exercise in modelling a RISC based embedded CPU. It comes with absolutely no warranties for accuracy, or fitness for any given purpose, and is provided 'as-is'. Hopefully it is useful for someone, and feel free to extend and enhance the model, and maybe let me know how it's going.

Simon Southwell (simon@anita-simulator.org.uk) Cambridge, August 2016

# Source files

Listed and described here are those source files that make up the Mico32 ISS library (e.g. libmico32.so). These are the source files needed for integration into other C or C++ environments. The source files for the example executable and test bench program, cpumico32, are not described in this document (i.e. cpumico32.cpp, cpumico32\_c.c and lm32\_get\_config.cpp). These are still freely available for use, under the terms of the GNU public license, but do not form part of the core functionality of the simulator, and are not documented.

The main header files comprise those listed below:

```
src/lm32_cpu.hsrc/lm32_cpu_hdr.hsrc/lm32_cpu_mico32.h
```

For integrating the model with external programs only 1m32\_cpu.h needs be included in source code that references the API, but this header makes reference to 1m32\_cpu\_hdr.h, which will need to be in the include path when compiling. The 1m32\_cpu\_mico32.h header is only used by the internal source files, and includes all the definitions and types needed by this code. The 1m32\_cpu.h header has the major class definition for the model (1m32\_cpu), and the other header, 1m32\_cpu\_hdr.h, has all the definitions for need by external programs using the API.

The following listed files define the methods that belong to the class 1m32\_cpu, and headers specific to those methods. The class methods are split over several files, but all belong to the single 1m32\_cpu class.

```
src/lm32_cpu.cpp
src/lm32_cpu_inst.cpp
src/lm32_cpu_elf.h
src/lm32_cpu_elf.cpp
src/lm32_cpu_disassembler.cpp
```

The entry point methods and program flow methods are all defined in lm32\_cpu.cpp, whereas the instructions themselves have methods defined in lm32\_cpu\_inst.cpp. There is almost a one-to-one mapping of LM32 instructions and the instruction methods, but a couple of methods double up for multiple instructions. The processing of the ELF program files are handled in methods defined in lm32\_cpu\_elf.cpp, with its header file as lm32\_cpu\_elf.h. Code disassembly is handled by methods in defined in lm32\_cpu\_disassembler.cpp.

A cache is implemented, for timing modelling purposes, and is instantiated in the main class for both the data and instruction caches. It is defined in the following files:

```
src/lm32_cache.hsrc/lm32_cache.cpp
```

A C linkage interface is provided, for those requiring to integrate the model into a C environment, and this is defined in the following files:

```
src/lm32_cpu_c.hsrc/lm32_cpu_c.cpp
```

For external programs interfacing to the model over the C interface, the 1m32\_cpu\_c.h header must be included in source code making reference to that API, in place of 1m32\_cpu.h. The 1m32\_cpu\_hdr.h still needs to be in the include path, when using the C interface.

# **Building code**

Included in the package is a makefile to build the code under Linux or Cygwin, and support is also provided for MSVC 2010. Under the UN\*X systems, by default (i.e. simply typing 'make') it will build the following:

- cpumico32
- libmico32.a
- libmico32.so

The first is an executable (see man/cpumico32.1) for running simple programs, particularly the self-test programs provided in the package—see "Testing" section below. The next two are a static and dynamic library respectively, and are the libraries an external program can use to link with the model, choosing the appropriate one depending whether static or dynamic linking was most appropriate for the particular application. The API for the libraries, and its use, is described in the "API" section below.

The makefile also, by default, builds the code with debug information (with g++ option -g) and as position-independent code (with option -fPIC—though this option is not needed in Cygwin). These are defined in the make variable 'COPTS', and can be overridden at the make command line. By default, the model is 'big endian', just like the Lattice processor. For variants that have been modified to be 'little endian', the model can be compiled with a COPTS value that includes the definition option "-DLM32\_LITTLE\_ENDIAN".

No code coverage information is included by default, but the 'COVOPTS' make variable can be set at the command line to add, say, 'gcov' coverage information in the build (e.g. COVOPTS="-coverage"). If 'lcov' is available, the HTML output can be generated with 'make coverage', after the tests have been executed. The output is placed in a directory cov html/.

Support for MSVC 2010 is provided, with a solution file (.sln) in the msvc/ directory, along with the minimal set of project files to read in to the MSVC 2010 IDE, and compile and run the model, but if MSBuild.exe is in the PATH under Cygwin, then the makefile has support to build from the command line with 'make MSVC'. The MSBuild.exe executable is part of Microsoft.NET, and thus can normally be found in a directory (for example, under Cygwin) such as:

```
<cdrive_path>/Windows/Microsoft.NET/Framework/v4.0.30319
```

The <cdrive\_path> is the Cygwin path to the windows disk (most likely /cygdrive/c) and the final directory name will depend on the particular version of Microsoft NET installed. For 64 bit machines, a 64 bit version of the executable will be under Framework64.

By default, a make build for MSVC builds a 'Release' executable, which is placed in the same directory as for the other builds of the makefile. If a 'Debug' version is required, then the default can be overridden via the MSVCCONF make variable—i.e.:

```
make MSVCCONF="Debug" MSVC
```

Like the make for UN\*X, the MSVC build produces a cpumico32.exe executable, but only a single library, libmico32.dll.

# **API**

The API to the model is a C++ interface (though a C interface is provided—see "C Linkage Interface" section below), that consists of a single object (of class 1m32\_cpu, as defined in 1m32\_cpu.h) that has a set of methods for configuring the model, setting control of program flow, and running executable code. Definitions are provided in 1m32\_cpu\_hdr.h needed to communicate with some of these methods, and set their parameters. This is all described in the sections to follow. In summary, the methods are:

```
lm32_cpu (int
                                            verbose.
                      bool
                                            disable reset break,
                                            disable_lock_break,
                      bool
                      hool
                                            disable_hw_break,
                                           disassemble_run,
                      hoo1
                      uint32 t
                                           num mem bytes,
                                           mem_offset,
                      uint32_t
                      int
                                           mem_wait_states,
                                           entry_point_addr,
                      uint32_t
                      uint32 t
                                           cfg_word,
                      FILE*
                                           ofp,
                      lm32_cache_config_t* p_dcache_cfg,
                      lm32_cache_config_t* p_icache_cfg,
                      uint32 t
                                           disassemble start)
int
            lm32_run_program (const char* elf fname,
                              int
                                          run cycles,
                              int
                                          break addr,
                                          exec_type,
                              int
                                          load code)
                              bool
void
            lm32_register_int_callback
                                             (p_lm32_intcallback_t callback_func)
            lm32_register_ext_mem_callback (p_lm32_memcallback_t callback_func)
void
            lm32_register_jtag_mem_callback (p_lm32_jtagcallback_t callback_func)
void
void
            lm32_reset_cpu (void)
            lm32_set_verbosity_level (int level)
lm32_time_t lm32_get_current_time (void)
           lm32_set_configuration (uint32_t word)
void
uint32 t
            lm32_get_configuration (void)
lm32 time t lm32_get_num_instructions (void)
uint32_t
            lm32_read_mem (uint32_t byte_addr, int type)
            lm32_write_mem (uint32_t byte_addr, uint32_t data, int type, bool dis_cyc_cnt)
void
void
            lm32_dump_registers (void)
lm32_state lm32_get_cpu_state (void)
            lm32_set_cpu_state (1m32_state new_state)
void
void
            lm32_set_hw_debug_reg (uint32_t address, int type)
void
            lm32_set_gp_reg (uint32_t index, uint32_t value)
```

#### Initialisation

The model object is created by instantiating a variable of type 1m32\_cpu class, or creating via 'new'. The constructor, 1m32\_cpu(), has a set of inputs for the initial configuration of the model.

| verbose       |                                          |
|---------------|------------------------------------------|
| type          | int                                      |
| valid values  | LM32_VERBOSITY_OFF, LM32_VERBOSITY_LVL_1 |
| default value | LM32_VERBOSITY_OFF                       |

| description | Controls level of verbosity. Currently this is either on or off (i.e. only one level). |
|-------------|----------------------------------------------------------------------------------------|
|             | When on, a disassembled output showing program flow is sent to the output              |
|             | stream (see 'ofp' description below).                                                  |

| disable_reset_break |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type                | bool                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| valid values        | true, false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| default value       | true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| description         | Controls whether the model will break and return on a reset exception. If, from a callback function, <code>lm32_reset_cpu()</code> is invoked, emulating a pin reset, a reset exception is flagged internally to the model, and the exception is handled if enabled. When this parameter is <code>false</code> , this will cause a break and return from <code>lm32_run_program()</code> with a value of <code>LM32_RESET_BREAK</code> . This break occurs whether or not the internal state of the model means the exception is handled (e.g. if the <code>IE</code> register is set to disable interrupts). This is useful if the calling program wants to reconfigure the model between resets. If the parameter is <code>true</code> , the model does not break on a reset. |

| disable_lock_break |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type               | bool                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| valid values       | true, false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| default value      | false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| description        | Controls whether the model will break and return on detection of a program 'lock' condition; i.e. an instruction with a 'jump to self' characteristic that would lock further program flow. This is useful when running a program with a definite termination point (while(1);). However, in an event driven environment, the main thread may have this construct, with the system simply responding to incoming events, and this feature would be disabled, with alterative breaking needed to return from the model. |

| disable_hw_break |                                                                                                                                                                                                                                                                                                                                                                                                                      |
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type             | bool                                                                                                                                                                                                                                                                                                                                                                                                                 |
| valid values     | true, false                                                                                                                                                                                                                                                                                                                                                                                                          |
| default value    | false                                                                                                                                                                                                                                                                                                                                                                                                                |
| description      | Control whether the model will break and return on detection of a hardware break or watch points (if configured). When enabled, after a hardware breakpoint or watchpoint is reached, the model will return control to the calling program. On re-entry to the model, the program flow will continue from exception point, including calling the exception vector code, as it would have without the external break. |

| disassemble_run |                                                                           |
|-----------------|---------------------------------------------------------------------------|
| type            | bool                                                                      |
| valid values    | true, false                                                               |
| default value   | false                                                                     |
| description     | When set 'true', 'running' the program does not execute the program code, |

| but simply runs through the code linearly generating disassembled output (as   |
|--------------------------------------------------------------------------------|
| if verbose were set). This turns the model into straight forward disassembler. |

| disassemble_start |                                                                                                   |
|-------------------|---------------------------------------------------------------------------------------------------|
| type              | uint32_t (#include <cstdint>)</cstdint>                                                           |
| valid values      | 0x0 to 0xffffffc                                                                                  |
| default value     | 0                                                                                                 |
| description       | When verbose mode set, specifies after which cycle the verbose output will start to be displayed. |

| num_mem_bytes |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | uint32_t (#include <cstdint>)</cstdint>                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| valid values  | 0x0 to 0xffffffc                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| default value | 65536                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| description   | Define the number of bytes of internally modelled memory. This value is rounded up to a 4 byte boundary. The internal memory is contiguous and is read- and writeable. Internal memory does have to be specified, but this places a requirement on having a registered external memory callback to handle all accesses to memory (see "Callbacks" section below). Internal memory will be masked by a callback that intercepts addresses that overlap the internal memory space. |

| mem_offset    |                                                                                                                                                                                                                                                                                                                                                                                  |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | uint32_t (#include <cstdint>)</cstdint>                                                                                                                                                                                                                                                                                                                                          |
| valid values  | 0x0 to 0xffffffc                                                                                                                                                                                                                                                                                                                                                                 |
| default value | 0                                                                                                                                                                                                                                                                                                                                                                                |
| description   | Define the byte address offset for the internal memory, if used. This value is rounded down to the nearest 4 byte word boundary. Useful if code to be executed is located in a region away from address 0, but it is still desired to have it loaded into internal memory. Usually used with <a href="mailto:entry_point_addrargument">entry_point_addrargument</a> (see below). |

| mem_wait_states |                                                                                                                                                                                                                                                              |
|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type            | int                                                                                                                                                                                                                                                          |
| valid values    | 0 to INT_MAX                                                                                                                                                                                                                                                 |
| default value   | 0                                                                                                                                                                                                                                                            |
| description     | Defines the number of wait states to be associated with read or write accesses to internal memory. This value will be added to the cycle count for any access to the internal memory over and above any issue or stall due to the load or store instruction. |

| entry_point_addr |                                                                                                                                                                                                                                  |
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type             | uint32_t (#include <cstdint>)</cstdint>                                                                                                                                                                                          |
| valid values     | 0x0 to 0xffffffc                                                                                                                                                                                                                 |
| default value    | 0                                                                                                                                                                                                                                |
| description      | Defines the address for the reset PC value. The model normally starts, or resets to address 0, which is where the default internal memory resides. If code is compiled for a different location, then the internal memory can be |

| relocated with mem_offset, a     | and | the   | reset  | address  | specified  | with  | this |
|----------------------------------|-----|-------|--------|----------|------------|-------|------|
| argument, to point to the reloca | ted | inter | nal me | mory (or | a region h | andle | d by |
| an external memory callback).    |     |       |        |          |            |       |      |

| cfg_word      |                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | uint32_t (#include <cstdint>)</cstdint>                                                                                                                                                                                                                                                                                                                                                                                                      |
| valid values  | 0x0 to 0xfffffff                                                                                                                                                                                                                                                                                                                                                                                                                             |
| default value | LM32_DEFAULT_CONFIG                                                                                                                                                                                                                                                                                                                                                                                                                          |
| description   | At construction, the value of the CFG register can be set with this value to enable or disable hardware features. The value is exactly compatible with the CFG register as defined in the Lattice Mico32 Processor Reference Manual [1], section 'Control and Status Register'. A mask is applied to the value set, so that features not supported by the model cannot be enabled—see "Introduction" section above for unsupported features. |
|               | Some definitions are defined in 1m32_cpu_hdr.h that can be ORed together into the cfg_word, to enable features:                                                                                                                                                                                                                                                                                                                              |
|               | LM32_MULT_ENABLE LM32_DIV_ENABLE LM32_SHIFT_ENABLE LM32_SEXT_ENABLE LM32_COUNT_ENABLE LM32_DCACHE_ENABLE LM32_ICACHE_ENABLE LM32_SWDEBUG_ENABLE LM32_SWDEBUG_ENABLE LM32_HWDEBUG_ENABLE LM32_JTAG_ENABLE LM32_NUM_BP_[0-4] LM32_NUM_WP_[0-4] To configure the number of external interrupts, a value (between 0 and 32) can be                                                                                                               |
|               | shifted and ORed into the cfg_word. E.g. (32 << LM32_CFG_INT).                                                                                                                                                                                                                                                                                                                                                                               |

| ofp           |                                                                                                                                                                      |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | FILE* (#include <cstdio>)</cstdio>                                                                                                                                   |
| valid values  | NULL, <valid file="" pointer=""></valid>                                                                                                                             |
| default value | stdout                                                                                                                                                               |
| description   | A file pointer to direct verbose and debug output data. By default this is to stdout, but a valid open writeable file pointer can be specified to direct the output. |

| p_dcache_cfg  | p_dcache_cfg                                                                                                                                                                                                                   |  |  |  |  |
|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| type          | lm32_cache_config_t*                                                                                                                                                                                                           |  |  |  |  |
| valid values  | NULL, <valid lm32_cache_config_t="" pointer=""></valid>                                                                                                                                                                        |  |  |  |  |
| default value | NULL                                                                                                                                                                                                                           |  |  |  |  |
| description   | A pointer to a cache configuration structure. This structure contains values for the configurable parameters of the data cache. This structure is defined, in <a href="mailto:lm32_cpu_hdr.h">lm32_cpu_hdr.h</a> , as follows: |  |  |  |  |

| <pre>typedef struct {     uint32_t cache_base_addr;     uint32_t cache_limit;     int cache_num_sets;     int cache_num_ways;     int cache_bytes_per_line; } lm32 cache config t;</pre>                                   |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Valid values for these parameters are as per the LatticeMico32 Process Reference Manual, Chapter 3, table 17. If the parameter is set to NULL, the the data cache will use default values (if a data cache is configured). |  |

| p_icache_cfg  | p_icache_cfg                                                                                                                                                        |  |  |
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| type          | lm32_cache_config_t*                                                                                                                                                |  |  |
| valid values  | NULL, <valid lm32_cache_config_t="" pointer=""></valid>                                                                                                             |  |  |
| default value | NULL                                                                                                                                                                |  |  |
| description   | A pointer to a cache configuration structure. This structure contains values for the configurable parameters of the instruction cache. See p_dcache_cfg for details |  |  |

# **Execution and breakpoints**

Once a model object is created, a program can be run via the <code>lm32\_run\_program()</code> method. At its simplest, it is called with a program file name to load and execute, and run 'forever'. However, other features are controllable on calling to limit the amount of execution. The call to the method can also return due to other break events that were specified at initialisation (see "Initialisation" section above). A returned value indicates why the <code>lm32\_run\_program()</code> exited. The parameters to the methods are described below:

| elf_name      |                                                                                                                                                     |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | const char*                                                                                                                                         |
| valid values  | string pointer with valid ELF program name, "" (empty string)                                                                                       |
| default value | "test.elf"                                                                                                                                          |
| description   | The name of the ELF program to load and execute. Can be an empty string if the call is not a load type (e.g. LM32_SINGLE_STEP—see exec_type below). |

| run_cycle     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | int                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| valid values  | 1 to INT_MAX, LM32_FOREVER or LM32_ONCE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| default value | LM32_FOREVER                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| description   | Defines the run cycle count to reach before returning. Any positive value between 1 and INT_MAX can be specified, or LM32_FOREVER to not break on a cycle count, or LM32_ONCE. Note that the timing model (see "Timing Model" section below) is such that it can only break on instruction boundaries. As some instructions take multiple cycles, the cycle count on returning may be greater than that specified, up to the amount that the last instruction took to execute. Note also that specifying a run_cycle of a value less than the model's current cycle count is equivalent to running with a value of LM32_ONCE (see below for lm32_get_current_time() method). |

| break_addr    |                                                                                                                                                                                                       |
|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | int                                                                                                                                                                                                   |
| valid values  | 0 to 0xfffffffc, LM32_NO_BREAK_ADDR                                                                                                                                                                   |
| default value | LM32_NO_BREAK_ADDR                                                                                                                                                                                    |
| description   | Specifies a return point when the PC reaches a particular address, or if no break on address is required (i.e.LM32_NO_BREAK_ADDR). Note that the lower two bits of the specified address are ignored. |

| exec_type     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | int                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| valid values  | LM32_RUN_FROM_RESET, LM32_RUN_CONTINUE, LM32_RUN_SINGLE_STEP, LM32_RUN_TICK                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| default value | LM32_RUN_FROM_RESET                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| description   | Specifies the action on calling the method externally. Normally a value of LM32_RUN_FROM_RESET is specified with an executable file given in 'elf_name', and any break point settings. The program is loaded into memory, and the CPU reset, starting execution from address 0. For single stepping the program, LM32_RUN_SINGLE_STEP is used. In this case the model returns after executing only one instruction. The run_cycle, break_addr and elf_name parameters are ignored for this type. A similar type is LM32_RUN_TICK. This advances just a single clock, but doesn't necessarily execute an instruction; say if the last instruction takes multiple cycles. It advances time by one clock only, and executes an instruction only when the tick count and the instruction cycle counts agree. It is useful if integration with an environment that has a timing model that advances by single clock ticks. The LM32_RUN_CONTINUE is used to continue execution from the point at which the method last returned. The break parameters are active in this call type, but elf_name is ignored, and no program is loaded. |

| load_code     |                                                                                                                                                                                                                                                                                                             |
|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type          | bool                                                                                                                                                                                                                                                                                                        |
| valid values  | true, false                                                                                                                                                                                                                                                                                                 |
| default value | false                                                                                                                                                                                                                                                                                                       |
| description   | Specifies to load the program specified by the elf_name argument into memory. The program will be load to memory, in all events, when exec_type is set to LM32_RUN_FROM_RESET, but this parameter can be used to re-load, or replace a program if the lm32_run_program() call returned on some break point. |

#### **Return Value**

The 1m32\_run\_program() method returns one of several values to indicate why it returned.

• LM32\_USER\_BREAK: returned having reached the user specified break address, set at configuration or passed as a parameter when calling 1m32\_run\_program().

- LM32\_SINGLE\_STEP\_BREAK: returned whilst single stepping (i.e. exec\_type argument was set as LM32\_RUN\_SINGLE\_STEP when lm32\_run\_program() called
- LM32\_TICK\_BREAK: returned whilst executing with an exec\_type of LM32\_RUN\_TICK
- LM32\_RESET\_BREAK: returned if the model was externally reset via 1m32\_reset\_cpu()
  (and reset breaking was not disabled)
- LM32\_LOCK\_BREAK: Reached a program 'lock' condition (and lock breaking was not disabled)
- LM32\_DISASSEMBLE\_BREAK: reached the end of the program during disassemble mode (see "Disassembled Output" section below)
- LM32\_HW\_BREAKPOINT\_BREAK: a hardware breakpoint fired, when breaking on hardware debug events were enabled.
- LM32\_HW\_WATCHPOINT\_BREAK: a hardware watchpoint fired, when breaking on hardware debug events were enabled.

#### Run-time configuration and status

Some methods are provided for inspecting status and setting configuration at run-time, i.e. after 1m32 run program() has returned.

# lm32\_set\_verbosity(int level)

description

Allows the verbosity level to be changed. Its single parameter has valid values as verbose of the 1m32\_cpu constructor. This is useful for debugging of long programs, which would generate a large output if verbosity specified from time 0. A break can be set up to return at a known point before the area of interest, and verbosity increased before continuing

#### lm32\_get\_current\_time()

description

Returns the current internal cycle count of the model. This counts upward from 0 for all execution, and is not the CC value which can be reset. This is useful if wanting a break point on a cycle count relative to current time, rather than an absolute value.

#### lm32\_get\_num\_intructions()

description

Returns the current count of instructions executed since time 0. Useful for statistical analysis and performance measurements.

#### lm32\_get\_configuration()

description

Returns the value of the CFG register as a uint32\_t. It allows the external inspection of 'implemented' hardware. Since any configuration value is masked (see  $1m32\_cpu()$  description above, and  $1m32\_set\_configuration()$  description below), then this give a definitive value as being used by the model.

#### lm32 set configuration(uint32 t word)

description

Sets the value of the CFG register, masked to allow only supported features to be enabled. This ability to dynamically update the hardware configuration is not really supported in the Mico32, but it is useful for testing the model. The test

suite (see "Testing" section below) run on cpumico32 reconfigures the model via the memory callback function to test that removing the features disables them as far as the program is concerned, and the proper response is seen.

#### Reset event

| lm32_reset_cpu() |                                                                              |
|------------------|------------------------------------------------------------------------------|
| description      | Method used to model asserting the reset pin of the Mico32. It generates and |
|                  | internal event and is processed accordingly, as for the real processor.      |

#### Callbacks

The ISS is a model of a processor core, and its main usage is as a component in a larger system level model. It has an internal memory model for convenience and to aid stand alone testing, but it is via the callbacks that the model can be extended or integrated into a system model of arbitrary complexity. The model supports three types of user defined callbacks that can be registered with the model. One is for calling at each memory access that the CPU performs, the second is called at regular intervals (from once each instruction boundary, to as long as specified in a wait or sleep period). The third is to allow a JTAG interface to be implemented as an add-on, and is invoked whenever the model accesses the JTX or JRX registers.

The main extension is to map peripherals (including more memory, if desired) into the memory space via the external memory callback function, trapping accesses to addresses with memory mapped peripheral registers and implementing the functionality.

To take a trivial example, to model a system that has a serial output then the external memory callback mechanism would seem to be what you need. It is there to allow modelling of hardware that is external to the lm32 model itself. By registering a callback function with the method lm32\_register\_ext\_mem\_callback(), then all accesses to anywhere in the memory space by the model will call this registered function first, with the address and any write data value (assuming a write). Taking an example, if you modelled a serial port with an output register located at, say, 0x80000000 or wherever, then the callback function can test for this address, and print a character to the screen of the value of the lowest byte of the data word passed in, returning a wait-state count for the access (any value of 0 or more). For all other addresses the callback function returns LM32\_EXT\_MEM\_NOT\_PROCESSED to allow the model to handle the access. This concept can be extended to model any hardware with a memory mapped register set, accessible by the processor. You can try this mechanism for yourself by registering your own external memory callback function and have it print out all of the memory read and write accesses it sees (returning LM32 EXT MEM NOT PROCESSED so that the model can continue). You'll see it accessing all the program locations and any other locations it is trying to access.

For functionality that isn't a memory mapped register, but can generate an interrupt, or requires updating on CPU time, then the interrupt callback can be used. This can be called up to every instruction, or at longer intervals, and it can generate an interrupt upon returning (or not). Taking the serial output port as an example, if it is required that the peripheral generates an interrupt some cycles after a byte is transmitted, then, after a sufficient number of calls to the int callback, from the register write (via the external memory callback), the callback can return a value, indicating an interrupt on one of the interrupt pins.

Note that these two callbacks are shared amongst all modelled peripherals, and the callbacks will have to do initial decoding. For the external memory accesses, this is just decoding incoming addresses, perhaps doing a page decode to identify a particular peripheral, and then

calling a model function which does the rest of the decode. For the interrupt callback, then it is envisaged that all peripheral models are called at every cycle, amalgamating any interrupts returned from the peripheral model functions.

A special callback is implemented to handle JTAG accesses, via the JTX and JRX registers. This is simply a hook for functionality that isn't implemented in the model, should anyone wish to add support for this.

The three callback registration functions are described below:

#### lm32\_register\_int\_callback (p\_lm32\_intcallback\_t callback\_func)

description

The caller must provide as the parameter input, a pointer to a function of type p lm32 intcallback t, e.g.:

```
uint32_t cb_func(lm32_time_t time, lm32_time_t* wakeup_time);
```

If a function is registered, the model will call this function at least once at time 0. The user function receives the current time in the 'time' parameter. Before returning the function must update the contents of the integer pointed to by 'wakeup\_time' to indicate the cycle it next wishes to be called. A value of 0 or less than or equal to the 'time' parameter will mean it is called after the next instruction. If the callback function wishes to delay being called to a future time, then the wakeup\_time is specified for some value greater than 'time'. A negative value returned informs the model that a request to terminate is requested, and a user breakpoint is generated.

The return value from the callback is the pattern of inputs to the external interrupt pins (up to 32). For configured input pins, the event is set in the IP register of the model, and will generate an interrupt if the model has these enabled (IE and IM settings). If the CFG register is not configured for 32 interrupts, then bits set for all non-configured inputs are ignored, and the IP register bit is not set.

#### lm32\_register\_ext\_mem\_callback (p\_lm32\_memcallback\_t callback\_func)

description

The caller must provide, as the input parameter, a pointer to a function of type p\_lm32\_memcallback\_t e.g.:

If a function is registered the model will call this for every memory access made (i.e. via load and store instructions). The address being called is passed in as 'byte\_addr', and data is exchanged via the 32 bit word pointed to by 'data'. On write access types this contains the data to be written. It has meaning to update this on write accesses, as the model will ignore it, but it is safe to do so. On read access types, the data to be returned is placed into the integer pointed to by data. The type parameter takes on one of several values, to indicate the direction and size of access, as shown below.

```
LM32_MEM_WR_ACCESS_BYTE
LM32_MEM_WR_ACCESS_HWORD
LM32_MEM_WR_ACCESS_WORD
LM32_MEM_WR_ACCESS_INSTR
LM32_MEM_RD_ACCESS_BYTE
LM32_MEM_RD_ACCESS_HWORD
```

```
LM32_MEM_RD_ACCESS_WORD
LM32_MEM_RD_INSTR
```

The cache\_hit parameter is a flag which indicates whether the memory access is a true access to memory (when zero), or whether the cache is fetching data on a cache hit (when non-zero). The cache model does not store actual data internally, but only keeps track of which addresses are cached, and still fetches data from memory. This flag allows the external callback functions to differentiate between access types if, say, it is keeping statistics on access rate, location etc.

The callback function can intercept any of the memory accesses to update its own internal state, and thus model memory mapped external blocks. If the callback processed the memory access it must return a cycle count from 0 to INT\_MAX to indicate the number of wait states that the model must add to its internal cycle count. If the callback is modelling a single cycle access, then 0 would be returned (no wait-states). If the callback is modelling more complex peripherals (e.g. with resource sharing and arbitration) with wait states generated, it would return a positive integer to indicate the number of wait state cycles elapsed. If the supplied address did not access any portion of memory covered by the callback function, or the access was invalid for some reason (misaligned, say), the callback *must* return LM32\_EXT\_MEM\_NOT\_PROCESSED, to allow the model to attempt an access to its internal memory. Honouring this requirement is important for correct timing operation, and the timing modelling is only as good as the values returned by the callback.

Note that for cache hit accesses, (cache\_hit non-zero) the returned number can be any value zero or greater (but not LM32\_EXT\_MEM\_NOT\_PROCESSED, if in a mapped region) as the timing model uses a number based on the timing for a cache access.

#### lm32\_register\_jtag\_callback (p\_lm32\_jtagcallback\_t callback\_func)

description

The caller must provide, as the single input parameter, a pointer to a function of type  $p_1m32_jtagcallback_t$ , e.g.:

```
void cb_func(uint32_t *data, int type, lm32_time_t time);
```

If a callback function is registered, the model will execute this each time the model accesses the JTX or JRX registers (via the rcsr or wcsr instructions). One of three types is passed in:

```
LM32_JTX_WR
LM32_JTX_RD
LM32_JRX_RD
```

There is no JRX write type as this has no function, and the model ignores the instruction internally. When the type is a write, the pointer '\*data' points to the data byte to be written. On read types, the returned data is placed in to the location pointed to by \*data. For both JTX and JRX, bit 8 should contain the 'full' status of the TX or RX register.

By default, the model has JTAG as an unimplemented featured. This can be enabled or disabled via lm32\_set\_configuration(), or set at the model object's construction, but a side-effect of registering a JTAG callback function is

that the feature is enabled, and the bit set in the CFG register automatically.

#### Internal memory access

The API provides direct access to the model's internal memory, via two methods. Using these methods will also invoke any external memory callback function, and so can be used to peek and poke memory areas implemented externally via the callback.

# 1m32\_read\_mem(uint32\_t byte\_addr, int type) description Returns a 32 bit word with the value at address specified by 'byte\_addr'. Valid types are LM32\_MEM\_RD\_xx types as specified for the memory callback (see above). Any other type will cause a fatal error.

```
lm32_write_mem(uint32_t byte_addr, uint32_t data, int type, bool
db1_cyc_cnt)

description

Writes the data in 'data' to the address specified in 'byte_addr'. Valid types
are LM32_MEM_WR_xx types as specified for the memory callback (see above).
Any other type will cause a fatal error.

By default, the cycle count is advanced whenever this function is called,
depending on other settings. This can be disabled with the optional third
argument for use when, say, loading binary program data using this function.
This argument is not available in the C interface function.
```

There are no safe guards on the calling of these memory access functions, and all internal memory is accessible from them. However, accessing invalid areas of memory will cause fatal exceptions. Use with caution.

#### Internal state access

Three methods are provided to give access to internal register state of the model, either to the output stream, or returned to the calling program.

```
lm32 dump registers()
description
             will print out the complete set of internal registers states to the output stream
              (as defined by the ofp parameter of the constructor), formatted as shown in the
             example below:
               r00 = 0x00000000 \quad r01 = 0x00003303 \quad r02 = 0x00000000 \quad r03 = 0x000000000
               r04 = 0x00000000 r05 = 0x00000000 r06 = 0x00000000 r07 = 0x000000000
               r08 = 0x00000000 \quad r09 = 0x000000000 \quad r10 = 0x000000000 \quad r11 = 0x0000000000
               r12 = 0x00000003 r13 = 0x00000000 r14 = 0x00000000 r15 = 0x00000000
               r16 = 0x00000000 \quad r17 = 0x00000000 \quad r18 = 0x00000000 \quad r19 = 0x000000000
               r20 = 0x00000000 \quad r21 = 0x00000000 \quad r22 = 0x00000000 \quad r23 = 0x00000000
               r24 = 0x0000900d r25 = 0x00000fffc gp = 0x000000000 fp = 0x000000000
               sp = 0x0000fff0 ra = 0x00000000 ea = 0x00000000 ba = 0x00000578
               pc = 0x0000005ac ie = 0x000000005 ip = 0x000000000 im = 0x000000000
               cc = 0x000005bb eba = 0x00000000
```

```
bp0 = 0x00000241 bp1 = 0x00000251 bp2 = 0x00000261 bp3 = 0x00000271 wp0 = 0x00003000 wp1 = 0x00003101 wp2 = 0x00003202 wp3 = 0x00003303 dc = 0x0000006c deba = 0x000000000
```

#### lm32\_get\_cpu\_state()

description

Returns a class 1m32\_cpu::1m32\_state containing a complete set of the register values, plus some other persistent internal state needed by the model. All register fields are of type uint32\_t: one for each register. See source file 1m32\_cpu.h for the details of the class definition.

#### lm32\_set\_cpu\_state()

description

Sets the internal model state to that passed into the function, of type lm32\_cpu::lm32\_state. This contains all the CPU registers, plus some other internal state, needed by the model. This is intended for use in save and restore operations, rather than as a means to update the models internal state externally. See source file lm32\_cpu.h for the details of the class definition.

#### lm32\_set\_hw\_debug\_reg(uint32\_t addr, int type)

description

Allows the external setting of the h/w debug registers. The address to be written to the register is passed in on the 'addr' parameter, and the register to access is defined via the 'type'. This can take on one of the following values.

```
LM32_CSR_ID_BP0
LM32_CSR_ID_BP1
LM32_CSR_ID_BP2
LM32_CSR_ID_BP3
LM32_CSR_ID_WP0
LM32_CSR_ID_WP1
LM32_CSR_ID_WP2
LM32_CSR_ID_WP2
LM32_CSR_ID_WP3
```

If the model is configured to have less than the full complement of break- or watch points, then attempting to set the registers via this method will have no effect. Note also, that the breakpoint 'addr' value must include the enable bit, exactly as defined for the BPx registers in the reference manual [1], but the watchpoint 'addr' is a pure 32 bit byte address so, in addition to the basic LM32\_CSR\_ID\_WPx value for the 'type', to define which watchpoint is updated, one of four settings must be 'ORed' in with the value to make up the complete watchpoint access type:

```
LM32_WP_DISABLED
LM32_WP_BREAK_ON_READ
LM32_WP_BREAK_IN_WRITE
LM32_WP_BREAK_ALWAYS
```

The method updates the relevant Cn fields of the DC register, based on these type values.

# lm32\_set\_gp\_reg(uint32\_t idx, uint32\_t value) description Allows the external setting of the 32 general purpose registers. The GP register to be written to is passed in on the 'idx' parameter, and the value to set is passed in on the 'value' parameter. This function is meant for debug, and initialisation. Caution should be used if setting the registers externally, whilst code is executing.

#### C Linkage Interface

A C linkage API is provided as an alternative to the C++ interface, for those who have a C environment that they wish to integrate the model into. The API is purposely as similar to the C++ API as possible, as it has been described above. There is a one-to-one correspondence to the C++ methods, with all the C API functions called lm32c\_<c++ equivalent suffix>, and each has an additional parameter, except for the initialisation function, as explained below.

The constructor is replaced with a function <code>lm32c\_cpu\_init()</code>. The parameters are the same as for the C++ constructor, only with the Boolean types now defined to be of type <code>'int'</code>. Thus their default values are no longer <code>'true'</code> or <code>'false'</code>, but the API defines values <code>TRUE</code> and <code>FALSE</code>, which replace these. The function returns a handle to a unique object, of type <code>lm32c\_hdl</code>, which must be saved, as all the other API functions require it to access the initialised model's instantiation. This allows for multiple instantiations of the model.

All the other C++ methods have an lm32c\_xxxx equivalent, with identical parameters, except that a new first parameter must be given, which is the handle returned by lm32c\_cpu\_init(). For example, if the handle has been saved in a variable lm32Hdl of type lm32c\_hdl, then a reset call is now lm32c\_reset\_cpu(lm32Hdl), in place of the C++ method lm32\_reset\_cpu(), described in the above sections. The lm32c\_get\_cpu\_state() function returns a structure of type lm32c\_state, rather than the class lm32\_state, but the field names are identical. The full list of C linkage functions is thus:

```
lm32c_cpu_init()
lm32c run program()
lm32c reset cpu()
lm32c_register_int_callback()
lm32c_register_ext_mem_callback()
lm32c_register_jtag_callback()
lm32c_set_verbosity_level()
lm32c_get_current_time()
lm32c set configuration()
lm32c get configuration()
lm32c_get_num_instructions()
lm32c_read_mem()
lm32c write mem()
lm32c_dump_registers()
lm32c_set_hw_debug_reg()
lm32c get cpu state()
```

The API is defined in the source file header 1m32\_cpu\_c.h, which must be included in code using the C API.

# **Disassembled Output**

The model can output (to 'ofp') fully disassembled output in one of two ways. Either the disassembled output can show program flow, during a normal execution of code on the model, or it can simply display a disassembled output of the specified ELF executable file.

Normal execution flow disassembly is instigated either by setting the constructor's 'verbose' parameter, or by calling lm32\_set\_verbosity\_level(). When verbosity is enabled the output looks something like the example fragment shown below:

```
0x01ac: (0x2b9f0034)
                      lw
                               ba, (sp +00052)
                                                    @433
0x01b0: (0x379c0038)
                      addi
                               sp,
                                   sp, 000056
                                                    @434
0x01b4: (0xc3e00000)
                      b
                               ba
                                                    @435
                                                    @439
0x0304: (0x5e8c00a8)
                      bne
                               r20, r12, 0000672
0x0308: (0x9a94a000)
                      xor
                               r20, r20, r20
                                                    @440
0x030c: (0x38013101)
                      ori
                               r1, r0, 0x3101
                                                    @441
0x0310: (0x30220000)
                      sb
                               (r1 +00000), r2
                                                    @442
0x0314: (0x5e8000a4)
                               r20, r0, 0000656
                                                    @443
                      bne
0x0318: (0x38010144)
                               r1, r0,
                                                    @444
                      ori
                                         0x0144
                               DC , r1
0x031c: (0xd1010000)
                      wcsr
                                                    @445
0x0320: (0x9a94a000)
                               r20, r20, r20
                                                    @446
                      xor
0x0324: (0x38013101)
                      ori
                               r1, r0, 0x3101
                                                    @447
```

The first field displays address of the instruction (i.e. the value of the PC register). When the program flow is disrupted (due to a branch, call, or exception), this field shows '\*', and the rest of the line is left blank, to ease finding the jumps when debugging. Field 2 gives the raw instruction value being executed followed by the actual disassembled instruction in field 3. The current cycle count is displayed in the last field. This mostly increases by one, but for instructions that take more cycles to execute, or are stalled etc., the value jumps by a larger amount. In the example above the branch instruction takes 4 cycles to issue, and so the cycle count jumps from 435 to 439.

The pure disassembled output is specified by setting the disassemble\_run parameter of the constructor, and has nearly identical output to that of the run-time output.

```
0x02f8: (0x9a94a000)
                                r20, r20, r20
                      xor
0x02fc: (0x38013101)
                      ori
                               r1, r0, 0x3101
0x0300: (0x10220000)
                               r2, (r1 +00000)
                      1b
0x0304: (0x5e8c00a8)
                      bne
                               r20, r12, 0000672
0x0308: (0x9a94a000)
                      xor
                               r20, r20, r20
0x030c: (0x38013101)
                      ori
                               r1, r0, 0x3101
                               (r1 +00000), r2
0x0310: (0x30220000)
                      sb
0x0314: (0x5e8000a4)
                      bne
                                r20, r0,
                                         0000656
0x0318: (0x38010144)
                      ori
                                r1, r0,
                                         0x0144
0x031c: (0xd1010000)
                                DC , r1
                      wcsr
0x0320: (0x9a94a000)
                      xor
                                r20, r20, r20
0x0324: (0x38013101)
                      ori
                                r1, r0, 0x3101
```

The main difference here is that there is no cycle count (as this has no meaning in this context), and there will be no breaks in address as the disassembling runs from the lowest to the highest address (of text areas) linearly.

# **Timing Model**

The ISS makes an approximation of time using the issue cycles and result cycles associated with each instruction, as defined in the LatticeMico32 reference manual [1]. Each instruction executed will advance the cycle count by at least its issue cycles, as the next instruction cannot be executed before this time. In addition, if it updates a register, then the result cycles value plus the current cycle count is stored for the target register. This is the earliest time that a future instruction can access this register. When an instruction is executed, its source registers (RY and, if applicable, RZ) have their availability times checked, and the cycle count is advanced to the time of the latest register's availability. This timing model does not take into account branch prediction, and uses the issue cycle numbers for 'taken' and 'not taken' unmodified, as defined in the reference manual [1]. The internal cycle count is also used as the basis for the CC register value. Since this register can be changed by software, but the cycle count needs to run continuously, the CC value is emulated by keeping a track of the offset from cycle count and the last programmed value, such that a read of the CC register will be correct, whilst still being based on the internal cycle count. This means only a single source is used for all timing.

The model can be advanced by single cycles, as well as single instructions, to allow the model to be called at a regular clock tick count. At each instruction the cycle count is advanced by one or more, depending on the instruction. At each clock 'tick', the clock time is advanced by one clock cycle. Only when the clock tick count matches the instruction cycle count is the next instruction executed. This is also useful when multiple instances of the model are instantiated, as they can be kept in synchronisation, by calling with a clock 'tick' rather than single-stepping, and their internal sense of time will advance at the same rate and remain locked, with just minor differences due to instruction execution granularity.

#### **Caches**

The model implements configurable caches for the data and instruction fetches. The cache models are for timing purposes only, and do not actually store data within them, but keep a record of which addresses are cached. If an access to a cached region of memory is made, whether to internal memory or to a region mapped by an external memory access callback function, the data is still accessed as normal, but the reported timing wait states are ignored if a cache hit, and single cycle accesses of the cache are used to update time instead. If a cache miss, however, the memory access wait states (if any) are scaled by the number of words accesses required to fill the cache line, as these would have been fetched by the cache. When no cache is configured, the wait states from memory callbacks or internal memory accesses are used unmodified to update time.

By default, the caches are disabled (i.e. the configuration is for no caches implemented). To enable caching, the configuration register (CFG) must be modified at instantiation or via the lm32\_set\_configuration() method (see API section above).

## **Source Code Architecture**

It is not the intention to go into minute detail for the internal architecture of the model here, but a brief overview of the main program flow, internal state, and major structures is in order, to allow anyone wishing to understand or modify the code enough of a handle, that they can explore the details on their own.

#### Main execution flow

Below is shown some pseudo-code of the main program flow when executing a program. The main <code>lm32\_cpu</code> class member functions are shown as <code>"funcname()"</code>, and the phrases between "<" and ">" describe local functionality. The indentation of the pseudo-code shows the calling hierarchy as implemented in the code.

```
lm32_run_program()
   <if running from reset...>
       <load ELF program to memory>
   <while no break point reached...>
       execute instruction()
           process exception()
               cprocess external interrupts>
               <if master interrupts enabled...>
                   interrupt()
                       <if interrupt outstanding...>
                           <generate exception>
           <fetch opcode from PC location in memory>
           <le><lookup decode table information using opcode>
           <extract argument fields from opcode>
           <if verbose or disassemble run...>
                disassemble()
           <if not disassemble run...>
                <lookup instruction function in tbl_p>
                <execute instruction function using decode table lookup data>
       <check for break points and flag>
```

The above pseudo-code is a rough outline only, and doesn't show callback handling, memory accesses, disassembling or instruction execution (though this last is described below, in the "Disassembled Output" section).

#### **Key Model State**

The list below show some of the major state used in the model.

• state: Contains all the CPU's modelled registers, e.g. r[32], pc, im etc. There is a field corresponding to each register in the Mico32 CPU, including debug registers. It is of type lm32\_cpu::lm32\_state. It also carries other persistent state, used by the model, that will need saving for save and restore operations

- state.int\_flags: bitmapped value indicating pending exception. Each of the bits, from bit 0 to bit 7, corresponds to the exception ID as defined in the reference manual [1]. This is part of the state structure.
- state.cycle\_count: number of executed cycles since time 0. Note that this
  is not the number of instructions executed. Instructions that take multiple
  cycles, increment this count by more than 1. This is part of the state
  structure.
- mem: pointer to internal memory. This can be NULL if none defined, and all memory handled by callback functions.
- mem\_tag: pointer to internal memory tag that contains debug tag data to mark the
  access types for internal memory locations. Can be NULL—see mem above.
- rt: CPU general purpose registers' next availability times. See the 'Instruction Functions' section below for details of usage.
- **tbl\_p**: pointer to table of instruction function pointers. See the 'Decode Table' section below for more details.
- decode\_table: table of instruction decode information. See the 'Decode Table' section below for more details.

#### **Decode Table**

At the heart of the execution of the model is a decode table used for quick lookup of decode information for a given instruction's opcode. The decode table consists of 64 entries with the following structure type:

```
struct lm32_decode_table_t {
   const char* instr_name;
   unsigned instr_fmt;
   lm32_time_t result_cycles;
   lm32_time_t issue_cycles;
   lm32_time_t issue_not_takencycles;
   unsigned signed_imm;}
```

It is a constant table, and held in the global decode\_table variable, initialised at compilation. The instr\_name field is a string for disassembly purposes, whilst the instr\_fmt gives information as to the instruction format for that opcode. There are slightly more formats than that defined in the reference manual [1], as quirks of some instructions need uniquely identifying. The definitions in lm32\_cpu\_mico32.h prefixed "LM32\_FMT\_" give all the possible values. The three time based fields, correspond to the values of cycle taken for each instruction as defined by the reference manual [1], with an issue count a results count and (if a decision branch) a not taken issue count. The signed\_imm field indicates whether any immediate bits of the instruction are signed or not. For instructions with no immediate value, this is a "don't care". An example initialisation for a table entry, for the sextb instruction is shown below:

```
{"sextb ", INSTR_FMT_RC, 1, 1, 0, INSTR_SE_DONT_CARE}
```

The table is used during execution of instructions. During decode, a structure is used for constructing decode information, as shown below.

```
struct lm32_decode_t {
  uint32_t    opcode;
  uint32_t    reg0_csr
  uint32_t    reg1;
  uint32_t    reg2;
  uint32_t    imm;
  const lm32_decode_table_t* decode; }
```

This structure is like a form that is filled in as the instruction is processed. The 'opcode' field is set with the raw fetched instruction value, and then the fields are separated into the regXX and imm fields, depending on the instruction type. The type is derived from the last field which is a pointer to an entry in the decode\_table, described above. During decode the opcode is used to fetch the decode\_table location for the instruction, and the pointer to the entry is stored in the decode field. It is a pointer to this structure that is ultimately passed in to the instruction functions for use in execution the instruction functionality.

The tbl\_p pointer of the 1m32\_cpu class points to a table of 64 entries, corresponding to the 64 opcodes, and contains pointers to functions that will implement that opcode's function. The table's type is an 1m32\_func\_table class, with an array of pointers of type pFunc\_t. This corresponds to a member function of 1m32\_class, with a form "void 1m32\_<instr\_name>  $(p_1m32\_decode_t)$ ", with the sole argument being a pointer to an object of type  $1m32\_decode_t$ , as shown above. The table is constructed an initialised in the constructor of the 1m32\_cpu class.

#### **Instruction Functions**

The actual instruction execution functions are defined in the source file lm32\_cpu\_inst.cpp, and all have a similar basic format. An example is shown below for the byte sign extend instruction (sextb).

```
void lm32_cpu::lm32_sextb (p_lm32_decode_t p) {
   if (state.cfg & (1 << LM32_CFG_X)) {
      cycle_count += calc_stall(p->reg0_csr, NULL_REG_IDX, cycle_count);
      int32_t ry = SIGN_EXT8(state.r[p->reg0_csr] & BYTE_MASK);
      state.r[p->reg2] = ry;
      rt[p->reg2] = cycle_count + p->decode->result_cycles;
      state.pc += 4;
      cycle_count += p->decode->issue_cycles;
   } else
      lm32_rsrvd(p);
}
```

The function is passed in a pointer to the decode information looked up in execute(), and filled in with extracted argument fields (e.g. rx, rz indices, or immediate values etc.). As sign extension is an optional feature, the CFG register state (state.cfg) is inspected, and if sign extension is not implemented, then the lm32\_rsrvd() instruction function is called instead. Not all instructions are optionally implemented, and these instruction's functions don't have a test like this.

If it is implemented, then the first job is to see if any source registers are stalled. In this case there is only one source register (indexed by p->reg0\_csr), and calc\_stall() is called that returns a number representing any number of cycles to wait before that source register is available. The 'rt' table has a list of cycle counts indicating when each of the 32 general purpose registers are next available. Any source register for an instruction that has an 'rt' entry in the future (relative to cycle\_count) generates a wait state count that is the difference between cycle\_count and the 'rt' entry for the register. In the case of instructions with two source registers, the larger of the two calculated wait cycles is returned. This is added to the current cycle\_count to effectively delay execution of the instruction.

The value of the register indexed is retrieved from state and signed extended, as required for this instruction's function, into a variable ry. The destination register, indexed by p->reg2, is updated with the ry value, and then the 'rt' table entry for the indexed destination register is updated to contain the cycle time that it will next be available. This is the current cycle\_count (with stalling already added), plus the result\_cycles for this particular executed instruction, as defined in the decode\_table entry passed into the function.

The PC is incremented to the next instruction (for branches this might be to a different address), and the cycle\_count incremented by the value of the issue\_cycles for the instruction, as defined in the decode\_table entry, that was passed into the function via the 'p' pointer.

The 64 opcodes all have an entry in the tbl\_p table, and point to a function like that in the above example.

# **Testing**

#### **Test Platform**

As has been mentioned above, an executable environment, cpumico32, is constructed that instantiates the lm32 model, and provides sufficient control and facilities to allow the model to be fully tested. This includes a command line control interface for configuring the model and testing, as well as a set of callback functions to allow testing of such things as interrupts etc.

Detailed discussion of the code is not undertaken here—the code is not complicated, and inspection of the source should be sufficient—but a brief description of the program's usage is given. The usage message for cpumico32 is as follows:

```
Usage: cpumico32 [-v] [-x] [-d] [-D] [-I] [-n <num>] [-b <addr>] [-r <addr>]
         [-R <num>] [-f <filename>] [-m <num>] [-o < addr>] [-e <addr>]
         [-l <filename>] [-c <num>] [-w <wait states>] [-i <filename>] [-T]
    -n Specify number of instructions to run (default: run forever)
    -b Specify address for breakpoint (default: none)
    -f Specify executable ELF file (default: test.elf)
    -1 Specify log file output (default: stdout)
    -m Specify size of internal memory in bytes (default: 65536)
    -o Internal memory offset (default 0x00000000)
    -e specify an entry point address (default 0x00000000)
    -v Specify verbose output (default: off)
    -x Enable disassemble mode (default: disabled)
    -d Disable breaking on lock condition (default: enabled)
    -r Address to dump value from internal ram after completion (default: no dump)
    -R Number of bytes to dump from RAM if -r specified (default 4)
    -D Dump registers after completion (default: no dump)
    -I Dump number of instructions executed (default: no dump)
    -c Set configuration word value to enable/disable features
    -w Set the number of wait states for internal memory (default 0)
    -i Specify a .ini filename to use for configuration (default none)
    -T Enable internal callback functions for test (default disabled)
```

For the most part, the command line options map directly to configuration options of the model's constructor, or configuration methods. The options -m, -o, -e, -v, -x, -d, -c, and -w all get mapped to the constructor's inputs unmodified. The -1 option specifies an executable filename, which cpumico32 opens for writing, and passes the file pointer to the constructor.

The options -f, -n, -b map to the first three arguments of the 1m32\_run\_program() method (elf\_name, run\_cycles and break\_addr respectively). The exec\_type and load\_code inputs are controlled internally by the program, with the exec\_type value defaulting to LM32\_RUN\_FROM\_RESET, but this can be overwritten by a test program to change its type, via the memory callback function. The program always loads the specified ELF program, but controls loading of the code in case of a break, where 1m32\_run\_program() will be re-entered, but the code does not need reloading.

The options -r, -R, -D and -I all control the post-run calling of debug data dumping. The -r and -R has the program dump memory values from internal RAM, via the  $1m32\_read\_mem()$  method, specifying the start address and the number of bytes (always rounded to a whole word). The -D option causes a call to the  $1m32\_dump\_registers()$  method, and -I a call to the  $1m32\_get\_num\_instructions()$  method.

As mentioned before, the cpumico32 program has internal callback functions for the three callback that can be registered with the model. These are specific to testing the model, and can have side effects if non-test code is run. Therefore, by default, they are not enabled. when testing, a -T option is specified to enable them.

All of the above configuration command line options, and additional configurations, can be set by using a .ini configuration file. The -i option is used to specify the .ini file to use. Values specified in this configuration file override the default values but, in turn, can be overridden by the command line option, allowing a mix of methods, and final command line control. The default test .ini file, used by model testing, is show below:

```
; INI file used for test. DO NOT EDIT!
[program]
filename=test.elf
entry point addr=0
[configuration]
cfg_word=0x11203f7
log fname=stdout
test_mode=false
verbose=false
ram_dump_addr=-1
ram_dump_bytes=0
dump_registers=false
dump num exec instr=false
disassemble_run=false
[breakpoints]
user_break_addr=-1
num_run_instructions=-1
disable reset break=false
disable_hw_break=false
disable_lock_break=false
[memory]
mem_size=65536
mem wait states=0
mem\_offset=0
[dcache]
cache_base_addr=0
cache_limit=0x0fffffff
cache_num_sets=512
cache_num_ways=2
cache_bytes_per_line=4
[icache]
cache base addr=0
cache limit=0x7fffffff
cache_num_sets=1024
cache_num_ways=2
cache_bytes_per_line=4
```

The first five sections should be fairly self-explanatory, and map to command line options. The two cache sections allow control of the cache configurations that are passed into the model's constructor, which have no command line equivalent, and so can only be modified from default settings with a .ini configuration file.

#### **Callback Functionality**

The cpumico32 program implements and registers three callbacks with the model in order to allow full coverage in testing the model. It provides a means to generate external interrupts, with a time to fire them, a means to alter the configuration register to dynamically enable or disable hardware features, a means to reset the model, changing the execution type as it does so, and to test JTAG accesses.

These controls (except the JTAG) are implemented by memory mapping 'registers' from location 0x20000000, implemented in the memory callback function, with offsets defined in the source code. An interrupt pattern can be written at offset COMMS\_INT\_PATTERN\_OFFSET, along

with a time (relative to current time) at COMMS\_TIME\_OFFSET. The interrupt callback function, when called will generate an interrupt if any of the pattern bits are set, after the time set by a write to the COMMS\_TIME\_OFFSET register.

The individual bits of the configuration register (CFG) can be written to via the next set of locally defined offsets (COMMS\_NUM\_INT\_OFFSET to COMMS\_WDOG\_EN\_OFFSET). Note that when reading these locations returns the whole configuration register value, not the individual bit. At an offset of COMMS\_RESET\_OFFSET, a write will reset the model, as if the reset pin had been activated, and also set the local execution type variable to whatever data value was written, to override the default.

The JTAG callback function implements a simply loopback functionality. JTAG transmit register write loads a value to a local variable, that can be read when the JTAG receive register is read (or the TX register read).

With this functionality and configurability in cpumico32 all features of the model can be exercised, and a set of assembler code tests have been constructed to do just that, detailed in the Test Code section.

#### **GUI for cpumico32**

The package comes with a python based GUI for cpumico32 program, if configuraing the command lines seems to compilcated and cumbersome. It is located in the python/ directory and is called 1m32.py. The script uses python3, and the tkinter module, which is usually bundled with the python package.

When run (e.g., on windows, python3 python/lm32.py), a window appears looking like the following:



All the flags and arguments of the command line can be controlled from this GUI. The windows will open with the default values of the cpumico32 program, and the GUI is used to adjust these. It willcheck for valid iputs, and raise an error if outside of prescribed limits The program to be run is selected from the menu under 'File->Open ELF File...', with the selection displayed at the bottom in the 'Utility files' frame. The CFG register setting is shown in the 'Variables' frame, but greyed out. To alter this value, the box may be double-clicked with the mouse, and a new popup appears, like that shown below:



The popup window has two tabs, with one for the binary flags enabling or disabling features, whilst the send has values defining the number of watch- and breakpoints, as well as the number of external interrupts. Updating the flags and values automatically updates the value in the configuration register box.

When configuration is complete, the 'Run' button at the bottom can be pressed, and it will execute a cpumico32 command with all the appropriate command line arguments. It is assumed that cpumico32 is on the search path, or is in the directory from whence the script was run.

By default, the program will search for the <code>cpumico32</code> executable on the path, as indicated by the 'cpumico32 Dir' box in the Directories area. This can be changed from the file menu ('File->cpumico2 Folder...'), to select a particular folder containing an executable. This is useful to select between a debug or a release development version, say. In addition, the program uses the directory from which it was run as the working directory (as indicated by the 'Run Dir' box), but this can also be changed from the file menu ('File->cpumico2 Folder...'). Any relative references (such as for the program file, log file, etc.) are automatically updated if the run directory is changed.

The output from the running the command is sent to a new window, including the contents of any specified log file. The window will look something like the following figure (where, in this example, registers and number of executed instructions are dumped, and the contents of memory at <code>0xfffc</code> are printed). The first line in the window is the command that was executed, with the command line options, as a reference for using in scripting etc.

```
cpumico32.exe -DIT -r 0xfffc
r00 = 0x00000000 r01 = 0x00000315 r02 = 0xfffffe43 r03 = 0xfffffe43
                                          r06 = 0x694b2073
                                                                 r07 = 0x385c7374
r04 = 0xfffffff5 r05 = 0x776f646e
r08 = 0x575c312e r09 = 0x6f646e69 r10 = 0x50207377
                                                                 r11 = 0x6f667265
r12 = 0x6e616d72    r13 = 0x54206563    r14 = 0x6b6c6f6f    r15 = 0x3b5c7469
r16 = 0x505c3a43    r17 = 0x72676f72    r18 = 0x46206d61    r19 = 0x73656c69
r20 = 0x38782820 r21 = 0x4d5c2936 r22 = 0x6f26363 r24 = 0x20535620 r25 = 0x65646f43 gp = 0x6e69625c fp = 0x5c3a433b sp = 0x6c6f6f54 ra = 0x6c415c73 ea = 0x0000900d ba = 0x0000fffc
pc = 0x000000e8 ie = 0x00000000 ip = 0x00000000 im = 0x00000000
icc = 0x00000000 dcc = 0x00000000 cfg = 0x01120bf7 cfg2 = 0x00000000
cc = 0x0000004e eba = 0x00000000
bp0 = 0xfffffffd bp1 = 0xfffffffd bp2 = 0xfffffffd bp3 = 0xfffffffd
wp0 = 0xffffffff wp1 = 0xffffffff wp2 = 0xffffffff wp3 = 0xffffffff
dc = 0x00000000 deba = 0x00000000
Number of executed instructions = 51
RAM 0xfffc = 0x0000900d
                                           Close
```

#### **Test Code**

A set of assembler programs were developed and are provided for execution on cpumico32, that execute a range of self-tests to verify the model. These tests are all directed tests, but cover nearly all aspects of the model including all instructions and all exceptions. Each program lives in a solitary directory under the directory test/<category>/ and each subdirectory has a single source file, test.s. These tests are self-checking and return a value 0x0000900d in memory location 0x0000fffc if the test passes, or 0x00000bad if it fails (if the program never terminates cleanly, then this value is undefined—but is unlikely to be the pass value). Below is listed the features covered by the tests, and the test directory that contains the test that covers that feature.

#### Arithmetic instructions

| Instruction | Covering test location        | Status    |
|-------------|-------------------------------|-----------|
| add         | instructions/add/             | Completed |
| addi        | instructions/add/             | Completed |
| sub         | instructions/sub/             | Completed |
| sextb       | <pre>instructions/sext/</pre> | Completed |
| sexth       | <pre>instructions/sext/</pre> | Completed |
| mul         | instructions/mul/             | Completed |
| muli        | instructions/mul/             | Completed |
| div*        | instructions/div/             | Completed |
| divu        | instructions/div/             | Completed |

| mod*                 | instructions/div/                                                | Completed              |
|----------------------|------------------------------------------------------------------|------------------------|
| modu                 | instructions/div/                                                | Completed<br>Completed |
| odd                  | 1113 61 46 61 6113, 411,                                         | compreced              |
| Comparative instruc  | tions                                                            |                        |
| Instruction          | Covering test location                                           | Status                 |
|                      |                                                                  | 2 00.00.3              |
| cmpe .               | <pre>instructions/cmp_e_ne</pre>                                 | Completed              |
| cmpei                | instructions/cmp_e_ne                                            | Completed              |
| cmpne<br>cmpnei      | <pre>instructions/cmp_e_ne instructions/cmp_e_ne</pre>           | Completed<br>Completed |
| cmpg                 | instructions/cmpg/                                               | Completed              |
| cmpgi                | instructions/cmpg/                                               | Completed              |
| cmpgu                | instructions/cmpg/                                               | Completed              |
| cmpgui               | instructions/cmpg/                                               | Completed              |
| cmpge                | instructions/cmpge/                                              | Completed              |
| cmpgei               | instructions/cmpge/                                              | Completed              |
| cmpgeu               | <pre>instructions/cmpge/ instructions/cmpge/</pre>               | Completed<br>Completed |
| cmpgeui              | instructions/cmpge/                                              | Completed              |
| Shift instructions   |                                                                  |                        |
| Instruction          | Covering test location                                           | Status                 |
| sl                   | instructions/sl/                                                 | Completed              |
| sli                  | instructions/sl/                                                 | Completed              |
| sr                   | instructions/sr/                                                 | Completed              |
| sri                  | instructions/sr/                                                 | Completed              |
| sru                  | instructions/sr/                                                 | Completed              |
| srui                 | instructions/sr/                                                 | Completed              |
| Logical instructions |                                                                  |                        |
| Instruction          | Covering test location                                           | Status                 |
| and                  | instructions/and/                                                | Completed              |
| andi                 | instructions/and/                                                | Completed              |
| andhi                | instructions/and/                                                | Completed              |
| or<br>:              | instructions/or/                                                 | Completed              |
| ori<br>orhi          | <pre>instructions/or/ instructions/or/</pre>                     | Completed<br>Completed |
| nor                  | instructions/or/                                                 | Completed              |
| nori                 | instructions/or/                                                 | Completed              |
| xor                  | instructions/xor/                                                | Completed              |
| xori                 | instructions/xor/                                                | Completed              |
| xnori                | instructions/xor/                                                | Completed              |
| xnor                 | instructions/xor/                                                | Completed              |
| Branch instructions  |                                                                  |                        |
| Instruction          | Covering test location                                           | Status                 |
| be                   | instructions/branch_cond/                                        | Completed              |
| bne                  | instructions/branch_cond/                                        | Completed              |
| bg                   | instructions/branch_cond/                                        | Completed              |
| bgu                  | instructions/branch_cond/                                        | Completed              |
| bge                  | <pre>instructions/branch_cond/ instructions/branch_cond/</pre>   | Completed              |
| bgeu<br>b            | <pre>instructions/branch_cond/ instructions/branch_uncond/</pre> | Completed<br>Completed |
| bi                   | instructions/branch_uncond/                                      | Completed              |
| call                 | instructions/branch_uncond/                                      | Completed              |
| calli                | instructions/branch_uncond/                                      | Completed              |
|                      | _                                                                | -                      |

#### Memory access instructions

| Instruction | Covering test location        | Status    |
|-------------|-------------------------------|-----------|
| 1b          | instructions/load/            | Completed |
| lbu         | <pre>instructions/load/</pre> | Completed |
| lh          | instructions/load/            | Completed |
| lhu         | instructions/load/            | Completed |
| lw          | <pre>instructions/load/</pre> | Completed |
| sb          | instructions/store/           | Completed |
| sh          | instructions/store/           | Completed |
| SW          | instructions/store/           | Completed |
|             |                               |           |

#### Control/Status access instructions

| Instruction | Covering test location | Status    |
|-------------|------------------------|-----------|
| rcsr        | instructions/csr/      | Completed |
| wcsr        | instructions/csr/      | Completed |

<sup>\*</sup> Note that the div and mod instructions are listed in the instruction table in the LatticMico32 Processor Reference Manual [1], but are not documented in the instruction descriptions. They are not supported in the GNU assembler either. The implementation in this ISS implementation assumes signed arithmetic and the tests use '.word <opcode>' to insert the instruction into the test that the assembler won't recognise and compile.

#### **Exceptions**

| Exception                                                                  | Covering test location                                                                                                                                                                                                                                                                                                                          | Status                                                                                                                                      |
|----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
| rsrvd instr bus err disable instr data bus err hw breakpoint hw watchpoint | exceptions/instruction/<br>exceptions/instruction/<br>exceptions/instruction/<br>exceptions/instruction/<br>exceptions/instruction/<br>exceptions/instruction/<br>exceptions/external/<br>exceptions/ibus_errors/<br>exceptions/ibus_errors/<br>exceptions/dbus_errors/<br>exceptions/hw_debug/<br>exceptions/hw_debug/<br>exceptions/hw_debug/ | Completed |
|                                                                            |                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                             |

#### ISS user API testing

| lest                                                                          | Covering test location                                                                                                                                                         | Status                                                                     |
|-------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|
| Int callback Instr count Run-time ctrl HW debug ctrl State access Re-entrance | exceptions/external/<br>exceptions/external/<br>api/num_instr/<br>covered in cpumico32.cpp<br>covered in cpumico32.cpp<br>covered in cpumico32.cpp<br>covered in cpumico32.cpp | Completed<br>Completed<br>Completed<br>Completed<br>Completed<br>Completed |
| EXCIT DI EAKS                                                                 | covered in cpumico32.cpp                                                                                                                                                       | Completed                                                                  |
|                                                                               |                                                                                                                                                                                |                                                                            |

#### **Executing Tests**

The tests are all run via a 'runtest.sh' script that lives in the test/ directory. Changing directory to 'test/' and running 'runtest.sh' will execute all the tests, giving a pass/fail criteria for each, with a summary at the end. An easier way to execute the tests, especially when doing coverage measurements, is to use the makefile. When building code, a command 'make test' will get the build up-to-date, and then run the test script. The tail end of the output should be something like that shown below:

```
.
Running test exceptions/dbus_errors
  PASS
Running test exceptions/hw_debug
  PASS
Running test api/num_instr
  PASS

Tests run : 24
Tests pass: 24
Tests fail: 0
```

The test script runs cpumico32 with arguments of '-T -r 0xfffc', but additional arguments can be added by setting the environment variable CPUMICO32\_ARGS. This must contain a string of valid cpumico32 arguments but, even when valid, it is not guaranteed that testing will pass for all possible argument combinations, so use with care.

#### Coverage

Coverage for the self-tests was performed using gcov and 1cov, with support in the makefile. Excluded from the coverage was any disassembler or debug output code as, although this can be covered to a level of 100%, it cannot be verified in an automatic self-test, and it is does not affect the accuracy of the model. Similarly, the cpumico32 top level code was not included, as this is a test/example program, and not part of the model. The core files covered were thus:

```
lm32_cpu.cpp
lm32_cpu.h
lm32_cpu_inst.cpp
lm32_cache.cpp
lm32_cache.h
lm32_cpu_elf.cpp
```

The diagram below shows the LCOV report generated by executing the following commands:

```
make clean
make COVOPTS="-coverage" test
make coverage
```

The report generated is created in the directory cov\_html/src, and accessed via index.html.

| LCOV - code coverage report            |          |          |          |        |         |          |
|----------------------------------------|----------|----------|----------|--------|---------|----------|
| Current view: top leve                 | el - src |          |          | Hit    | Total   | Coverage |
| Test: Im32.in                          | fo       | Lin      | es:      | 1168   | 1168    | 100.0 %  |
| Date: 2013-07                          | 7-24     | Functio  | ns:      | 89     | 89      | 100.0 %  |
|                                        |          |          |          |        |         |          |
| Filename                               | Line (   | Coverage | <b>÷</b> |        | Func    | tions 🕈  |
| 1m32 cache.cpp                         |          | 100.0 %  | 42       | 2 / 42 | 100.0 % | 3/3      |
| lm32 cache.h                           |          | 100.0 %  |          | 1/1    | 100.0 % | 1/1      |
| 1m32 cpu.cpp                           |          | 100.0 %  | 367      | / 367  | 100.0 % | 15 / 15  |
| 1m32 cpu.h                             |          | 100.0 %  | 10       | 0 / 10 | 100.0 % | 5/5      |
| 1m32 cpu elf.cpp                       |          | 100.0 %  | 30       | 0 / 30 | 100.0 % | 1/1      |
| 1m32 cpu inst.cpp                      |          | 100.0 %  | 718      | / 718  | 100.0 % | 64 / 64  |
|                                        |          |          |          |        |         |          |
| Generated by: <u>LCOV version 1.10</u> |          |          |          |        |         |          |

In order to obtain a goal of 100% coverage, some waivers on lines of code were needed on unreachable lines of code. The exceptions followed 6 broad categories, detailed below:

Checks on parameters etc. that should never fail, and are meant as debug aids for invalid calls from elsewhere in the code, or to protect against invalid values from the API. The following waivers are of this type.

| method                                | coverage waiver             |
|---------------------------------------|-----------------------------|
| <pre>lm32_set_verbosity_level()</pre> | invalid verbosity level     |
| lm32_set_hw_debug_reg()               | invalid debug register type |
| lm32_read_mem()                       | invalid read access type    |
| lm32_write_mem()                      | invalid write access type   |
| <pre>interrupt()</pre>                | invalid exception ID        |
| lm32_rcsr()                           | Invalid CSR register index  |
| lm32_wcsr()                           | Invalid CSR register index  |
| lm32_cache()                          | Parameter checks            |

Memory allocation failures that should never happen. Mostly malloc() calls, where a failure here would indicate a system level problem. The following were waivered on this basis:

| method             | coverage waiver                       |
|--------------------|---------------------------------------|
| lm32_write_mem()   | memory allocation, and error handling |
| lm32_run_program() | memory allocation failure             |

Code associated with disassembled and debug output, cannot be self-tested. Coverage is possible, but not meaningful. lm32\_cpu\_disassembler.cpp was wholly excluded, but calls to disassembler methods from the core functionality still needed waiving:

| method                           | coverage waiver                                                  |
|----------------------------------|------------------------------------------------------------------|
| <pre>execute_instruction()</pre> | call to disassemble() function and pc update in disassemble mode |
| 1m32_run_program()               | break on disassemble run                                         |

User defined break address handling actually terminates the program, and so cannot be self-tested. The feature is for debug purposes only, in any case, so a waiver was added:

| method             | coverage waiver         |
|--------------------|-------------------------|
| lm32_run_program() | user break address trap |

File exception handling also terminates the program, and has no meaning in terms of model accuracy:

| method     | coverage waiver                 |
|------------|---------------------------------|
| read_elf() | file opening exception handling |
| read_elf() | unexpected EOF handling         |
| read_elf() | program load overflowing memory |

ELF file checks only fire with an invalid ELF executable file, and don't affect the model's accuracy for executing a valid ELF file, and so the checks were waivered:

| method     | coverage waiver       |
|------------|-----------------------|
| read_elf() | all ELF header checks |

Creation of cache without parameters is not done in the testing, as the tests exercise the various settings (including the defaults) by explicitly setting the cache configurations.

| method                 | coverage waiver                             |
|------------------------|---------------------------------------------|
| lm32_set_configuration | default parameters on cache object creation |

With the above waivers in place, the three listed files, containing all the methods, bar the disassembling, have 100% coverage.

#### **Not Covered**

Despite the 100% measured coverage metrics, there are some aspects of the model as yet uncovered by formal testing.

- Various internal memory sizes (all tests run with the default 64K RAM)
- Running multiple instances of the model
- Accuracy of the timing model against a known good reference

# **Compile Options**

By default, when <code>cpumico32</code> is compiled, it has the behaviour as described in the previous sections. However, it can be compiled with various definition in order to modify it's behaviour. There are, presently, two conditional compile definitions that can be set:

- LM32\_FAST\_COMPILE: Removes compiling of much code that is not strictly necessary for execution to improve execution speed, but removes many features. (Used in files lm32\_cpu.cpp, lm32\_cpu\_inst.cpp, lm32\_get\_config.cpp.)
- LNXMICO32: Removes some command line options and features not supported by the <a href="lnxmico32">lnxmico32</a> program (see the case study in the next section), and a test for instruction type when disassembly, since <a href="lnxmico32">lnxmico32</a> loads code differently. (Used in files <a href="lm32">lm32</a> cpu.cpp, <a href="lm32">lm32</a> get config.cpp.)

The LM32\_FAST\_COMPILE definition is used to remove as much code from the model, whilst still retaining a viable simulation, in order to maximise the execution speed. To this end, the following features are removed:

- Break point specification removed
  - o Removes command line options -n, -b, -d
  - o Can still break from interrupt callback
- Memory wait state specification
  - Removes command line option -w
- Disassemble mode
  - Removes command line option -x
- Removes timing accuracy
  - No cache modelling of timing
  - Callback timing information is ignored
  - Pipeline stalling not calculated
- Memory access alignment checks are disabled
- Memory stats information is not logged (memory tagging removed)
- No watchpoint support
- No hardware breakpoint support

With the compile definition, an additional memory restriction applies—memory size must be a power of 2, as the range masking requires this to avoid using a division.

The LNXMICO32 definition simply removes some further command line options and functionality not needed by the <code>lnxmico32</code> case study program described in detail in the next section. The configuration code (<code>lm32\_get\_config.cpp</code>) is shared between the <code>lnxmico32</code> and <code>cpumico32</code> programs, and the majority of the uses of the definition are found in this file. With the definition active, the <code>-T</code> option is not used, as the test callbacks are not implemented in <code>lnxmico32</code>, which has its own callback functions, used to model peripherals. In addition a <code>-V</code> option is added, but only if <code>LM32\_FAST\_COMPILE</code> is not defined as well, which acts like <code>-v</code>, but allows specification of a cycle time when verbosity is activated. The <code>LNXMICO32</code> program loads binary images directly to memory, rather than load an ELF file, so the <code>-f</code> option is also removed.

One other use, modifying lm32\_cpu.cpp code, is to remove the test to see if a memory location is an instruction before disassembling it. The lnxmico32 program loads code as binary data via the lm32\_write\_mem() function, and thus does not label it as instruction data. In order to debug the code, the check is suspended for this compilation option.

# Case Study: An Embedded Linux System

A case study in the usage of the model is given here, in order to demonstrate the features and extensibility of the ISS. A basic, non-mmu Linux system is put together, using the u-boot and uClinux ports to Im32. A minimal system is put together in order to be able to boot the Linux OS, with a light weight Unix environment provided by BusyBox, targeted at embedded platforms.

The diagram below shows the general system layout. It consists of the mico32 model, with two UARTs (UARTO and UART1) and a timer (TIMERO). These are modelled as part of the <a href="lnxmico32">lnxmico32</a> environment (see <a href="lnxmart.cpp">lnxmico32</a> environment (see <a href="lnxmart.cpp



The top level for the system is called <code>lnxmico32</code>, and has a top level source file <code>lnxmico32.cpp</code>. This instantiates mico32 model, shown as the <code>lm32</code> and RAM boxes in the diagram. It registers its own callback functions for both the external memory accesses (using the model's API method <code>lm32\_register\_ext\_mem\_callback()</code>) and interrupts (using <code>lm32\_register\_int\_callback()</code>). The callbacks handle the register accesses to the peripherals, along with the 'ticking' and passing back interrupt status.

When run, the two binary files, vmlinux.bin and romfs.ext2, are expected to be in the directory from which the program is executed—as is, by default, a configuration file, lnx.ini (see next section).

The binary images for the u-boot/uClinux and RAM filesystem are loaded to memory (at 0x08000000 and 0x08400000, respectively), and then memory is updated for hardware setup configuration, and an initial boot command string. The simulation can then be started, and the system boots, sending output characters, via UARTO, and, once booted, accepting keyboard input to allow logging in and issuing of commands in the shell (msh—minimal shell). After boot the screen will look something like the following:

```
🏌 ~/src/cpu/mico32/test
                                                                                                     П
<u>io scheduler deadline registered</u>
io scheduler cfq registered (default)
lm32uart: Lattice Mico 32 UART driver
ttyS0 at I/O 0x80000000 (irq = 0) is a LM32UART
lm32uart; added port 0 with irq 0 at 0x800000000
ttyS1 at I/O 0x81000000 (irq = 2) is a LM32UART
lm32uart; added port 1 with irq 2 at 0x81000000
RAMDISK driver initialized; 16 RAM disks of 16384K size 1024 blocksize
PPP generic driver version 2.4.2
PPP Deflate Compression module registered
PPP BSD Compression module registered
SLIP: version 0.8.4-NET3.019-NEWTTY (dynamic channels, max=256).
IPv4 over IPv4 tunneling driver
TCP bic registered
TCP cubic registered
TCP westwood registered
TCP htcp registered
NET: Registered protocol family 1
NET: Registered protocol family 17
RAMDISK: ext2 filesystem found at block 0
RAMDISK: Loading 2048KiB [1 disk] into ram disk... done.
VFS: Mounted root (ext2 filesystem) readonly.
mico32 login:
```

To login to the system, login as root, with a password of lattice. To exit from the program, from anywhere, type #!exit! and press enter.

# Configuration

#### The Im32 model

The mico32 model must be configured correctly for the system to boot properly and, by default, the program, will look for a configuration file <code>lnx.ini</code> in the directory from which it is run. This can, of course, be overridden with the <code>-i</code> command line option. The <code>lnxmico32</code> program shares a number of command line options of the <code>cpumico32</code> program, (indeed, it shares common configuration code). The full usage message for the <code>lnxmico32</code> program is as follows:

The provided lnx.ini options file, for the most part specifies default, but does set the configuration word to a specific value that represents the minimum configuration for lnxmico32 functionality. The file looks like the following:

```
; INI file used for lnxmico32. DO NOT EDIT!; [configuration] cfg_word=0x00003017
[debug] log_fname=stdout
```

```
ram_dump_addr=-1
ram_dump_bytes=0
dump_registers=false
dump_num_exec_instr=false
save_file_name=lnxmico32.sav
save state=false
load_state=false
; When LM32_FAST_COMPILE not defined
; verbose=false
; disassemble_run=false
; [breakpoints]
; user_break_addr=-1
; num_run_instructions=-1
; disable_reset_break=false
; disable_hw_break=false
; disable_lock_break=false
; [memory]
; mem_wait_states=0
; [dcache]
; cache_base_addr=0
; cache_limit=0x0fffffff
; cache_num_sets=512
; cache_num_ways=2
; cache_bytes_per_line=4
; [icache]
; cache_base_addr=0
; cache limit=0x7fffffff
; cache_num_sets=1024
; cache_num_ways=2
; cache_bytes_per_line=4
```

Some of the options (those commented out) are only available if <code>lnxmico32</code> is not compiled with <code>LM32\_FAST\_COMPILE</code> defined, which disables disassembling, breakpoints, memory wait states and cache timing simulation. These can be reinstated when compiled without the definition, but will cause a warning if uncommented when compiled with it defined.

### The system software

To configure boot and OS software before running, three things need to happen:

- A hardware setup table must be constructed at 0x0BFFE000
- A boot command line string placed at 0x0BFFF000
- GP registers 1 to 4 pre-charged with addresses for the above two entries, plus the addresses of the romfs.ext2 image start (0x84000000) and end (0x84000000 + file length)

The hardware setup table consists of a consecutive list of structures, with one for the CPU, the memory, the two UARTs and the timer. This is followed by a termination structure. Each structure has a similar format

```
struct {
    uint32_t length;
    uint32_t id;
    .
    .
    <specific payload>
    .
}
```

The length gives the size of the entry (including the length bytes), and the ID is a unique number. The payload for each of the entries also have similar structures (except the terminator), with a 32 byte string array containing the name of the instance (which can be shorter, but not longer, than 32 bytes), followed by parameters for the particular device. The terminator is just a length (8) followed by an ID of 0, with no payload.

For the CPU ("LM32") the payload is simply a 32 bit number for the clock frequency, in Hz. The memory ("ddr\_sdram"), has a parameter for the base address, followed by the size in bytes. The timer ("timero") has a 32 bit word for the base address, then four bytes for a write tick count, read tick count, start/stop/control and counter width. A following 32 bit word specifies the number of reload ticks, and the a byte giving the interrupt number (i.e. which of the 32 bit external interrupt pins it is connected to). The structure is then padded to a 32 bit boundary with bytes of 0 value.

The UARTs ("uart0" and "uart1") have a base address and baud rate parameters (both 32 bits), followed by 8 bytes for number of data bits, number of stop bits, interrupt enable, block on transmit, block on receive, RX buffer size, TX buffer size and its interrupt number. More information can be found in Appendix A of the "Linux Port to LatticeMico32 System Reference Guide" [3].

These hardware setup structures are written consecutively to memory, starting at 0x0BFFF000, in the order, CPU, memory, timer0, uart1, and the terminator.

The command line string is used as the u-boot command arguments when starting the system. For lnxmico32, this is

```
root=/dev/ram0 console=ttyS0,115200 ramdisk_size=16384
```

Finally, the general purpose registers GP1 to GP4 are pre-charged with four addresses, based on the above configurations. If these were invariant, then a small assembler program could be added in memory that set these values and then jumped to the system entry point, with the initial entry point being this initialisation program. However, to ease modification to system parameters, these are written directly, using the Im32 model's 1m32\_set\_gp\_reg() method. GP1 to GP4 are set to have the following addresses: The h/w setup base address, the command string base address, the RAMFS load start address, and the RAMFS load end address + 1.

Having configured the system, the execution of the code can begin.

### **Use of Callbacks**

The lnxmco32 system registers two callbacks with the lm32 model; one for memory accesses (ext mem access()) and one for ticking/interrupt generation (ext interrupt()).

The first of these (ext\_mem\_access()) intercepts all memory accesses to the peripherals—the timer and the two UARTs. It separates out the address passed in by the Im32 model into a page address (in this case a 4KB page), and offset within that page. If the page address matches the base address of one of the peripherals, it processes the address, otherwise it simply returns with a LM32\_EXT\_MEM\_NOT\_PROCESSED status, informing the model it must handle this access.

All the peripheral models for the lnxmico32 system provide three functions: a read function, a write function and a tick function (see next section). The memory callback function is called with an address and an access type. If the access type is LM32 MEM WR ACCESS WORD, then

the selected peripheral's write function is called with the offset address and data value. If not, the read method is called with the offset address and a pointer to the data variable in which to return the read value.

When the address is matched by the callback, a processing time is returned, so that the Im32 model can advance time accordingly with the delay of the access. In lnxmico32, this defaults to 1 cycle for all peripheral register accesses.

The tick/interrupt callback function (ext\_interrupt()) is called regularly by the Im32 model, with a timestamp. The callback returns an interrupts status in a 32 bit word, with each bit representing an external interrupt pin on the mico32 processor, of which there are up to 32. Each time the function is called, it calls each of the tick functions for the peripherals. For the timer this is a function that simply takes the time as an argument, and returns true or false to indicate whether it is interrupting or not. For the UARTs, they take additional parameters to return termination request status, indicate whether it is the keyboard UART and its context. The context is needed, as the function's code is common to all UART instantiations, but supports up to 4 different contexts, and identifies whether UART0 or UART1 calls. Like the timer tick function, the UART tick function returns a Boolean, indicating interrupt request status.

The callback function ORs together all the interrupt statuses of the peripherals, which is returned when the function exits. The termination request statuses of the UARTs are also combined, and if a UART is requesting termination, the value returned in the wakeup\_time pointer is set as LM32\_EXT\_TERMINATE\_REQ, indicating to the lm32 model that an external termination is active. If no termination, a wakeup time of the current time plus LM32\_INTERRUPT\_GRANULARITY (1000 cycles in this case), is returned. This means that the tick function will not be called for at least 1000 cycles (plus a bit to, say, complete an instruction). This *might* be set to 1, so it is called after every instruction, but the rate of activity for these peripherals is not that high, and would produce unnecessary processing overhead. A granularity of even higher might be possible, but 1000 seems to give little noticeable overhead in processing speed.

## **Peripheral Models**

There are two peripheral models implemented for the Inxmico32 system: a timer and a UART. Both provide three interface functions:

The two register access functions are fairly self explanatory, with an address, and either a data value passed in for writes, or a pointer passed in for returning the read value. All accesses are for 32 bit words, as all the registers for the peripherals are aligned to 32 bit word boundaries.

The tick functions all have a time parameter input, and a context value (cntx), for selecting the particular instance of the peripheral (up to 4 of each), with a default context of 0, allowing this parameter to be omitted of only one peripheral instantiated, as for the timer in lnxmico32. The

UART model also has a terminate parameter (passed by reference) and a kbd\_connected input flag. The terminate parameter allows the UART to request termination of execution by setting this parameter to true. This done if the user types a certain sequence of characters as input to the UART, flagging indication of termination, and allowing the program to exit cleanly with any post processing requirements, such as dumping registers etc. The keyboard flag input enables internal processing of keystrokes as RX data to the UART. As only one UART can process the single keyboard, this flag enables the nominated UART to process key stroke inputs, whilst the others ignore them.

The timer model simulates the behaviour of the Lattice timer IP [4], implementing the four registers to control operation, the counter and the generation of an interrupt, when the counter reaches zero. The counter can continue or stop, depending on the register control settings.

The UART model simulates the behaviour of the Lattice UART [5], implementing the 8 registers. Transmission modelling is the much simpler functionality. When the processer writes a byte to the transmit holding register, the byte is passed straight on to stdout, via a putchar() call. However, timing is simulated, and the status bits are cleared to indicate that there is no space for another byte. The timer model's tick function notes the time when transmission status goes active, and counts until the transmission time has passed, before setting the status back to allow further transmission. The time calculated is a function of the configured BAUD rate and the clock frequency, and assumes a start, parity and stop bit, on top of 8 bits of data (i.e. 11 bits). When the status bits indicate that the TX buffer is empty once more, if enabled, the model generates an interrupt, which is returned as a true status when the tick function exits.

For keyboard input, each time the tick function is called (and the kbd\_connected flag is set), the model checks if a key has been pressed, and fetches the byte value if it has. This is placed in the RBR register and the "data received" status set. If interrupts are enabled for a data reception, a interrupt status of true is returned by the tick function. As well as simulating keyboard input for the system, the model also monitors the input to detect for a specific sequence of key strokes. If the input sequence matches a particular string ("#!exit!<enter>"), it sets, to true, the terminate parameter passed in (by reference) to the tick function call. The system model can choose to ignore this, but in 1nxmico32, this request is passed back to the Im32 model to terminate its execution as a user breakpoint.

### **Save and Restore**

The 1nxmico32 program has the ability to save the state of the system on exit, and to reload that state on re-running to restore the running system to exactly where it was upon exit. In order to do this, the state of the CPU, memory and of all the peripherals must be saved and restored. The lm32 model already has two methods to support this, as documented previously:

```
lm32_get_cpu_state()lm32_set_cpu_state()
```

These two methods transfer state within a single object of type 1m32\_cpu::1m32\_state. The CPU model has been designed so that it keeps all relevant internal state within a structure of this type, and it is returned or set as a single structure, so that the calling program need not know what state to save, or even its details. The returned state can be pointed to by a pointer to bytes, and saved as a byte stream to the size of the type. The CPU state does not contain the contents of the memory, as this can be large and needs to be handled externally, using the read/write methods of the CPU model, for optimal handling (more below).

The UART and timer peripherals of lnxmico32 have followed the same practice as the CPU model, and have single structures containing all the relevant state (with defined, and exported,

types lm32\_uart\_state\_t and lm32\_utimer\_state\_t), and methods for retrieving them and restoring them:

```
1m32_get_timer_state()1m32_set_timer_state()1m32_get_uart_state()1m32_set_uart_state()
```

The state returned by the peripherals contains the state for all the contexts that the peripheral code supports, and not just a single context whether the context has been used or not. This simplifies the save and restore, and makes the returned data size fixed in all cases. The amount of data is small (compared to the memory image), and so has little overhead. Since the data is fixed size, it is saved completely raw, with no additional information, such as size or target peripheral. The format of the .sav file has a fixed order of data (see below), so that on restore it is known what data is expected next, and its size inferred.

#### Saving of memory

The memory of the <code>lnxmico32</code> program is fairly large at 64MB. A large proportion of this contains the invariant boot and OS programs, and the filesystem ROM image. As these can be reloaded on restore runs, the data in these locations need not to be saved. To make saving of data more efficient, the memory callback function, which is called for all CPU memory accesses, monitors for any write access to the RAM. The RAM, for this purpose, is divided up into 1K pages and a tag kept on each page, setting a flag to true if a write access is made on that page. The size of the page is a compromise between granularity of save data and the size of the tag array. This can be altered without change the function's integrity.

When a save operation is performed, only the pages in RAM that have had a write access are saved. Since the tag array is cleared after that OS and FS are loaded, only subsequent writes are recorded and the pages saved. Since the page sizes are fixed, only the address of the page needs to be pre-pended to the data. This is saved first as four bytes, with MSB format. By forcing this format, the data is independent of the host, and its byte ordering, on which it is run. The data, then, is saved as a set of consecutive addresses and 1K binary data images.

#### The File Format

As mentioned above, the peripheral data is fixed size. Also, the CPU state comes as a fixed size object. The RAM data is a dynamic set of pages, which will vary from save to save. The file format was chosen to place all the fixed sized data first, and then followed by the RAM data, avoiding the need to delimit the fixed sized data, though the order becomes fixed. The format of the .sav file is as shown below:



## Control

By default the state is saved to a file <code>lnxmico32.sav</code>. This can be changed to a different file name by using the <code>-s</code> command line option, or the <code>save\_file\_name</code> parameter in the <code>[state]</code> section of the <code>.ini</code> file. Saving of state is enabled with the <code>-S</code> command line option or <code>save\_state</code>, and loading of previously saved state is enabled with the <code>-L</code> option or <code>load\_state</code>.

When both saving and restoring are enabled, the affects are accumulative. That is, each load will re-mark the RAM pages that are restored to, with new pages accessed added on top of this, and so on. This is needed as there is no guarantee that all previous pages will be accessed on each new run.

#### **Performance**

Performance measures were made using the Linux system model, as this is sufficiently complicated, and representative of a real system, as to yield meaningful results. The system was tested for all supported platforms (as documented previously), with the addition of MSVC Community 2015.

The platform used was an  $Intel^{®}$  if 920 CPU, running at 2.67GHz, with a system having 6GB RAM on an ASUS P6T SE Motherboard. The test was to run 1nxmico32 -I, boot Linux, login as root and exit, which yielded a test of > 400 Million instructions.

Compilation for the Microsoft Visual C environment was all done with the 'Release' mode. For the gcc compilations, optimisation options '-Ofast -fomit-frame-pointer -march=native' were used.

The results are summarised in the table below:

| OS                  | Compiler                | LM32_FAST_COMPILE | Unmodified |
|---------------------|-------------------------|-------------------|------------|
| Windows 10          | MSVC Express 2010       | 34.2 MIPS         | 22.6 MIPS  |
|                     | MSVC Community 2015 x86 | 31.4 MIPS         | 21.2 MIPS  |
|                     | MSVC Community 2015 x64 | 38.8 MIPS         | 26.3 MIPS  |
| Cygwin 2.5.2 32 Bit | gcc v5.4.0 (-m32)       | 45.4 MIPS         | 28.8 MIPS  |
| Ubuntu 16.04 LTS    | gcc v5.4.0 (-m64)       | 43.8 MIPS         | 30.2 MIPS  |
|                     | gcc v5.4.0 (-m32)       | 30.5 MIPS         | 19.8 MIPS  |

The surprising result here is that the Cygwin 32 bit compilation comes out on top, rather than the native 64 bit Linux system (Ubuntu)—though this situation is reversed when not compiled with LM32\_FAST\_COMPILE. Since Cygwin is a 32 bit compiler, and Ubuntu is 64 bit, additional differences may arise from this, though the 32 bit Ubuntu compile was the worst of all. However, compiling for 64 bits in MSVC improved the situation over 32 bits, and the same might be expected for GCC. Unfortunately 64 bit Cygwin was not available at the time of testing, and so a complete analysis has not been done for these differences.

So, focussing on the best result, the system runs at around 45 MIPS, when compiled with LM32\_FAST\_COMPILE, which yields a 57.6% improvement over the fully featured model. The speed of the model will very much depend on the nature of the code being executed, and the profile of the particular instructions, and so the results documented here are only a rough guide of 'best' performances for a limited, though not contrived, test.

# **Multi-processor System Modelling**

In this section is discussed the method for constructing a system model with multiple instantiations of the CPU model, to create a multi-processor system. Note that, to date, the model has not been tested extensively in this manner, but the model is designed to be able to support this, and the recommended method is described here.

# **Running Models Concurrently**

In the case study described in the previous section, a single model is instantiated, and is run by calling the model's <code>lm32\_run\_program()</code> method with an <code>exec\_type</code> argument set to a value of <code>LM32\_RUN\_FROM\_RESET</code>. This means that the CPU model will loop internally, executing instructions indefinitely, until such time as the registered external interrupt callback functions signal for a termination (returning a negative wakeup time), when the model will return from the function call. Using this method when requiring multiple CPU models to be running concurrently would require having each model running in a separate thread, with all the complexities that that would entail. Fortunately this is not required.

The model has two other execution types that can be used to enable concurrency: LM32\_RUN\_SINGLE\_STEP and LM32\_RUN\_TICK. The differences between these two types is explained in the 'Execution and Breakpoints' section but, in summary, the first steps one instruction, whilst the second advances the clock by one tick (which may, or may not execute an instruction, if waiting for the last to complete). The single stepping is the simplest and quickest way to advance the model but requires time synchronisation between the models (more below), whereas the ticking advances a known time (one cycle), but the models will needs calling more frequently. Note that ticking assumes that the timing model is enabled within the model. When compiled with LM32\_FAST\_COMPILE, for instance, the timing model is disabled, and a tick call and an instruction call are one and the same thing. In this case LM32\_RUN\_SINGLE\_STEP should be used, as the clock state is not updated.

When calling the run methods with either the step or tick execution type, the model will return after just one instruction or clock cycle. When multiple models are instantiated, these can be called within an external loop, one after the other, with either a step or tick execution type (don't mix the type between models though). Termination of this external loop is up to the implementer, but the status returned by the calls to the model can be inspected, and breaking the loop could be based on one or all of them requesting termination.

#### **Synchronising Time**

Running the models with an active timing model, but using stepped execution type (for speed), can cause drift in time between models if this is not managed. This is because instructions take different times to execute, and unless the models are all running identical programs, the state of their clocks will advance differently. In this case, the external program's loop needs to inspect time, and run the models appropriately.

The CPU model has a method to inspect time: 1m32\_get\_current\_time(). On the first loop, all model are stepped, and their time inspected. On the next iteration, the CPU with the earliest time is the one to be run, and any others that have this same earliest time. Any with a time in advance of this are not run. This continues on all subsequent iterations, until termination. This ensures that the CPUs are never more than a few cycles adrift, and keeps them in sync, whilst allowing reduction in the number of calls to the models from a pure ticking model. Whether a ticking model is better than a stepping and synchronising one is a matter of circumstances and preference.

## **Shared Callbacks**

Any system model using the Im32 model will have to have memory and interrupt callbacks registered in order to model external peripherals etc., as for the Linux case study described earlier. If a multi-processor system is to have completely different functionality implemented in each of the callback methods, then different functions can be registered for each of the instantiated CPUs. However, if the CPUs have shared functionality, such as modelling connection to a shared bus, with shared access to memory and peripherals in a common address space, then something else needs to be done.

The model does not return an ID when calling its registered callback functions, and so a method is needed to identify which CPU is making the call, if the callback code is to be shared. The idea here is to have the common code in a separate function, not registered with the models, but with an addition of an ID parameter. Individual 'wrapper' callback functions are registered with each separate CPU which simply calls the common code, with the addition of the ID for the particular processor. It is then up to the common code to keep separate contexts for each processor, where necessary, or access common state. The diagram below illustrates this concept:



# **Shared Memory Space**

As mentioned in previous sections, the model can have an internal memory, with controllable size, and memory space offset. This memory is separate for each CPU instantiation. The internal memory can be configured to be all, or partly, removed, and the memory callback functions trap and process memory addresses across all or part of the modelled space. To share an address space, whether mapped to memory, or to peripheral registers, the external memory callback functionality processes these addresses, and models the functionality.

For shared space, the external model common callback code must access the common state, regardless of the ID passed in. If modelling state unique to each processor, it must switch context based on the ID. One can also envisage state that is common between a subset of processors, but separate to another subset.

A similar sharing of interrupt sources can be envisaged for the interrupt callback as well, where some external interrupts raise an interrupt pin on all the CPUs (for a broadcast mailbox function, say), or be CPU specific (for a private peripheral, for example).

In this manner, it is possible to set up any arbitrary system of shared or particular memory and peripheral set between multiple CPU instantiations. Internal RAM can be used for private memory (modelling data and/or instruction TCM, say), with external callbacks mapping shared memory and peripherals to the appropriate CPUs. The CPU's program code can be located either privately in internal memory, or be common to multiple instantiations, as required. Such an example system is illustrated in the diagram below:



In this example three processors have access to shared memory and peripherals A and B (and any other on this bus). The third processor also has access to peripheral C, but has no internal memory. Thus, access to shared memory and peripherals A and B are handled by the common callback code, without regard to ID. All memory accesses from the third processor must be handled by the memory callback (as it has no internal memory), and peripheral C is only accessed when receiving an ID for the third processor, otherwise the access is marked as unprocessed. Not shown, but similarly for interrupts, it can be envisaged that one or more of the peripherals on the common bus could interrupt one or more processors as necessary. Peripheral C, however, would only interrupt the bottom processor. So two memory busses, and two interrupt busses are modelled, and can be extended at will.

# **Further Reading**

- [1] "LatticeMico32 Processor Reference Manual", June 2012, Lattice Semiconductors [2] "Using as, the GNU Assembler", version 2.19.51, 2009
- [3] "Linux Port to LatticeMico32 System Reference Guide", Lattice Semiconductors, 2008
- [4] "LatticeMico Timer", version 3.1, Lattice Semiconductors, 2012
- [5] "LatticeMico UART", version 3.8, Lattice Semiconductors, 2012