# ARM® Cortex™-R Technology for Safe and Reliable Systems

by Andrew Frame, Senior Project Manager and Chris Turner, Product Marketing Manager, ARM

You probably don't realize it, but there is a good chance that you've used ARM® Cortex™-R technology hundreds of times today from accessing the data on your hard drive, downloading emails on your smartphone right through to safely making your journey to work this morning. All of this is because ARM Cortex™-R4 processors are shipping in high volume across a broad range of market segments including hard disk drive controllers, industrial control, wireless baseband processors, consumer products and electronic control units for automotive systems. The continued evolution of these markets and increasingly complex requirements has brought to light the next generation of innovation in the ARM Cortex-R profile.

Now, ARM has announced two new members of its real-time profile, the Cortex™-R5 processor and the Cortex™-R7 processor, which will bring to market significant performance and system-level enhancements to enable faster, safer and more reliable embedded products. This article introduces these new processors and their multicore and coherency technology, fast access to external peripherals and increasingly high levels of safety, as demanded by today's industry standards. All of this is sitting alongside high levels of configuration for precise targeting of application requirements including options for Instruction and Data cache controllers, Tightly-Coupled Memory (TCM) interfaces, memory protection, error correction, parity checking, a Floating-Point Unit (FPU), debug and trace.



# Cortex-R4 high-performance real-time processor

ince its launch in 2006 the ARM Cortex-R4 processor core has gained wide acceptance in the embedded systems industry with over twenty ARM silicon partners manufacturing semiconductor products in which it performs the central processing function. These products are



typically application-specific System-on-Chip, or SoC ASICs, designed for use in particular vertical applications such as automotive electronic control units, high-performance data storage and cellular baseband processing for advanced 3G and new 4G handsets and mobile comput-



Figure 1: Cortex-R4 processor

ing. Some of these ARM Partners have developed families of devices using Cortex-R4 processor with varying feature sets and levels of performance for products ranging from 3G USB modem sticks to automotive microcontrollers such as the TMS570 devices available from Texas Instruments. Infineon recently announced a Cortex-R4 processor-based medical device platform, MD8710. In all cases these devices are enabled by the Cortex-R4 processor's specific capabilities,

namely high computing performance, configurable features such as soft error handling, and the ability to respond deterministically to hard real-time events in an embedded system.

This deterministic hard real-time responsiveness is the distinguishing feature of all Cortex-R processors. It means that the processor can be depended upon in a system where unexpected delays might result in loss of data or mechanical damage. Thus the Cortex-R processors can be found at the heart of systems performing real-time tasks such as steer-bywire, anti-lock braking, hard disk drive servo control and 3G cellular data modems. These and other systems require both dependable and high performance processing.

The Cortex-R4 processor features an advanced high-performance eight-stage pipeline with instruction pre-fetch, branch prediction and dual-issue execution together with dedicated hardware for division and floating point. This processor delivers a high benchmark performance of 1.66 Dhrystone

MIPS per MHz at clock frequencies approaching 500 MHz on a 40 nm low-power process using standard cell libraries and compiled RAMs.

The Cortex-R4 processor's high-performance and hard real-time features include Harvard Instruction and Data caches, a Vectored Interrupt Controller (VIC) port and high-speed AMBA3 AXI bus ports connecting to memory and peripherals. There is an optional Floating Point Unit (FPU) and an optional Memory Protection Unit (MPU) which can protect areas of memory and peripherals from unintended software accesses. Also present in Cortex-R4 processor is a special memory interface for Tightly-Coupled Memory (TCM) which provides for the processor's deterministic interrupt response. TCM is used to hold critical sections of code and data, such as an Interrupt Service Routine (ISR), which is immediately available for execution as opposed to suffering variable and potentially long latency while it is fetched from main memory into cache.

In addition, the Cortex-R4 processor also has unique features for building dependable systems. These include a combination of parity checking and Error Correction Code (ECC) logic that can detect and correct soft and, in some cases, hard errors in the level-1 memory system, e,g, the cache and TCM RAMs. Sources of soft errors include induced signal glitches and radiation particles and these have become of more concern as semi-conductor processes fabricate ever-smaller gates widths at 40 nm and below. The parity checking and ECC logic is integrated within the processor microarchitecture such that the pipeline is flushed, a correction is made and execution continues automatically when an error occurs.

The Cortex-R4 processor is also designed for use in a dual core lockstep configuration where a redundant processor is used for error detection. In this configuration, both cores execute the same program



using the same data and additional checking logic inspects every cycle of operation to detect a difference in behavior that would indicate that a soft or hard error has occurred. The system can then switch into a fail-safe mode or take some other appropriate course of action.

Now, millions of people rely every day upon Cortex-R4 processors in their smartphones, disk drives and automotive systems. However, these and other applications are increasing the demand for more performance and more functionality as data rates increase, energy consumption and costs are reduced and as microelectronics penetrates more widely into applications. There is also an increasing emphasis on safety and reliability which is exemplified by the new ISO 26262 standards for automotive systems and which is also a consequence of the increasing number of applications using advanced semiconductor process technologies where soft errors may be of more concern.

# **Introducing the Cortex-R5 processor**

ARM recently announced the new Cortex-R5 processor to fulfil specific requirements that have evolved since the Cortex-R4 processor was introduced. The Cortex-R5 processor is instruction set compatible with, and builds upon, the Cortex-R4 processor's advanced features and capabilities by introducing new system-level integration features that facilitate higher levels of system performance, increased efficiency and reliability, and enhanced error management in dependable real-time systems.

The first of these new features is a Low-Latency Peripheral Port (LLPP) which is an additional bus port intended specifically for fast peripheral reads and writes. It is implemented as an AMBA® AXI port with an optional AMBA AHB port. By using the LLPP, the processor can always guarantee an immediate read or write to peripheral registers in a system where a bounded and deterministic response is required, ensuring that peripheral reads or writes are unaffected by cache refills and/or queued AMBA AXI bus transactions to main memory or other addresses. In particular, the LLPP can be used to interface with ARM's Generic Interrupt Controller (GIC), thus ensuring a more immediate response to interrupts.

Another significant optional feature of the Cortex-R5 processor is its Accelerator Coherency Port (ACP) which provides a mechanism for cache coherency with an external data source. Examples of such data sources are 3G/4G modems or a hard disk read channel that write data directly into the processor's level-2 memory system. By writing this data through the ACP, the processor's data cache is inspected using a micro-Snoop Control Unit (µSCU) and if the same data is currently in cache it is invalidated so that it is updated when the processor next accesses it. This cache coherency is transparent to the developer, obviating the need to monitor and maintain coherency through additional software overhead. It is estimated that this feature increases effective system performance by up to 25% compared to using a Cortex-R4 processor with software performing cache maintenance, while also increasing code reliability by removing the likelihood of software cache maintenance coding errors being introduced into the system.

The Cortex-R5 processor can be configured as a single or a dual core, in which case a single ACP maintains coherency in both processors.

The Cortex-R5 processor offers a broader range of dual core configuration options than was offered for the Cortex-R4 processor. Both processors can be configured with a redundant core in lock-step for safety-critical systems. However, the Cortex-R5 processor can also be





Figure 2: Cortex-R5 in single and dual core processor configurations



configured as dual cores running independently, each executing its own program with its own bus interfaces, interrupts, TCMs, etc. A number of applications can use such a dual core configuration, for example, a 3G modem running layer-2 protocol stack software on one core and layer-3 on the other. Such a system may require both processors to access the same data coming from an external source and in this dual core configuration the ACP and its associated micro-snoop control unit will maintain coherency in both cores' data caches.



Figure 3: ACP application in dual core 3G baseband

The FPU in the Cortex-R4 processor and the Cortex-R5 processor is capable of both single- and double-precision, with specifically optimized single-precision performance. Another feature of the Cortex-R5 processor is the option to use a single-precision only FPU. The single-precision only version of FPU offers a useful saving in silicon area and energy consumption when double-precision calculations are not required.

Finally, the Cortex-R5 processor extends ECC and Parity error management to all of its AMBA3 AXI bus ports; main AXI, AXI-slave port to TCM, LLPP and ACP. This allows an ECC-equipped level-2 memory or peripheral device to communicate with the processor over an ECC-widened AMBA AXI bus and, if an error is detected, the processor can correct it and re-execute the read or write instruction. This allows a complete system to be designed with end-to-end ECC capability, meeting key requirements in automotive, aerospace and other safety-critical application markets.

### Roadmap to the next real-time processor, Cortex-R7

Following on from Cortex-R4 and Cortex-R5 processors, ARM is releasing the new Cortex-R7 processor during 2011. The Cortex-R7 processor combines real-time features with even higher levels of performance, implementing a new microarchitecture with enhanced features and capable of running at higher clock frequencies. Overall the Cortex-R7 processor can deliver almost twice the performance of Cortex-R4 or Cortex-R5 processors on the same semiconductor process.

This increased level of performance meets the need of evolving application demands brought about by trends such as sensor fusion in automotive where a single powerful processor takes input from an increasing array of sensors around the vehicle for power-train and stability management, collision avoidance, steering, braking and so on. In high-performance storage the capacity and data rates available from magnetic disks continue to increase, placing increasing performance demands on the servo and the read and write channel processors. In mobile communications the increasing wireless broadband data rates,

up to 1 Gbps, available from 3G Long Term Evolution (LTE) and 4G LTE-Advanced also require enormous amounts of processing performance.

The Cortex-R7 processor delivers higher levels of performance through the introduction of new technology in its microarchitecture such as out-of-order execution and dynamic register renaming combined with improved branch prediction, extensive superscalar execution capability and faster hardware support for divide and other functions. It is benchmarked at more than 2.5 DMIPS/MHz and should support a clock frequency exceeding 600 MHz on 40 nm LP using standard compiled RAMs. With specifically optimized RAMs and other hardening techniques it should be possible to run at over 800 MHz on 40 nm LP when a single Cortex-R7 processor will provide 2,000 DMIPS performance. The processor can be implemented in either a single core configuration or as a dual core, as shown Figure 4.



Figure 4: Cortex-R7 dual core processor

A key feature of the Cortex-R7 processor is the introduction of a new class of level-2 memory known as Low-Latency RAM (LLRAM). This RAM is connected through a dedicated AMBA3 AXI bus port and is intended to complement the Cortex-R7 processor's internal TCM. Experience from fast real-time SoC system designs using the Cortex-R4 and Cortex-R5 processors has shown that TCM can limit performance as larger, and therefore slower, RAM arrays introduce wait state cycles. This limitation is exacerbated by the Cortex-R7 processor's higher clock frequencies. Thus the Cortex-R7 processor's TCM is organized as high-performance Harvard memory with separate ports for Instruction and Data TCM with RAM size limited to 128 KBytes. Meanwhile the LLRAM port provides for larger, flexible and unified Instruction and Data memory that is not blocked by transactions to the rest of level-2 memory on the main AMBA AXI bus port.



This additional layer of memory hierarchy gives designers the ability to maximize system performance, locating different parts of programs and data in memory of the most appropriate size and speed, and therefore minimizing energy consumption. Importantly, the LLRAM also has its coherency maintained between dual cores, which cannot be achieved in TCM (see Figure 5). Systems incorporating the Cortex-R7 processor will enjoy predictable and bounded response times while reducing, or in some cases eliminating, the need for expensive zero wait-state TCM.

The Cortex-R7 processor also introduces complete dual core and external data coherency to ARM's real-time processor line-up. The processor can be implemented in single or dual core configuration like the Cortex-R5 processor but now the Cortex-R7 processor also provides automated cache coherency between the two cores so there is no need for any software cache maintenance. This capability enables the introduction of Symmetric Multi-Processing (SMP) using advanced Real-time Operating Systems (RTOS) in hard real-time embedded systems. Cache coherency in the Cortex-R7 processor is implemented using a powerful Snoop Control Unit which operates efficiently by managing complete cache lines. A further advantage of the Cortex-R7 processor is that its caches can be operated in write-back mode while maintaining



Figure 5: ACP in the Cortex-R7 processor

cache coherency. Readers familiar with the ARM Cortex<sup>TM</sup>-A9 processor should note that the SCU in the Cortex-R7 processor is considerably smaller as it is only called upon to support a dual core configuration, and not a quad core as is possible in the Cortex-A9 processor. However, there are other differences introduced for real-time systems when the two cores run different programs with configuration for each core to prioritize its access to specified memory addresses or peripheral locations over the other core. ARM refers to this prioritized access through the SCU as 'Quality of Service.'

Real-time performance of the Cortex-R7 processor is further enhanced by the integration of an interrupt controller within the processor core, in either single or dual core configuration. This controller has similar capabilities to the standard ARM GIC and is able to distribute interrupts

across cores in a dual core configuration. Integrating the interrupt controller in this way passes both the initial interrupt event and also the interrupt vector into the processor in fewer cycles than would be achieved with an externally connected controller, thereby providing for faster entry into the ISR.

The Cortex-R7 processor also introduces innovative new error management techniques for application in safety-critical systems. Both soft and hard errors are managed according to programmable policies and these errors can be fixed transparently as they occur during execution. Soft errors are corrected using ECC in a similar way to the Cortex-R5 processor. Hard errors are fixed on the fly using an error bank memory in which the processor stores a hard error correction once it has been encountered for the first time.

These, and other features, make the Cortex-R7 processor highly suitable for use in the next generation of dependable systems in safety-critical and high availability applications. These processors already meet the advanced requirements driven by increasing data rates and performance coupled with increasing sensitivity to errors at advanced semiconductor process nodes. The Cortex-R7 processor delivers its huge levels of performance while achieving industry-leading energy efficiency of better than 11 DMIPS/mW on a 40 nm LP process.

### Conclusion

With the introduction of the Cortex-R5 and Cortex-R7 processors, ARM has significantly enhanced its high-performance, real-time processor roadmap, allowing designers to select the core most appropriate for their application in terms of features, performance, silicon area and power consumption. ARM's semiconductor partners are enabled to develop a range of embedded system SoC products by choosing from the Cortex-R line-up with software compatibility maintained across the range according to the ARMv7-R architecture definition. The Cortex-R5 processor introduced important new system-level features and the Cortex-R7 processor further offers significantly higher performance along with real-time SMP capability.

All three processors are highly configurable, allowing designers to select features and set parameters such as cache size, memory protection regions and debug capability. This enables die area and energy efficiency to be optimized for a particular application. All three cores can be implemented in lock-step for a safety-critical system and the Cortex-R5 and Cortex-R7 processors can also be implemented as dual cores for higher performance, with SMP supported in the Cortex-R7 processor.

With the introduction of the new Cortex-R5 and Cortex-R7 processors, ARM demonstrates its continuing investment in the embedded systems industry by delivering modern and innovative processors for demanding applications that will be fabricated using advanced semiconductor technologies. The Cortex-R7 processor provides the features and performance levels that will be required for next-generation real-time applications such as LTE-Advanced mobile baseband processing, very high-speed and high-capacity storage products and the next generation of safety-critical electronic control units for automotive, aerospace and similar applications.

