-
Notifications
You must be signed in to change notification settings - Fork 0
Isolate designs to low power domains to reduce power consumption
Item | Value |
---|---|
Guideline Number | 19 |
Guideline Responsible (Name, Affiliation) | Björn Gottschall, NTNU |
Guideline Reviewer (Name, Affiliation) | Philippe Millet, Thales |
Guideline Audience (Category) | HW designers, System architects |
Guideline Expertise (Category) | HW designers |
Guideline Keywords (Category) | Energy efficiency |
Isolate functionality to power domains to reduce power consumption.
High-performance embedded platforms commonly include different power domains that can be independently clock-gated or power-gated. With clock-gating, the selected domain is not clocked and therefore transistors do not switch -- saving dynamic power. With power-gating the power supply of the domain is cut -- saving both dynamic and static power. A disadvantage of power-gating is that state elements (e.g., registers, memory) loose their contents. Thus, application state must be restored when the domain is powered up.
Recommended implementation method of the guideline along with a solid motivation for the recommendation
Constrain applications to work with low-power domains. Carefully trade the benefits of entering a power saving mode against the overheads of resuming application execution.
Zynq UltraScale+ MPSoC has three power domains -- full-power, low-power, and PL domain. Each domain has components that can be clock-gated. Not using entire power domains saves more power than using all domains with optimized clock-gating. Powering individual domains requires separate power rails, currently not available on the Tulipp reference platform. The full-power domain includes the main Application Processing Unit (APU, ARM Cortex-A53), the DDR-Controller, GPU and high speed connectivity like PCIe. The low-power domain provides the Real-Time Processing Unit (RPU, ARM Cortex-R5), Configuration Security Unit (CSU), Platform Management Unit (PMU), system monitor and general low-speed connectivity like USB. The PL power domain includes only the reconfigurable FPGA fabric as Programmable Logic (PL).
The reference platform does not provide power-gating functionality due to shared power rails. Therefore, it is not possible to shutdown individual power domains. Also, the evaluation does not include the power consumption of the PL as it is highly application and implementation dependent. However, the Xilinx Power Estimator (XPE) provides a worst-case value of 5.5 Watts for the PL alone for a rather unrealistic 100% usage of all its resources. The following evaluation should give some insight on possible power savings on the Tulipp reference platform by putting the APU and RPU of the UltraScale+ platform in different power states as well as considering to completely power off the platform and loading the PL configuration on startup.
The following case studies have been done:
- Having APU and RPU in reset state
- Waiting for interrupt on APU and RPU
- Compute intensive workload on APU and RPU
- Memory intensive workload on APU and RPU
- Waiting for interrupt on APU single core (all others are held in reset)
- Programming PL from bitstream in memory
- Programming PL from bitstream on SD-Card
Case | Power [W] | Time [mS] | Energy [J] |
---|---|---|---|
1 | 3.01465 | 10000 | 30.1465 |
2 | 4.95719 | 10000 | 49.5719 |
3 | 5.66150 | 10000 | 56.6150 |
4 | 6.66712 | 10000 | 66.6712 |
5 | 5.02270 | 10000 | 50.2270 |
6 | 5.20629 | 17.040 | 0.08872 |
7 | 5.05152 | 756.087 | 3.81939 |
The least power is drawn in case 1, in which the RPU and APU is held in reset, while the PL configuration is preserved. However, the processing system cannot wake up itself from this state and must be initialized externally (e.g. JTAG). In case 2 all processor cores are set to WFI (wait for interrupt), which draws less power than any application execution. In contrast case 5 shows a single core from the APU in WFI state which consumes surprisingly more power. Case 3 and 4 show worst-case power consumption on the processing system from testing compute-bound and memory-bound applications. In case 6, the PL is reconfigured by a prepared bitstream in the off chip memory, which takes only 17 milliseconds on a slightly higher power consumption than idle (WFI) with 5.2 Watts. Case 7 shows a reconfiguration process from SD-Card, as a bitstream is usually not prepared in the main memory after power on. This requires less power but significantly longer time -- due to the slow SD-Card interface.
These measurements clearly show that a considerable amount of energy can be saved by temporarily switching off the reference platform and investing the energy required for PL reconfiguration. However, application latencies added through the startup process needs to be considered in the design process. Waking the platform using interrupts saves around 0.5 Watts, keeps the PL reconfiguration in place and guarantees fast reaction times. Thus, the method of choice depends on the application requirements.
[1] Managing Power and Performance with the Zynq UltraScale+ MPSoC (.PDF)
Review
None
TULIPP Guideline Wiki