Skip to content

Isolate designs to low power domains to reduce power consumption

bgottschall edited this page Nov 5, 2018 · 19 revisions

Guideline Information

Item Value
Guideline Number 19
Guideline Responsible (Name, Affiliation) Björn Gottschall, NTNU
Guideline Reviewer (Name, Affiliation) Philippe Millet, Thales
Guideline Audience (Category) HW designers, System architects
Guideline Expertise (Category) HW designers
Guideline Keywords (Category) Energy efficiency

Guideline advice

Isolate functionality to power domains to reduce power consumption.

Insights that led to the guideline

High-performance embedded platforms commonly include different power domains that can be independently clock-gated or power-gated. With clock-gating, the selected domain is not clocked and therefore transistors do not switch -- saving dynamic power. With power-gating the power supply of the domain is cut -- saving both dynamic and static power. A disadvantage of power-gating is that state elements (e.g., registers, memory) loose their contents. Thus, application state must be restored when the domain is powered up.

Recommended implementation method of the guideline along with a solid motivation for the recommendation

Constrain applications to work with low-power domains. Carefully trade the benefits of entering a power saving mode against the overheads of resuming application execution.

Instantiation of the recommended implementation method in the reference platform

Zynq UltraScale+ MPSoC has three power domains -- full-power, low-power, and PL domain. Each domain has components that can be clock-gated. Not using entire power domains saves more power than using all domains with optimized clock-gating. Powering individual domains requires separate power rails, currently not available on the Tulipp reference platform. The full-power domain includes the main Application Processing Unit (APU, ARM Cortex-A53), the DDR-Controller, GPU and high speed connectivity like PCIe. The low-power domain provides the Real-Time Processing Unit (RPU, ARM Cortex-R5), Configuration Security Unit (CSU), Platform Management Unit (PMU), system monitor and general low-speed connectivity like USB. The PL power domain includes only the reconfigurable FPGA fabric as Programmable Logic (PL).

Evaluation of the guideline in reference applications

The reference platform does not provide power-gating functionality due to shared power rails. Therefore, it is not possible to shutdown individual power domains. Also, the evaluation does not include the power consumption of the PL as it is highly application and implementation dependent. However, the Xilinx Power Estimator (XPE) provides a worst-case value of 5.5 Watts for the PL alone for a rather unrealistic 100% usage of all its resources. The following evaluation should give some insight on possible power savings on the Tulipp reference platform by putting the APU and RPU of the UltraScale+ platform in different power states as well as considering to completely power off the platform and loading the PL configuration on startup.

The following case studies have been done:

  1. Having APU and RPU in reset state
  2. Waiting for interrupt on APU and RPU
  3. Compute intensive workload on APU and RPU
  4. Memory intensive workload on APU and RPU
  5. Waiting for interrupt on APU single core (all others are held in reset)
  6. Programming PL from bitstream in memory
  7. Programming PL from bitstream on SD-Card
Case Power [W] Time [mS] Energy [J]
1 3.01465 10000 30.1465
2 4.95719 10000 49.5719
3 5.66150 10000 56.6150
4 6.66712 10000 66.6712
5 5.02270 10000 50.2270
6 5.20629 17.040 0.08872
7 5.05152 756.087 3.81939

The least power is drawn in case 1, in which the RPU and APU is held in reset, while the PL configuration is preserved. However, the processing system cannot wake up itself from this state and must be initialized externally (e.g. JTAG). In case 2 all processor cores are set to WFI (wait for interrupt), which draws less power than any application execution. In contrast case 5 shows a single core from the APU in WFI state which consumes surprisingly more power. Case 3 and 4 show worst-case power consumption on the processing system from testing compute-bound and memory-bound applications. In case 6, the PL is reconfigured by a prepared bitstream in the off chip memory, which takes only 17 milliseconds on a slightly higher power consumption than idle (WFI) with 5.2 Watts. Case 7 shows a reconfiguration process from SD-Card, as a bitstream is usually not prepared in the main memory after power on. This requires less power but significantly longer time -- due to the slow SD-Card interface.

These measurements clearly show that a considerable amount of energy can be saved by temporarily switching off the reference platform and investing the energy required for PL reconfiguration. However, application latencies added through the startup process needs to be considered in the design process. Waking the platform using interrupts saves around 0.5 Watts, keeps the PL reconfiguration in place and guarantees fast reaction times. Thus, the method of choice depends on the application requirements.

References

[1] Managing Power and Performance with the Zynq UltraScale+ MPSoC (.PDF)
Review

Related guidelines

None

Clone this wiki locally