Skip to content

Review of Guideline #26

philippemilletresearch edited this page Jan 29, 2019 · 4 revisions

Review of Guideline #26

Formatting

Item Outcome
Guideline complies with the Guideline-Template Yes
Name of guideline responsible with affiliation is clearly stated Yes

Is the audience of the guideline correctly specified?

Yes.

Is the expertise required to write the guideline correctly specified?

Yes.

Are the keywords well selected?

Yes.

Are parts of the guideline too generic or too specific?

It is a bit too generic. The link to the TULIPP applications should be made more clear.

Also, the title is not perfect. You need to use branches when the algorithm branches -- so there is not really any choice. However, you can choose how to implement branches (and by the way, loops also contain a conditional branches). Maybe a better title can be "Beware of conditional branches" or something like that. To me, this is what the guideline is expressing. If you do not take care when implementing codes with lots of branches, performance will suffer on GPUs and resource usage will likely increase on FPGAs.

Also, please be specific on the use of GPU vs GPGPU. To me the GPU is the architecture and the GPGPU is the programming model and how it is supported on the architecture. I am not sure if this is the consensus definition, though. Also, the GPGPU abbreviation should be introduced and explained.

Does the guideline explicitly refer to the handbook? To which part of the deliverable is it relevant (e.g., chapter of D1.2/D1.3)?

Mainly to the hardware platforms chapter. If your target application has a significant amount of divergent branches, you should probably go for an FPGA accelerator rather than a GPU.

Does the guideline specify the work done in the project that can benefit from the guideline?

No, but it should. The link to the UAV use case should be made more explicit. What algorithm is it used for? What are the (quantitive) effects of implementing the algorithm in a branch-heavy style compared to the style recommended in the guideline.

Other comments

  1. Guideline advice: It is OK, but not fully precise. Conditional branching on FPGAs are cheap in the sense that there is commonly no performance overhead, but there is a resource usage overhead. If the HLS tool can group branches (see divergent branches below), the implementation will likely use (much) less resources than if branches cannot be grouped.

  2. Insights that led to the guideline: The treatment of GPU architecture is imprecise. Branches are only a problem for GPUs if they diverge within a warp. A somewhat simplified explanation of divergence is that some branch conditions of threads within the warp evaluate to true and some to false. In this case, the instruction is run twice -- once for the threads where the condition is true and once for the threads where the condition is false. Thus, conditional branches only create a problem when they are divergent. See the Reissmann et al. for more details.

  3. Recommended implementation method of the guideline along with a solid motivation for the recommendation: I cannot see how this programming style will help with divergence as it all depends on how the Condition_for_A variable is evaluated across threads within warps. Please explain.

  4. Instantiation of the recommended implementation method in the reference platform: Needs more details. Which algorithm/procedure/kernel? What does it do? How does it fit within the whole use case?

  5. Evaluation of the guideline in reference applications: We need a quantitative evaluation of how the suggested implementation method improves branch behaviour (and thereby performance) compared to a different style.

  6. References: These are OK. You should check out Efficient control flow restructuring for GPUs. We go into a fair bit of detail on the performance problems caused by divergent branches in that paper.

Track changes:

  1. 30/07/2018: Made some formatting changes in the guidelines to cope with template.
  2. 23/08/2018: Guideline reviewed.
  3. 25/01/2019: Guideline updated
Clone this wiki locally