Writing for HW

Guideline Information

Item	Value
Guideline Number	22
Guideline Responsible (Name, Affiliation)	Asbjørn Djupdal, NTNU
Guideline Reviewer (Name, Affiliation)	Julian Haase, TUD
Guideline Audience (Category)	Application developer
Guideline Expertise (Category)	Toolchain designer
Guideline Keywords (Category)	Code Optimisation

Guideline advice

To keep the possibilities for HW/SW partitioning as open as possible, functions doing heavy calculations should be written with HLS in mind. Optimizations for execution on a CPU or an FPGA should come towards the end of the application development process, after the HW/SW partition has been fixed.

Insights that led to the guideline

Work on an automatic design space exploration tool has shown that it is non-trivial to convert a typical C function to a form suitable for HLS. Manual work is typically needed, sometimes resulting in large rewrites.

Recommended implementation method of the guideline along with a solid motivation for the recommendation

High level synthesis (HLS) makes it possible to write HW modules to be executed on an FPGA using normal C code. This is one of the programming models mentioned in chapter 5 of D1.3. HLS is useful for SW programmers without HDL experience which then can produce FPGA modules without learning a new language. It is also useful for experienced FPGA developers when full control over the resulting module is not needed or wanted.

The programmer can not, however, write code without considering the limitations of the HLS tool. In the early phases of application development, the final HW/SW partitioning of the application is likely still unknown. By keeping HLS in mind, less work will be required when C functions are later chosen for HW implementation.

The following guidelines increases the possibility that your code will be HLS compatible. In addition, the HLS analysis capability of the analysis tool (D4.4) can be used at any stage to assess the HLS compatibility of any part of the application code.

Organize your code such that the compute intensive parts are self-contained kernels. Keep system and library calls elsewhere.
HLS requires that C constructs are of a fixed or bounded size. Either use only fixed or bounded sizes in the compute kernels, or write the code such that it is easy to later fulfill this requirement.
Inside the compute kernels, memory allocation should be avoided and stack allocated variables should be preferred. When more memory is needed than can be safely allocated on the stack, #ifdef can be used to select between implementations using malloc and stack variables.
Avoid pointer casting, and if needed use only pointer casting between native C types
When using pointer arrays, make sure the pointers point to a scalar or array of scalars. Avoid pointer arrays pointing to additional pointers.
Do not make the compute kernels recursive

For further information on writing HLS compatible code, see Xilinx User Guide UG902.

Instantiation of the recommended implementation method in the reference platform

The LKOF image processing application has its main compute kernel written in two different versions: One optimized for running on the CPU and one optimized for HLS.

Evaluation of the guideline in reference applications

The LKOF image processing application demonstrates that the same optimisations can not be done for both running on the CPU and for HLS, and that the general advise of avoiding premature optimisations is valid also here. Instead of optimising early on, the LKOF application demonstrates that it is better to isolate the compute kernels early on and design them such that they are compatbile with both HLS and optimisations for running on the CPU. This makes sure that a future decision on optimizing for either HLS or CPU does not require a complete rewrite.

References

Xilinx User Guide UG902

Review

Related guidelines

TULIPP Guideline Wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly