# UM-SJTU JOINT INSTITUTE System-on-Chip Design (ECE4810J)

## LABORATORY REPORT

# Lab 4. Optimizing Performance through Pipelining

# Group 2

 Name: Haochen Wu
 ID: 518021910558

 Name: Siyuan Zhang
 ID: 518370910180

 Name: Yihua Liu
 ID: 518021910998

Date: November 4, 2021

# Contents

| 1 | Ove              | erview                                             | 2  |
|---|------------------|----------------------------------------------------|----|
| 2 | Opt              | imizing Performance through Pipelining             | 2  |
|   | $2.\overline{1}$ | Create a Vivado HLS Project from Command Line      | 2  |
|   | 2.2              | Analyze the Created Project and Results            | 2  |
|   | 2.3              | Apply TRIPCOUNT Pragma                             | 2  |
|   | 2.4              | Turn OFF INLINE and Apply PIPELINE Directive       | 15 |
|   | 2.5              | Apply DATAFLOW Directive and Configuration Command | 17 |
|   | 2.6              | Export and Implement the Design in Vivado HLS      | 28 |
| 3 | Que              | estions                                            | 31 |

## 1 Overview

This is a lab for exercising the HLS flow on Zynq using Vivado. The goals of this lab are:

- Understand the effect of INLINE directive
- Improve performance using PIPELINE directive
- Distinguish between DATAFLOW directive and Configuration Command functionality

# 2 Optimizing Performance through Pipelining

The PDF pages are the results given by Vitis HLS 2021.1 and the figures are the results given by Vivado HLS 2019.2. The PDF pages are converted from HTML files of Export Wizard by open-source tool wkhtmltopdf.

- 2.1 Create a Vivado HLS Project from Command Line
- 2.2 Analyze the Created Project and Results
- 2.3 Apply TRIPCOUNT Pragma

# Synthesis Report for 'yuv\_filter'

# **General Information**

Date: Mon Nov 1 16:56:38 2021

**Version:** 2021.1.1 (Build 3286242 on Wed Jul 28 13:10:47 MDT 2021)

Project: yuv\_filter.prj

**Solution:** solution1 (Vivado IP Flow Target)

Product family: zynq

Target device: xc7z020-clg400-1

# **Performance Estimates**

#### • Timing

#### • Summary

|        |          |          | Uncertainty |
|--------|----------|----------|-------------|
| ap_clk | 10.00 ns | 6.651 ns | 2.70 ns     |

#### • Latency

#### • Summary

| Latence | y (cycles) | Latency  | (absolute)                                                        | l (cycles) | Tymo     |      |
|---------|------------|----------|-------------------------------------------------------------------|------------|----------|------|
| min     | min max    |          | atency (cycles) Latency (absolute) Interval ( min max min max min |            | max      | турс |
| 721205  | 44248325   | 7.212 ms | 0.442 sec                                                         | 721206     | 44248326 | no   |

#### Detail

#### Instance

| Instance             | Module    | Latenc | y (cycles) | Latency ( | (absolute) | Interva | ıl (cycles) | Tymo |
|----------------------|-----------|--------|------------|-----------|------------|---------|-------------|------|
| Histalice            | iviodule  | min    | max        | min       | max        | min     | max         | Турс |
| grp_rgb2yuv_1_fu_251 | rgb2yuv_1 | 240401 | 14749441   | 2.404 ms  | 0.147 sec  | 240401  | 14749441    | no   |
| grp_yuv2rgb_1_fu_271 | yuv2rgb_1 | 320401 | 19664641   | 3.204 ms  | 0.197 sec  | 320401  | 19664641    | no   |

#### Loop

| Loop Name |                    | Latency | (cycles)                       | Itaration Latoney | Initiation l |            | Trip Count | Dinalinad |
|-----------|--------------------|---------|--------------------------------|-------------------|--------------|------------|------------|-----------|
| l         | Loop Name          | min     | max Iteration Latency achieved |                   | target       | Trip Count | преше      |           |
|           | - YUV_SCALE_LOOP_X | 160400  | 9834240                        | 802 ~ 5122        | -            | -          | 200 ~ 1920 | no        |
|           | + YUV_SCALE_LOOP_Y | 800     | 5120                           | 4                 | -            | -          | 200 ~ 1280 | no        |

# **Utilization Estimates**

| Name            | BRAM_18K | DSP | FF     | LUT   | URAM |
|-----------------|----------|-----|--------|-------|------|
| DSP             | -        | -   | -      | -     | -    |
| Expression      | -        | -   | 0      | 130   | -    |
| FIFO            | -        | -   | -      | -     | -    |
| Instance        | -        | 7   | 341    | 965   | -    |
| Memory          | 12288    | -   | 0      | 0     | 0    |
| Multiplexer     | -        | -   | -      | 261   | -    |
| Register        | -        | -   | 214    | -     | -    |
| Total           | 12288    | 7   | 555    | 1356  | 0    |
| Available       | 280      | 220 | 106400 | 53200 | 0    |
| Utilization (%) | 4388     | 3   | ~0     | 2     | 0    |

#### • Detail

# • Instance

| Instance               | Module             | BRAM_18K | DSP | FF  | LUT | URAM |
|------------------------|--------------------|----------|-----|-----|-----|------|
| mul_8ns_8ns_15_1_1_U34 | mul_8ns_8ns_15_1_1 | 0        | 0   | 0   | 41  | 0    |
| mul_8ns_8ns_15_1_1_U35 | mul_8ns_8ns_15_1_1 | 0        | 0   | 0   | 41  | 0    |
| mul_8ns_8ns_15_1_1_U36 | mul_8ns_8ns_15_1_1 | 0        | 0   | 0   | 41  | 0    |
| grp_rgb2yuv_1_fu_251   | rgb2yuv_1          | 0        | 3   | 181 | 492 | 0    |
| grp_yuv2rgb_1_fu_271   | yuv2rgb_1          | 0        | 4   | 160 | 350 | 0    |
| Total                  | 5                  | 0        | 7   | 341 | 965 | 0    |

# • DSP

N/A

# • Memory

| Memory                 | Module             | BRAM_18K | FF | LUT | URAM | Words    | Bits | Banks | W*Bits*Banks |
|------------------------|--------------------|----------|----|-----|------|----------|------|-------|--------------|
| p_yuv_channels_ch1_U   | p_yuv_channels_ch1 | 2048     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_yuv_channels_ch2_U   | p_yuv_channels_ch1 | 2048     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_yuv_channels_ch3_U   | p_yuv_channels_ch1 | 2048     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_scale_channels_ch1_U | p_yuv_channels_ch1 | 2048     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_scale_channels_ch2_U | p_yuv_channels_ch1 | 2048     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_scale_channels_ch3_U | p_yuv_channels_ch1 | 2048     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| Total                  | 6                  | 12288    | 0  | 0   | 0    | 14745600 | 48   | 6     | 117964800    |

# • FIFO

N/A

# • Expression

| Variable Name         | Operation | DSP | FF | LUT | Bitwidth P0 | Bitwidth P1 |
|-----------------------|-----------|-----|----|-----|-------------|-------------|
| add_ln134_1_fu_370_p2 | +         | 0   | 0  | 29  | 22          | 22          |
| add_ln134_fu_349_p2   | +         | 0   | 0  | 29  | 22          | 22          |
| x_2_fu_319_p2         | +         | 0   | 0  | 23  | 16          | 1           |
| y_1_fu_360_p2         | +         | 0   | 0  | 23  | 16          | 1           |
| icmp_ln129_fu_314_p2  | icmp      | 0   | 0  | 13  | 16          | 16          |
| icmp_ln132_fu_355_p2  | icmp      | 0   | 0  | 13  | 16          | 16          |
| Total                 | 6         | 0   | 0  | 130 | 108         | 78          |

# • Multiplexer

| Name                          | LUT | Input Size | Bits | Total Bits |
|-------------------------------|-----|------------|------|------------|
| ap_NS_fsm                     | 48  | 9          | 1    | 9          |
| p_scale_channels_ch1_address0 | 14  | 3          | 22   | 66         |
| p_scale_channels_ch1_ce0      | 14  | 3          | 1    | 3          |
| p_scale_channels_ch2_address0 | 14  | 3          | 22   | 66         |
| p_scale_channels_ch2_ce0      | 14  | 3          | 1    | 3          |
| p_scale_channels_ch3_address0 | 14  | 3          | 22   | 66         |
| p_scale_channels_ch3_ce0      | 14  | 3          | 1    | 3          |
| p_yuv_channels_ch1_address0   | 14  | 3          | 22   | 66         |
| p_yuv_channels_ch1_ce0        | 14  | 3          | 1    | 3          |
| p_yuv_channels_ch1_we0        | 9   | 2          | 1    | 2          |
| p_yuv_channels_ch2_address0   | 14  | 3          | 22   | 66         |
| p_yuv_channels_ch2_ce0        | 14  | 3          | 1    | 3          |
| p_yuv_channels_ch2_we0        | 9   | 2          | 1    | 2          |
| p yuv channels ch3 address0   | 14  | 3          | 22   | 66         |

| p_yuv_channels_ch3_ce0 | 14  | 3  | 1   | 3   |
|------------------------|-----|----|-----|-----|
| p_yuv_channels_ch3_we0 | 9   | 2  | 1   | 2   |
| x_fu_96                | 9   | 2  | 16  | 32  |
| y_reg_240              | 9   | 2  | 16  | 32  |
| Total                  | 261 | 55 | 174 | 493 |

# • Register

| Name                              | FF  | LUT | Bits | Const Bits |
|-----------------------------------|-----|-----|------|------------|
| U_reg_542                         | 8   | 0   | 8    | 0          |
| V_reg_547                         | 8   | 0   | 8    | 0          |
| Y_reg_537                         | 8   | 0   | 8    | 0          |
| add_ln134_reg_502                 | 14  | 0   | 22   | 8          |
| ap_CS_fsm                         | 8   | 0   | 8    | 0          |
| grp_rgb2yuv_1_fu_251_ap_start_reg | 1   | 0   | 1    | 0          |
| grp_yuv2rgb_1_fu_271_ap_start_reg | 1   | 0   | 1    | 0          |
| p_yuv_height_reg_473              | 16  | 0   | 16   | 0          |
| p_yuv_width_reg_467               | 16  | 0   | 16   | 0          |
| trunc_ln1_reg_557                 | 8   | 0   | 8    | 0          |
| trunc_ln2_reg_562                 | 8   | 0   | 8    | 0          |
| trunc_ln_reg_552                  | 8   | 0   | 8    | 0          |
| x_2_reg_497                       | 16  | 0   | 16   | 0          |
| x_fu_96                           | 16  | 0   | 16   | 0          |
| y_1_reg_510                       | 16  | 0   | 16   | 0          |
| y_reg_240                         | 16  | 0   | 16   | 0          |
| zext_ln134_1_reg_515              | 22  | 0   | 64   | 42         |
| zext_ln137_reg_479                | 8   | 0   | 15   | 7          |
| zext_ln138_reg_484                | 8   | 0   | 15   |            |
| zext_ln139_reg_489                | 8   | 0   | 15   | 7          |
| Total                             | 214 | 0   | 285  | 71         |

# Interface

| RTL Ports                 | Dir | Bits | Protocol   | Source Object    | C Type       |
|---------------------------|-----|------|------------|------------------|--------------|
| ap_clk                    | in  | 1    | ap_ctrl_hs | yuv_filter       | return value |
| ap_rst                    | in  | 1    | ap_ctrl_hs | yuv_filter       | return value |
| ap_start                  | in  | 1    | ap_ctrl_hs | yuv_filter       | return value |
| ap_done                   | out | 1    | ap_ctrl_hs | yuv_filter       | return value |
| ap_idle                   | out | 1    | ap_ctrl_hs | yuv_filter       | return value |
| ap_ready                  | out | 1    | ap_ctrl_hs | yuv_filter       | return value |
| in_channels_ch1_address0  | out | 22   | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch1_ce0       | out | 1    | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch1_q0        | in  | 8    | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch2_address0  | out | 22   | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch2_ce0       | out | 1    | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch2_q0        | in  | 8    | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch3_address0  | out | 22   | ap_memory  | in_channels_ch3  | array        |
| in_channels_ch3_ce0       | out | 1    | ap_memory  | in_channels_ch3  | array        |
| in_channels_ch3_q0        | in  | 8    | ap_memory  | in_channels_ch3  | array        |
| in_width                  | in  | 16   | ap_none    | in_width         | pointer      |
| in_height                 | in  | 16   | ap_none    | in_height        | pointer      |
| out_channels_ch1_address0 | out | 22   | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch1_ce0      | out | 1    | ap_memory  | out_channels_ch1 | array        |

| 2                         |     |    |           |                  |         |
|---------------------------|-----|----|-----------|------------------|---------|
| out_channels_ch1_we0      | out | 1  | ap_memory | out_channels_ch1 | array   |
| out_channels_ch1_d0       | out | 8  | ap_memory | out_channels_ch1 | array   |
| out_channels_ch2_address0 | out | 22 | ap_memory | out_channels_ch2 | array   |
| out_channels_ch2_ce0      | out | 1  | ap_memory | out_channels_ch2 | array   |
| out_channels_ch2_we0      | out | 1  | ap_memory | out_channels_ch2 | array   |
| out_channels_ch2_d0       | out | 8  | ap_memory | out_channels_ch2 | array   |
| out_channels_ch3_address0 | out | 22 | ap_memory | out_channels_ch3 | array   |
| out_channels_ch3_ce0      | out | 1  | ap_memory | out_channels_ch3 | array   |
| out_channels_ch3_we0      | out | 1  | ap_memory | out_channels_ch3 | array   |
| out_channels_ch3_d0       | out | 8  | ap_memory | out_channels_ch3 | array   |
| out_width                 | out | 16 | ap_vld    | out_width        | pointer |
| out_width_ap_vld          | out | 1  | ap_vld    | out_width        | pointer |
| out_height                | out | 16 | ap_vld    | out_height       | pointer |
| out_height_ap_vld         | out | 1  | ap_vld    | out_height       | pointer |
| Y_scale                   | in  | 8  | ap_none   | Y_scale          | scalar  |
| U_scale                   | in  | 8  | ap_none   | U_scale          | scalar  |
| V_scale                   | in  | 8  | ap_none   | V_scale          | scalar  |



Figure 1. In 2.3, Step 3.

| Parameter              | Vivado HLS 2019.2                      | Vitis HLS 2021.1            |
|------------------------|----------------------------------------|-----------------------------|
| Estimated clock period | $10.723\mathrm{ns}$                    | $6.651\mathrm{ns}$          |
| Worst case latency     | $0.554 \sec (51621125 \text{ cycles})$ | 0.442 sec (44248325 cycles) |
| Number of DSP48E used  | 6                                      | 7                           |
| Number of BRAMs used   | 12288                                  | 12288                       |
| Number of FFs used     | 679                                    | 555                         |
| Number of LUTs used    | 1446                                   | 1356                        |

Table 1: Parameters of yuv\_filter function.

# Synthesis Report for 'rgb2yuv\_1'

# **General Information**

Date: Mon Nov 1 17:17:30 2021

**Version:** 2021.1.1 (Build 3286242 on Wed Jul 28 13:10:47 MDT 2021)

Project: yuv\_filter.prj

Solution: solution1 (Vivado IP Flow Target)

Product family: zynq

Target device: xc7z020-clg400-1

# **Performance Estimates**

#### • Timing

#### • Summary

|        |          |          | Uncertainty |
|--------|----------|----------|-------------|
| ap_clk | 10.00 ns | 6.270 ns | 2.70 ns     |

#### • Latency

#### Summary

| Latenc | y (cycles) | Latency  | (absolute) | Interva | Type     |      |
|--------|------------|----------|------------|---------|----------|------|
| min    | max        | min      | max        | min     | max      | Турс |
| 240401 | 14749441   | 2.404 ms | 0.147 sec  | 240401  | 14749441 | no   |

#### Detail

#### Instance

N/A

#### Loop

| Loop Name        | Latenc | y (cycles) | Iteration Latency | Initiation l | Interval | Trip Count | Dinalinad |
|------------------|--------|------------|-------------------|--------------|----------|------------|-----------|
| Loop Name        | min    | max        | lieration Latency | achieved     | target   | Trip Count | прешеч    |
| - RGB2YUV_LOOP_X | 240400 | 14749440   | 1202 ~ 7682       | -            | -        | 200 ~ 1920 | no        |
| + RGB2YUV_LOOP_Y | 1200   | 7680       | 6                 | -            | -        | 200 ~ 1280 | no        |

## **Utilization Estimates**

#### • Summary

| Name            | BRAM_18K | DSP | FF     | LUT   | URAM |
|-----------------|----------|-----|--------|-------|------|
| DSP             | -        | 3   | -      | -     | -    |
| Expression      | -        | -   | 0      | 344   | -    |
| FIFO            | -        | -   | -      | -     | -    |
| Instance        | -        | 0   | 0      | 82    | -    |
| Memory          | -        | -   | -      | -     | -    |
| Multiplexer     | -        | -   | -      | 66    | -    |
| Register        | -        | -   | 181    | -     | -    |
| Total           | 0        | 3   | 181    | 492   | 0    |
| Available       | 280      | 220 | 106400 | 53200 | 0    |
| Utilization (%) | 0        | 1   | ~0     | ~0    | 0    |

## • Detail

• Instance

| Instance             | Module            | BRAM_18 | K | DSP | FF | LUT | URAM |
|----------------------|-------------------|---------|---|-----|----|-----|------|
| mul_8ns_8s_16_1_1_U1 | mul_8ns_8s_16_1_1 |         | 0 | 0   | 0  | 41  | 0    |
| mul_8ns_8s_16_1_1_U2 | mul_8ns_8s_16_1_1 |         | 0 | 0   | 0  | 41  | 0    |
| Total                | 2                 |         | 0 | 0   | 0  | 82  | 0    |

# • DSP

| Instance                          | Module                         | Expression   |
|-----------------------------------|--------------------------------|--------------|
| mac_muladd_8ns_5ns_9ns_13_4_1_U3  | mac_muladd_8ns_5ns_9ns_13_4_1  | i0 + i1 * i2 |
| mac_muladd_8ns_7ns_16ns_16_4_1_U5 | mac_muladd_8ns_7ns_16ns_16_4_1 | i0 * i1 + i2 |
| mac_muladd_8ns_7s_16s_16_4_1_U4   | mac_muladd_8ns_7s_16s_16_4_1   | i0 * i1 + i2 |

# • Memory

N/A

# • FIFO

N/A

# $\circ$ Expression

| Variable Name        | Operation | DSP | FF | LUT | Bitwidth P0 | Bitwidth P1 |
|----------------------|-----------|-----|----|-----|-------------|-------------|
| add_ln54_1_fu_287_p2 | +         | 0   | 0  | 29  | 22          | 22          |
| add_ln54_fu_256_p2   | +         | 0   | 0  | 29  | 22          | 22          |
| add_ln57_1_fu_442_p2 | +         | 0   | 0  | 14  | 16          | 16          |
| add_ln57_2_fu_451_p2 | +         | 0   | 0  | 14  | 16          | 16          |
| add_ln57_3_fu_316_p2 | +         | 0   | 0  | 14  | 9           | 8           |
| add_ln57_fu_436_p2   | +         | 0   | 0  | 23  | 16          | 16          |
| add_ln58_1_fu_467_p2 | +         | 0   | 0  | 14  | 16          | 8           |
| add_ln58_2_fu_472_p2 | +         | 0   | 0  | 14  | 16          | 16          |
| add_ln59_1_fu_397_p2 | +         | 0   | 0  | 14  | 14          | 8           |
| add_ln59_2_fu_490_p2 | +         | 0   | 0  | 23  | 16          | 16          |
| out_channels_ch1_d0  | +         | 0   | 0  | 15  | 8           | 5           |
| x_6_fu_226_p2        | +         | 0   | 0  | 23  | 16          | 1           |
| y_fu_277_p2          | +         | 0   | 0  | 23  | 16          | 1           |
| sub_ln58_fu_358_p2   | -         | 0   | 0  | 23  | 16          | 16          |
| sub_ln59_1_fu_391_p2 | -         | 0   | 0  | 14  | 14          | 14          |
| sub_ln59_fu_370_p2   | -         | 0   | 0  | 14  | 1           | 13          |
| icmp_ln49_fu_221_p2  | icmp      | 0   | 0  | 13  | 16          | 16          |
| icmp_ln52_fu_272_p2  | icmp      | 0   | 0  | 13  | 16          | 16          |
| out_channels_ch2_d0  | xor       | 0   | 0  | 9   | 8           | 9           |
| out_channels_ch3_d0  | xor       | 0   | 0  | 9   | 8           | 9           |
| Total                | 20        | 0   | 0  | 344 | 282         | 248         |

# • Multiplexer

| Name         | LUT | Input Size | Bits | Total Bits |
|--------------|-----|------------|------|------------|
| ap_NS_fsm    | 48  | 9          | 1    | 9          |
| x_fu_108     | 9   | 2          | 16   | 32         |
| y_02_reg_202 | 9   | 2          | 16   | 32         |
| Total        | 66  | 13         | 33   | 73         |

# • Register

| Name      | FF | LUT | Bits | Const Bits |
|-----------|----|-----|------|------------|
| B_reg_619 | 8  | 0   | 8    | 0          |
| G_reg_637 | 8  | 0   | 8    | 0          |

| R_reg_613           | 8   | 0 | 8   | 0  |
|---------------------|-----|---|-----|----|
| add_ln54_reg_577    | 14  | 0 | 22  | 8  |
| add_ln59_1_reg_664  | 13  | 0 | 14  | 1  |
| ap_CS_fsm           | 8   | 0 | 8   | 0  |
| sub_ln58_reg_654    | 12  | 0 | 16  | 4  |
| trunc_ln5_reg_674   | 8   | 0 | 8   | 0  |
| trunc_ln6_reg_679   | 8   | 0 | 8   | 0  |
| trunc_ln_reg_669    | 8   | 0 | 8   | 0  |
| x_6_reg_572         | 16  | 0 | 16  | 0  |
| x_fu_108            | 16  | 0 | 16  | 0  |
| y_02_reg_202        | 16  | 0 | 16  | 0  |
| y_reg_585           | 16  | 0 | 16  | 0  |
| zext_ln54_1_reg_590 | 22  | 0 | 64  | 42 |
| Total               | 181 | 0 | 236 | 55 |

# Interface

| RTL Ports                 | Dir | Bits | Protocol   | Source Object    | C Type       |
|---------------------------|-----|------|------------|------------------|--------------|
| ap_clk                    | in  | 1    | ap_ctrl_hs | rgb2yuv.1        | return value |
| ap_rst                    | in  | 1    | ap_ctrl_hs | rgb2yuv.1        | return value |
| ap_start                  | in  | 1    | ap_ctrl_hs | rgb2yuv.1        | return value |
| ap_done                   | out | 1    | ap_ctrl_hs | rgb2yuv.1        | return value |
| ap_idle                   | out | 1    | ap_ctrl_hs | rgb2yuv.1        | return value |
| ap_ready                  | out | 1    | ap_ctrl_hs | rgb2yuv.1        | return value |
| ap_return_0               | out | 16   | ap_ctrl_hs | rgb2yuv.1        | return value |
| ap_return_1               | out | 16   | ap_ctrl_hs | rgb2yuv.1        | return value |
| in_channels_ch1_address0  | out | 22   | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch1_ce0       | out | 1    | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch1_q0        | in  | 8    | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch2_address0  | out | 22   | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch2_ce0       | out | 1    | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch2_q0        | in  | 8    | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch3_address0  | out | 22   | ap_memory  | in_channels_ch3  | array        |
| in_channels_ch3_ce0       | out | 1    | ap_memory  | in_channels_ch3  | array        |
| in_channels_ch3_q0        | in  | 8    | ap_memory  | in_channels_ch3  | array        |
| p_read                    | in  | 16   | ap_none    | p_read           | scalar       |
| p_read1                   | in  | 16   | ap_none    | p_read1          | scalar       |
| out_channels_ch1_address0 | out | 22   | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch1_ce0      | out | 1    | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch1_we0      | out | 1    | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch1_d0       | out | 8    | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch2_address0 | out | 22   | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_ce0      | out | 1    | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_we0      | out | 1    | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_d0       | out | 8    | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch3_address0 | out | 22   | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_ce0      | out |      | 1 —        | out_channels_ch3 | array        |
| out_channels_ch3_we0      | out |      | 1 = -      | out_channels_ch3 | array        |
| out_channels_ch3_d0       | out | 8    | ap_memory  | out_channels_ch3 | array        |



Figure 2. In 2.3, Step 8.

| Parameter              | Vivado HLS 2019.2           | Vitis HLS 2021.1            |  |  |
|------------------------|-----------------------------|-----------------------------|--|--|
| Estimated clock period | $10.283{\rm ns}$            | $6.270\mathrm{ns}$          |  |  |
| Worst case latency     | 0.177 sec (17207041 cycles) | 0.147 sec (14749441 cycles) |  |  |
| Number of DSP48E used  | 3                           | 3                           |  |  |
| Number of BRAMs used   | 0                           | 0                           |  |  |
| Number of FFs used     | 194                         | 181                         |  |  |
| Number of LUTs used    | 495                         | 492                         |  |  |

Table 2: Parameters of rgb2yuv function.

# Synthesis Report for 'yuv2rgb\_1'

# **General Information**

Date: Mon Nov 1 17:17:31 2021

**Version:** 2021.1.1 (Build 3286242 on Wed Jul 28 13:10:47 MDT 2021)

Project: yuv\_filter.prj

Solution: solution1 (Vivado IP Flow Target)

Product family: zynq

Target device: xc7z020-clg400-1

# **Performance Estimates**

#### • Timing

#### • Summary

|        |          |          | Uncertainty |
|--------|----------|----------|-------------|
| ap_clk | 10.00 ns | 6.651 ns | 2.70 ns     |

#### • Latency

#### Summary

| Latency (cycles) |        |          | Latency  | (absolute) | Interva | ıl (cycles) | Tyne |  |
|------------------|--------|----------|----------|------------|---------|-------------|------|--|
|                  | min    | max      | min      | max        | min     | max         | Турс |  |
|                  | 320401 | 19664641 | 3.204 ms | 0.197 sec  | 320401  | 19664641    | no   |  |

#### Detail

#### Instance

N/A

#### Loop

| Loop Name |                  | Latenc | y (cycles) | Iteration Latency | Initiation l | Interval | Trip Count | Dinalinad |
|-----------|------------------|--------|------------|-------------------|--------------|----------|------------|-----------|
|           | Loop Name        | min    | max        | lieration Latency | achieved     | target   | Trip Count | ripenneu  |
|           | - YUV2RGB_LOOP_X | 320400 | 19664640   | 1602 ~ 10242      | -            | -        | 200 ~ 1920 | no        |
|           | + YUV2RGB_LOOP_Y | 1600   | 10240      | 8                 | -            | -        | 200 ~ 1280 | no        |

## **Utilization Estimates**

#### • Summary

| Name            | BRAM_18k | DSP | FF     | LUT   | URAM |
|-----------------|----------|-----|--------|-------|------|
| DSP             | -        | 4   | -      | -     | -    |
| Expression      | -        | -   | 0      | 273   | -    |
| FIFO            | -        | -   | -      | -     | -    |
| Instance        | -        | -   | -      | -     | -    |
| Memory          | -        | -   | -      | -     | -    |
| Multiplexer     | -        | -   | -      | 77    | -    |
| Register        | -        | -   | 160    | -     | -    |
| Total           | (        | ) 4 | 160    | 350   | 0    |
| Available       | 280      | 220 | 106400 | 53200 | 0    |
| Utilization (%) | (        | 1   | ~0     | ~0    | 0    |

## • Detail

• Instance

# • DSP

| Instance                         | Module                       | Expression   |
|----------------------------------|------------------------------|--------------|
| mac_muladd_8s_8s_18s_18_4_1_U21  | mac_muladd_8s_8s_18s_18_4_1  | i0 + i1 * i2 |
| mac_muladd_8s_9ns_18s_18_4_1_U19 | mac_muladd_8s_9ns_18s_18_4_1 | i0 * i1 + i2 |
| mac_muladd_8s_9s_18s_18_4_1_U20  | mac_muladd_8s_9s_18s_18_4_1  | i0 + i1 * i2 |
| mac_muladd_9s_9ns_8ns_18_4_1_U18 | mac_muladd_9s_9ns_8ns_18_4_1 | i0 * i1 + i2 |

# • Memory

N/A

# • FIFO

N/A

# • Expression

| Variable Name          | Operation | DSP | FF | LUT | Bitwidth P0 | Bitwidth P1 |
|------------------------|-----------|-----|----|-----|-------------|-------------|
| C_fu_311_p2            | +         | 0   | 0  | 14  | 9           | 6           |
| add_ln102_1_fu_420_p2  | +         | 0   | 0  | 25  | 18          | 18          |
| add_ln102_fu_429_p2    | +         | 0   | 0  | 26  | 19          | 19          |
| add_ln94_1_fu_293_p2   | +         | 0   | 0  | 29  | 22          | 22          |
| add_ln94_fu_262_p2     | +         | 0   | 0  | 29  | 22          | 22          |
| x_4_fu_232_p2          | +         | 0   | 0  | 23  | 16          | 1           |
| y_2_fu_283_p2          | +         | 0   | 0  | 23  | 16          | 1           |
| icmp_ln100_fu_354_p2   | icmp      | 0   | 0  | 8   | 2           | 1           |
| icmp_ln101_fu_500_p2   | icmp      | 0   | 0  | 8   | 2           | 1           |
| icmp_ln102_fu_445_p2   | icmp      | 0   | 0  | 8   | 3           | 1           |
| icmp_ln89_fu_227_p2    | icmp      | 0   | 0  | 13  | 16          | 16          |
| icmp_ln92_fu_278_p2    | icmp      | 0   | 0  | 13  | 16          | 16          |
| or_ln100_fu_384_p2     | or        | 0   | 0  | 2   | 1           | 1           |
| or_ln101_fu_530_p2     | or        | 0   | 0  | 2   | 1           | 1           |
| or_ln102_fu_477_p2     | or        | 0   | 0  | 2   | 1           | 1           |
| B_fu_483_p3            | select    | 0   | 0  | 8   | 1           | 8           |
| G_fu_536_p3            | select    | 0   | 0  | 8   | 1           | 8           |
| R_fu_390_p3            | select    | 0   | 0  | 8   | 1           | 8           |
| select_ln100_fu_376_p3 | select    | 0   | 0  | 2   | 1           | 2           |
| select_ln101_fu_522_p3 | select    | 0   | 0  | 2   | 1           | 2           |
| select_ln102_fu_469_p3 | select    | 0   | 0  | 2   | 1           | 2           |
| D_fu_335_p2            | xor       | 0   | 0  | 9   | 8           | 9           |
| E_fu_321_p2            | xor       | 0   | 0  | 9   | 8           | 9           |
| Total                  | 23        | 0   | 0  | 273 | 186         | 175         |

# • Multiplexer

| Name      | LUT | Input Size | Bits | <b>Total Bits</b> |
|-----------|-----|------------|------|-------------------|
| ap_NS_fsm | 59  | 11         | 1    | 11                |
| x_fu_114  | 9   | 2          | 16   | 32                |
| y_reg_208 | 9   | 2          | 16   | 32                |
| Total     | 77  | 15         | 33   | 75                |

# • Register

| Name | FF | LUT | Bits | Const Bits |
|------|----|-----|------|------------|
|      |    |     |      |            |

| B_reg_689           | 8   | 0 | 8   | 0  |
|---------------------|-----|---|-----|----|
| D_reg_661           | 8   | 0 | 8   | 0  |
| G_reg_694           | 8   | 0 | 8   | 0  |
| R_reg_679           | 8   | 0 | 8   | 0  |
| add_ln100_reg_672   | 18  | 0 | 18  | 0  |
| add_ln94_reg_609    | 14  | 0 | 22  | 8  |
| ap_CS_fsm           | 10  | 0 | 10  | 0  |
| x_4_reg_604         | 16  | 0 | 16  | 0  |
| x_fu_114            | 16  | 0 | 16  | 0  |
| y_2_reg_617         | 16  | 0 | 16  | 0  |
| y_reg_208           | 16  | 0 | 16  | 0  |
| zext_ln94_1_reg_622 | 22  | 0 | 64  | 42 |
| Total               | 160 | 0 | 210 | 50 |

# Interface

| RTL Ports                 | Dir | Bits | Protocol   | Source Object    | C Type       |
|---------------------------|-----|------|------------|------------------|--------------|
| ap_clk                    | in  | 1    | ap_ctrl_hs | yuv2rgb.1        | return value |
| ap_rst                    | in  | 1    | ap_ctrl_hs | yuv2rgb.1        | return value |
| ap_start                  | in  | 1    | ap_ctrl_hs | yuv2rgb.1        | return value |
| ap_done                   | out | 1    | ap_ctrl_hs | yuv2rgb.1        | return value |
| ap_idle                   | out | 1    | ap_ctrl_hs | yuv2rgb.1        | return value |
| ap_ready                  | out | 1    | ap_ctrl_hs | yuv2rgb.1        | return value |
| ap_return_0               | out | 16   | ap_ctrl_hs | yuv2rgb.1        | return value |
| ap_return_1               | out | 16   | ap_ctrl_hs | yuv2rgb.1        | return value |
| in_channels_ch1_address0  | out | 22   | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch1_ce0       | out | 1    | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch1_q0        | in  | 8    | ap_memory  | in_channels_ch1  | array        |
| in_channels_ch2_address0  | out | 22   | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch2_ce0       | out | 1    | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch2_q0        | in  | 8    | ap_memory  | in_channels_ch2  | array        |
| in_channels_ch3_address0  | out | 22   | ap_memory  | in_channels_ch3  | array        |
| in_channels_ch3_ce0       | out | 1    | ap_memory  | in_channels_ch3  | array        |
| in_channels_ch3_q0        | in  | 8    | ap_memory  | in_channels_ch3  | array        |
| p_read                    | in  | 16   | ap_none    | p_read           | scalar       |
| p_read1                   | in  | 16   | ap_none    | p_read1          | scalar       |
| out_channels_ch1_address0 | out | 22   | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch1_ce0      | out | 1    | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch1_we0      | out | 1    | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch1_d0       | out | 8    | ap_memory  | out_channels_ch1 | array        |
| out_channels_ch2_address0 | out | 22   | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_ce0      | out | 1    | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_we0      | out | 1    | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_d0       | out | 8    | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch3_address0 | out | 22   | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_ce0      | out | 1    | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_we0      | out | 1    | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_d0       | out | 8    | ap_memory  | out_channels_ch3 | array        |



Figure 3. In 2.3, Step 9.

| Parameter              | Vivado HLS 2019.2           | Vitis HLS 2021.1                       |
|------------------------|-----------------------------|----------------------------------------|
| Estimated clock period | $10.723\mathrm{ns}$         | $6.651  \mathrm{ns}$                   |
| Worst case latency     | 0.211 sec (19664641 cycles) | $0.197 \sec (19664641 \text{ cycles})$ |
| Number of DSP48E used  | 3                           | 4                                      |
| Number of BRAMs used   | 0                           | 0                                      |
| Number of FFs used     | 195                         | 160                                    |
| Number of LUTs used    | 421                         | 350                                    |

Table 3: Parameters of yuv2rgb function.

# 2.4 Turn OFF INLINE and Apply PIPELINE Directive

# **Vitis HLS Report Comparison**

# **All Compared Solutions**

**solution1:** xc7z020-clg400-1 **solution2:** xc7z020-clg400-1

# **Performance Estimates**

• Timing

| Clock  |           | solution1 | solution2 |
|--------|-----------|-----------|-----------|
| ap_clk | Target    | 10.00 ns  | 10.00 ns  |
|        | Estimated | 6.651 ns  | 6.960 ns  |

• Latency

|                    |     | solution1 | solution2 |
|--------------------|-----|-----------|-----------|
| Latency (cycles)   | min | 721205    | 120038    |
|                    | max | 44248325  | 7372838   |
| Latency (absolute) | min | 7.212 ms  | 1.200 ms  |
|                    | max | 0.442 sec | 73.728 ms |
| Interval (cycles)  | min | 721206    | 120039    |
|                    | max | 44248326  | 7372839   |

## **Utilization Estimates**

|          | solution1 | solution2 |
|----------|-----------|-----------|
| BRAM_18K | 12288     | 12288     |
| DSP      | 7         | 10        |
| FF       | 555       | 1049      |
| LUT      | 1356      | 1767      |
| URAM     | 0         | 0         |

# **Resource Usage Implementation**

|       | solution1 | solution2 |
|-------|-----------|-----------|
| RTL   | verilog   | verilog   |
| SLICE | -         | -         |
| LUT   | -         | -         |
| FF    | -         | -         |
| DSP   | -         | -         |
| SRL   | -         | -         |
| BRAM  | -         | -         |

Need to run vivado synthesis/implementation to populate the real data for "-"

# **Final Timing Implementation**

|                                 | solution1 | solution2 |
|---------------------------------|-----------|-----------|
| RTL                             | verilog   | verilog   |
| CP required                     | -         | -         |
| CP achieved post-synthesis      | -         | -         |
| CP achieved post-implementation | -         | -         |

Need to run vivado synthesis/implementation to populate the real data for "-"



Figure 4. In 2.4, Step 15, a screenshot of the resources utilization.

The FFs, LUTs, and DSP48E utilization increased whereas BRAM remained same.

# 2.5 Apply DATAFLOW Directive and Configuration Command

# Synthesis Report for 'yuv\_filter'

# **General Information**

Date: Mon Nov 1 19:24:14 2021

**Version:** 2021.1.1 (Build 3286242 on Wed Jul 28 13:10:47 MDT 2021)

Project: yuv\_filter.prj

Solution: solution3 (Vivado IP Flow Target)

Product family: zynq

Target device: xc7z020-clg400-1

# **Performance Estimates**

## • Timing

#### • Summary

| Clock  | Target   | Estimated | Uncertainty |
|--------|----------|-----------|-------------|
| ap_clk | 10.00 ns | 7.271 ns  | 2.70 ns     |

#### • Latency

#### Summary

| Latency | y (cycles) | Latency  | Latency (absolute) Interval (cycles) |         | Interval (cycles) |          |  |
|---------|------------|----------|--------------------------------------|---------|-------------------|----------|--|
| min     | max        | min      | max                                  | min max |                   | Type     |  |
| 120037  | 7372837    | 1.200 ms | 73.728 ms                            | 40013   | 2457613           | dataflow |  |

#### Detail

#### Instance

| Instance      | Module     | Latenc | y (cycles) | Latency  | Latency (absolute) |       | Interval (cycles) |      |  |
|---------------|------------|--------|------------|----------|--------------------|-------|-------------------|------|--|
| Histairce     | Wiodule    | min    | max        | min      | max                | min   | max               | турс |  |
| entry_proc_U0 | entry_proc | 0      | 0          | 0 ns     | 0 ns               | 0     | 0                 | no   |  |
| rgb2yuv_1_U0  | rgb2yuv_1  | 40012  | 2457612    | 0.400 ms | 24.576 ms          | 40012 | 2457612           | no   |  |
| yuv_scale_U0  | yuv_scale  | 40011  | 2457611    | 0.400 ms | 24.576 ms          | 40011 | 2457611           | no   |  |
| yuv2rgb_1_U0  | yuv2rgb_1  | 40012  | 2457612    | 0.400 ms | 24.576 ms          | 40012 | 2457612           | no   |  |

Loop

N/A

## **Utilization Estimates**

| Name            | BRAM_18K | DSP | FF     | LUT   | URAM |
|-----------------|----------|-----|--------|-------|------|
| DSP             | -        | -   | -      | -     | -    |
| Expression      | -        | -   | 0      | 60    | -    |
| FIFO            | -        | -   | 693    | 476   | -    |
| Instance        | -        | 11  | 1155   | 1692  | -    |
| Memory          | 24576    | -   | 0      | 0     | 0    |
| Multiplexer     | -        | -   | -      | 108   | -    |
| Register        | -        | -   | 12     | -     | -    |
| Total           | 24576    | 11  | 1860   | 2336  | 0    |
| Available       | 280      | 220 | 106400 | 53200 | 0    |
| Utilization (%) | 8777     | 5   | 1      | 4     | 0    |

## • Instance

| Instance      | Module     | BRAM_18K | DSP | FF   | LUT  | URAM |
|---------------|------------|----------|-----|------|------|------|
| entry_proc_U0 | entry_proc | 0        | 0   | 2    | 38   | 0    |
| rgb2yuv_1_U0  | rgb2yuv_1  | 0        | 6   | 411  | 543  | 0    |
| yuv2rgb_1_U0  | yuv2rgb_1  | 0        | 4   | 373  | 613  | 0    |
| yuv_scale_U0  | yuv_scale  | 0        | 1   | 369  | 498  | 0    |
| Total         | 4          | 0        | 11  | 1155 | 1692 | 0    |

# • DSP

N/A

# • Memory

| Memory                 | Module             | BRAM_18K | FF | LUT | URAM | Words    | Bits | Banks | W*Bits*Banks |
|------------------------|--------------------|----------|----|-----|------|----------|------|-------|--------------|
| p_yuv_channels_ch1_U   | p_yuv_channels_ch1 | 4096     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_yuv_channels_ch2_U   | p_yuv_channels_ch1 | 4096     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_yuv_channels_ch3_U   | p_yuv_channels_ch1 | 4096     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_scale_channels_ch1_U | p_yuv_channels_ch1 | 4096     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_scale_channels_ch2_U | p_yuv_channels_ch1 | 4096     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| p_scale_channels_ch3_U | p_yuv_channels_ch1 | 4096     | 0  | 0   | 0    | 2457600  | 8    | 1     | 19660800     |
| Total                  | 6                  | 24576    | 0  | 0   | 0    | 14745600 | 48   | 6     | 117964800    |

# • FIFO

| Name             | BRAM_18K | FF  | LUT | URAM | Depth | Bits | Size:D*B |
|------------------|----------|-----|-----|------|-------|------|----------|
| U_scale_c_U      | 0        | 99  | 0   | -    | 3     | 8    | 24       |
| V_scale_c_U      | 0        | 99  | 0   | -    | 3     | 8    | 24       |
| Y_scale_c_U      | 0        | 99  | 0   | -    | 3     | 8    | 24       |
| p_scale_height_U | 0        | 99  | 0   | -    | 2     | 16   | 32       |
| p_scale_width_U  | 0        | 99  | 0   | -    | 2     | 16   | 32       |
| p_yuv_height_U   | 0        | 99  | 0   | -    | 2     | 16   | 32       |
| p_yuv_width_U    | 0        | 99  | 0   | -    | 2     | 16   | 32       |
| Total            | 0        | 693 | 0   | 0    | 17    | 88   | 200      |

# • Expression

| Variable Name                        | Operation | DSP | FF | LUT | Bitwidth P0 | Bitwidth P1 |
|--------------------------------------|-----------|-----|----|-----|-------------|-------------|
| ap_channel_done_p_scale_channels_ch1 | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_scale_channels_ch2 | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_scale_channels_ch3 | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_scale_height       | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_scale_width        | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_yuv_channels_ch1   | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_yuv_channels_ch2   | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_yuv_channels_ch3   | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_yuv_height         | and       | 0   | 0  | 2   | 1           | 1           |
| ap_channel_done_p_yuv_width          | and       | 0   | 0  | 2   | 1           | 1           |
| ap_idle                              | and       | 0   | 0  | 2   | 1           | 1           |
| ap_sync_ready                        | and       | 0   | 0  | 2   | 1           | 1           |
| entry_proc_U0_ap_start               | and       | 0   | 0  | 2   | 1           | 1           |
| rgb2yuv_1_U0_ap_continue             | and       | 0   | 0  | 2   | 1           | 1           |
| rgb2yuv_1_U0_ap_start                | and       | 0   | 0  | 2   | 1           | 1           |
| yuv2rgb_1_U0_ap_start                | and       | 0   | 0  | 2   | 1           | 1           |
| yuv_scale_U0_ap_continue             | and       | 0   | 0  | 2   | 1           | 1           |
| yuv_scale_U0_ap_start                | and       | 0   | 0  | 2   | 1           | 1           |

| ap_sync_channel_write_p_scale_channels_ch1 | or | 0 | 0 | 2  | 1  | 1  |
|--------------------------------------------|----|---|---|----|----|----|
| ap_sync_channel_write_p_scale_channels_ch2 | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_channel_write_p_scale_channels_ch3 | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_channel_write_p_scale_height       | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_channel_write_p_scale_width        | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_channel_write_p_yuv_channels_ch1   | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_channel_write_p_yuv_channels_ch2   | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_channel_write_p_yuv_channels_ch3   | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_channel_write_p_yuv_height         | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_channel_write_p_yuv_width          | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_entry_proc_U0_ap_ready             | or | 0 | 0 | 2  | 1  | 1  |
| ap_sync_rgb2yuv_1_U0_ap_ready              | or | 0 | 0 | 2  | 1  | 1  |
| Total                                      | 30 | 0 | 0 | 60 | 30 | 30 |

# • Multiplexer

| Name                                           | LUT | Input Size | Bits | Total Bits |
|------------------------------------------------|-----|------------|------|------------|
| ap_sync_reg_channel_write_p_scale_channels_ch1 | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_scale_channels_ch2 | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_scale_channels_ch3 | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_scale_height       | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_scale_width        | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_yuv_channels_ch1   | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_yuv_channels_ch2   | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_yuv_channels_ch3   | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_yuv_height         | 9   | 2          | 1    | 2          |
| ap_sync_reg_channel_write_p_yuv_width          | 9   | 2          | 1    | 2          |
| ap_sync_reg_entry_proc_U0_ap_ready             | 9   | 2          | 1    | 2          |
| ap_sync_reg_rgb2yuv_1_U0_ap_ready              | 9   | 2          | 1    | 2          |
| Total                                          | 108 | 24         | 12   | 24         |

# • Register

| Name                                           | FF | LUT | Bits | Const Bits |
|------------------------------------------------|----|-----|------|------------|
| ap_sync_reg_channel_write_p_scale_channels_ch1 | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_scale_channels_ch2 | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_scale_channels_ch3 | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_scale_height       | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_scale_width        | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_yuv_channels_ch1   | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_yuv_channels_ch2   | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_yuv_channels_ch3   | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_yuv_height         | 1  | 0   | 1    | 0          |
| ap_sync_reg_channel_write_p_yuv_width          | 1  | 0   | 1    | 0          |
| ap_sync_reg_entry_proc_U0_ap_ready             | 1  | 0   | 1    | 0          |
| ap_sync_reg_rgb2yuv_1_U0_ap_ready              | 1  | 0   | 1    | 0          |
| Total                                          | 12 | 0   | 12   | 0          |

# Interface

| RTL Ports | Dir | Bits | Protocol   | Source Object | C Type       |
|-----------|-----|------|------------|---------------|--------------|
| ap_clk    | in  | 1    | ap_ctrl_hs | yuv_filter    | return value |
| ap_rst    | in  | 1    | ap_ctrl_hs | yuv_filter    | return value |
| ap_start  | in  | 1    | ap_ctrl_hs | yuv_filter    | return value |
| ap_done   | out | 1    | ap_ctrl_hs | yuv_filter    | return value |
|           |     |      |            |               |              |

| ap_ready                  | out |    | _   | p_ctrl_hs | _    |               | return value |
|---------------------------|-----|----|-----|-----------|------|---------------|--------------|
| ap_idle                   | out |    |     | p_ctrl_hs |      | · -           | return value |
| in_channels_ch1_address0  | out |    |     | _memory   | _    | _channels_ch1 | array        |
| in_channels_ch1_ce0       | out | 1  | ap_ | _memory   |      | _channels_ch1 | array        |
| in_channels_ch1_d0        | out |    |     | _memory   | _    | _channels_ch1 | array        |
| in_channels_ch1_q0        | in  | 8  | ap_ | _memory   | -    | _channels_ch1 | array        |
| in_channels_ch1_we0       | out |    |     | _memory   | _    | _channels_ch1 | array        |
| in_channels_ch1_address1  | out | 22 | ap_ | _memory   | _    | _channels_ch1 | array        |
| in_channels_ch1_ce1       | out | 1  | ap_ | _memory   | in   | _channels_ch1 | array        |
| in_channels_ch1_d1        | out | 8  | ap_ | _memory   | in_  | _channels_ch1 | array        |
| in_channels_ch1_q1        | in  | 8  | ap_ | _memory   | in_  | _channels_ch1 | array        |
| in_channels_ch1_we1       | out |    |     | _memory   | _    | _channels_ch1 | array        |
| in_channels_ch2_address0  | out | 22 | ap_ | _memory   |      | _channels_ch2 |              |
| in_channels_ch2_ce0       | out | 1  | ap_ | memory    | in_  | _channels_ch2 | array        |
| in_channels_ch2_d0        | out | 8  | ap_ | _memory   | in_  | _channels_ch2 | array        |
| in_channels_ch2_q0        | in  | 8  | ap_ | _memory   | in_  | _channels_ch2 | array        |
| in_channels_ch2_we0       | out | 1  | ap_ | memory    | in_  | _channels_ch2 | array        |
| in_channels_ch2_address1  | out | 22 | ap_ | _memory   | in_  | _channels_ch2 | array        |
| in_channels_ch2_ce1       | out | 1  | ap_ | _memory   | in_  | _channels_ch2 | array        |
| in_channels_ch2_d1        | out | 8  | ap_ | _memory   | in_  | _channels_ch2 | array        |
| in_channels_ch2_q1        | in  | 8  | ap_ | _memory   | in_  | _channels_ch2 | array        |
| in_channels_ch2_we1       | out | 1  | ap_ | _memory   | in_  | _channels_ch2 | array        |
| in_channels_ch3_address0  | out | 22 | ap_ | _memory   | in_  | _channels_ch3 | array        |
| in_channels_ch3_ce0       | out | 1  | ap_ | _memory   | in_  | _channels_ch3 | array        |
| in_channels_ch3_d0        | out | 8  | ap_ | _memory   | in_  | _channels_ch3 | array        |
| in_channels_ch3_q0        | in  | 8  | ap_ | memory    | in_  | channels_ch3  | array        |
| in_channels_ch3_we0       | out | 1  | ap_ | _memory   | in_  | _channels_ch3 | array        |
| in_channels_ch3_address1  | out | 22 | ap_ | memory    | in_  | channels_ch3  | array        |
| in_channels_ch3_ce1       | out | 1  | ap_ | _memory   | in_  | _channels_ch3 | array        |
| in_channels_ch3_d1        | out | 8  | ap_ | _memory   | in_  | _channels_ch3 | array        |
| in_channels_ch3_q1        | in  | 8  | ap_ | memory    | in_  | channels_ch3  | array        |
| in_channels_ch3_we1       | out | 1  | ap_ | _memory   | in_  | channels_ch3  | array        |
| in_width                  | in  | 16 |     | ap_none   |      | in_width      | pointer      |
| in_height                 | in  | 16 |     | ap_none   |      | in_height     | pointer      |
| out_channels_ch1_address0 | out | 22 | ap  | memory    | out_ | channels_ch1  | array        |
| out_channels_ch1_ce0      | out | 1  | ap  | memory    | out_ | channels_ch1  | array        |
| out_channels_ch1_d0       | out | 8  | ap_ | memory    | out_ | channels_ch1  | array        |
| out_channels_ch1_q0       | in  | 8  | ap_ | memory    | out_ | channels_ch1  | array        |
| out_channels_ch1_we0      | out | 1  | ap  | memory    | out  | channels_ch1  | array        |
| out_channels_ch1_address1 | out | 22 | ap_ | memory    | out_ | channels_ch1  | array        |
| out_channels_ch1_ce1      | out | 1  | ap_ | memory    | out_ | channels_ch1  | array        |
| out_channels_ch1_d1       | out | 8  | ap_ | memory    | out_ | channels_ch1  | array        |
| out_channels_ch1_q1       | in  | 8  | ap_ | memory    | out_ | channels_ch1  | array        |
| out_channels_ch1_we1      | out | 1  | ap_ | memory    | out_ | channels_ch1  | array        |
| out_channels_ch2_address0 | out | 22 | ap  | memory    | out_ | channels_ch2  | array        |
| out_channels_ch2_ce0      | out | 1  | ap  | memory    | out_ | channels_ch2  | array        |
| out_channels_ch2_d0       | out | 8  | ap  | memory    | out_ | channels_ch2  | array        |
| out_channels_ch2_q0       | in  | 8  | ap  | memory    | out  | channels_ch2  | array        |
| out_channels_ch2_we0      | out | 1  | ap  | memory    | out  | channels_ch2  | array        |
| out_channels_ch2_address1 | out | 22 | ap  | memory    | out  | channels_ch2  | array        |
| out channels ch2 ce1      | out | 1  | ар  | memory    | out  | channels ch2  | array        |
| out channels ch2 d1       | out |    |     |           |      | channels ch2  |              |
| out channels ch2 q1       | in  |    |     | <u> </u>  |      | channels ch2  |              |
|                           |     |    |     |           | _    |               |              |
| out channels ch2 we1      | out | 1  | an  | memory    | out  | channels ch2  | array        |

| out | 22                                      | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|-----|-----------------------------------------|------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| out | 1                                       | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| out | 8                                       | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| in  | 8                                       | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| out | 1                                       | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| out | 22                                      | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| out | 1                                       | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| out | 8                                       | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| in  | 8                                       | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| out | 1                                       | ap_memory                                                                                | out_channels_ch3                                                                                                                                                                                                                                                                                                                                                                                                                                                    | array                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| out | 16                                      | ap_vld                                                                                   | out_width                                                                                                                                                                                                                                                                                                                                                                                                                                                           | pointer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| out | 1                                       | ap_vld                                                                                   | out_width                                                                                                                                                                                                                                                                                                                                                                                                                                                           | pointer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| out | 16                                      | ap_vld                                                                                   | out_height                                                                                                                                                                                                                                                                                                                                                                                                                                                          | pointer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| out | 1                                       | ap_vld                                                                                   | out_height                                                                                                                                                                                                                                                                                                                                                                                                                                                          | pointer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| in  | 8                                       | ap_none                                                                                  | Y_scale                                                                                                                                                                                                                                                                                                                                                                                                                                                             | scalar                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| in  | 8                                       | ap_none                                                                                  | U_scale                                                                                                                                                                                                                                                                                                                                                                                                                                                             | scalar                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| in  | 8                                       | ap_none                                                                                  | V_scale                                                                                                                                                                                                                                                                                                                                                                                                                                                             | scalar                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|     | out | out 1 out 8 in 8 out 1 out 22 out 1 out 8 in 8 out 1 out 16 out 1 out 16 out 1 in 8 in 8 | out 1 ap_memory out 8 ap_memory in 8 ap_memory out 1 ap_memory out 22 ap_memory out 1 ap_memory out 8 ap_memory out 8 ap_memory out 1 ap_vld | out 1 ap_memory out_channels_ch3 out 8 ap_memory out_channels_ch3 in 8 ap_memory out_channels_ch3 out 1 ap_memory out_channels_ch3 out 22 ap_memory out_channels_ch3 out 1 ap_memory out_channels_ch3 out 8 ap_memory out_channels_ch3 out 8 ap_memory out_channels_ch3 out 1 ap_memory out_channels_ch3 out 1 ap_memory out_channels_ch3 out 1 ap_memory out_channels_ch3 out 1 ap_wld out_width out 1 ap_vld out_width out 16 ap_vld out_height out 1 ap_vld out_height in 8 ap_none Y_scale in 8 ap_none U_scale |



Figure 5. In 2.5, Step 10, a screenshot of the Utilization Estimates.

The number of BRAMs required has doubled.

Apply Dataflow configuration command, generate the solution, and observe the improved resources utilization.

# Synthesis Report for 'yuv\_filter'

# **General Information**

Date: Tue Nov 2 10:42:00 2021

**Version:** 2021.1.1 (Build 3286242 on Wed Jul 28 13:10:47 MDT 2021)

Project: yuv\_filter.prj

Solution: solution3 (Vivado IP Flow Target)

Product family: zynq

Target device: xc7z020-clg400-1

# **Performance Estimates**

## • Timing

#### • Summary

| Clock  | Target   | Estimated | Uncertainty |
|--------|----------|-----------|-------------|
| ap_clk | 10.00 ns | 7.271 ns  | 2.70 ns     |

#### • Latency

#### Summary

| Latenc | y (cycles) | Latency  | Latency (absolute) Interval (cycles) |       |         |          |  |
|--------|------------|----------|--------------------------------------|-------|---------|----------|--|
| min    | max        | min      | max                                  | min   | max     | Type     |  |
| 40023  | 2457623    | 0.400 ms | 24.576 ms                            | 40015 | 2457615 | dataflow |  |

#### Detail

#### Instance

| Instance      | Module     | Latenc | y (cycles) | Latency  | (absolute) | Interva | ıl (cycles) | Typo |
|---------------|------------|--------|------------|----------|------------|---------|-------------|------|
| Histairce     | Wiodule    | min    | max        | min      | max        | min     | max         | турс |
| rgb2yuv_1_U0  | rgb2yuv_1  | 40014  | 2457614    | 0.400 ms | 24.576 ms  | 40014   | 2457614     | no   |
| entry_proc_U0 | entry_proc | 0      | 0          | 0 ns     | 0 ns       | 0       | 0           | no   |
| yuv_scale_U0  | yuv_scale  | 40009  | 2457609    | 0.400 ms | 24.576 ms  | 40009   | 2457609     | no   |
| yuv2rgb_1_U0  | yuv2rgb_1  | 40012  | 2457612    | 0.400 ms | 24.576 ms  | 40012   | 2457612     | no   |

Loop

N/A

## **Utilization Estimates**

| Name            | BRAM_18K | DSP | FF     | LUT   | URAM |
|-----------------|----------|-----|--------|-------|------|
| DSP             | -        | -   | -      | _     | -    |
| Expression      | -        | -   | 0      | 12    | -    |
| FIFO            | -        | -   | 1287   | 884   | -    |
| Instance        | -        | 11  | 962    | 1803  | -    |
| Memory          | -        | -   | -      | -     | -    |
| Multiplexer     | -        | -   | -      | 18    | -    |
| Register        | -        | -   | 2      | -     | -    |
| Total           | 0        | 11  | 2251   | 2717  | 0    |
| Available       | 280      | 220 | 106400 | 53200 | 0    |
| Utilization (%) | 0        | 5   | 2      | 5     | 0    |

## • Instance

| Instance      | Module     | BRAM_18K | DSP | FF  | LUT  | URAM |
|---------------|------------|----------|-----|-----|------|------|
| entry_proc_U0 | entry_proc | 0        | 0   | 3   | 47   | 0    |
| rgb2yuv_1_U0  | rgb2yuv_1  | 0        | 6   | 445 | 640  | 0    |
| yuv2rgb_1_U0  | yuv2rgb_1  | 0        | 4   | 338 | 664  | 0    |
| yuv_scale_U0  | yuv_scale  | 0        | 1   | 176 | 452  | 0    |
| Total         | 4          | 0        | 11  | 962 | 1803 | 0    |

# • DSP

N/A

# • Memory

N/A

# • FIFO

| Name                   | BRAM_18K | FF   | LUT | URAM | Depth | Bits | Size:D*B |
|------------------------|----------|------|-----|------|-------|------|----------|
| U_scale_c_U            | 0        | 99   | 0   | -    | 3     | 8    | 24       |
| V_scale_c_U            | 0        | 99   | 0   | -    | 3     | 8    | 24       |
| Y_scale_c_U            | 0        | 99   | 0   | -    | 3     | 8    | 24       |
| p_scale_channels_ch1_U | 0        | 99   | 0   | -    | 2     | 8    | 16       |
| p_scale_channels_ch2_U | 0        | 99   | 0   | -    | 2     | 8    | 16       |
| p_scale_channels_ch3_U | 0        | 99   | 0   | -    | 2     | 8    | 16       |
| p_scale_height_U       | 0        | 99   | 0   | -    | 2     | 16   | 32       |
| p_scale_width_U        | 0        | 99   | 0   | -    | 2     | 16   | 32       |
| p_yuv_channels_ch1_U   | 0        | 99   | 0   | -    | 2     | 8    | 16       |
| p_yuv_channels_ch2_U   | 0        | 99   | 0   | -    | 2     | 8    | 16       |
| p_yuv_channels_ch3_U   | 0        | 99   | 0   | -    | 2     | 8    | 16       |
| p_yuv_height_U         | 0        | 99   | 0   | -    | 2     | 16   | 32       |
| p_yuv_width_U          | 0        | 99   | 0   | -    | 2     | 16   | 32       |
| Total                  | 0        | 1287 | 0   | 0    | 29    | 136  | 296      |

# • Expression

| Variable Name                  | Operation | DSP | FF | LUT | Bitwidth P0 | Bitwidth P1 |
|--------------------------------|-----------|-----|----|-----|-------------|-------------|
| ap_idle                        | and       | 0   | 0  | 2   | 1           | 1           |
| ap_sync_ready                  | and       | 0   | 0  | 2   | 1           | 1           |
| entry_proc_U0_ap_start         | and       | 0   | 0  | 2   | 1           | 1           |
| rgb2yuv_1_U0_ap_start          | and       | 0   | 0  | 2   | 1           | 1           |
| ap_sync_entry_proc_U0_ap_ready | or        | 0   | 0  | 2   | 1           | 1           |
| ap_sync_rgb2yuv_1_U0_ap_ready  | or        | 0   | 0  | 2   | 1           | 1           |
| Total                          | 6         | 0   | 0  | 12  | 6           | 6           |

# • Multiplexer

| Name                               | LUT | Input Size | Bits | <b>Total Bits</b> |
|------------------------------------|-----|------------|------|-------------------|
| ap_sync_reg_entry_proc_U0_ap_ready | 9   | 2          | 1    | 2                 |
| ap_sync_reg_rgb2yuv_1_U0_ap_ready  | 9   | 2          | 1    | 2                 |
| Total                              | 18  | 4          | 2    | 4                 |

# • Register

| Name                               | FF | LUT | Bits | <b>Const Bits</b> |
|------------------------------------|----|-----|------|-------------------|
| ap_sync_reg_entry_proc_U0_ap_ready | 1  | 0   | 1    | 0                 |
| ap_sync_reg_rgb2yuv_1_U0_ap_ready  | 1  | 0   | 1    | 0                 |
|                                    |    | ,   |      |                   |

| Total | 2 | 0 | 2 | 0 |
|-------|---|---|---|---|
|       |   |   |   |   |

# Interface

| RTL Ports                 | Dir | Bits | Protocol  | Source Object    | C Type  |
|---------------------------|-----|------|-----------|------------------|---------|
| in channels ch1 address0  | out | 22   | ap memory | in channels ch1  | array   |
| in channels ch1 ce0       | out | 1    | ap memory | in channels ch1  | array   |
| in channels ch1 d0        | out | 8    | ap memory | in channels ch1  | array   |
| in channels ch1 q0        | in  | 8    | ap memory | in channels ch1  | array   |
| in channels ch1 we0       | out | 1    | ap memory | in channels ch1  | array   |
| in channels ch1 address1  | out | 22   | ap memory | in channels ch1  | array   |
| in channels ch1 ce1       | out | 1    | ap memory | in channels ch1  | array   |
| in channels ch1 d1        | out | 8    | ap memory | in channels ch1  | array   |
| in channels ch1 q1        | in  | 8    | ap memory | in channels ch1  | array   |
| in channels ch1 we1       | out | 1    | ap memory | in channels ch1  | array   |
| in channels ch2 address0  | out | 22   | ap memory | in channels ch2  | array   |
| in channels ch2 ce0       | out | 1    | ap memory | in channels ch2  | array   |
| in channels ch2 d0        | out | 8    | ap memory | in channels ch2  | array   |
| in channels ch2 q0        | in  |      | ap memory | in channels ch2  | array   |
| in channels ch2 we0       | out | 1    | ap memory | in channels ch2  | array   |
| in channels ch2 address1  | out | 22   | ap memory | in channels ch2  | array   |
| in channels ch2 ce1       | out | 1    | ap memory | in channels ch2  | array   |
| in channels ch2 d1        | out | 8    | ap memory | in channels ch2  | array   |
| in channels ch2 q1        | in  |      | ap memory | in channels ch2  | array   |
| in channels ch2 we1       | out | 1    | ap memory | in channels ch2  | array   |
| in channels ch3 address0  | out | 22   | ap memory | in channels ch3  | array   |
| in channels ch3 ce0       | out | 1    | ap memory | in channels ch3  | array   |
| in channels ch3 d0        | out | 8    | ap memory | in channels ch3  | array   |
| in channels ch3 q0        | in  |      | ap memory | in channels ch3  | array   |
| in channels ch3 we0       | out | 1    | ap memory | in channels ch3  | array   |
| in channels ch3 address1  | out | 22   | ap memory | in channels ch3  | array   |
| in channels ch3 ce1       | out | 1    | ap memory | in channels ch3  | array   |
| in channels ch3 d1        | out | 8    | ap memory | in channels ch3  | array   |
| in channels ch3 q1        | in  | 8    | ap_memory | in channels ch3  | array   |
| in channels ch3 we1       | out |      | ap_memory |                  | array   |
| in width                  | in  |      |           |                  | pointer |
| in height                 | in  | 16   | ap none   | in height        | pointer |
| out_channels_ch1_address0 | out | 22   | ap_memory | out_channels_ch1 | array   |
| out channels ch1 ce0      | out | 1    | ap memory | out channels ch1 | array   |
| out_channels_ch1_d0       | out | 8    | ap_memory | out_channels_ch1 | array   |
| out channels ch1 q0       | in  | 8    | ap memory | out channels ch1 | array   |
| out_channels_ch1_we0      | out | 1    | ap_memory | out_channels_ch1 | array   |
| out channels ch1 address1 | out | 22   | ap_memory | out channels ch1 | array   |
| out_channels_ch1_ce1      | out | 1    | ap_memory | out_channels_ch1 | array   |
| out_channels_ch1_d1       | out | 8    | ap_memory | out_channels_ch1 | array   |
| out_channels_ch1_q1       | in  | 8    | ap_memory | out_channels_ch1 | array   |
| out_channels_ch1_we1      | out |      | -         | out_channels_ch1 | array   |
| out_channels_ch2_address0 | out |      |           | out_channels_ch2 | array   |
| out_channels_ch2_ce0      | out |      | 1         | out_channels_ch2 | array   |
| out channels ch2 d0       | out |      | 1 = -     | out channels ch2 | array   |
| out channels ch2 q0       | in  |      | -         | out channels ch2 | array   |
| 1                         |     |      | 1 - 7     |                  |         |

| out_channels_ch2_we0      | out | 1  | ap_memory  | out_channels_ch2 | array        |
|---------------------------|-----|----|------------|------------------|--------------|
| out_channels_ch2_address1 | out | 22 | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_ce1      | out | 1  | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_d1       | out | 8  | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_q1       | in  | 8  | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch2_we1      | out | 1  | ap_memory  | out_channels_ch2 | array        |
| out_channels_ch3_address0 | out | 22 | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_ce0      | out | 1  | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_d0       | out | 8  | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_q0       | in  | 8  | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_we0      | out | 1  | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_address1 | out | 22 | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_ce1      | out | 1  | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_d1       | out | 8  | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_q1       | in  | 8  | ap_memory  | out_channels_ch3 | array        |
| out_channels_ch3_we1      | out | 1  | ap_memory  | out_channels_ch3 | array        |
| out_width                 | out | 16 | ap_vld     | out_width        | pointer      |
| out_width_ap_vld          | out | 1  | ap_vld     | out_width        | pointer      |
| out_height                | out | 16 | ap_vld     | out_height       | pointer      |
| out_height_ap_vld         | out | 1  | ap_vld     | out_height       | pointer      |
| Y_scale                   | in  | 8  | ap_none    | Y_scale          | scalar       |
| U_scale                   | in  | 8  | ap_none    | U_scale          | scalar       |
| V_scale                   | in  | 8  | ap_none    | V_scale          | scalar       |
| ap_clk                    | in  | 1  | ap_ctrl_hs | yuv_filter       | return value |
| ap_rst                    | in  | 1  | ap_ctrl_hs | yuv_filter       | return value |
| ap_start                  | in  | 1  | ap_ctrl_hs | yuv_filter       | return value |
| ap_done                   | out | 1  | ap_ctrl_hs | yuv_filter       | return value |
| ap_ready                  | out | 1  | ap_ctrl_hs | yuv_filter       | return value |
| ap_idle                   | out | 1  | ap_ctrl_hs | yuv_filter       | return value |



Figure 6. In 2.5, Step 7, a screenshot of synthesis report.

The performance parameter has not changed; however, resource estimates show that the design is not using any BRAM. For Vivado HLS 2019.2, other resources (FF, LUT) usage has also reduced.

## 2.6 Export and Implement the Design in Vivado HLS

Vitis HLS 2021.1 Report (RTL Synthesis) and Report (Place & Route)



 $Figure~7.~Implementation (RTL~Synthesis) (solution 3) (yuv\_filter\_export.rpt).$ 



 $Figure~8.~Implementation (Place~\&~Route) (solution 3) (yuv\_filter\_export.rpt).$ 

#### Vivado HLS 2019.2:



Figure 9. In 2.6, a screenshot of our implementation report.

# 3 Questions

• Does the pipelining approach employed in this lab apply to all designs? If not, why?

From our perspective, the pipelining approach indeed can be applied into many designs. For instance, for loops without dependencies each other, applying PIPELINE directive can parallel data and decrease the latency and interval. Applying DATAFLOW directive to functions with dependencies can also increase the throughput. However, these methodologies have some limitations. For pipelining and unrolling loops, these loops must have fixed bounds, and PIPELINE directive must be applied into the inner-most loop. For those loops without fixed bounds, this method may not work.

• In general, how do you identify the performance bottleneck? (You can answer at a high level, based on this lab)

The performance bottleneck should be identified case by case. For instance, a serial execution program may have limited performance. By pipelining method, its performance can be improved. For programs with long loops executed many times, this iteration process may takes a lot of time and become the bottleneck. We can also unroll loops, having several many loops executing concurrently, so that this bottleneck is overcame.