Skip to content

Commit

Permalink
Merge pull request #2479 from verilog-to-routing/noc_qor_doc_issue
Browse files Browse the repository at this point in the history
Noc QoR measurement
  • Loading branch information
vaughnbetz committed Feb 2, 2024
2 parents 6ad1de4 + 844b9d2 commit 451fb4d
Show file tree
Hide file tree
Showing 105 changed files with 768 additions and 71 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,19 @@ vtr_flow/benchmarks/titan_blif/titan23
vtr_flow/benchmarks/titan_blif/titan_new


#
# NoC MLP benchmarks
#
# We ignore blif and vqm files because of thier large size.
# We also ignore symbolic links to traffic flow and blif files.
#
vtr_flow/benchmarks/noc/Large_Designs/MLP/**/*.vqm
vtr_flow/benchmarks/noc/Large_Designs/MLP/**/*.blif
vtr_flow/benchmarks/noc/Large_Designs/MLP/blif_files/*
vtr_flow/benchmarks/noc/Large_Designs/MLP/traffic_flow_files/*
MLP_Benchmark_Netlist_Files_blif.tar.gz
MLP_Benchmark_Netlist_Files_vqm_blif.tar.gz

#
# ISPD benchmarks
#
Expand Down
8 changes: 8 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,14 @@ add_custom_target(get_titan_benchmarks
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
COMMENT "Downloading (~1GB) and extracting Titan benchmarks (~10GB) into VTR source tree.")

#
# NoC MLP Benchmarks
#
add_custom_target(get_noc_mlp_benchmarks
COMMAND ./vtr_flow/scripts/download_noc_mlp.py --vtr_flow_dir ./vtr_flow
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
COMMENT "Downloading (~100MB) and extracting NoC MLP benchmarks (~3.2GB) into VTR source tree.")

#
# ISPD Benchmarks
#
Expand Down
62 changes: 52 additions & 10 deletions README.developers.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,17 +386,21 @@ The following are key QoR metrics which should be used to evaluate the impact of

Implementation Quality Metrics:

| Metric | Meaning | Sensitivity |
|-----------------------------|--------------------------------------------------------------------------|-------------|
| num_pre_packed_blocks | Number of primitive netlist blocks (after tech. mapping, before packing) | Low |
| num_post_packed_blocks | Number of Clustered Blocks (after packing) | Medium |
| device_grid_tiles | FPGA size in grid tiles | Low-Medium |
| min_chan_width | The minimum routable channel width | Medium\* |
| crit_path_routed_wirelength | The routed wirelength at the relaxed channel width | Medium |
| critical_path_delay | The critical path delay at the relaxed channel width | Medium-High |
| Metric | Meaning | Sensitivity |
|---------------------------------|------------------------------------------------------------------------------|-------------|
| num_pre_packed_blocks | Number of primitive netlist blocks (after tech. mapping, before packing) | Low |
| num_post_packed_blocks | Number of Clustered Blocks (after packing) | Medium |
| device_grid_tiles | FPGA size in grid tiles | Low-Medium |
| min_chan_width | The minimum routable channel width | Medium\* |
| crit_path_routed_wirelength | The routed wirelength at the relaxed channel width | Medium |
| NoC_agg_bandwidth\** | The total link bandwidth utilized by all traffic flows | Low |
| NoC_latency\** | The total time of traffic flow data transfer (summed over all traffic flows) | Low |
| NoC_latency_constraints_cost\** | Total number of traffic flows that meet their latency constraints | Low |

\* By default, VPR attempts to find the minimum routable channel width; it then performs routing at a relaxed (e.g. 1.3x minimum) channel width. At minimum channel width routing congestion can distort the true timing/wirelength characteristics. Combined with the fact that most FPGA architectures are built with an abundance of routing, post-routing metrics are usually only evaluated at the relaxed channel width.

\** NoC-related metrics are only reported when --noc option is enabled.

Run-time/Memory Usage Metrics:

| Metric | Meaning | Sensitivity |
Expand Down Expand Up @@ -493,7 +497,7 @@ k6_frac_N10_frac_chain_mem32K_40nm.xml boundtop.v common 9f591f6-
k6_frac_N10_frac_chain_mem32K_40nm.xml ch_intrinsics.v common 9f591f6-dirty success 363 493 270 247 10 10 17 99 130 1 0 1792 1.86527 -194.602 -1.86527 46 1562 13 1438 20 2.4542 -226.033 -2.4542 0 0 3.92691e+06 1.4642e+06 259806. 2598.06 333135. 3331.35 0.03 0.01 -1 -1 -1 0.46 0.31 0.94 0.09 2.59 62684 8672 32940
```

### Example: Titan Benchmarks QoR Measurements
### Example: Titan Benchmarks QoR Measurement

The [Titan benchmarks](https://docs.verilogtorouting.org/en/latest/vtr/benchmarks/#titan-benchmarks) are a group of large benchmark circuits from a wide range of applications, which are compatible with the VTR project.
The are typically used as post-technology mapped netlists which have been pre-synthesized with Quartus.
Expand All @@ -511,7 +515,7 @@ $ make get_titan_benchmarks
#Move to the task directory
$ cd vtr_flow/tasks

#Run the VTR benchmarks
#Run the Titan benchmarks
$ ../scripts/run_vtr_task.py regression_tests/vtr_reg_nightly_test2/titan_quick_qor

#Several days later... they complete
Expand All @@ -528,6 +532,44 @@ stratixiv_arch.timing.xml stereo_vision_stratixiv_arch_timing.blif 0208312
stratixiv_arch.timing.xml cholesky_mc_stratixiv_arch_timing.blif 0208312 success 140214 108592 67410 5444 121 90 -1 111 151 -1 -1 5221059 8.16972 -454610 -8.16972 1518597 15 0 0 2.38657e+08 21915.3 9.34704 -531231 -9.34704 0 0 211.12 364.32 490.24 6356252 -1 -1
```

### Example: NoC Benchmarks QoR Measurements
NoC benchmarks currently include synthetic and MLP benchmarks. Synthetic benchmarks have various NoC traffic patters,
bandwidth utilization, and latency requirements. High-quality NoC router placement solutions for these benchmarks are
known. By comparing the known solutions with NoC router placement results, the developer can evaluate the sanity of
the NoC router placement algorithm. MLP benchmarks are the only realistic netlists included in this benchmark set.

Based on the number of NoC routers in a synthetic benchmark, it is run on one of two different architectures. All MLP
benchmarks are run on an FPGA architecture with 16 NoC routers. Post-technology mapped netlists (blif files)
for synthetic benchmarks are added to the VTR project. However, MLP blif files are very large and should be downloaded
separately.

Since NoC benchmarks target different FPGA architectures, they are run as different circuits. A typical way to run all
NoC benchmarks is to run a task list and gather QoR data form different tasks:

#### Running and Integrating the NoC Benchmarks with VTR
```shell
#From the VTR root

#Download and integrate NoC MLP benchmarks into the VTR source tree
$ make get_noc_mlp_benchmarks

#Move to the task directory
$ cd vtr_flow

#Run the VTR benchmarks
$ scripts/run_vtr_task.py -l tasks/noc_qor/task_list.txt

#Several days later... they complete

#NoC benchmarks are run as several different tasks. Therefore, QoR results should be gathered from multiple directories,
#one for each task.
$ head -5 tasks/noc_qor/large_complex_synthetic/latest/parse_results.txt
$ head -5 tasks/noc_qor/large_simple_synthetic/latest/parse_results.txt
$ head -5 tasks/noc_qor/small_complex_synthetic/latest/parse_results.txt
$ head -5 tasks/noc_qor/small_simple_synthetic/latest/parse_results.txt
$ head -5 tasks/noc_qor/MLP/latest/parse_results.txt
```

### Example: Koios Benchmarks QoR Measurement

The [Koios benchmarks](https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/master/vtr_flow/benchmarks/verilog/koios) are a group of Deep Learning benchmark circuits distributed with the VTR project.
Expand Down
4 changes: 2 additions & 2 deletions doc/README
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Overview
The VTR documentation is generated using sphinx, a python based documentation generator.

The documentation itself is written in re-structured text (files ending in .rst), which
is a lightwieght mark-up language for text documents.
is a lightweight mark-up language for text documents.

Currently VTR's documenation is automatically built by https://readthedocs.org/projects/vtr/ and is served at:

Expand Down Expand Up @@ -36,7 +36,7 @@ from the main documentation directory (i.e. <vtr_root>/doc).

This will produce the output html in the _build directory.

You can then view the resulting documention with the web-browser of your choice.
You can then view the resulting documentation with the web-browser of your choice.
For instance:

$ firefox _build/html/index.html
Expand Down
15 changes: 14 additions & 1 deletion doc/src/vtr/benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,20 @@ The SymbiFlow benchmarks can be downloaded and extracted by running the followin
cd $VTR_ROOT
make get_symbiflow_benchmarks
Once downloaded and extracted, benchmarks are provided as post-synthesized eblif files under: ::
Once downloaded and extracted, benchmarks are provided as post-synthesized blif files under: ::

$VTR_ROOT/vtr_flow/benchmarks/symbiflow

.. _noc_benchmarks:

NoC Benchmarks
----------------
NoC benchmarks are composed of synthetic and MLP benchmarks and target NoC-enhanced FPGA architectures. Synthetic
benchmarks include a wide variety of traffic flow patters and are divided into two groups: 1) simple and 2) complex
benchmarks. As their names imply, simple benchmarks use very simple and small logic modules connected to NoC routers,
while complex benchmarks implement more complicated functionalities like encryption. These benchmarks do not come from
real application domains. On the other hand, MLP benchmarks include modules that perform matrix-vector multiplication
and move data. Pre-synthesized netlists for the synthetic benchmarks are added to VTR project, but MLP netlists should
be downloaded separately.

.. note:: The NoC MLP benchmarks are not included with the VTR release (due to their size). However they can be downloaded and extracted by running ``make get_noc_mlp_benchmarks`` from the root of the VTR tree. They can also be `downloaded manually <https://www.eecg.utoronto.ca/~vaughn/titan/>`_.
16 changes: 12 additions & 4 deletions vtr_flow/benchmarks/noc/Large_Designs/MLP/Readme.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,17 @@ Benchmark Structure:
|---<Benchmark>.flows - Is the NoC traffic flows file associated with the given benchmark
(A benchmark can have multiple traffic flows files)
|---verilog - Contains design files needed to generate the netlist file for the benchmark
|---shared_verilog - Contains design files needed by all benchmarks to generate thier netlist files
|---shared_verilog - Contains design files needed by all benchmarks to generate their netlist files
|---blif_files - Contains symbolic links to all .blif files that exist in this directory
|---flow_files - Contains symbolic links to all .flow files that exist in this directory

Running the benchmarks:
Pre-requisite
- Ensure VPR is built (refer to 'https://docs.verilogtorouting.org/en/latest/' for build instructions)
- Set 'VTR_ROOT' as environment variable pointing to the location of the VTR source tree
- Ensure python version 3.6.9 or higher is installed
- Copy over the netlist files from 'https://drive.google.com/drive/folders/135QhmfgUaGnK2ZEfbfEXtdm1BfS7YoG7?usp=sharing'.
The file structure in the previous link is similiar to structure found in '$VTR_ROOT/vtr_flow/benchmarks/noc/Large_Designs/MLP'.
The file structure in the previous link is similar to structure found in '$VTR_ROOT/vtr_flow/benchmarks/noc/Large_Designs/MLP'.
Place the netlist files in the appropriate folder locations.

Running single instance:
Expand Down Expand Up @@ -48,7 +50,7 @@ Running the benchmarks:
-vpr_executable $VTR_ROOT/build/vpr/vpr --device EP4SE820 -flow_file $VTR_ROOT/vtr_flow/benchmarks/noc/Large_Designs/MLP/MLP_1/mlp_1.flows \
-noc_routing_algorithm xy_routing -number_of_seeds 5 -number_of_threads 1 -route

- The above command will generate an output file in the run directory that contains all the place and route metrics. This is a txt file with a name which matches the
- The above command will generate an output file in the run directory that contains all the place and route metrics. This is a txt file with a name which matches
the flows file provided. So for the command shown above the output file is 'mlp_1.txt'

Special benchmarks:
Expand All @@ -64,8 +66,14 @@ Running the benchmarks:
of the NoC routers needs to be locked. A
- To run a single instance of this benchmark, pass in the following command line parameter and its value to the command shown above:
'--fix_clusters $VTR_ROOT/vtr_flow/benchmarks/noc/Large_Designs/MLP/MLP_2_phase_optimization/MLP_2_phase_optimization_step_2/MLP_two_phase_optimization_step_two_constraints.place'
- To run the benchmarkusing the automated script just pass in the following command line parameter and its value to the script command above:
- To run the benchmarking the automated script just pass in the following command line parameter and its value to the script command above:
'-fix_clusters $VTR_ROOT/vtr_flow/benchmarks/noc/Large_Designs/MLP/MLP_2_phase_optimization/MLP_2_phase_optimization_step_2/MLP_two_phase_optimization_step_two_constraints.place'

Running VTR tasks:
- All synthetic benchmarks can be run as VTR tasks. Example tasks are provided in vtr_flow/tasks/noc_qor
- Instructions on how to run VTR tasks to measure QoR for NoC benchmarks in available in VTR Developer Guide.
- Link to VTR Developer Guide: https://docs.verilogtorouting.org/en/latest/README.developers/#example-noc-benchmarks-qor-measurements

Expected run time:
- These benchmarks are quite large so the maximum expected run time for a single run is a few hours
- To speed up the run time with multiple VPR runs the thread count can be increased from 1. Set thread count equal to number seeds for fastest run time.
Expand Down
11 changes: 9 additions & 2 deletions vtr_flow/benchmarks/noc/Synthetic_Designs/Readme.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ Benchmark Structure:
|---<Benchmark>.flows - Is the NoC traffic flows file associated with the given benchmark
(A benchmark can have multiple traffic flows files)
|---verilog - Contains design files needed to generate the netlist file for the benchmark
|---shared_verilog - Contains design files needed by all benchmarks to generate thier netlist files
|---shared_verilog - Contains design files needed by all benchmarks to generate their netlist files
|---blif_files - Contains symbolic links to all .blif files that exist in this directory
|---flow_files - Contains symbolic links to all .flow files that exist in this directory

Running the benchmarks:
Pre-requisite
Expand Down Expand Up @@ -42,7 +44,12 @@ Running the benchmarks:
-noc_routing_algorithm xy_routing -noc_swap_percentage 40 -number_of_seeds 5 -number_of_threads 1

- The above command will generate an output file in the run directory that contains all the place and route metrics. This is a txt file with a name which matches the
the flows file provided. So for the command shown above the outout file is 'complex_2_noc_1D_chain.txt'
flows file provided. So for the command shown above the output file is 'complex_2_noc_1D_chain.txt'

Running VTR tasks:
- All synthetic benchmarks can be run as VTR tasks. Example tasks are provided in vtr_flow/tasks/noc_qor
- Instructions on how to run VTR tasks to measure QoR for NoC benchmarks in available in VTR Developer Guide.
- Link to VTR Developer Guide: https://docs.verilogtorouting.org/en/latest/README.developers/#example-noc-benchmarks-qor-measurements

Expected run time:
- These benchmarks are quite small so the maximum expected run time for a single run is ~30 minutes
Expand Down

0 comments on commit 451fb4d

Please sign in to comment.