Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add symbiflow tests #1564

Merged
merged 20 commits into from
Dec 15, 2020
Merged

Conversation

acomodi
Copy link
Collaborator

@acomodi acomodi commented Sep 28, 2020

Description

This is an initial PR to have the SymbiFlow XC7 architecture integrated within the vtr_flow.

There is an initial download step of the latest SymbiFlow package, with the extraction of the Artix 50T data to have it working within VPR:

  • Arch XML
  • RR graph
  • Router lookahead
  • Place delay matrix lookup

A very simple initial circuit file has been added, which represent a synthesized counter.

This PR is still marked as WIP for two main reasons:

  • There is an issue with the architecture or too strict VPR checks around buffered switches. In fact, to let the test pass, I had to temporarily demote a VPR_ERROR to a warning. (Issue)
  • Need to either allow to correctly read symlinks within VPR to get the RR graph and other data files, or have the vtr_flow accept the addition of additional <flags, file> pairs and add them to the VPR command. The data files (RR graph, etc.) should be located in the arch directory.

Motivation and Context

With the Symbiflow/VTR fork being almost equal to the upstream version, it is possible now to insert SymbiFlow tests within the vtr_flow. This helps to track possible regressions against SymbiFlow architectures, as well as increasing the robustness of the test suite.

How Has This Been Tested?

New symbiflow counter test added.

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@probot-autolabeler probot-autolabeler bot added build Build system infra Project Infrastructure lang-cpp C/C++ code lang-make CMake/Make code lang-netlist lang-python Python code lang-shell Shell scripts (bash etc.) scripts Utility & Infrastructure scripts tests VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures) docs Documentation labels Sep 28, 2020
@acomodi acomodi force-pushed the add-symbiflow-test branch 4 times, most recently from 59090c7 to c031151 Compare September 28, 2020 12:04
@acomodi
Copy link
Collaborator Author

acomodi commented Sep 28, 2020

@mithro @litghost @vaughnbetz FYI. This is still a WIP and requires some more adjustments, but it is a starting point to begin the introduction of SymbiFlow architectures and benchmarks.

In this PR, the nightly tests have been reduced to only run the symbiflow counter example, to check whether everything is working fine.

# Pass requirements
pass_requirements_file=pass_requirements.txt

#The Titan benchmarks are run at a fixed channel width of 300 to simulate a Stratix IV-like routing architecture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out of date

@acomodi
Copy link
Collaborator Author

acomodi commented Oct 2, 2020

@litghost @vaughnbetz According to what discussed, I am trying to get the RR indexed data factored out to be used only for the classic router lookahead.
The problem I am facing is that this RR indexed data is used not only in the classic lookahead, but in other locations:

  1. base_cost = get_single_rr_cong_base_cost(size_t(set_rr_node));

In this case, the base cost is taken through the get_single_rr_cong_base_cost function which in turn uses the RR indexed data for the base cost.
2.

for (index = CHANX_COST_INDEX_START; index < device_ctx.rr_indexed_data.size(); index++) {
if (device_ctx.rr_indexed_data[index].T_quadratic > 0.) { /* pass transistor */
device_ctx.rr_indexed_data[index].base_cost = device_ctx.rr_indexed_data[index].saved_base_cost * factor;
} else {
device_ctx.rr_indexed_data[index].base_cost = device_ctx.rr_indexed_data[index].saved_base_cost;
}
}

3. and other locations

I am unsure how to proceed, as, for instance, the base cost calculation depends on the T_linear and T_quadratic values as well when building the RR indexed data structure.

@litghost
Copy link
Collaborator

litghost commented Oct 2, 2020

@litghost @vaughnbetz According to what discussed, I am trying to get the RR indexed data factored out to be used only for the classic router lookahead.

As stated before, the base cost should be per cost index, and the base cost compuation doesn't depend on any of the other variables:

for (index = CHANX_COST_INDEX_START; index < device_ctx.rr_indexed_data.size(); index++) {
if (base_cost_type == DELAY_NORMALIZED || base_cost_type == DEMAND_ONLY) {
device_ctx.rr_indexed_data[index].base_cost = delay_normalization_fac;
} else if (base_cost_type == DELAY_NORMALIZED_LENGTH || base_cost_type == DEMAND_ONLY_NORMALIZED_LENGTH) {
device_ctx.rr_indexed_data[index].base_cost = delay_normalization_fac / device_ctx.rr_indexed_data[index].inv_length;
} else if (base_cost_type == DELAY_NORMALIZED_LENGTH_BOUNDED) {
float length = (1 / device_ctx.rr_indexed_data[index].inv_length);
if (max_length != min_length) {
float length_scale = 1.f + 3.f * (length - min_length) / (max_length - min_length);
device_ctx.rr_indexed_data[index].base_cost = delay_normalization_fac * length_scale;
} else {
device_ctx.rr_indexed_data[index].base_cost = delay_normalization_fac;
}
} else if (base_cost_type == DELAY_NORMALIZED_FREQUENCY) {
int seg_index = device_ctx.rr_indexed_data[index].seg_index;
VTR_ASSERT(total_segments > 0);
float freq_fac = float(rr_segment_counts[seg_index]) / total_segments;
device_ctx.rr_indexed_data[index].base_cost = delay_normalization_fac / freq_fac;
} else if (base_cost_type == DELAY_NORMALIZED_LENGTH_FREQUENCY) {
int seg_index = device_ctx.rr_indexed_data[index].seg_index;
VTR_ASSERT(total_segments > 0);
float freq_fac = float(rr_segment_counts[seg_index]) / total_segments;
//Base cost = delay_norm / (len * freq)
//device_ctx.rr_indexed_data[index].base_cost = delay_normalization_fac / ((1. / device_ctx.rr_indexed_data[index].inv_length) * freq_fac);
//Base cost = (delay_norm * len) * (1 + (1-freq))
device_ctx.rr_indexed_data[index].base_cost = (delay_normalization_fac / device_ctx.rr_indexed_data[index].inv_length) * (1 + (1 - freq_fac));
} else {
VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "Unrecognized base cost type");
}
}

The idea is everything except for the base cost can likely be split.

This case is a little hairy:

for (index = CHANX_COST_INDEX_START; index < device_ctx.rr_indexed_data.size(); index++) {
if (device_ctx.rr_indexed_data[index].T_quadratic > 0.) { /* pass transistor */
device_ctx.rr_indexed_data[index].base_cost = device_ctx.rr_indexed_data[index].saved_base_cost * factor;
} else {
device_ctx.rr_indexed_data[index].base_cost = device_ctx.rr_indexed_data[index].saved_base_cost;
}
}

@vaughnbetz Can you explain why factor is only multiplied in the T_quadratic calculation?

@acomodi
Copy link
Collaborator Author

acomodi commented Oct 2, 2020

As stated before, the base cost should be per cost index, and the base cost compuation doesn't depend on any of the other variables

Actually, as the base cost computation depends on the normalization which, in turn, depends on T_quadratic, T_del and so on (this only if base_cost_type != DEMAND_ONLY and != DEMAND_ONLY_NORMALIZED_LENGTH).

static float get_delay_normalization_fac(int nodes_per_chan,
const t_rr_node_indices& L_rr_node_indices) {
/* Returns the average delay to go 1 CLB distance along a wire. */
const int clb_dist = 3; /* Number of CLBs I think the average conn. goes. */
int inode, itrack, cost_index;
float Tdel, Tdel_sum, frac_num_seg;
auto& device_ctx = g_vpr_ctx.device();
Tdel_sum = 0.;
for (itrack = 0; itrack < nodes_per_chan; itrack++) {
inode = find_average_rr_node_index(device_ctx.grid.width(), device_ctx.grid.height(), CHANX, itrack,
L_rr_node_indices);
if (inode == -1)
continue;
cost_index = device_ctx.rr_nodes[inode].cost_index();
frac_num_seg = clb_dist * device_ctx.rr_indexed_data[cost_index].inv_length;
Tdel = frac_num_seg * device_ctx.rr_indexed_data[cost_index].T_linear
+ frac_num_seg * frac_num_seg
* device_ctx.rr_indexed_data[cost_index].T_quadratic;
Tdel_sum += Tdel / (float)clb_dist;
}
for (itrack = 0; itrack < nodes_per_chan; itrack++) {
inode = find_average_rr_node_index(device_ctx.grid.width(), device_ctx.grid.height(), CHANY, itrack,
L_rr_node_indices);
if (inode == -1)
continue;
cost_index = device_ctx.rr_nodes[inode].cost_index();
frac_num_seg = clb_dist * device_ctx.rr_indexed_data[cost_index].inv_length;
Tdel = frac_num_seg * device_ctx.rr_indexed_data[cost_index].T_linear
+ frac_num_seg * frac_num_seg
* device_ctx.rr_indexed_data[cost_index].T_quadratic;
Tdel_sum += Tdel / (float)clb_dist;
}
return (Tdel_sum / (2. * nodes_per_chan));
}

@litghost
Copy link
Collaborator

litghost commented Oct 2, 2020

As stated before, the base cost should be per cost index, and the base cost compuation doesn't depend on any of the other variables

Actually, as the base cost computation depends on the normalization which, in turn, depends on T_quadratic, T_del and so on (this only if base_cost_type != DEMAND_ONLY and != DEMAND_ONLY_NORMALIZED_LENGTH).

static float get_delay_normalization_fac(int nodes_per_chan,
const t_rr_node_indices& L_rr_node_indices) {
/* Returns the average delay to go 1 CLB distance along a wire. */
const int clb_dist = 3; /* Number of CLBs I think the average conn. goes. */
int inode, itrack, cost_index;
float Tdel, Tdel_sum, frac_num_seg;
auto& device_ctx = g_vpr_ctx.device();
Tdel_sum = 0.;
for (itrack = 0; itrack < nodes_per_chan; itrack++) {
inode = find_average_rr_node_index(device_ctx.grid.width(), device_ctx.grid.height(), CHANX, itrack,
L_rr_node_indices);
if (inode == -1)
continue;
cost_index = device_ctx.rr_nodes[inode].cost_index();
frac_num_seg = clb_dist * device_ctx.rr_indexed_data[cost_index].inv_length;
Tdel = frac_num_seg * device_ctx.rr_indexed_data[cost_index].T_linear
+ frac_num_seg * frac_num_seg
* device_ctx.rr_indexed_data[cost_index].T_quadratic;
Tdel_sum += Tdel / (float)clb_dist;
}
for (itrack = 0; itrack < nodes_per_chan; itrack++) {
inode = find_average_rr_node_index(device_ctx.grid.width(), device_ctx.grid.height(), CHANY, itrack,
L_rr_node_indices);
if (inode == -1)
continue;
cost_index = device_ctx.rr_nodes[inode].cost_index();
frac_num_seg = clb_dist * device_ctx.rr_indexed_data[cost_index].inv_length;
Tdel = frac_num_seg * device_ctx.rr_indexed_data[cost_index].T_linear
+ frac_num_seg * frac_num_seg
* device_ctx.rr_indexed_data[cost_index].T_quadratic;
Tdel_sum += Tdel / (float)clb_dist;
}
return (Tdel_sum / (2. * nodes_per_chan));
}

Ugh

@acomodi
Copy link
Collaborator Author

acomodi commented Oct 2, 2020

Given that the error returned should be affecting only specific run cases (classic lookahead, or any lookahead with base_cost_types that make use of the T_quadratic/linear values, etc.), does it make sense to have the VTR_ERROR changed to VPR_THROW and disable that with the --disable_errors flag?

@litghost
Copy link
Collaborator

litghost commented Oct 2, 2020

Given that the error returned should be affecting only specific run cases (classic lookahead, or any lookahead with base_cost_types that make use of the T_quadratic/linear values, etc.), does it make sense to have the VTR_ERROR changed to VPR_THROW and disable that with the --disable_errors flag?

So based on the examples you showed, the inconsistent buffering will affect the router in the following ways, even when not running in the classic lookahead:

  • It corrupts the delay normalization factor a small amount. This is likely fine, as the affect is likely small
  • It will cause T_quadratic to be incorrect for update_rr_base_costs case

It is unclear how important the code in update_rr_base_costs is, and whether having this wrong on the 7-series graph will have dramatic effects.

@vaughnbetz , thoughts?

@acomodi
Copy link
Collaborator Author

acomodi commented Oct 6, 2020

@litghost @vaughnbetz I have done some digging and found out that the problem resides in the fact that in SymbiFlow we add short switches to connect wires. These short switches are "fake" in the sense that their cost values are all set to zero.

The problem might be that, when calculating the average T_del, T_quadratic and C for a node, we end up adding up in the sum of the number of switches also the "fake" switches:

NODE: RR node: 223968 type: CHANX location: (17,40) <-> (46,40) track: 23 len: 29 longline: 0 seg_type: LH dir: BI_DIR capacity: 1 fan-in: 26 fan-out: 40, COST_INDEX: 49
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0012375_C9.404e-09_Tdel1.6800000000000001e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.7699999999999998e-10
        SWITCH NAME: routing_R0.0012375_C9.404e-09_Tdel1.65e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.7699999999999998e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.89e-10
        SWITCH NAME: routing_R0.0012375_C9.404e-09_Tdel1.6800000000000001e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.7599999999999999e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.75e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.8e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.8e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.75e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.7599999999999999e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.85e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.7799999999999998e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.7699999999999998e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.8e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0012375_C9.404e-09_Tdel1.6700000000000002e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.79e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.89e-10
        SWITCH NAME: routing_R0.0012375_C9.404e-09_Tdel1.6700000000000002e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0012375_C9.404e-09_Tdel1.65e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.7699999999999998e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.89e-10
        SWITCH NAME: routing_R0.0015845231875_C9.979e-09_Tdel1.7699999999999998e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: routing_R0.0007569375_C1.3478e-08_Tdel3.01e-10
        SWITCH NAME: short
Warning 21: Inconsitent buffering of children of rr node LH (RR node: 223968 type: CHANX location: (17,40) <-> (46,40) track: 23 len: 29 longline: 0 seg_type: LH dir: BI_DIR capacity: 1 fan-in: 26 fan-out: 40)
        SWITCH NAME: short
Warning 22: Inconsitent buffering of children of rr node LH (RR node: 223968 type: CHANX location: (17,40) <-> (46,40) track: 23 len: 29 longline: 0 seg_type: LH dir: BI_DIR capacity: 1 fan-in: 26 fan-out: 40)

In the example above, for the LH node, we have the last two switches which are short, and the warning is displayed, but, even though those have zero delay, I am unsure whether the number of switches to perform the avarage calculation should be taken into account.

Moreover, the error we are getting about the buffering inconsistencies is due to the fact that, for the same cost index, we have only short swithces:

NODE: RR node: 610507 type: CHANX location: (16,81) <-> (58,81) track: 10 len: 42 longline: 0 seg_type: LH dir: BI_DIR capacity: 1 fan-in: 2 fan-out: 2, COST_INDEX: 49
        SWITCH NAME: short
        SWITCH NAME: short

Here there is also a portion of code that checks as well the presence of inconsistencies when calculating the average of a node's switches buffering type:

for (int iedge = 0; iedge < num_edges; iedge++) {
int to_node_index = device_ctx.rr_nodes[inode].edge_sink_node(iedge);
/* want to get C/R/Tdel/Cinternal of switches that connect this track segment to other track segments */
if (device_ctx.rr_nodes[to_node_index].type() == CHANX || device_ctx.rr_nodes[to_node_index].type() == CHANY) {
int switch_index = device_ctx.rr_nodes[inode].edge_switch(iedge);
avg_switch_R += device_ctx.rr_switch_inf[switch_index].R;
avg_switch_T += device_ctx.rr_switch_inf[switch_index].Tdel;
avg_switch_Cinternal += device_ctx.rr_switch_inf[switch_index].Cinternal;
if (buffered == UNDEFINED) {
if (device_ctx.rr_switch_inf[switch_index].buffered()) {
buffered = 1;
} else {
buffered = 0;
}
} else if (buffered != device_ctx.rr_switch_inf[switch_index].buffered()) {
VTR_LOG_WARN("Inconsitent buffering of children of rr node %s (%s)\n",
rr_node_arch_name(inode).c_str(),
describe_rr_node(inode).c_str());
}
num_switches++;
}
}

In there, instead of a hard failure there is a warning, in case the switches have inconsistent buffering types

@acomodi acomodi mentioned this pull request Oct 9, 2020
7 tasks
@github-actions github-actions bot removed the VPR VPR FPGA Placement & Routing Tool label Nov 18, 2020
@acomodi
Copy link
Collaborator Author

acomodi commented Dec 14, 2020

I believe this might be due to the fact that the VTR flow copies the circuit under test in the build directory, instead of generating a symlink?

This could result in the various runN directories to be prettly loaded with lots of data.

For reference, the whole symbiflow build directory weights ~500 MB.

I don't have data on the Titan benchmarks, but, if they are in total sth like ~10GB and if they get replicated in the build directory, the sum of all the circuits size and VPR outputs might generate that amount of data.

@litghost
Copy link
Collaborator

litghost commented Dec 14, 2020

@acomodi To separate whether the large output directory is new or not, can you open a new PR with just the changes to the working directory (e.g. cleanup and report size)? Assuming that the output directory was large before, I don't believe it blocks this PR, just something to fix in the future.

@vaughnbetz
Copy link
Contributor

Agreed, if this was an existing issue then it's not blocking.

Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Also updates the counter blif test

Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
Signed-off-by: Alessandro Comodi <acomodi@antmicro.com>
@acomodi
Copy link
Collaborator Author

acomodi commented Dec 14, 2020

PR opened here: #1610.

I have also rebased this on top of master

@acomodi
Copy link
Collaborator Author

acomodi commented Dec 15, 2020

@litghost @vaughnbetz CI went green and the "big" data usage happens also in master: #1610 (comment)

@vaughnbetz vaughnbetz merged commit 64d15e2 into verilog-to-routing:master Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build system infra Project Infrastructure lang-make CMake/Make code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants