Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A New Simulated Annealing Schedule #1205

Merged
merged 12 commits into from Jul 27, 2020

Conversation

HackerFoo
Copy link
Contributor

@HackerFoo HackerFoo commented Mar 10, 2020

Description

This change adds a new schedule designed to simultaneously optimize for variables of varying sensitivity by continually sweeping a narrow temperature band at increasing rates.

Related Issue

#1203

Motivation and Context

This change is to increase the performance (quality/runtime) of the placer.

How Has This Been Tested?

I've run a QoR comparison using vtr_reg_nightly/vtr_reg_qor_chain:

Metric Relative to Baseline
vtr_flow_elapsed_time 108.40%
placed_wirelength_est 107.57%
place_time 49.71%
placed_CPD_est 100.64%
min_chan_width 105.69%
routed_wirelength 106.89%
min_chan_width_route_time 147.13%
crit_path_routed_wirelength 105.72%
critical_path_delay 101.79%
crit_path_route_time 109.44%

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@probot-autolabeler probot-autolabeler bot added docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Mar 10, 2020
@kmurray
Copy link
Contributor

kmurray commented Mar 10, 2020

A couple of quick thoughts:

  1. How does this perform on the Titan benchmarks?
  2. The results show there is a quality cost (wirelength, min W, crit path delay) for the run-time reduction. How does it compare to reducing the --inner_num parameter (e.g. from 1.0 to 0.5 which should also cut run-time by 2x)? It would be interesting to see what the achievable run-time quality trade-off is with this schedule, and how it compares with the current schedule.

@HackerFoo
Copy link
Contributor Author

I did try --inner_num 0.5 on some tests in SymbiFlow, because I hoped that combining that would make placement 4x faster. It was not 2x faster, and came at much higher quality costs, so I abandoned that experiment, but I will gather some data.

Note that my schedule also adjusts move_lim to speed up iterations when the temperature is too high.

@vaughnbetz
Copy link
Contributor

Notes for March 12 discussion:
The best way to evaluate this will be vs. the pareto-optimal curve generated by

  • varying inner num
  • setting T = 0 and varying inner_num (but going to quite high inner_num now, as we are only doing one temperature
    • Varying how often timing analysis is done also useful with very low inner-num (do less often) or T = 0 (try several times per quench)

@HackerFoo
Copy link
Contributor Author

Notes for March 12 discussion:
The best way to evaluate this will be vs. the pareto-optimal curve generated by

* varying inner num

* setting T = 0 and varying inner_num (but going to quite high inner_num now, as we are only doing one temperature
  
  * Varying how often timing analysis is done also useful with very low inner-num (do less often) or T = 0 (try several times per quench)

@kmurray @vaughnbetz Are there any scripts or tools that you use to execute and coordinate such tasks? I'm working on something to make this easier, but maybe something already exists that does most of what I need.

@kmurray
Copy link
Contributor

kmurray commented Apr 1, 2020

My general approach has been to write shell scripts to print out the set of commands I want to run and feed them to GNU parallel. I then have scripts which parse and compare the metrics I care about.

(As an aside you can have run_vtr_task print out the names of scripts to invoke with the -system scripts option to make it useable with GNU parallel or whatever other parallel/distributed job system you want, see for example c.f. dev/submit_slurm.py which is a helper script to submit vtr task jobs to a cluster scheduler like SLURM.

However for general early experimentation (vs building regression tests), I've found the VTR task system a bit too inflexible, which is why I've generally rolled my own.

@HackerFoo HackerFoo force-pushed the dusty_sa branch 3 times, most recently from 7b945a3 to b46fd7d Compare April 9, 2020 18:29
@HackerFoo
Copy link
Contributor Author

Estimated Wirelength vs. Time
chart

Estimated CPD vs. Time
chart (1)

Points are labeled with inner_num.

Using the flag settings that worked well for some smaller designs (with_flag) seems to be roughly equivalent to using half the inner_num, so I will need to experiment more. With the flag disabled (no_flag) has little effect as expected (2.5% increase in place time, same estimated wirelength, 1% increase in estimated CPD.)

The reason to test the no_flag configuration is that it moves all state changes into one place in the outer placement loop (in try_place()) so to allow switching the schedule. This seemingly inconsequential change still can have significant effects on the performance of the algorithm due to its probabilistic nature.

I'd like to merge this as is, and then experiment more with different settings, but the bitcoin_miner test in titan_quick_qor fails to route, getting stuck with a pres fac of 1e25. I suspect this is due to a slightly worse placement.

Data: inner_num_sweep_nightly_2_comparison.xlsx

@HackerFoo
Copy link
Contributor Author

@kmurray @vaughnbetz After fixing the bitcoin_miner test, is this enough data to support merging this? Please let me know if there is any other data you'd need to support this.

@HackerFoo
Copy link
Contributor Author

I've found better parameters that perform well for the titan_quick_qor test suite: https://colab.research.google.com/drive/11Cf5Mhaa4eu7KygkuOorxe_9OxlS7VqG

@HackerFoo
Copy link
Contributor Author

@kmurray @vaughnbetz Given the data showing no changes without flags, and improvements with flags (better wirelength, 7% better CPD in 66% time), would you please merge this?

If not, please let me know what else is needed. Thanks!

@HackerFoo
Copy link
Contributor Author

I've run the full vtr_reg_titan test suite with 20 random seeds. The new results show the new schedule performing about the same, with 2% better wirelength, but 0.6% higher CPD.

https://colab.research.google.com/drive/11Cf5Mhaa4eu7KygkuOorxe_9OxlS7VqG

vpr/src/place/place.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@litghost litghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are in

@HackerFoo
Copy link
Contributor Author

I'm running nightly tests with 32 PRNG seeds on Hydra. I'll post the results when I have them.

@probot-autolabeler probot-autolabeler bot added tests VTR Flow VTR Design Flow (scripts/benchmarks/architectures) labels Jun 26, 2020
Signed-off-by: Dustin DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dustin DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dustin DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dustin DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dustin DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
@HackerFoo HackerFoo force-pushed the dusty_sa branch 2 times, most recently from 854fbbb to 519a503 Compare June 26, 2020 17:18
@vaughnbetz
Copy link
Contributor

I added a few review comments on places where I think function names or commenting should be tweaked.

@HackerFoo
Copy link
Contributor Author

I've found that both tests pass with different seeds. I've rebased my changes; hopefully the tests will just pass. I'm also running the changes on Hydra with several seeds.

After addressing @vaughnbetz's comments, this should be ready to merge.

@vaughnbetz
Copy link
Contributor

Great, thanks.

Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
@HackerFoo
Copy link
Contributor Author

HackerFoo commented Jul 7, 2020

Here's my plan to make the tests pass:

  • Modify the relevant parameters in pass requirements to resolve QoR failures, because the failures present and both this PR and the test PR are improvements.
  • Change the seed for failing tests so that they pass.

Longer term, it would be nice to check for brittle tests by adding a CI test with random seeds.

Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
@HackerFoo
Copy link
Contributor Author

HackerFoo commented Jul 21, 2020

This PR is ready to be merged if there are no more comments.

@vaughnbetz @mithro It seems like there are two "Travis CI - Pull Request" checks, and one of them is indefinitely blocked. Maybe this can just be ignored.

@vaughnbetz
Copy link
Contributor

Can't seem to merge it; it's waiting on the Travis CI pull request. Not sure what I should do to override that -- Dusty do you know how to override the duplicate check (or relaunch if it's easier)?

@HackerFoo
Copy link
Contributor Author

@vaughnbetz Administrators can override required checks: https://docs.github.com/en/github/administering-a-repository/about-required-status-checks
Although administrators can disable that ability, see #9: https://docs.github.com/en/github/administering-a-repository/enabling-required-status-checks

I'm not sure I can remove the stuck check.

@vaughnbetz vaughnbetz merged commit e28aafe into verilog-to-routing:master Jul 27, 2020
@vaughnbetz
Copy link
Contributor

OK, I temporarily changed the settings on the master to allow administrators to override failed checks and merged this (useful to know about this in case we have to do it again!). Now I'll turn off the ability for administrators to override checks again, for safety.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code tests VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants