A New Simulated Annealing Schedule #1205

HackerFoo · 2020-03-10T17:58:16Z

Description

This change adds a new schedule designed to simultaneously optimize for variables of varying sensitivity by continually sweeping a narrow temperature band at increasing rates.

Related Issue

#1203

Motivation and Context

This change is to increase the performance (quality/runtime) of the placer.

How Has This Been Tested?

I've run a QoR comparison using vtr_reg_nightly/vtr_reg_qor_chain:

Metric	Relative to Baseline
vtr_flow_elapsed_time	108.40%
placed_wirelength_est	107.57%
place_time	49.71%
placed_CPD_est	100.64%
min_chan_width	105.69%
routed_wirelength	106.89%
min_chan_width_route_time	147.13%
crit_path_routed_wirelength	105.72%
critical_path_delay	101.79%
crit_path_route_time	109.44%

Types of changes

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

kmurray · 2020-03-10T19:09:28Z

A couple of quick thoughts:

How does this perform on the Titan benchmarks?
The results show there is a quality cost (wirelength, min W, crit path delay) for the run-time reduction. How does it compare to reducing the --inner_num parameter (e.g. from 1.0 to 0.5 which should also cut run-time by 2x)? It would be interesting to see what the achievable run-time quality trade-off is with this schedule, and how it compares with the current schedule.

HackerFoo · 2020-03-10T21:00:33Z

I did try --inner_num 0.5 on some tests in SymbiFlow, because I hoped that combining that would make placement 4x faster. It was not 2x faster, and came at much higher quality costs, so I abandoned that experiment, but I will gather some data.

Note that my schedule also adjusts move_lim to speed up iterations when the temperature is too high.

vaughnbetz · 2020-03-12T20:04:05Z

Notes for March 12 discussion:
The best way to evaluate this will be vs. the pareto-optimal curve generated by

varying inner num
setting T = 0 and varying inner_num (but going to quite high inner_num now, as we are only doing one temperature
- Varying how often timing analysis is done also useful with very low inner-num (do less often) or T = 0 (try several times per quench)

HackerFoo · 2020-04-01T14:19:39Z

Notes for March 12 discussion:
The best way to evaluate this will be vs. the pareto-optimal curve generated by

* varying inner num

* setting T = 0 and varying inner_num (but going to quite high inner_num now, as we are only doing one temperature
  
  * Varying how often timing analysis is done also useful with very low inner-num (do less often) or T = 0 (try several times per quench)

@kmurray @vaughnbetz Are there any scripts or tools that you use to execute and coordinate such tasks? I'm working on something to make this easier, but maybe something already exists that does most of what I need.

kmurray · 2020-04-01T15:08:34Z

My general approach has been to write shell scripts to print out the set of commands I want to run and feed them to GNU parallel. I then have scripts which parse and compare the metrics I care about.

(As an aside you can have run_vtr_task print out the names of scripts to invoke with the -system scripts option to make it useable with GNU parallel or whatever other parallel/distributed job system you want, see for example c.f. dev/submit_slurm.py which is a helper script to submit vtr task jobs to a cluster scheduler like SLURM.

However for general early experimentation (vs building regression tests), I've found the VTR task system a bit too inflexible, which is why I've generally rolled my own.

HackerFoo · 2020-04-16T17:17:00Z

Estimated Wirelength vs. Time

Estimated CPD vs. Time

Points are labeled with inner_num.

Using the flag settings that worked well for some smaller designs (with_flag) seems to be roughly equivalent to using half the inner_num, so I will need to experiment more. With the flag disabled (no_flag) has little effect as expected (2.5% increase in place time, same estimated wirelength, 1% increase in estimated CPD.)

The reason to test the no_flag configuration is that it moves all state changes into one place in the outer placement loop (in try_place()) so to allow switching the schedule. This seemingly inconsequential change still can have significant effects on the performance of the algorithm due to its probabilistic nature.

I'd like to merge this as is, and then experiment more with different settings, but the bitcoin_miner test in titan_quick_qor fails to route, getting stuck with a pres fac of 1e25. I suspect this is due to a slightly worse placement.

Data: inner_num_sweep_nightly_2_comparison.xlsx

HackerFoo · 2020-04-23T17:20:53Z

@kmurray @vaughnbetz After fixing the bitcoin_miner test, is this enough data to support merging this? Please let me know if there is any other data you'd need to support this.

HackerFoo · 2020-04-28T05:24:59Z

I've found better parameters that perform well for the titan_quick_qor test suite: https://colab.research.google.com/drive/11Cf5Mhaa4eu7KygkuOorxe_9OxlS7VqG

HackerFoo · 2020-04-28T21:56:58Z

@kmurray @vaughnbetz Given the data showing no changes without flags, and improvements with flags (better wirelength, 7% better CPD in 66% time), would you please merge this?

If not, please let me know what else is needed. Thanks!

HackerFoo · 2020-05-04T23:00:38Z

I've run the full vtr_reg_titan test suite with 20 random seeds. The new results show the new schedule performing about the same, with 2% better wirelength, but 0.6% higher CPD.

https://colab.research.google.com/drive/11Cf5Mhaa4eu7KygkuOorxe_9OxlS7VqG

doc/src/vpr/command_line_usage.rst

vpr/src/base/read_options.cpp

vpr/src/place/place.cpp

litghost

Comments are in

HackerFoo · 2020-06-26T01:53:19Z

I'm running nightly tests with 32 PRNG seeds on Hydra. I'll post the results when I have them.

Signed-off-by: Dustin DeWeese <dustin.deweese@gmail.com>

Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>

Signed-off-by: Dustin DeWeese <dustin.deweese@gmail.com>

Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>

vaughnbetz · 2020-06-26T21:16:37Z

I added a few review comments on places where I think function names or commenting should be tweaked.

HackerFoo · 2020-06-27T05:15:52Z

I've found that both tests pass with different seeds. I've rebased my changes; hopefully the tests will just pass. I'm also running the changes on Hydra with several seeds.

After addressing @vaughnbetz's comments, this should be ready to merge.

vaughnbetz · 2020-06-28T20:45:48Z

Great, thanks.

Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>

HackerFoo · 2020-07-07T18:38:20Z

Here's my plan to make the tests pass:

Modify the relevant parameters in pass requirements to resolve QoR failures, because the failures present and both this PR and the test PR are improvements.
Change the seed for failing tests so that they pass.

Longer term, it would be nice to check for brittle tests by adding a CI test with random seeds.

Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>

HackerFoo · 2020-07-21T15:28:46Z

This PR is ready to be merged if there are no more comments.

@vaughnbetz @mithro It seems like there are two "Travis CI - Pull Request" checks, and one of them is indefinitely blocked. Maybe this can just be ignored.

vaughnbetz · 2020-07-27T20:47:16Z

Can't seem to merge it; it's waiting on the Travis CI pull request. Not sure what I should do to override that -- Dusty do you know how to override the duplicate check (or relaunch if it's easier)?

HackerFoo · 2020-07-27T21:30:11Z

@vaughnbetz Administrators can override required checks: https://docs.github.com/en/github/administering-a-repository/about-required-status-checks
Although administrators can disable that ability, see #9: https://docs.github.com/en/github/administering-a-repository/enabling-required-status-checks

I'm not sure I can remove the stuck check.

vaughnbetz · 2020-07-27T22:19:38Z

OK, I temporarily changed the settings on the master to allow administrators to override failed checks and merged this (useful to know about this in case we have to do it again!). Now I'll turn off the ability for administrators to override checks again, for safety.

probot-autolabeler bot added docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Mar 10, 2020

HackerFoo force-pushed the dusty_sa branch 3 times, most recently from 7b945a3 to b46fd7d Compare April 9, 2020 18:29

HackerFoo requested review from litghost, kmurray and vaughnbetz May 5, 2020 21:47