-
Notifications
You must be signed in to change notification settings - Fork 422
[AP][Solver] Ignored Disconnected Blocks in AP Solver #3152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AP][Solver] Ignored Disconnected Blocks in AP Solver #3152
Conversation
After investigating some of the slowest running testcases, I realized that we were not handling disconnected blocks in the solver. Especially after we started thresholding out high-fanout nets, some circuits were taking far longer to solve than they should. They especially took a long time to set up the matrices. After investigating, I realized that there were many blocks which we completely disconnected from the rest of the circuit. There is no reason to optimize the location of these blocks since the AP objective is formulated based on net connectivity. As such, these disconnected blocks should be completely ignored during placement. Ignoring these blocks reduces the number of variables in the A matrix, which can greatly improve runtime. Early results on Titan show up to a 3.5x improvement in GP runtime and a 20% improvement in GP runtime on average. Future work is to be more methodical on what nets to mark as ignored. The AP flow currently does not directly set signals like clocks as ignored, which may be able to allow us to label more blocks as disconnected.
Results on Titan. timing driven, no fixed blocks:
Outliers are denois and spartT1_chip2, where their GP runtime improved by 3x. Practically no loss in quality and a 6% improvement in overall runtime. |
In a future PR I will explore marking the nets as global if they are a clock or "non-clock global". This would match how the placer does this. Then I may re-sweep the high-fanout threshold since clocks will always be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AlexandreSinger, LGTM as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but see one question on width/height
|
||
/// @brief The width of the device grid. Used for randomly generating points | ||
/// on the grid. | ||
size_t device_grid_width_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you keep cached copies of the grid width and height in the solver instead of asking the grid for width and height?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During Global Placement, the device size is assumed to be fixed. We also only really use this information in the first iteration. I preferred to just pass the device size in instead of holding on to a reference to the device itself.
After investigating some of the slowest running testcases, I realized that we were not handling disconnected blocks in the solver.
Especially after we started thresholding out high-fanout nets, some circuits were taking far longer to solve than they should. They especially took a long time to set up the matrices. After investigating, I realized that there were many blocks which we completely disconnected from the rest of the circuit. There is no reason to optimize the location of these blocks since the AP objective is formulated based on net connectivity. As such, these disconnected blocks should be completely ignored during placement.
Ignoring these blocks reduces the number of variables in the A matrix, which can greatly improve runtime. Early results on Titan show up to a 3.5x improvement in GP runtime and a 20% improvement in GP runtime on average.
Future work is to be more methodical on what nets to mark as ignored. The AP flow currently does not directly set signals like clocks as ignored, which may be able to allow us to label more blocks as disconnected.