Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNOPT and UNOPTFLAT V3Split optimizations #1096

Open
veripoolbot opened this issue Oct 21, 2016 · 11 comments
Open

UNOPT and UNOPTFLAT V3Split optimizations #1096

veripoolbot opened this issue Oct 21, 2016 · 11 comments

Comments

@veripoolbot
Copy link

@veripoolbot veripoolbot commented Oct 21, 2016


Author Name: Arthur Kahlich
Original Redmine Issue: 1096 from https://www.veripool.org


I am running into a performance issue due to Verilator finding circular logic where none exists. This causes the entire model to be evaluated twice for each input edge instead of once, which is a significant performance degradation.

To see the issue, de-tar the 2 small Verilog source files and run the command:

verilator -cc +libext+.v -y . abc.v

What I see is the following:

%Warning-UNOPTFLAT: abc.v:15: Signal unoptimizable: Feedback to clock or circular logic: abc.a
%Warning-UNOPTFLAT: Use "/* verilator lint_off UNOPTFLAT */" and lint_on around source to disable this message.
%Warning-UNOPTFLAT:      Example path: abc.v:15:  abc.a
%Warning-UNOPTFLAT:      Example path: abc.v:44:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:16:  abc.b
%Warning-UNOPTFLAT:      Example path: xyz.v:11:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:15:  abc.a
%Warning-UNOPT: abc.v:10: Signal unoptimizable: Feedback to public clock or circular logic: e
%Warning-UNOPT:      Example path: abc.v:10:  e
%Warning-UNOPT:      Example path: abc.v:53:  ASSIGNW
%Warning-UNOPT:      Example path: abc.v:21:  abc.h
%Warning-UNOPT:      Example path: abc.v:55:  ALWAYS
%Warning-UNOPT:      Example path: abc.v:10:  e
%Warning-UNOPTFLAT: abc.v:17: Signal unoptimizable: Feedback to clock or circular logic: abc.c
%Warning-UNOPTFLAT:      Example path: abc.v:17:  abc.c
%Warning-UNOPTFLAT:      Example path: abc.v:42:  ASSIGNW
%Warning-UNOPTFLAT:      Example path: abc.v:18:  abc.d
%Warning-UNOPTFLAT:      Example path: abc.v:55:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:17:  abc.c
%Warning-UNOPTFLAT: abc.v:22: Signal unoptimizable: Feedback to clock or circular logic: abc.i
%Warning-UNOPTFLAT:      Example path: abc.v:22:  abc.i
%Warning-UNOPTFLAT:      Example path: abc.v:159:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:26:  abc.m
%Warning-UNOPTFLAT:      Example path: abc.v:143:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:28:  abc.p
%Warning-UNOPTFLAT:      Example path: abc.v:55:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:25:  abc.l
%Warning-UNOPTFLAT:      Example path: xyz.v:11:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:22:  abc.i
%Error: Exiting due to 4 warning(s)

In all of these Warnings no true circular logic exists. This appears to be a case of each LHS of a blocking assign in an 'ALWAYS @*' block being given all of the sensitivity list of the entire block, rather than only the wires/regs that the particular LHS reg truly depends on. I note in the source code that there appears to be a visit pass intended to split ALWAYS blocks - perhaps there is a bug here or perhaps the pass is being applied at an unfortunate point in the overall sequence of passes.

I also have observed that manually splitting the combo ALWAYS blocks into a separate block for each LHS cures the problem. Unfortunately this is not practical in the code base I have, due to its massive size and due to the terrible readability problem it would cause.

The test case files were extracted (manually) from a much larger Verilog code base and the signal names sanitized. The particular Verilog in the test case may not do anything meaningful other than illustrate where Verilator finds circular logic where there is none.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Oct 21, 2016


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2016-10-21T11:14:06Z


Verilator does block-by-block scheduling; to avoid these warnings and improve performance you must manually split up at least one of the always statements that it is complaining about. For more details see the UNOPT message in the manual and #� and the forums. Sorry.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Oct 21, 2016


Original Redmine Comment
Author Name: Arthur Kahlich
Original Date: 2016-10-21T17:10:26Z


Oh dear. As I wrote before, due to the size of the code and for readability considerations, the splitting is just not practical. For the blocks that define state machine logic it would bloat the code by around 2X or more and make the code unreadable. This Verilog coding style is particularly popular as a means of specifying default output values for signals that have only a few non-default values.

I note in the Verilator source code comments on the oSplit visit path that would lead me to believe its purpose is to automatically do the combo ALWAYS block splitting that you say I need to do manually. Is this not the case? Or is there some limitation to this process that I could investigate reducing or eliminating?

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Oct 21, 2016


Original Redmine Comment
Author Name: Arthur Kahlich
Original Date: 2016-10-21T17:17:28Z


I looked at #� and that appears to have both aspects of the UNOPTFLAT warning covered: the block-by-block scheduling aspect as illustrated by my test case here, and the wire array aspect, which I would also like to look at helping with, but which I consider a much harder problem to fix properly.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Oct 21, 2016


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2016-10-21T17:39:26Z


As to V3Split, it tries to break up always blocks subject to what is considered legal from a Verilog language standpoint. If you for example have a $display statement, it won't break up the other statements proceeding that display in the same block. It certainly could be improved or flags added to make it more optionally aggressive.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Oct 21, 2016


Original Redmine Comment
Author Name: Arthur Kahlich
Original Date: 2016-10-21T17:58:57Z


That's what I was hoping for. While I follow the visit pass methodology in general, I will need to look the the particular classes used by this pass to see if what would be needed is there, and if not add it before proceeding.

There will be a need to break apart case blocks which could get complicated, and could also end up in some ways being sub-optimal compared to keeping the case block together. I think the potential gain from not having to eval twice will offset conversion of the case to a series of individual if blocks. Of course, a case block can only be broken up if it is complete.

All of the above assumes a pass somewhere to only enable this if there are indeed no latches implied in the block.

Is V3Split the right place to look at the breakup of complete case blocks?

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Oct 21, 2016


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2016-10-21T18:06:29Z


Yes, look at V3Split, also look at V3Table which converts many case statements to lookup tables for much better performance.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Jul 10, 2017


Original Redmine Comment
Author Name: Arthur Kahlich
Original Date: 2017-07-10T17:47:48Z


I have looked at V3Split and come to the conclusion that I need to add another pass, call it V3SplitCombo. This is because V3Split only deals with delayed (non-blocking) assignments. The goal of V3Split appears to be the reduction of slave temporaries for implied flipflops, which is a worthy goal, but not related to what I am trying to accomplish. I will place the call to V3SplitCombo right after the call to V3SplitAs::splitAsAll. After looking at some of the other passes, it appears that 2 important simplifications will have been done:

  1. Assignments to concatenations will have been split into a series of assignments to the variables on the LHS. I saw this happening in V3Const::replaceNodeAssign(). I am depending on the tree after this to be free of CONCAT and CONCATN on the LHS. If this is not the case, please let me know.

  2. All Verilog case statements will have been converted to if-else constructs. I saw this happening in V3Case::caseAll(). I am depending on the tree after this to be free of CASE and CASEITEM types and their associated Ast types. If this is not the case, please let me know.

A further simplification to the SplitCombo methods would be if all assignments within the combo block were converted to SSA form with the use of appropriate temporary variable insertions. I am just starting to look through the passes to see if this is already being done, which would save me the trouble.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Jul 11, 2017


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-07-11T23:42:50Z


V3Split also does combo logic, it works with both assignment types, for example see debug output of t_alw_split.pl. If this needs extension probably V3Split is better to enhance, versus adding another pass.

Your assumption about LHS concats and case statements is correct.

I've debated converting to SSA form before, but never made the plunge, as my primary motivation was to enable some downstream optimizations and experiments showed that GCC was doing almost as well so didn't seem worth it. Simplifying the code is a different perhaps better argument.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Jul 12, 2017


Original Redmine Comment
Author Name: Arthur Kahlich
Original Date: 2017-07-12T00:15:37Z


This morning after I wrote that and started going over V3Split more closely I did notice that at least the comments indicate splitting blocking (non-delayed) assigns.

I am still trying to figure out precisely where the non-delayed assigns get marked to be split, since they don't seem to have a visit function (is it visit(AstVarRef* nodep)? ) - unless it is inherited, which would make things harder to follow.

It appears that the biggest thing I want split that is currently not are all assigns to the same variable, grouped together, with intermediate dependent variables and if blocks included. Then if there are other multiply assigned variables, each one would get its own block with all its dependent variables and if blocks included. This can be sub-optimal in some situations, but it is the only way I can think of to break the false combo dependency loops and still have any scheduling sort work.

OTOH, I could convert all multiple LHS references to unique SSA style names, along with the RHS and if condition references. Then split it all up, let it be reordered according to dependencies, then use a clean up pass to replace each group of SSA style names back to the original name, since the statements would still be in the correct order. This might still get into trouble with the reordering attempt unless if-else blocks containing multiple assignments get split, which again, could be sub-optimal in certain situations, but could guarantee that the sub-optimal path only gets executed once per eval instead of at least twice.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Jul 12, 2017


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-07-12T00:53:33Z


V3Split will separate any statement list, so a normal assign is handled by being in the list scanBlock iterates.

If you start spliting under a conditional you will want to put them back somewhere or add some new optimization, otherwise all of the extra compares is likely to result in slower code. Also be careful PLI statements must stop splitting like as currently done.

@veripoolbot

This comment has been minimized.

Copy link
Author

@veripoolbot veripoolbot commented Jul 12, 2017


Original Redmine Comment
Author Name: Arthur Kahlich
Original Date: 2017-07-12T02:14:46Z


This morning after I wrote that and started going over V3Split more closely I did notice that at least the comments indicate splitting blocking (non-delayed) assigns.

I am still trying to figure out precisely where the non-delayed assigns get marked to be split, since they don't seem to have a visit function (is it visit(AstVarRef* nodep)? ) - unless it is inherited, which would make things harder to follow.

It appears that the biggest thing I want split that is currently not are all assigns to the same variable, grouped together, with intermediate dependent variables and if blocks included. Then if there are other multiply assigned variables, each one would get its own block with all its dependent variables and if blocks included. This can be sub-optimal in some situations, but it is the only way I can think of to break the false combo dependency loops and still have any scheduling sort work.

OTOH, I could convert all multiple LHS references to unique SSA style names, along with the RHS and if condition references. Then split it all up, let it be reordered according to dependencies, then use a clean up pass to replace each group of SSA style names back to the original name, since the statements would still be in the correct order. This might still get into trouble with the reordering attempt becoming unsolvable unless if-else blocks containing multiple assignments get split, which again, could be sub-optimal in certain situations, but could guarantee that the sub-optimal path only gets executed once per eval instead of at least twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.