Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Multi pass parsing over multi source files #1495

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

masatake
Copy link
Member

@masatake masatake commented Jun 29, 2017

Close #80.

@vhda, how do you think about these changes? See the commits whose logs are started with [TEMPORARY].
I will update docs/internal.rst after getting your approval.

In my version, SystemVerilog emits the most of all tags in the first (-1) pass. However, SystemVerilog parser hands over some of them to the next mm pass via "barrel". In the seond (-2) pass, tags in the barrel are stored to the keyword table by setupMM method of SystemVerilog.

For testing I choose tags of typedef kind. In the -2 pass, SystemVerilog parser can recognize the next token of typedef'ed type. If such token is recognized in class context, the token can be tagged as "member" kind.
I didn't choose the kind name "variable" though 'v' was chosen. Because a member of class is not a variable.

This will not work well in interactive mode. I have to research the area more.

   $ cat a.sv
    class test;
       t_user t_user_memb;
    endclass
    $ cat b.sv
    typedef int t_user;
    $ ./ctags -o -  a.sv b.sv
    t_user	b.sv	/^typedef int t_user;$/;"	T
    t_user_memb	a.sv	/^   t_user t_user_memb;$/;"	v	class:test
    test	a.sv	/^class test;$/;"	C

…m singed to unsigned

A negative value will be passed in the stages of multi pass parsing over multi source files

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…source files

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…rser

typedefs gathered in "-1" mm pass is stored to barrel.
At the first of the "-2" mm pass, typedefs are stored as keywords.
    $ cat a.sv
    class test;
       t_user t_user_memb;
    endclass
    $ cat b.sv
    typedef int t_user;
    $ ./ctags -o -  a.sv b.sv
    t_user	b.sv	/^typedef int t_user;$/;"	T
    t_user_memb	a.sv	/^   t_user t_user_memb;$/;"	v	class:test
    test	a.sv	/^class test;$/;"	C

t_user_memb is captured as v.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 85.144% when pulling ee912f3 on masatake:multi-pass-parsing-over-multi-source-files into 4ff09da on universal-ctags:master.

@masatake
Copy link
Member Author

After thinking I found barrel is not needed here. Adding typedefed typename to the keyword table in the -1 pass will be enough...

@masatake masatake changed the title Multi pass parsing over multi source files [RFC] Multi pass parsing over multi source files Jun 29, 2017
@masatake
Copy link
Member Author

Other interesting application of mm is "unknown" kind of python.
"unknown" kind objects tagged in -1st pass may be solved in -2nd pass...

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 85.15% when pulling d997cb4 on masatake:multi-pass-parsing-over-multi-source-files into 4ff09da on universal-ctags:master.

@masatake
Copy link
Member Author

What I wrote is something like linker. So I can borrow many concepts and ideas from linker.

@vhda
Copy link
Contributor

vhda commented Jun 30, 2017

I have the intention of dedicating part of my weekend to Universal Ctags.
I'll reply ASAP :)

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member Author

masatake commented Jun 30, 2017

I updated the documents. I didn't take an example from SystemVerilog parser because I don't know it well.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 85.159% when pulling 6ea7823 on masatake:multi-pass-parsing-over-multi-source-files into 4ff09da on universal-ctags:master.

@masatake masatake mentioned this pull request Jul 1, 2017
Copy link
Contributor

@vhda vhda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good and the number of added lines isn't as big as I would have expected, which is nice :)

I think some code code be made common. Please review my comments and analyze if it is possible.

I also remembered "libraries". Code that is central and used frequently in the project but that is rarely changed. It would make sense to be able to "seed" the second pass with such libraries... but this is beyond the scope of this change, I guess.

main/keyword.h Outdated
extern void addKeyword (const char *const string, langType language, int value);

/* addKeywordStrdup does strdup `string'.
Duplicated string is freed in freeKeywordTable() */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space in comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I will make a fixup! commit.

@@ -122,7 +124,8 @@ static kindDefinition SystemVerilogKinds [] = {
{ true, 'P', "program", "programs" },
{ false,'Q', "prototype", "prototypes" },
{ true, 'R', "property", "properties" },
{ true, 'T', "typedef", "type declarations" }
{ true, 'T', "typedef", "type declarations" },
{ true, 'v', "member", "member elements" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meanwhile I started thinking about 'o' and 'object'.
I'm raising this now, because it might make sense to maintain various "groups". E.g.: instances of classes are objects, instances of typedef are "custom types".

Verilog is an hardware description language that allows a special type of "object" called "instance". Design architecture is divided in "modules", which are instantiated in other top level modules. Having ctags parse these as special types would allow editors that support these tags to easily show the overall design structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Designing kinds is the most important task in ctags development.
I don't know Verilog. So if you are o.k., I'm o.k.
@RadekRR, this is the chance to reflect your idea to the software you are using:-).

else
return RESCAN_MM;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make the parser aware of passes? In my mind I think we could modify the keyword table to indicate which types we are looking for in the first pass, and which are only valid in the second.

Core ctags could take care of the rest, by providing a smaller keyword table to the parser in the first pass and then the complete table in the second. The parser itself would be simplified and any other parser could easily use this new architecture.

Do you like the idea? Or am I missing something basic that will not allow this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make the parser aware of passes?

The main part doesn't know which source file(s) should be rescanned.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The main part of Universal-ctags provides APIs for Multi passes
parsing over Multi source files (MM). The main part applies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand the meaning of "MM". First "M" is for multi, but what is the second for?
Typically you would do something like: "Multi source Files (MF)". That is, capitalize the words used in the abbreviation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MM means "M"ulti pass parsing over "M"ulti source files.
Do I make sense?

@masatake
Copy link
Member Author

masatake commented Jul 3, 2017

@vhda, I feel gaps between what you want and what I wrote.

Could you look at

Please, look at the document explaining MM.
6ea7823

https://github.com/masatake/ctags/blob/6ea782371d7dc78b884a7882dd4dedd81a03f63c/docs/barrel.svg

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 85.169% when pulling bded49b on masatake:multi-pass-parsing-over-multi-source-files into 4ff09da on universal-ctags:master.

@masatake
Copy link
Member Author

masatake commented Jul 6, 2017

I read again the original discussion.

My understanding of what you want.

pass 1. just making a hashtable and fill it with type names (or something names). You want to make no tag entries in this stage.

pass2. making tag entries and emiting them to tags file.

These flow can be implemented on MM. What you should is just ignoring Barrel. You can create SystemVerilog own type name table with functions declared in hashtable.h. SystemVerilog parser
must return RESCAN_MM.

However, you should emit some of tag entries in pass1 if possible. Because, newly introduced --interactive mode may expect per-source file processing. It means in the mode ctags runs only pass 1(-1). So the user of --interactive mode may recognize SystemVerilog parser does nothing.

@wvandamm
Copy link

Hi @masatake ,

Could I please inquire what the status is of this pull request? The functionality proposed seems highly interesting in any case!

Also a related question: the C++ parser appears to already be able to identify definitions for variables or objects of a type created with typedef or class. If I understand the discussion well that's something that this pull request was intending to add as well. For the SystemVerilog parser the same functionality does not appear to work yet though. So am I correct in assuming that the C++ parser has solved this problem in the parser itself? Or has some framework been added that other parsers could make use of to add similar functionality?

Thanks!

Wim

@masatake
Copy link
Member Author

masatake commented Aug 13, 2019

Could I please inquire what the status is of this pull request? The functionality proposed seems highly interesting in any case!

There is no progress other than changes I proposed here.

C++ parser may use heuristics. It is not perfect. e.g. about handling template variables.
See files under Units/parser-cxx.r/*.b.

Before implementing Multi-passes/Mult-files parsing, I have to improve the infrastructure for Multi-passes/Single-files parsing. See #2115.

If you want mm seriously, and want to implement, I will explain what kind of issues are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dual-pass capability
4 participants