Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main: experimental implementation of multi-pass parsing #2741

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

masatake
Copy link
Member

This pull request introduces --_hint=<tag file> option and internal APIs for utilizing the given tags file.
A parser can use the pre-existing tags file for improving the quality of parsing and tagging with the APIs.

This option is not for incremental updating.
Even you specify --_hint=<tag file>, ctags parsers all input files.

Python parser is the initial target for applying the APIs.
In the first pass, the Python parser attaches "unknown" kind to X in "from Y import X".
With the hint, the Python parser can resolve the real kind for X.

@masatake masatake changed the title Experimental implementation of multi-pass parsing main: experimental implementation of multi-pass parsing Nov 28, 2020
@masatake
Copy link
Member Author

masatake commented Nov 29, 2020

@masatake masatake force-pushed the multi-pass branch 2 times, most recently from 563d0a2 to eb456c7 Compare November 29, 2020 16:51
@coveralls
Copy link

coveralls commented Nov 29, 2020

Coverage Status

Coverage increased (+0.01%) to 87.037% when pulling 4aef85d on masatake:multi-pass into 09e9513 on universal-ctags:master.

@masatake masatake force-pushed the multi-pass branch 2 times, most recently from 581d43f to 8f66fcc Compare December 1, 2020 09:46
@codecov
Copy link

codecov bot commented Dec 1, 2020

Codecov Report

Merging #2741 (4aef85d) into master (c436bca) will decrease coverage by 0.43%.
The diff coverage is 77.14%.

❗ Current head 4aef85d differs from pull request most recent head 61e5266. Consider uploading reports for the commit 61e5266 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2741      +/-   ##
==========================================
- Coverage   87.38%   86.95%   -0.44%     
==========================================
  Files         199      194       -5     
  Lines       47769    41114    -6655     
==========================================
- Hits        41743    35749    -5994     
+ Misses       6026     5365     -661     
Impacted Files Coverage Δ
main/options.c 83.63% <ø> (-0.41%) ⬇️
main/hint.c 54.54% <54.54%> (ø)
parsers/python.c 98.50% <97.22%> (-0.01%) ⬇️
extra-cmds/readtags-cmd.c 53.11% <100.00%> (-0.71%) ⬇️
main/htable.c 51.18% <0.00%> (-35.07%) ⬇️
main/ptrarray.c 56.89% <0.00%> (-28.11%) ⬇️
dsl/es.c 44.01% <0.00%> (-10.37%) ⬇️
parsers/ada.c 70.94% <0.00%> (-9.50%) ⬇️
dsl/dsl.c 75.05% <0.00%> (-6.67%) ⬇️
main/mbcs.c 73.17% <0.00%> (-5.10%) ⬇️
... and 191 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c436bca...61e5266. Read the comment docs.

@masatake
Copy link
Member Author

masatake commented Dec 27, 2020

Random ideas:

The option --_hint should be renamed like:

--_hint-file=tagfile: error strict. If an error occurs in calling libreadtags APIs, ctags may stop.
--_hint-file-maybe=tagfile: error-tolerant version of --_hint-file. If an error occurs in calling libreadtags APIs, ctags just doesn't use the hint file.

ctags must reject specifying the same file for output and hint. Should I compare their inode numbers?

more API for parsers:

bool isHintFileAvaiable (void);

Just after opening a hint file, the main part of ctags should notify it to parsers that have a method

void (*preprocessHintFile) (hintFile *file, hintFileInfo *info, langType lang);

This helps a parser build including/included (, require/provide, use/used, or import/package) relation graph before parsing.

How about introducing tags.c, a parser for tags.

./ctags a.tags

ctags parses a.tags. When ctags find F kind entry in a.tags, ctags parses the file tagged with the entry.

The code for realizing the multi-pass parsing and for updating a tags file are strongly related. But how?

Linking libreadtags proposed here is obviously needed for both features.
I guess we may need the "filesystem" language that deals with directories as the first-class objects.
tagEntryInfos for directories may be an important building block for dealing with "path" like include path, module pat, library path, etc.

For implementing "updating a tags file", I must revise the way of output.

When using a hint file, ctags must compare the options used for generating the hint file and the options just passed from the user.
This comparison is much more important when updating a tags file. Should we accept options in the second pass?
Saying NO is easy. However, we have to remember a user can have many lines in one's .ctags.

Comparing options is basic infrastructure for running ctags parsers parallel.

To support the other types of hint files, ctags must verify the type of the hint file with filename extension and pattern as ctags does for detecting a suitable parser for an input source file.

If querying hints is not done at hotspots, we can reuse the query engine used in the readtags command.

@masatake masatake force-pushed the multi-pass branch 2 times, most recently from 7341933 to 4aef85d Compare December 27, 2020 17:49
@masatake
Copy link
Member Author

I applied --_hint-file= to our C parser.

$ cat macro.h 
#define DEF(fn, rtype, signature, body)	\
	rtype fn signature BEGIN body END
$ u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o hint.tags macro.h 
$ cat hint.tags
!_TAG_FILE_FORMAT	2	/extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED	1	/0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD	mixed	/number, pattern, mixed, or combineV2/
!_TAG_OUTPUT_FILESEP	slash	/slash or backslash/
!_TAG_OUTPUT_MODE	u-ctags	/u-ctags or e-ctags/
!_TAG_PATTERN_LENGTH_LIMIT	96	/0 for no limit/
!_TAG_PROC_CWD	/home/jet/var/ctags-new/Units/parser-c.r/macrodef-hint-file.d/	//
!_TAG_PROGRAM_AUTHOR	Universal Ctags Team	//
!_TAG_PROGRAM_NAME	Universal Ctags	/Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL	https://ctags.io/	/official site/
!_TAG_PROGRAM_VERSION	5.9.0	/f35a3944/
DEF	macro.h	/^#define DEF(/;"	d	language:C++	signature:(fn,rtype,signature,body)	macrodef:rtype fn signature BEGIN body END
$ cat input.c 
#include "macro.h"

#define BEGIN {
#define END   }

DEF(add2, int, (int a, int b), a + b)
$ u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -o - input.c 
BEGIN	input.c	/^#define BEGIN /;"	d	file:
END	input.c	/^#define END /;"	d	file:
add2	input.c	/^DEF(add2, int, (int a, int b), a + b)$/;"	f	typeref:typename:int
$

I used the ctags command with my experimental patch to make tags for Qemu source code that I'm reading now.

$ time u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o hint.tags  -R
u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o  -R  2.03s user 0.13s system 98% cpu 2.196 total
$ time u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R
u-ctags --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R  99.66s user 41.11s system 99% cpu 2:21.08 total

About 70 times slower.

@masatake
Copy link
Member Author

I implemented negative hint cache.

$ time ~/bin/u-ctags --fields='+{line}' --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R
time ~/bin/u-ctags --fields='+{line}' --param-CPreProcessor:_expand=1 --_hint-file=hint.tags -R

real	0m20.691s
user	0m15.114s
sys	0m5.488s

7 times faster than the version without the negative hint cache.
However still 10 times slower than running with no-hint.

@masatake
Copy link
Member Author

This is related to #1960.

This change is preparation for adding features querying
existing tags files during parsing.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…e name defined in libreadtags

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
An option, --_hint-file=<tags-file> is also added.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…le is given

As the first pass, make a tags file with --fields=+{language}.
In the second pass, specified the tags file created in the first pass
with --_hint-file=<tags file>.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…a hint file

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
… hint file

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
… not in a hint file

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
… hint file

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
… language doesn't enable symtab

LDScript and Asm parsers use CPreProcessor parser code.
However, they don't turn on their symtabs because they are not ready for utilize
macro expanding feature. If CPreProcessor called symtab related functions, ctags crashed
because there was no symtab.

With this change, ctags can avoid the crash. CPreProcessor parser calls the symtab related
functions only when the client language enables its symtab.

TODO: TEST CASE IS NEEDED.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member Author

e5d7ced must be included in ctags6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants