Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: clean up stale tags when appending (-a) #1421

Open
aktau opened this issue May 29, 2017 · 5 comments
Open

feature: clean up stale tags when appending (-a) #1421

aktau opened this issue May 29, 2017 · 5 comments
Labels
FAQ This should be part of FAQ.

Comments

@aktau
Copy link

aktau commented May 29, 2017

As referenced in #1420. Ideally both features would be implemented for an enormous speed up and increase in ease-of-use (reduction in duplicated code all over the place). The FR:

  • Filter out tags belonging to files that no longer exist when updating a tags file (in append mode).
  • Filter out (stale) tags belonging to the file that is currently being updated.

Because exuberant/universal ctags doesn't support these two things, many tool developers are forced to implement something like it to when regenerating (parts) of their tags files. Some examples:

This would be a natural extension of the append functionality, IMHO.

@aktau
Copy link
Author

aktau commented May 29, 2017

Another example of someone who implemented tags cleaning: https://github.com/kainz/incremental-ctags-hooks.

Citation:

#Implementation OK, this is the ugly part. We temporarily write and execute a C program to filter 'removed' files from your ctags, then feed the remainder of changed files as seen by git-diff into a ctags --append. While this is incredibly ugly, it is orders of magnitude faster than awk/sed/perl, and about 10-20% faster than CPython on my tests involving an approximately 80MB tags file.

@masatake
Copy link
Member

masatake commented May 5, 2020

Now we can link libreadtags to ctags. libreadtags in ctags can be used for the filtering in ctags.

Alternative approach is introducing new command like linktags. ldtags, or edittags. More study is needed. I'm working on this topic very slowly.

@masatake
Copy link
Member

masatake commented May 5, 2020

make tags for whole the source code (linux kernel):

% /bin/time -p u-ctags --options=NONE --fields=+KZz -o linux.tags -R code/linux                    
u-ctags: Notice: No options will be read from files or environment
real 54.79
user 54.23
sys 2.82

dropping the tags for code/linux/block/bfq-iosched.c (here I assume you edit the file.)

%  { readtags -t linux.tags -D; /bin/time -p readtags -t linux.tags -en -Q '(not (eq? $input "code/linux/block/bfq-iosched.c"))' -l } > filtered.tags
real 8.34
user 7.79
sys 0.52

tagging code/linux/block/bfq-iosched.c again:

% /bin/time -p u-ctags --options=NONE --fields=+KZz -o part.tags -R code/linux/block/bfq-iosched.c
u-ctags: Notice: No options will be read from files or environment
real 0.01
user 0.00
sys 0.00

Merging filtered.tags and part.tags:

% LC_COLLATE=C LC_ALL=C /bin/time -p sort -u --parallel=4 linux.tags part.tags > tags
real 1.04
user 1.09
sys 0.60

It takes about 9.5s to update the tags file.
About 5 times faster than full parsing. I wonder I can improve the performance of -Q option more.

Just for listing with readtags:

%  { readtags -t linux.tags -D; /bin/time -p readtags -t linux.tags -en  -l } > just-listing.tags
real 3.54
user 3.06
sys 0.45

@aktau
Copy link
Author

aktau commented May 5, 2020

A couple of questions:

  1. Why does your last step used sort instead of u-ctags --append? Is it because there are no domain-specific optimizations and thus the speed is similar?
  2. The approach you're using (filter ; retag-partial ; merge) seems very similar to what I mentioned in feature: clean up stale tags when appending (-a) #1421 (comment). It looks like your tool implements (interpretes) some lisp-like language. Could you try something like grep --fixed-strings -v 'code/linux/block/bfq-iosched.c' and see if there is a slowdown or speedup or a difference in the results? grep is highly optimized for this sort of use-case, and I would be somewhat surprised if readtags is faster.

@masatake
Copy link
Member

masatake commented May 5, 2020

Thank you for the comments.

Why does your last step used sort instead of u-ctags --append? Is it because there are no domain-specific optimizations and thus the speed is similar?

No optimization here.
I'm not familiar with the option. So I just forgot it.

% cp filtered.tags filtered2.tags
% /bin/time -p u-ctags --options=NONE --fields=+KZz --append filtered2.tags  code/linux/block/bfq-iosched.c
u-ctags: Notice: No options will be read from files or environment
real 1.51
user 1.74
sys 0.89

As I expected, the result is not so changed. However, we can reduce the step. Thank you.

The approach you're using (filter ; retag-partial ; merge) seems very similar to what I mentioned in #1421 (comment). It looks like your tool implements (interpretes) some lisp-like language. Could you try something like grep --fixed-strings -v 'code/linux/block/bfq-iosched.c' and see if there is a slowdown or speedup or a difference in the results? grep is highly optimized for this sort of use-case, and I would be somewhat surprised if readtags is faster.

You are correct.

% /bin/time -p grep --fixed-strings -v 'code/linux/block/bfq-iosched.c' tags > filtered-2.tags             
real 0.77
user 0.47
sys 0.30

grep is much faster than readtags -Q.
Just for updating, using a wrapper shell script will be the best. I have to write this as a tips to FAQ.

I'm thinking about adding feature for reading tags files to ctags expanding Cpreprocessor macros during parsing.
The current implementation (#2427) allows expanding macros defined in the same input file.
If ctags can read macro definitions stored in tags files, ctags can overcome the limitation "in the same input file".

@masatake masatake added the FAQ This should be part of FAQ. label May 5, 2020
hirooih added a commit to hirooih/ctags that referenced this issue Feb 8, 2021
- new sections:
  - Does Universal Ctags support Unicode file names?
  - Why does zsh cause "zsh: no matches found" error?
- add TODO comment for universal-ctags#1421, universal-ctags#2356, and universal-ctags#2540
hirooih added a commit to hirooih/ctags that referenced this issue Feb 9, 2021
- new sections:
  - Does Universal Ctags support Unicode file names?
  - Why does zsh cause "zsh: no matches found" error?
- add TODO comment for universal-ctags#1421, universal-ctags#2356, and universal-ctags#2540
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FAQ This should be part of FAQ.
Projects
None yet
Development

No branches or pull requests

2 participants