make ctags a library #63

masatake · 2014-08-07T07:50:08Z

"Library" in ctags may have 3 aspects:

reading tags file,
backend language parser, and
running ctags.

The scope is this issue is "3. running ctags."
We have to research Geany.

Decreasing global variables will be the initial step.

fishman · 2014-08-07T08:04:54Z

i agree with you. maybe @b4n can chime in. i think he mentioned the issues in a prior post.

Btw, @b4n would you be fine with becoming a maintainer here as well?

masatake · 2014-10-09T08:12:43Z

@b4n, who is the primary maintainer of geany/tagmanager /ctags?
you?
If not, could you introduce me the one?

I'm looking for the way to share our efforts.

My idea is that splitting the source code of fishman/ctags into 3 parts.

parsers
core part but shared with fishman/ctags and geany
core part but not shared, fishman/ctags own part.

Of course "parsers" is the initial target for merging.
Currently source tree of fishman/ctags is very flat; all .c files are in topdir.
I will introduce new directory named "parsers" and move all parsers .c file to the directory.
This is fishman/ctags side.

About geany side my request is:
Could you introduce "geany / tagmanager / ctags / parsers" and move all parsers .c file to the directory?

Then we can evaluate, compare and exchange "parsers" code each other.
Soon code under parsers is unified auto-magically.

How do you think this approach?

vhda · 2014-10-09T08:56:45Z

Personally, I would not move files to avoid issues while merging work with sourceforge updates. I've seen git deal quite well with moved files, but it might be safer to keep files where they are for now unless really necessary.

ffes · 2014-10-09T09:40:50Z

From what I can see, Geany just included the files they need (base and parsers), modified them and set the variables/options to their need. But @b4n should be able to tell us more.

Keeping the parsers synced in some would be great and we both would benefit from that.

As I wrote in #83 Geany modifies their kindOptions. For instance, compare the two pascal parsers. We have two type: "function" and "procedure" and Geany has more of less merged them to "function". This kind of adjustments appear in other parsers as well. I guess this is to create a more uniform way for them to show the tags. But again @b4n should be able to tell us more.

masatake · 2014-10-09T10:16:52Z

@vhda, yes, you are right. We can send the same proposal, introducing "parsers" directory to exuberant-ctags.

If there is conceptual gap, like kindOptionas reported by @ffes, beetwen Geany and fishman-ctags, I will work on Geany side to fill the gap.

Anyway, let's think separate core and parsres. I believe the efforts for parsers can be share easily.

b4n · 2014-10-09T12:57:14Z

Sorry for the delay. At the time I didn't have time to answer correctly, and then I forgot about it… err.

The scope is this issue is "3. running ctags."
We have to research Geany.

What Geany does (through TagManager that was a very old library used by Anjuta a long time ago AFAIK), is simply call in LanguageTable[n]->parser() from parse.c, pretty much like createTagsForFile() does -- see tm_source_file_parse(). The glue is of the form of two global hooks added to parse.c, TagEntryFunction and TagEntrySetArglistFunction.
If we want to check more precisely the difference here we'd need to diff the two, I'm not familiar enough with the upstream version of these files to know all the details (nor, honestly, am I with our own; they predates my arrival in the project for a long time and just mostly worked since then).

We however support in-memory parsing, to be able to update displayed tags as the user types without having to write temporary files all the time. For this purposed, I wrote a tiny library named MIO, that supports file or memory-backed I/O with very low overhead and an API very close to the C FILE API to simplify porting.
MIO currently depends on GLib, mostly for a vsnprintf() implementation in C89, but it can relatively be easily ruled out, especially if using C99 is not an issue (and Geany switched to C99 sine then, so we would be just fine with it too). BTW, I have a not-yet-pushed set of changes to make the GLib dependency optional.
Geany bundles a copy of MIO, as AFAIK it's the only app that uses it, but if CTags wants to use it it can either do the same (as Geany bundles CTags we wouldn't mind anyway), or could link to it, either way is fine. MIO is currently licensed under GPLv2+, but it'd be easy for me to re-license it if required by CTags, as I'm the only author.

@fishman

Btw, @b4n would you be fine with becoming a maintainer here as well?

Sure, for the parsers part I'd be happy to. I already do most of the work on Geany's side, so I can do it here too.

@masatake

@b4n, who is the primary maintainer of geany/tagmanager /ctags?
you?

I am, mostly. I'm the maintainer for Geany, and the most active developer on the CTags part.

I'm looking for the way to share our efforts.

My idea is that splitting the source code of fishman/ctags into 3 parts.

parsers

core part but shared with fishman/ctags and geany

core part but not shared, fishman/ctags own part.

Of course "parsers" is the initial target for merging.

That would be awesome of course, the less diverged our tree is from upstream the easier it is for importing in both directions.
The parsers are already mostly in sync.
The next part that could be shared is the logic for calling in parsers and getting the result. I'm not sure the current way we have in Geany is really generic/clean enough to be a good design for reusable a library, but it would indeed at least help seeing what was useful for us.

Currently source tree of fishman/ctags is very flat; all .c files are in topdir.
I will introduce new directory named "parsers" and move all parsers .c file to the directory.
This is fishman/ctags side.

About geany side my request is:
Could you introduce "geany / tagmanager / ctags / parsers" and move all parsers .c file to the directory?

We could very well of course, and it would slightly help in differentiating the files, but that's probably not the worse part of the job :)

@ffes

From what I can see, Geany just included the files they need (base and parsers), modified them and set the variables/options to their need. But @b4n should be able to tell us more.

That's it, basically. We try to do the fewer modifications we can so importing upstream changes is easier, so again, we should be mostly in sync here, but for a few parsers.

Keeping the parsers synced in some would be great and we both would benefit from that.

Definitely.

As I wrote in #83 Geany modifies their kindOptions. For instance, compare the two pascal parsers. We have two type: "function" and "procedure" and Geany has more of less merged them to "function". This kind of adjustments appear in other parsers as well. I guess this is to create a more uniform way for them to show the tags. But again @b4n should be able to tell us more.

Yes, we change the kind names to be able to map them easily to generic entity types (TMTagType). CTags parsers are really not unified in this regard (which may or may not make sense, but some languages do have other names for things), and we need to be able to recognize at least some types, like functions/methods/prototypes, no matter what they are called in a particular language.

There could be other solution for this mapping problem, like we could probably have a mapping table to translate those in our side rather than in the parser itself.

masatake · 2014-10-09T17:45:35Z

@b4n, thank you for comments.
About the "parsers" directory, I will work on fishman/ctags side first at #88 and I will show you the prototype(?). Could you wait for awhile?

I like the idea of mapping table. I call it "flavour".

I would like to hear from you about two more topics.

How can I decode *.tags files under geany / tests / ctags ? They may be the same things to expected.tags in fishman/ctags.
Did you do something static variables in geany?
To use ctags as a library, I guess you might want decrease global variables, file static variables and
function static variables especially if you want to use the library with pthread. I would like to here your
experience in this area. This issue is the biggest reason I didn't talk with you about merging and/or
sharing the code and efforts within fishman/ctags and geany. I gues there were big gaps between
fishman/ctags and geany.

b4n · 2014-10-12T21:26:17Z

About the "parsers" directory, […] Could you wait for awhile?

Sure, no hurry.

How can I decode *.tags files under geany / tests / ctags ? They may be the same things to expected.tags in fishman/ctags.

They are, they are the expected results from the parser. However, Geany uses the TagManager format, which introduces fields with binary identifiers. See tm_tag_write() in TagManager sources.

Did you do something static variables in geany? To use ctags as a library, I guess you might want decrease global variables, file static variables and function static variables especially if you want to use the library with pthread. I would like to here your experience in this area. This issue is the biggest reason I didn't talk with you about merging and/or sharing the code and efforts within fishman/ctags and geany. I gues there were big gaps between fishman/ctags and geany.

I honestly don't really know the state of the core ctags we use, but it surely have its fair share of global variables. We corrected some of the global variable issues in some parser, but only to make them re-usable because some had problems when called for the second and next times; but I believe CTags also suffers from this when parsing multiple files, it may just be a little less obvious -- and I also believe most of these issues have been fixed here too.

About threading, yes global variables are indeed a problem. However, my (little) experience trying to run parsers in threads showed that it was mostly good enough to use one single worker thread calling all parsers, and for this global variables aren't a problem.
Actually in Geany we still (for now) parse in the main thread because parsing is actually not our biggest performance concern: managing all the parsed tags from various sources (open files, tag files, …) so we can query for them showed to be a lot more costly than running the parsers themselves (which are quite fast).

So I believe that threading is not the primary concern for creating a CTags library, I think that at least at first it would be good enough for the application to use it from one single thread, it being the main one or a dedicated one is the application's choice anyway.

Of course it'd be a different story if you wanted to provide a library that also manages the set of parsed tags and allow to manipulate them (search for a tag of a particular name, tags with a particular prefix, children of a tag, …), in which case indeed allowing this to be multi-threaded out-of-the-box would be awesome.

Instead of qsort() it's possible to use g_ptr_array_sort_with_data() with similar performance. Unfortunately it seems there's no bsearch_with_data() anywhere so the patch uses a modified bsearch() implementation from libc (still probably cleaner than passing arguments using static variables).

dtikhonov · 2015-07-24T20:57:42Z

@b4n, why not fmemopen(3) instead of MIO?

b4n · 2015-07-24T21:20:48Z

@dtikhonov Well, first, I didn't know about it :) Then, and probably a reason why I didn't find about it at the time (2011), it's not very portable (in POSIX.1-2008 but not before, otherwise seems to be _GNU_SOURCE).

So it'd be very nice, but I'm really not sure it's an acceptable dependency (would have to be checked on all platforms, also for consistency as if it's pre-POSIX, it's likely to have slight inconsistencies -- and we can't really reimplement it where missing as FILE is totally opaque).

elextr · 2015-10-29T08:17:10Z

@b4n, isn't another reason for not parsing in another thread to prevent having to make a copy of the Scintilla buffer. ATM the parser blocks Scintilla, so it can't move the buffer (by re-allocating it) or the gap during a parse.

masatake · 2019-05-22T02:09:49Z

libctags.a is introduced. Though it is not a generic library, but I think I can close this now.
Refining the API for Geany is the next step.

Import changes from kkos/oniguruma: * 928e08e6d898b72e7ac473263df76d480111199b Related: universal-ctags#63

739b3ee9e Fix argument type mismatch 150372de0 Merge branch 'feature/memory-recycling' 142660fb1 Reduce memory allocation frequency b3f745496 Add a typecast and const modifiers 176f5c0f8 Simplify the code using pcc_context_t typedef 4dbcaae48 Rename identifiers related to memory recycling 08a6f0c56 Merge branch 'master' into feature/memory-recycling 3a0ecca3f Rename macros in generated parsers e50f8b233 Merge pull request universal-ctags#63 from masatake/recycle-list 58ad04747 Merge pull request universal-ctags#64 from dolik-rce/benchmark-memory e559f4c4e add memory measurement to benchmark script f3a5c7e77 Preallocate memory objects for pcc_thunk_chunk_t, pcc_lr_head_t, and pcc_lr_answer_t 7cd6dffb7 Pass pcc_context_t instead of pcc_auxil_t in many places 710b51f7f Update the copyright years 70389ec19 Conform to the coding style 59668cf87 Divide the character_classes_0.d test into two tests 657508c52 Merge pull request universal-ctags#61 from mingodad/fix-charset-plus-minus 572951a8c Fix handling charset "[+-]" 0e3ee0c8b Update README.md 03c90e03e Fix the reopened issue universal-ctags#56 c2f499eb2 Ensures that all values of unevaluated rules are zero-cleared f376e099d Support exact column numbers in the PEG source even if UTF-8 multibyte characters are contained 9dfcd9153 Modify a dump function e27c05d91 Add codes for safety da750a9a7 Refine code block output afd64bc61 Update README.md cea483b89 Support insertion of #line directives in the generated code (universal-ctags#55) 62130fe96 Add a feature to count text lines output to a stream 4982d72ea Introduce a structure to hold code block data 86874c214 Fix incorrect update of the parsing position 41be80f02 Introduce a structure to hold options 5b9f23d18 Rename functions 803317bc4 Update README.md git-subtree-dir: misc/packcc git-subtree-split: 739b3ee9edd62b8623d30272069e6fd446270591

masatake changed the title ~~make ctags can be used as a library~~ make ctags a library Aug 7, 2014

masatake added the Core part label Aug 7, 2014

masatake added this to the Feature plan milestone Aug 12, 2014

masatake mentioned this issue Oct 9, 2014

Rearrange build process and directory organization #88

Closed

masatake mentioned this issue Apr 16, 2015

Move all parsers C code to parsers directory #286

Closed

masatake added the client tools/libctags/ctags api label Jun 19, 2015

masatake modified the milestones: Feature plan, Merging/sharing the code base with Geany's tag engine Jun 19, 2015

masatake mentioned this issue Jun 22, 2015

Make the hash table used in keywords.c work like a hash table #388

Merged

dtikhonov mentioned this issue Jul 24, 2015

Share parsers code with geany #459

Closed

techee mentioned this issue Apr 2, 2016

Porting memory I/O from Geany #863

Closed

codebrainz mentioned this issue Jul 14, 2016

Add interactive --json command mode #994

Closed

raphinesse mentioned this issue Jul 15, 2016

Factor out readtags as a standalone lib? #1040

Closed

codebrainz mentioned this issue Aug 23, 2016

New tagmanager query module geany/geany#1187

Open

codebrainz mentioned this issue Apr 25, 2019

Upstream or alternative solution for ctags regex parsers geany/geany#2119

Closed

2 tasks

masatake closed this as completed May 22, 2019

masatake pushed a commit to masatake/ctags that referenced this issue Mar 12, 2020

Don't use int_map_backward for thread-safe

f147c7f

Import changes from kkos/oniguruma: * 928e08e6d898b72e7ac473263df76d480111199b Related: universal-ctags#63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make ctags a library #63

make ctags a library #63

masatake commented Aug 7, 2014

fishman commented Aug 7, 2014

masatake commented Oct 9, 2014

vhda commented Oct 9, 2014

ffes commented Oct 9, 2014

masatake commented Oct 9, 2014

b4n commented Oct 9, 2014

masatake commented Oct 9, 2014

b4n commented Oct 12, 2014

dtikhonov commented Jul 24, 2015

b4n commented Jul 24, 2015

elextr commented Oct 29, 2015

masatake commented May 22, 2019

make ctags a library #63

make ctags a library #63

Comments

masatake commented Aug 7, 2014

fishman commented Aug 7, 2014

masatake commented Oct 9, 2014

vhda commented Oct 9, 2014

ffes commented Oct 9, 2014

masatake commented Oct 9, 2014

b4n commented Oct 9, 2014

masatake commented Oct 9, 2014

b4n commented Oct 12, 2014

dtikhonov commented Jul 24, 2015

b4n commented Jul 24, 2015

elextr commented Oct 29, 2015

masatake commented May 22, 2019