New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ctags a library #63

Open
masatake opened this Issue Aug 7, 2014 · 11 comments

Comments

Projects
None yet
7 participants
@masatake
Member

masatake commented Aug 7, 2014

"Library" in ctags may have 3 aspects:

  1. reading tags file,
  2. backend language parser, and
  3. running ctags.

The scope is this issue is "3. running ctags."
We have to research Geany.

Decreasing global variables will be the initial step.

@masatake masatake changed the title from make ctags can be used as a library to make ctags a library Aug 7, 2014

@masatake masatake added the Core part label Aug 7, 2014

@fishman

This comment has been minimized.

Contributor

fishman commented Aug 7, 2014

i agree with you. maybe @b4n can chime in. i think he mentioned the issues in a prior post.

Btw, @b4n would you be fine with becoming a maintainer here as well?

@masatake masatake added this to the Feature plan milestone Aug 12, 2014

@masatake

This comment has been minimized.

Member

masatake commented Oct 9, 2014

@b4n, who is the primary maintainer of geany/tagmanager /ctags?
you?
If not, could you introduce me the one?

I'm looking for the way to share our efforts.

My idea is that splitting the source code of fishman/ctags into 3 parts.

  1. parsers
  2. core part but shared with fishman/ctags and geany
  3. core part but not shared, fishman/ctags own part.

Of course "parsers" is the initial target for merging.
Currently source tree of fishman/ctags is very flat; all .c files are in topdir.
I will introduce new directory named "parsers" and move all parsers .c file to the directory.
This is fishman/ctags side.

About geany side my request is:
Could you introduce "geany / tagmanager / ctags / parsers" and move all parsers .c file to the directory?

Then we can evaluate, compare and exchange "parsers" code each other.
Soon code under parsers is unified auto-magically.

How do you think this approach?

@vhda

This comment has been minimized.

Contributor

vhda commented Oct 9, 2014

Personally, I would not move files to avoid issues while merging work with sourceforge updates. I've seen git deal quite well with moved files, but it might be safer to keep files where they are for now unless really necessary.

@ffes

This comment has been minimized.

Member

ffes commented Oct 9, 2014

From what I can see, Geany just included the files they need (base and parsers), modified them and set the variables/options to their need. But @b4n should be able to tell us more.

Keeping the parsers synced in some would be great and we both would benefit from that.

As I wrote in #83 Geany modifies their kindOptions. For instance, compare the two pascal parsers. We have two type: "function" and "procedure" and Geany has more of less merged them to "function". This kind of adjustments appear in other parsers as well. I guess this is to create a more uniform way for them to show the tags. But again @b4n should be able to tell us more.

@masatake

This comment has been minimized.

Member

masatake commented Oct 9, 2014

@vhda, yes, you are right. We can send the same proposal, introducing "parsers" directory to exuberant-ctags.

If there is conceptual gap, like kindOptionas reported by @ffes, beetwen Geany and fishman-ctags, I will work on Geany side to fill the gap.

Anyway, let's think separate core and parsres. I believe the efforts for parsers can be share easily.

@b4n

This comment has been minimized.

Member

b4n commented Oct 9, 2014

Sorry for the delay. At the time I didn't have time to answer correctly, and then I forgot about it… err.

The scope is this issue is "3. running ctags."
We have to research Geany.

What Geany does (through TagManager that was a very old library used by Anjuta a long time ago AFAIK), is simply call in LanguageTable[n]->parser() from parse.c, pretty much like createTagsForFile() does -- see tm_source_file_parse(). The glue is of the form of two global hooks added to parse.c, TagEntryFunction and TagEntrySetArglistFunction.
If we want to check more precisely the difference here we'd need to diff the two, I'm not familiar enough with the upstream version of these files to know all the details (nor, honestly, am I with our own; they predates my arrival in the project for a long time and just mostly worked since then).

We however support in-memory parsing, to be able to update displayed tags as the user types without having to write temporary files all the time. For this purposed, I wrote a tiny library named MIO, that supports file or memory-backed I/O with very low overhead and an API very close to the C FILE API to simplify porting.
MIO currently depends on GLib, mostly for a vsnprintf() implementation in C89, but it can relatively be easily ruled out, especially if using C99 is not an issue (and Geany switched to C99 sine then, so we would be just fine with it too). BTW, I have a not-yet-pushed set of changes to make the GLib dependency optional.
Geany bundles a copy of MIO, as AFAIK it's the only app that uses it, but if CTags wants to use it it can either do the same (as Geany bundles CTags we wouldn't mind anyway), or could link to it, either way is fine. MIO is currently licensed under GPLv2+, but it'd be easy for me to re-license it if required by CTags, as I'm the only author.

@fishman

Btw, @b4n would you be fine with becoming a maintainer here as well?

Sure, for the parsers part I'd be happy to. I already do most of the work on Geany's side, so I can do it here too.

@masatake

@b4n, who is the primary maintainer of geany/tagmanager /ctags?
you?

I am, mostly. I'm the maintainer for Geany, and the most active developer on the CTags part.

I'm looking for the way to share our efforts.

My idea is that splitting the source code of fishman/ctags into 3 parts.

  1. parsers
  2. core part but shared with fishman/ctags and geany
  3. core part but not shared, fishman/ctags own part.

Of course "parsers" is the initial target for merging.

That would be awesome of course, the less diverged our tree is from upstream the easier it is for importing in both directions.
The parsers are already mostly in sync.
The next part that could be shared is the logic for calling in parsers and getting the result. I'm not sure the current way we have in Geany is really generic/clean enough to be a good design for reusable a library, but it would indeed at least help seeing what was useful for us.

Currently source tree of fishman/ctags is very flat; all .c files are in topdir.
I will introduce new directory named "parsers" and move all parsers .c file to the directory.
This is fishman/ctags side.

About geany side my request is:
Could you introduce "geany / tagmanager / ctags / parsers" and move all parsers .c file to the directory?

We could very well of course, and it would slightly help in differentiating the files, but that's probably not the worse part of the job :)

@ffes

From what I can see, Geany just included the files they need (base and parsers), modified them and set the variables/options to their need. But @b4n should be able to tell us more.

That's it, basically. We try to do the fewer modifications we can so importing upstream changes is easier, so again, we should be mostly in sync here, but for a few parsers.

Keeping the parsers synced in some would be great and we both would benefit from that.

Definitely.

As I wrote in #83 Geany modifies their kindOptions. For instance, compare the two pascal parsers. We have two type: "function" and "procedure" and Geany has more of less merged them to "function". This kind of adjustments appear in other parsers as well. I guess this is to create a more uniform way for them to show the tags. But again @b4n should be able to tell us more.

Yes, we change the kind names to be able to map them easily to generic entity types (TMTagType). CTags parsers are really not unified in this regard (which may or may not make sense, but some languages do have other names for things), and we need to be able to recognize at least some types, like functions/methods/prototypes, no matter what they are called in a particular language.

There could be other solution for this mapping problem, like we could probably have a mapping table to translate those in our side rather than in the parser itself.

@masatake

This comment has been minimized.

Member

masatake commented Oct 9, 2014

@b4n, thank you for comments.
About the "parsers" directory, I will work on fishman/ctags side first at #88 and I will show you the prototype(?). Could you wait for awhile?

I like the idea of mapping table. I call it "flavour".

I would like to hear from you about two more topics.

  1. How can I decode *.tags files under geany / tests / ctags ? They may be the same things to expected.tags in fishman/ctags.
  2. Did you do something static variables in geany?
    To use ctags as a library, I guess you might want decrease global variables, file static variables and
    function static variables especially if you want to use the library with pthread. I would like to here your
    experience in this area. This issue is the biggest reason I didn't talk with you about merging and/or
    sharing the code and efforts within fishman/ctags and geany. I gues there were big gaps between
    fishman/ctags and geany.
@b4n

This comment has been minimized.

Member

b4n commented Oct 12, 2014

About the "parsers" directory, […] Could you wait for awhile?

Sure, no hurry.

  1. How can I decode *.tags files under geany / tests / ctags ? They may be the same things to expected.tags in fishman/ctags.

They are, they are the expected results from the parser. However, Geany uses the TagManager format, which introduces fields with binary identifiers. See tm_tag_write() in TagManager sources.

Did you do something static variables in geany? To use ctags as a library, I guess you might want decrease global variables, file static variables and function static variables especially if you want to use the library with pthread. I would like to here your experience in this area. This issue is the biggest reason I didn't talk with you about merging and/or sharing the code and efforts within fishman/ctags and geany. I gues there were big gaps between fishman/ctags and geany.

I honestly don't really know the state of the core ctags we use, but it surely have its fair share of global variables. We corrected some of the global variable issues in some parser, but only to make them re-usable because some had problems when called for the second and next times; but I believe CTags also suffers from this when parsing multiple files, it may just be a little less obvious -- and I also believe most of these issues have been fixed here too.

About threading, yes global variables are indeed a problem. However, my (little) experience trying to run parsers in threads showed that it was mostly good enough to use one single worker thread calling all parsers, and for this global variables aren't a problem.
Actually in Geany we still (for now) parse in the main thread because parsing is actually not our biggest performance concern: managing all the parsed tags from various sources (open files, tag files, …) so we can query for them showed to be a lot more costly than running the parsers themselves (which are quite fast).

So I believe that threading is not the primary concern for creating a CTags library, I think that at least at first it would be good enough for the application to use it from one single thread, it being the main one or a dedicated one is the application's choice anyway.

Of course it'd be a different story if you wanted to provide a library that also manages the set of parsed tags and allow to manipulate them (search for a tag of a particular name, tags with a particular prefix, children of a tag, …), in which case indeed allowing this to be multi-threaded out-of-the-box would be awesome.

b4n referenced this issue in techee/geany Oct 31, 2014

Don't pass arguments to search/sort functions using static variables
Instead of qsort() it's possible to use g_ptr_array_sort_with_data() with
similar performance. Unfortunately it seems there's no bsearch_with_data()
anywhere so the patch uses a modified bsearch() implementation from libc
(still probably cleaner than passing arguments using static variables).
@dtikhonov

This comment has been minimized.

Contributor

dtikhonov commented Jul 24, 2015

@b4n, why not fmemopen(3) instead of MIO?

@b4n

This comment has been minimized.

Member

b4n commented Jul 24, 2015

@dtikhonov Well, first, I didn't know about it :) Then, and probably a reason why I didn't find about it at the time (2011), it's not very portable (in POSIX.1-2008 but not before, otherwise seems to be _GNU_SOURCE).

So it'd be very nice, but I'm really not sure it's an acceptable dependency (would have to be checked on all platforms, also for consistency as if it's pre-POSIX, it's likely to have slight inconsistencies -- and we can't really reimplement it where missing as FILE is totally opaque).

@elextr

This comment has been minimized.

Contributor

elextr commented Oct 29, 2015

@b4n, isn't another reason for not parsing in another thread to prevent having to make a copy of the Scintilla buffer. ATM the parser blocks Scintilla, so it can't move the buffer (by re-allocating it) or the gap during a parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment