Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ctags a library #63

Closed
masatake opened this issue Aug 7, 2014 · 12 comments
Closed

make ctags a library #63

masatake opened this issue Aug 7, 2014 · 12 comments

Comments

@masatake
Copy link
Member

masatake commented Aug 7, 2014

"Library" in ctags may have 3 aspects:

  1. reading tags file,
  2. backend language parser, and
  3. running ctags.

The scope is this issue is "3. running ctags."
We have to research Geany.

Decreasing global variables will be the initial step.

@masatake masatake changed the title make ctags can be used as a library make ctags a library Aug 7, 2014
@fishman
Copy link
Contributor

fishman commented Aug 7, 2014

i agree with you. maybe @b4n can chime in. i think he mentioned the issues in a prior post.

Btw, @b4n would you be fine with becoming a maintainer here as well?

@masatake masatake added this to the Feature plan milestone Aug 12, 2014
@masatake
Copy link
Member Author

masatake commented Oct 9, 2014

@b4n, who is the primary maintainer of geany/tagmanager /ctags?
you?
If not, could you introduce me the one?

I'm looking for the way to share our efforts.

My idea is that splitting the source code of fishman/ctags into 3 parts.

  1. parsers
  2. core part but shared with fishman/ctags and geany
  3. core part but not shared, fishman/ctags own part.

Of course "parsers" is the initial target for merging.
Currently source tree of fishman/ctags is very flat; all .c files are in topdir.
I will introduce new directory named "parsers" and move all parsers .c file to the directory.
This is fishman/ctags side.

About geany side my request is:
Could you introduce "geany / tagmanager / ctags / parsers" and move all parsers .c file to the directory?

Then we can evaluate, compare and exchange "parsers" code each other.
Soon code under parsers is unified auto-magically.

How do you think this approach?

@vhda
Copy link
Contributor

vhda commented Oct 9, 2014

Personally, I would not move files to avoid issues while merging work with sourceforge updates. I've seen git deal quite well with moved files, but it might be safer to keep files where they are for now unless really necessary.

@ffes
Copy link
Member

ffes commented Oct 9, 2014

From what I can see, Geany just included the files they need (base and parsers), modified them and set the variables/options to their need. But @b4n should be able to tell us more.

Keeping the parsers synced in some would be great and we both would benefit from that.

As I wrote in #83 Geany modifies their kindOptions. For instance, compare the two pascal parsers. We have two type: "function" and "procedure" and Geany has more of less merged them to "function". This kind of adjustments appear in other parsers as well. I guess this is to create a more uniform way for them to show the tags. But again @b4n should be able to tell us more.

@masatake
Copy link
Member Author

masatake commented Oct 9, 2014

@vhda, yes, you are right. We can send the same proposal, introducing "parsers" directory to exuberant-ctags.

If there is conceptual gap, like kindOptionas reported by @ffes, beetwen Geany and fishman-ctags, I will work on Geany side to fill the gap.

Anyway, let's think separate core and parsres. I believe the efforts for parsers can be share easily.

@b4n
Copy link
Member

b4n commented Oct 9, 2014

Sorry for the delay. At the time I didn't have time to answer correctly, and then I forgot about it… err.

The scope is this issue is "3. running ctags."
We have to research Geany.

What Geany does (through TagManager that was a very old library used by Anjuta a long time ago AFAIK), is simply call in LanguageTable[n]->parser() from parse.c, pretty much like createTagsForFile() does -- see tm_source_file_parse(). The glue is of the form of two global hooks added to parse.c, TagEntryFunction and TagEntrySetArglistFunction.
If we want to check more precisely the difference here we'd need to diff the two, I'm not familiar enough with the upstream version of these files to know all the details (nor, honestly, am I with our own; they predates my arrival in the project for a long time and just mostly worked since then).

We however support in-memory parsing, to be able to update displayed tags as the user types without having to write temporary files all the time. For this purposed, I wrote a tiny library named MIO, that supports file or memory-backed I/O with very low overhead and an API very close to the C FILE API to simplify porting.
MIO currently depends on GLib, mostly for a vsnprintf() implementation in C89, but it can relatively be easily ruled out, especially if using C99 is not an issue (and Geany switched to C99 sine then, so we would be just fine with it too). BTW, I have a not-yet-pushed set of changes to make the GLib dependency optional.
Geany bundles a copy of MIO, as AFAIK it's the only app that uses it, but if CTags wants to use it it can either do the same (as Geany bundles CTags we wouldn't mind anyway), or could link to it, either way is fine. MIO is currently licensed under GPLv2+, but it'd be easy for me to re-license it if required by CTags, as I'm the only author.

@fishman

Btw, @b4n would you be fine with becoming a maintainer here as well?

Sure, for the parsers part I'd be happy to. I already do most of the work on Geany's side, so I can do it here too.

@masatake

@b4n, who is the primary maintainer of geany/tagmanager /ctags?
you?

I am, mostly. I'm the maintainer for Geany, and the most active developer on the CTags part.

I'm looking for the way to share our efforts.

My idea is that splitting the source code of fishman/ctags into 3 parts.

  1. parsers
  2. core part but shared with fishman/ctags and geany
  3. core part but not shared, fishman/ctags own part.

Of course "parsers" is the initial target for merging.

That would be awesome of course, the less diverged our tree is from upstream the easier it is for importing in both directions.
The parsers are already mostly in sync.
The next part that could be shared is the logic for calling in parsers and getting the result. I'm not sure the current way we have in Geany is really generic/clean enough to be a good design for reusable a library, but it would indeed at least help seeing what was useful for us.

Currently source tree of fishman/ctags is very flat; all .c files are in topdir.
I will introduce new directory named "parsers" and move all parsers .c file to the directory.
This is fishman/ctags side.

About geany side my request is:
Could you introduce "geany / tagmanager / ctags / parsers" and move all parsers .c file to the directory?

We could very well of course, and it would slightly help in differentiating the files, but that's probably not the worse part of the job :)

@ffes

From what I can see, Geany just included the files they need (base and parsers), modified them and set the variables/options to their need. But @b4n should be able to tell us more.

That's it, basically. We try to do the fewer modifications we can so importing upstream changes is easier, so again, we should be mostly in sync here, but for a few parsers.

Keeping the parsers synced in some would be great and we both would benefit from that.

Definitely.

As I wrote in #83 Geany modifies their kindOptions. For instance, compare the two pascal parsers. We have two type: "function" and "procedure" and Geany has more of less merged them to "function". This kind of adjustments appear in other parsers as well. I guess this is to create a more uniform way for them to show the tags. But again @b4n should be able to tell us more.

Yes, we change the kind names to be able to map them easily to generic entity types (TMTagType). CTags parsers are really not unified in this regard (which may or may not make sense, but some languages do have other names for things), and we need to be able to recognize at least some types, like functions/methods/prototypes, no matter what they are called in a particular language.

There could be other solution for this mapping problem, like we could probably have a mapping table to translate those in our side rather than in the parser itself.

@masatake
Copy link
Member Author

masatake commented Oct 9, 2014

@b4n, thank you for comments.
About the "parsers" directory, I will work on fishman/ctags side first at #88 and I will show you the prototype(?). Could you wait for awhile?

I like the idea of mapping table. I call it "flavour".

I would like to hear from you about two more topics.

  1. How can I decode *.tags files under geany / tests / ctags ? They may be the same things to expected.tags in fishman/ctags.
  2. Did you do something static variables in geany?
    To use ctags as a library, I guess you might want decrease global variables, file static variables and
    function static variables especially if you want to use the library with pthread. I would like to here your
    experience in this area. This issue is the biggest reason I didn't talk with you about merging and/or
    sharing the code and efforts within fishman/ctags and geany. I gues there were big gaps between
    fishman/ctags and geany.

@b4n
Copy link
Member

b4n commented Oct 12, 2014

About the "parsers" directory, […] Could you wait for awhile?

Sure, no hurry.

  1. How can I decode *.tags files under geany / tests / ctags ? They may be the same things to expected.tags in fishman/ctags.

They are, they are the expected results from the parser. However, Geany uses the TagManager format, which introduces fields with binary identifiers. See tm_tag_write() in TagManager sources.

Did you do something static variables in geany? To use ctags as a library, I guess you might want decrease global variables, file static variables and function static variables especially if you want to use the library with pthread. I would like to here your experience in this area. This issue is the biggest reason I didn't talk with you about merging and/or sharing the code and efforts within fishman/ctags and geany. I gues there were big gaps between fishman/ctags and geany.

I honestly don't really know the state of the core ctags we use, but it surely have its fair share of global variables. We corrected some of the global variable issues in some parser, but only to make them re-usable because some had problems when called for the second and next times; but I believe CTags also suffers from this when parsing multiple files, it may just be a little less obvious -- and I also believe most of these issues have been fixed here too.

About threading, yes global variables are indeed a problem. However, my (little) experience trying to run parsers in threads showed that it was mostly good enough to use one single worker thread calling all parsers, and for this global variables aren't a problem.
Actually in Geany we still (for now) parse in the main thread because parsing is actually not our biggest performance concern: managing all the parsed tags from various sources (open files, tag files, …) so we can query for them showed to be a lot more costly than running the parsers themselves (which are quite fast).

So I believe that threading is not the primary concern for creating a CTags library, I think that at least at first it would be good enough for the application to use it from one single thread, it being the main one or a dedicated one is the application's choice anyway.

Of course it'd be a different story if you wanted to provide a library that also manages the set of parsed tags and allow to manipulate them (search for a tag of a particular name, tags with a particular prefix, children of a tag, …), in which case indeed allowing this to be multi-threaded out-of-the-box would be awesome.

b4n referenced this issue in techee/geany Oct 31, 2014
Instead of qsort() it's possible to use g_ptr_array_sort_with_data() with
similar performance. Unfortunately it seems there's no bsearch_with_data()
anywhere so the patch uses a modified bsearch() implementation from libc
(still probably cleaner than passing arguments using static variables).
@masatake masatake modified the milestones: Feature plan, Merging/sharing the code base with Geany's tag engine Jun 19, 2015
@dtikhonov
Copy link
Member

@b4n, why not fmemopen(3) instead of MIO?

@b4n
Copy link
Member

b4n commented Jul 24, 2015

@dtikhonov Well, first, I didn't know about it :) Then, and probably a reason why I didn't find about it at the time (2011), it's not very portable (in POSIX.1-2008 but not before, otherwise seems to be _GNU_SOURCE).

So it'd be very nice, but I'm really not sure it's an acceptable dependency (would have to be checked on all platforms, also for consistency as if it's pre-POSIX, it's likely to have slight inconsistencies -- and we can't really reimplement it where missing as FILE is totally opaque).

@elextr
Copy link
Contributor

elextr commented Oct 29, 2015

@b4n, isn't another reason for not parsing in another thread to prevent having to make a copy of the Scintilla buffer. ATM the parser blocks Scintilla, so it can't move the buffer (by re-allocating it) or the gap during a parse.

@masatake
Copy link
Member Author

libctags.a is introduced. Though it is not a generic library, but I think I can close this now.
Refining the API for Geany is the next step.

masatake pushed a commit to masatake/ctags that referenced this issue Mar 12, 2020
Import changes from kkos/oniguruma:
* 928e08e6d898b72e7ac473263df76d480111199b

Related: universal-ctags#63
masatake added a commit to masatake/ctags that referenced this issue May 27, 2022
739b3ee9e Fix argument type mismatch
150372de0 Merge branch 'feature/memory-recycling'
142660fb1 Reduce memory allocation frequency
b3f745496 Add a typecast and const modifiers
176f5c0f8 Simplify the code using pcc_context_t typedef
4dbcaae48 Rename identifiers related to memory recycling
08a6f0c56 Merge branch 'master' into feature/memory-recycling
3a0ecca3f Rename macros in generated parsers
e50f8b233 Merge pull request universal-ctags#63 from masatake/recycle-list
58ad04747 Merge pull request universal-ctags#64 from dolik-rce/benchmark-memory
e559f4c4e add memory measurement to benchmark script
f3a5c7e77 Preallocate memory objects for pcc_thunk_chunk_t, pcc_lr_head_t, and pcc_lr_answer_t
7cd6dffb7 Pass pcc_context_t instead of pcc_auxil_t in many places
710b51f7f Update the copyright years
70389ec19 Conform to the coding style
59668cf87 Divide the character_classes_0.d test into two tests
657508c52 Merge pull request universal-ctags#61 from mingodad/fix-charset-plus-minus
572951a8c Fix handling charset "[+-]"
0e3ee0c8b Update README.md
03c90e03e Fix the reopened issue universal-ctags#56
c2f499eb2 Ensures that all values of unevaluated rules are zero-cleared
f376e099d Support exact column numbers in the PEG source even if UTF-8 multibyte characters are contained
9dfcd9153 Modify a dump function
e27c05d91 Add codes for safety
da750a9a7 Refine code block output
afd64bc61 Update README.md
cea483b89 Support insertion of #line directives in the generated code (universal-ctags#55)
62130fe96 Add a feature to count text lines output to a stream
4982d72ea Introduce a structure to hold code block data
86874c214 Fix incorrect update of the parsing position
41be80f02 Introduce a structure to hold options
5b9f23d18 Rename functions
803317bc4 Update README.md

git-subtree-dir: misc/packcc
git-subtree-split: 739b3ee9edd62b8623d30272069e6fd446270591
masatake added a commit to masatake/ctags that referenced this issue Jun 1, 2022
739b3ee9e Fix argument type mismatch
150372de0 Merge branch 'feature/memory-recycling'
142660fb1 Reduce memory allocation frequency
b3f745496 Add a typecast and const modifiers
176f5c0f8 Simplify the code using pcc_context_t typedef
4dbcaae48 Rename identifiers related to memory recycling
08a6f0c56 Merge branch 'master' into feature/memory-recycling
3a0ecca3f Rename macros in generated parsers
e50f8b233 Merge pull request universal-ctags#63 from masatake/recycle-list
58ad04747 Merge pull request universal-ctags#64 from dolik-rce/benchmark-memory
e559f4c4e add memory measurement to benchmark script
f3a5c7e77 Preallocate memory objects for pcc_thunk_chunk_t, pcc_lr_head_t, and pcc_lr_answer_t
7cd6dffb7 Pass pcc_context_t instead of pcc_auxil_t in many places
710b51f7f Update the copyright years
70389ec19 Conform to the coding style
59668cf87 Divide the character_classes_0.d test into two tests
657508c52 Merge pull request universal-ctags#61 from mingodad/fix-charset-plus-minus
572951a8c Fix handling charset "[+-]"
0e3ee0c8b Update README.md
03c90e03e Fix the reopened issue universal-ctags#56
c2f499eb2 Ensures that all values of unevaluated rules are zero-cleared
f376e099d Support exact column numbers in the PEG source even if UTF-8 multibyte characters are contained
9dfcd9153 Modify a dump function
e27c05d91 Add codes for safety
da750a9a7 Refine code block output
afd64bc61 Update README.md
cea483b89 Support insertion of #line directives in the generated code (universal-ctags#55)
62130fe96 Add a feature to count text lines output to a stream
4982d72ea Introduce a structure to hold code block data
86874c214 Fix incorrect update of the parsing position
41be80f02 Introduce a structure to hold options
5b9f23d18 Rename functions
803317bc4 Update README.md

git-subtree-dir: misc/packcc
git-subtree-split: 739b3ee9edd62b8623d30272069e6fd446270591
masatake added a commit to masatake/ctags that referenced this issue Jun 1, 2022
739b3ee9e Fix argument type mismatch
150372de0 Merge branch 'feature/memory-recycling'
142660fb1 Reduce memory allocation frequency
b3f745496 Add a typecast and const modifiers
176f5c0f8 Simplify the code using pcc_context_t typedef
4dbcaae48 Rename identifiers related to memory recycling
08a6f0c56 Merge branch 'master' into feature/memory-recycling
3a0ecca3f Rename macros in generated parsers
e50f8b233 Merge pull request universal-ctags#63 from masatake/recycle-list
58ad04747 Merge pull request universal-ctags#64 from dolik-rce/benchmark-memory
e559f4c4e add memory measurement to benchmark script
f3a5c7e77 Preallocate memory objects for pcc_thunk_chunk_t, pcc_lr_head_t, and pcc_lr_answer_t
7cd6dffb7 Pass pcc_context_t instead of pcc_auxil_t in many places
710b51f7f Update the copyright years
70389ec19 Conform to the coding style
59668cf87 Divide the character_classes_0.d test into two tests
657508c52 Merge pull request universal-ctags#61 from mingodad/fix-charset-plus-minus
572951a8c Fix handling charset "[+-]"
0e3ee0c8b Update README.md
03c90e03e Fix the reopened issue universal-ctags#56
c2f499eb2 Ensures that all values of unevaluated rules are zero-cleared
f376e099d Support exact column numbers in the PEG source even if UTF-8 multibyte characters are contained
9dfcd9153 Modify a dump function
e27c05d91 Add codes for safety
da750a9a7 Refine code block output
afd64bc61 Update README.md
cea483b89 Support insertion of #line directives in the generated code (universal-ctags#55)
62130fe96 Add a feature to count text lines output to a stream
4982d72ea Introduce a structure to hold code block data
86874c214 Fix incorrect update of the parsing position
41be80f02 Introduce a structure to hold options
5b9f23d18 Rename functions
803317bc4 Update README.md

git-subtree-dir: misc/packcc
git-subtree-split: 739b3ee9edd62b8623d30272069e6fd446270591
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants