Skip to content

Latest commit

 

History

History
127 lines (85 loc) · 4.84 KB

source-code-indexing-tools-comparison.md

File metadata and controls

127 lines (85 loc) · 4.84 KB

Source Code Indexing Tools Comparison

Comparison of tagging systems

By "tagging systems", we mean tools like Ctags, etags and GNU Global (gtags). They analyze your code base, and generate a index file that contains information about definitions/references in your project.

Usually, simple text matching is used when querying such an index file. For example, when finding definitions of "foo", we just find all definitions in the index file with the name "foo" (a tags file permits more advanced filtering, see below).

So, tagging systems are not as accurate as language servers. Think of them as "fuzzy finder"s. But that can become an advantage. It may work better for dynamic languages, and it works well for multi-language projects, as most of the tagging systems support multiple languages.

The unique advantage of Ctags

Ctags has a huge advantage over other existing tagging systems: it records much more abundant information in the tags file.

When using the tags backend, you'll notice it shows you the kind of a symbol (whether it's a function, method, macro, etc.). For supported languages, it also shows the type (whether it is/returns an integer, float, string, etc.). Citre can do this because these information are recorded in the tags file. TAGS file generated by etags (or $ ctags -e), for example, doesn't record these informations.

Ctags may record much more information than you thought. For example:

  • scope: In which struct is a member defined? In which class is a method defined? etc.
  • inherits: Which classes does a class inherits from?
  • extras: Does this tag has a file scope? Is it a reference tag? etc.

Based on these info, Citre could improve the filtering/sorting on the tags. For example:

  • In C language, if the current symbol is after a dot, Citre guesses you want a struct member, and put tags with "member" kind above others.

    You may still want other kinds, like "macro" or user-defined kinds, so Citre doesn't just throw them away, but put them below the member tags.

  • Citre throws away tags with file scope (i.e., can't be used outside of the file where it's defined), and is not defined in the current file. This trick alone could reduce a lot of useless results.

The essence is "an informative tags file helps us understand the code better". Imagine we want:

  • A function (the kind is "function").
  • In a library called scipy (there should be "scipy" in its path).
  • I don't know its name, but I want to do approximation, so it should contain "approx" (the name contains "approx").

These conditions can be easily translated into a readtags command, to find the tags that satisfy them. In the futuer, Citre may even offer you an interactive tool to do this, which would make digging in the code much easier (see the discussion here). This is simply impossible with other tagging systems.

Ctags vs. etags

Etags is the tagging system bundled with Emacs. Etags uses TAGS format, Ctags uses tags format by default.

Advantages of Ctags:

  • Records abundant information.

    TAGS format only records name, path, line number and the line content.

  • Support much more languages

    Etags supports less languages, but you can let Ctags generate TAGS format, by $ ctags -e.

  • Ctags creates tags that are alphabetically sorted, so readtags can perform binary search on it, making Citre fast even in huge projects.

    TAGS format sorts by filename and line number, so binary search is impossible.

  • Citre doesn't need to read the whole tags file into Emacs.

    Emacs built-in etags.el read the whole TAGS file into Emacs, which takes memory space, and is undesirable for huge projects.

Advantages of etags (or TAGS format):

  • Supported by Emacs natively, and many third-party packages.

Ctags vs. GNU Global (gtags)

Advantages of Ctags:

  • Records abundant information.

    Gtags only records name, path, line number and the line content.

  • Support more languages (but gtags can use Ctags as a plugin parser).

Advantages of gtags:

  • Can locate references.

    Universal Ctags has reference tags, but by now they are mostly used to record module references. References of functions, methods, etc., are not supported.

  • Support incremental updating of tag files (See this issue to know the progress of Universal Ctags on incremental updating).

Tagging systems vs. intelligent tools

By "intelligent tools", we mean language servers, rtags, etc. They could find the exact definition of symbol at point, which is a huge advantage over tagging systems.

Intelligent tools also come with their disadvantages:

  • You need to install a language server for each language you use.
  • For dynamic languages, sometimes it may fail to find a definition/reference.
  • Some language servers can be slow and heavy on CPU/memory.