Allow mio_new_mio() to have size 0 #1951

techee · 2018-12-15T16:30:33Z

Consider for instance the following HTML code where a subparser is called
on the contents of the <script> tags.

<script language="Javascript" src="./test.js"></script>
<script language="JavaScript">
...
</script>

Since the first <script> is empty, mio_new_mio() is called with 0 size
but with the current semantics of the API it creates MIO until the end of
the file so the whole source file to EOF is parsed. While it seems
no extra tags are generated in this case, the parsing happens
unnecessarily.

Instead of using 0, make size signed and use -1 when we want to create a
MIO till the end of the source MIO.

I actually noticed this problem in Geany rather than in ctags where tags started getting duplicated because of the multiple parsing. I haven't investigated enough to learn why there aren't duplicate tags in ctags in this case but I added some traces to the parsers and the extra parsing really happens when 0 is passed as the size in io_new_mio().

Also, I think it's cleaner to allow 0-sized MIO from the API point of view,

Consider for instance the following HTML code where a subparser is called on the contents of the <script> tags. <script language="Javascript" src="./test.js"></script> <script language="JavaScript"> ... </script> Since the first <script> is empty, mio_new_mio() is called with 0 size but with the current semantics of the API it creates MIO until the end of the file so the whole source file to EOF is parsed. While it seems no extra tags are generated in this case, the parsing happens unnecessarily. Instead of using 0, make size signed and use -1 when we want to create a MIO till the end of the source MIO.

coveralls · 2018-12-15T16:37:37Z

Coverage decreased (-0.007%) to 84.882% when pulling ee887ea on techee:mio_new_mio into 64f7d61 on universal-ctags:master.

techee · 2018-12-15T16:48:19Z

Funny - notice how coverage of the javascript parser went down because of this patch. It's probably because a lot of HTML was parsed as javascript before and these code paths aren't covered now.

See universal-ctags/ctags#1951

b4n

Looks good to me, and I totally agree that there's no reason not to allow a 0-sized MIO object here; and empty stream is a perfectly valid stream.

Other changes look good, but for the comment about the loop.

b4n · 2018-12-15T17:57:32Z

parsers/yacc.c

-		last = tmp;
-	if (last)
-		endCharOffset = strlen ((const char *)last);
+		endCharOffset = strlen ((const char *)tmp);


The trickyness of that loop might warrant a comment for nobody to come by it and "improve" it like it was before.

Good point, done.

Though if anyone tries to make such an "improvement" the "bom" test should fail because of the change in MIO - but of course the tests can change in the future and the comment definitely doesn't hurt as this was totally not obvious to me when I started looking at the code.

The change in mio_new_mio() exposed a problem in yacc parser where epilogue parsing didn't work correctly and the reason it worked was only thanks to the strange mio_new_mio() semantics. First, in runYaccParser() the line shouldn't be read because this is the first (and possibly last) line of epilogue and this way it gets ignored and offsets for the promise API aren't calculated for it. Because of this change, we have to add 1 to c_start and c_source_start inside make_promise_for_epilogue() so we get real start of epilogue. Second, the loop while ((tmp = readLineFromInputFile ())) last = tmp; didn't do what the author thought it was doing. The idea behind it probably was to "remember" the last line and subsequently to compute its length. This however doesn't work this way - because readLineFromInputFile() returns a const pointer into an internal buffer inside ctags main, the last time readLineFromInputFile() is called and when it returns NULL this value is cleared so no matter what the "last" variable contained, it would always be an empty string after the loop finishes. This patch just moves the calculation of endCharOffset inside the loop where the line value is still valid. It means some unnecessary strlen() calculation of all lines in the epilogue but I don't think it has a big overall impact on the performance. The unfortunate side-effect of this patch is that if a tag appears on the last line of the file, we can't say whether the line ends with \n or not because readLineFromInputFile() silently discards them. This means that the test in which there's a tag at the last line doesn't contain $ in the pattern. This problem didn't appear before the change in MIO because thanks to the various bugs in the yacc parser, the start line was equal to the end line and both offsets were 0 so the size for mio_new_mio() was 0 and it was created till the end of the file.

masatake · 2018-12-15T19:56:20Z

I will review this to understand how I did wrong.

masatake · 2018-12-16T18:41:42Z

LGTM. Sorry for taking your time for fixing my broken code.

masatake · 2018-12-16T18:55:58Z

BTW, I would like ctags to make a tag for ./test.js. It can be captured as a reference tag.

Since I introduced reference tag, I have tried to capture language objects specified with import, include, require, source, etc. as reference tags without no serious use case.

However, now I find a good and critical use case.
New option -r, semantic-recursion. This is similar to -R. -R makes ctags parse given directory tree recursively. -r makes ctags parse import-include-require-source relation recursively.

Let show me example (foo.html):

<html>
  <body>
    <script language="Javascript" src="test.js"/>
    <script language="Javascript" src="http://example.com/bar.js"/>
    </body>
</html>

For the about input, run ctags with "-r -rpath . foo.html".
ctags parsrs foo.html. Then it parsers test.js in the current directory.
Finally ctags does download http://example.com/bar.js and parser the downloaded file.

See universal-ctags/ctags#1951

…gs (#2018) * Use latest version of htable * Use latest version of mio * Use latest version of objpool * Use latest version of ptrarray * Use latest version of vstring This also requires adding trashbox.c/h which is now used by vstring and inline macros from inline.h. * Rename fieldSpec to fieldDefinition See b56bd065123d69087acd6f202499d71a86a7ea7a upstream. * Rename kindOption to kindDefinition See e112e8ab6e0933b5bd7922e0dfb969b1f28c60fa upstream * Rename kinds field in parserDefinition to kindTable See 09ae690face8b5cde940e2d7cf40f8860381067b upstream. * Rename structure fields about field in parserDefinition See a739fa5fb790bc349a66b2bee0bf42cf289994e8 upstream. * Use kindIndex instead of kindDefinition This patch replaces kindDefinition related entries from sTagEntryInfo with kindIndex so kinds are referenced indirectly using the index. For more info please refer to commits: 16a2541c0698bd8ee03c1be8172ef3191f6e695a f92e6bf2aeb21fd6b04756487f98d0eefa16d9ce Some other changes had to be made to make the sources compile (without bringing all the diffs from upstream). At some places, which aren't used by Geany, only stub implementations have been created. In particular, the regex parser has been disabled (for now?) because its current implementation doesn't allow accessing kindDefinitions using index and allowing this would require big changes in its implementation. The affected parsers are Cobol, ActionScript and HTML. For HTML we can use the token-based parser from upstream, and we should consider whether Cobol and ActionScript are worth the effort to maintain a separate regex implementation using GRegex (IMO these languages are dead enough not to justify the extra effort). The patch also disables tests for languages using regex parsers. * Rename roleDesc to roleDefinition See 1345725842c196cc0523ff60231192bcd588961b upstream. Since we don't care about roles in Geany, we don't have to do the additional stuff the upstream patch does. * Add XTAG_ANONYMOUS used by jscript See 0e4c5d4a0461bc8d9616fe3b97d75b91d014246e upstream. * Include stdint.h in entry.h * Don't use hash value as an Anonymous field identifier Instead of something like "Anonymous0ab283cd9402" use sequential integer values like "Anonymous1". * Call anonReset in main part See 3c91b1ea509df238feb86c9cbd552b621e462653 upstream. * Use upstream javascript parser * Use upstream css parser * Create correctly sized MIO for 0 size See universal-ctags/ctags#1951 * Always enable promise API and subparsers for Geany * Support subparsers in Geany and add HTML parser demonstrating this feature This feature requires several changes: 1. Propagating language of the tag from ctags to Geany so we know whether the tag comes from a master parser or a subparser. 2. We need to address the problem that tag types from a subparsers can clash with tag types from master parsers or other subparsers used by the master parser. For instance, HTML and both its css and javascript subparsers use tm_tag_class_t but HTML uses it for <h2> headings, and css and javascript for classes. Representing all of them using tm_tag_class_t would lead to complete mess where all of these types would for instance be listed in the same branch of the tree in the sidebar. To avoid this problem, this patch adds another mapping for subparsers where each tag type can be mapped to another tag type (which isn't used neither by master parser or other subparsers). To avoid unwanted clashes with other parsers, only tags explicitly mentioned in such mappings are added to tag manager; other subparser tags are discarded. For HTML this patch introduces mapping only for tm_tag_function_t (which in this case maps to the same type) to mimick the previous HTML parser behavior but other javascript and css tag types can be added this way in the future too. 3. Since in most of the code Geany and tag manager assume that tags from one file use the same language, subparser's tags are modified to have the same language like the master parser. 4. HTML parser itself was copied from upstream without any modifications. Tests were fixed as the parser now correctly ignores comments. * Rename truncateLine field of tagEntryInfo See 0e70b22791877322598f03ecbe3eb26a6b661001 upstream. Needed for Fortran parser. * Add dummy mbcs.h and trace.h Included by javascript parser. * Introduce an accessor to `locate' field of `Option' See fb5ef68859f71ff2949f1d9a7cab7515f523532f upstream. Needed for Fortran. * Add numarray.c/h Needed by various parsers. * Add getLanguageForFilename() and getLanguageForCommand() See 416c5e6b8807feaec318d7f8addbb4107370c187 334e072f9d6d9954ebd3eb89bbceb252c20ae9dd upstream. Needed for Sh parser. * txt2tags: Fix scope separator definition and re-enable tests * Rename rest.c to rst.c to match upstream filename * Use upstream asciidoc and rst parsers * Add asciidoc and rst unit tests * Rename conf.c to iniconf.c to match upstream filename * Add tests of conf, diff, md parsers from universal ctags * Add more ctags unit tests This patch adds unit tests for: nsis, docbook, haskell, haxe, abaqus, vala, abc. The only missing unit tests are for GLSL and Ferite parsers which however share the implementation with the C parser and should be reasonably well covered by other C-like language tests. The tests were put together from various tutorials and help of the languages in order to cover the tags these parsers generate. No guarantee they'd compile with real parsers. * Rename latex.c to tex.c to match upstream filename * Rename entry points of parsers to match upstream names * Initialize trashbox * Add newline to the end of file

techee added a commit to techee/geany that referenced this pull request Dec 15, 2018

Create correctly sized MIO for 0 size

e4a8bf3

See universal-ctags/ctags#1951

b4n approved these changes Dec 15, 2018

View reviewed changes

techee force-pushed the mio_new_mio branch from 62b9ba1 to ee887ea Compare December 15, 2018 19:17

techee added a commit to techee/geany that referenced this pull request Dec 17, 2018

Create correctly sized MIO for 0 size

621425e

See universal-ctags/ctags#1951

masatake merged commit 5728abe into universal-ctags:master Dec 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow mio_new_mio() to have size 0 #1951

Allow mio_new_mio() to have size 0 #1951

techee commented Dec 15, 2018 •

edited

coveralls commented Dec 15, 2018 •

edited

techee commented Dec 15, 2018

b4n left a comment

b4n Dec 15, 2018

techee Dec 15, 2018

masatake commented Dec 15, 2018

masatake commented Dec 16, 2018

masatake commented Dec 16, 2018

Allow mio_new_mio() to have size 0 #1951

Allow mio_new_mio() to have size 0 #1951

Conversation

techee commented Dec 15, 2018 • edited

coveralls commented Dec 15, 2018 • edited

techee commented Dec 15, 2018

b4n left a comment

Choose a reason for hiding this comment

b4n Dec 15, 2018

Choose a reason for hiding this comment

techee Dec 15, 2018

Choose a reason for hiding this comment

masatake commented Dec 15, 2018

masatake commented Dec 16, 2018

masatake commented Dec 16, 2018

techee commented Dec 15, 2018 •

edited

coveralls commented Dec 15, 2018 •

edited