Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce sub parser #64

Closed
masatake opened this issue Aug 7, 2014 · 13 comments
Closed

introduce sub parser #64

masatake opened this issue Aug 7, 2014 · 13 comments
Milestone

Comments

@masatake
Copy link
Member

masatake commented Aug 7, 2014

Consider running ctags against a source tree that uses autotools as its build system.

Which parser should be used for foo.h.in?

Of course C (or objc) parser should be used first of all.
Is it enough?
How about @Package@ or something in the file?
@foo@ is place holder used in autotools. I hope these are also tagged by ctags.
I would like to run the second priority parser(sub parser)something "template-of-configure-output".
The sub parser is activated for files globbed "*.in".

Suprisingly ctags has enough ability to run ,multi-path parser. We can use this function to run sub-parser.

@fishman
Copy link
Contributor

fishman commented Aug 7, 2014

so kind of like vim and emacs modelines?

@masatake
Copy link
Member Author

Very different.
The modlines are passed to choose a proper language parser.
The subparser is for supporting more than two syntax.

Example:

int
main(void)
{
return @Number@;
}

c parser can pick "main" and store it to tags file.
"template-of-configure-output" can pick NUMBER. and store it to tags file.

@masatake masatake added this to the Feature plan milestone Aug 12, 2014
@masatake
Copy link
Member Author

This is related to #80.
If a parser can have chances to parse the same file twice or more, we may call this feature multipath.
Exuberant-ctags has this feature. e.g. c.c.
(In my understanding #80 deals with more wider topic; not single file.)
If diffrent parsers can have changes to parse the same file twice or more, I called this fture sub parser, or multparser.

There are interesting lexers in pytments:
$ pygmentize -L
...

  • css+django, css+jinja:
    CSS+Django/Jinja
  • css+erb, css+ruby:
    CSS+Ruby
    ...

I guess more that two lexers can be combined.

@masatake
Copy link
Member Author

Now I understand recording @FOO@ is not so good idea. It is not any kind of definition. It is something referenced.

@b4n
Copy link
Member

b4n commented Jun 20, 2015

Now I understand recording @FOO@ is not so good idea. It is not any kind of definition. It is something referenced.

Yes, it's supposed to be replaced with the value of FOO (whatever that can be).

@masatake
Copy link
Member Author

php+javascript, we have both. But we don't have a combine them.

I can generalize this issue.

--- a/parsers/lua.c
+++ b/parsers/lua.c
@@ -34,14 +34,6 @@ static kindOption LuaKinds [] = {
 *   FUNCTION DEFINITIONS
 */
  1. Diff parser can capture a/parsers/lua.c.
  2. From the file name, ctags can guess lua.c is written in C language.
  3. store static kindOption LuaKinds [] = { to somewhere memory stream.
  4. run c parser as a subparser on the memory stream.
    So we can capture LuaKinds. lua.c is the parent of LuaKinds. LuaKinds is the parent of
    `@@ -34,14 +34,6@@.

We don't have a memory stream. We need it.
I have to inspect mio.

@masatake masatake mentioned this issue Nov 11, 2015
7 tasks
@cweagans
Copy link
Member

Here's another example:

<html>
  <head>
    <title>Test page</title>
    <?php
      function my_function() {
        return 'some string';
      }
    ?>
  </head>
  <body>
    <?php echo my_function(); ?>
  </body>
</html>

This is really not a good practice for PHP anymore, but it works, so maybe ctags should support it?

Here's another common scenario:

<html>
  <head>
    <title>Test page</title>
    <script type="text/javascript">
      function myFunction() {
        alert('myFunction() has been called');
      }
    </script>
  </head>
  <body>
    <p>some text</p>
    <script type="text/javascript">myFunction();</script>
  </body>
</html>

This second example is much more common.

I'm not sure that it's absolutely necessary for parsers to handle things differently in either of these scenarios, though. I should just be able to run the PHP parser on the first example and get back what I expect, and the Javascript parser should tolerate the second example. If I have both PHP and Javascript in the same file like that, then I should just be able to run both parsers over the same file and get back something reasonable.

@masatake
Copy link
Member Author

Very attractive. This kind of example will defer the release:-P

Currently the most of all parsers keep the state of parsing as file local variables.
This will be a trouble in calling a parser from another parser.
We have to introduce parserInstace data type that keeps the state of parsing.

The state of input stream can be stacked...
...

BTW, when submitting an example, I recommend you to submit an expected tags, too if possible.
So the discussion becomes more fruitful and attractive.

@cweagans
Copy link
Member

@masatake I'm saying that we shouldn't need to do that. One parser doesn't need to call another parser. We just need to be able to say "Run this set of parsers on this particular file type.", probably through a CLI switch (instead of doing it be default).

I might be a bad developer and have test.html with HTML, CSS, Javascript, and PHP all in the same file, so separately, I'd want all four of those parsers to run on test.html. A parser shouldn't call any of the other parsers. The HTML parser runs on the file, extracts whatever information it can get, and then gives control back. Then, the CSS parser runs on the same file, extracts info, and gives control back. Repeat for all of the parsers that need to be run.

This is definitely not needed for 1.0.0, I think.

@masatake
Copy link
Member Author

ctags is used behind opengrok and debian code search. And this is the area I'm interested in.
In this use case, just default command line arguments are used. There is no chance to tune options for input. Many things must be done automatically.

@masatake
Copy link
Member Author

masatake commented Apr 8, 2016

Now we have mio.
When mio_new_mio is implemented, we can handle a language in a language cleanly.

  • php in html
  • js in html
  • js in php?
  • css in html
  • asm in C(known as inline)
  • here document in shell script
  • C in yacc/lex
  • javadoc in java
  • code-block in rst

If you are interested in working on these area, tell me. I will work on mio_in_mio.

@masatake
Copy link
Member Author

masatake commented Jun 3, 2016

See #875. Running a specified parser in an area of input can be scheduled. For the feature I will introduce new API set called "promise".

A parser make a promise during paring. For making a promise, the parser specifies the start of the area, the end of the area, and the name of parser which parses the area.

After the current parser finishes parsing the input, ctags checks the backlog of promises made by the last parser. If there is more than one promise, ctags forces the promises: making a sub stream from the start and end information of the promise and invoking the parser specified with the name in the promise on the sub stream.

@masatake
Copy link
Member Author

masatake commented Dec 3, 2016

There is no way to pass data from parent to its children but promise works well.

@masatake masatake closed this as completed Dec 3, 2016
masatake pushed a commit to masatake/ctags that referenced this issue Mar 12, 2020
masatake added a commit to masatake/ctags that referenced this issue May 27, 2022
739b3ee9e Fix argument type mismatch
150372de0 Merge branch 'feature/memory-recycling'
142660fb1 Reduce memory allocation frequency
b3f745496 Add a typecast and const modifiers
176f5c0f8 Simplify the code using pcc_context_t typedef
4dbcaae48 Rename identifiers related to memory recycling
08a6f0c56 Merge branch 'master' into feature/memory-recycling
3a0ecca3f Rename macros in generated parsers
e50f8b233 Merge pull request universal-ctags#63 from masatake/recycle-list
58ad04747 Merge pull request universal-ctags#64 from dolik-rce/benchmark-memory
e559f4c4e add memory measurement to benchmark script
f3a5c7e77 Preallocate memory objects for pcc_thunk_chunk_t, pcc_lr_head_t, and pcc_lr_answer_t
7cd6dffb7 Pass pcc_context_t instead of pcc_auxil_t in many places
710b51f7f Update the copyright years
70389ec19 Conform to the coding style
59668cf87 Divide the character_classes_0.d test into two tests
657508c52 Merge pull request universal-ctags#61 from mingodad/fix-charset-plus-minus
572951a8c Fix handling charset "[+-]"
0e3ee0c8b Update README.md
03c90e03e Fix the reopened issue universal-ctags#56
c2f499eb2 Ensures that all values of unevaluated rules are zero-cleared
f376e099d Support exact column numbers in the PEG source even if UTF-8 multibyte characters are contained
9dfcd9153 Modify a dump function
e27c05d91 Add codes for safety
da750a9a7 Refine code block output
afd64bc61 Update README.md
cea483b89 Support insertion of #line directives in the generated code (universal-ctags#55)
62130fe96 Add a feature to count text lines output to a stream
4982d72ea Introduce a structure to hold code block data
86874c214 Fix incorrect update of the parsing position
41be80f02 Introduce a structure to hold options
5b9f23d18 Rename functions
803317bc4 Update README.md

git-subtree-dir: misc/packcc
git-subtree-split: 739b3ee9edd62b8623d30272069e6fd446270591
masatake added a commit to masatake/ctags that referenced this issue Jun 1, 2022
739b3ee9e Fix argument type mismatch
150372de0 Merge branch 'feature/memory-recycling'
142660fb1 Reduce memory allocation frequency
b3f745496 Add a typecast and const modifiers
176f5c0f8 Simplify the code using pcc_context_t typedef
4dbcaae48 Rename identifiers related to memory recycling
08a6f0c56 Merge branch 'master' into feature/memory-recycling
3a0ecca3f Rename macros in generated parsers
e50f8b233 Merge pull request universal-ctags#63 from masatake/recycle-list
58ad04747 Merge pull request universal-ctags#64 from dolik-rce/benchmark-memory
e559f4c4e add memory measurement to benchmark script
f3a5c7e77 Preallocate memory objects for pcc_thunk_chunk_t, pcc_lr_head_t, and pcc_lr_answer_t
7cd6dffb7 Pass pcc_context_t instead of pcc_auxil_t in many places
710b51f7f Update the copyright years
70389ec19 Conform to the coding style
59668cf87 Divide the character_classes_0.d test into two tests
657508c52 Merge pull request universal-ctags#61 from mingodad/fix-charset-plus-minus
572951a8c Fix handling charset "[+-]"
0e3ee0c8b Update README.md
03c90e03e Fix the reopened issue universal-ctags#56
c2f499eb2 Ensures that all values of unevaluated rules are zero-cleared
f376e099d Support exact column numbers in the PEG source even if UTF-8 multibyte characters are contained
9dfcd9153 Modify a dump function
e27c05d91 Add codes for safety
da750a9a7 Refine code block output
afd64bc61 Update README.md
cea483b89 Support insertion of #line directives in the generated code (universal-ctags#55)
62130fe96 Add a feature to count text lines output to a stream
4982d72ea Introduce a structure to hold code block data
86874c214 Fix incorrect update of the parsing position
41be80f02 Introduce a structure to hold options
5b9f23d18 Rename functions
803317bc4 Update README.md

git-subtree-dir: misc/packcc
git-subtree-split: 739b3ee9edd62b8623d30272069e6fd446270591
masatake added a commit to masatake/ctags that referenced this issue Jun 1, 2022
739b3ee9e Fix argument type mismatch
150372de0 Merge branch 'feature/memory-recycling'
142660fb1 Reduce memory allocation frequency
b3f745496 Add a typecast and const modifiers
176f5c0f8 Simplify the code using pcc_context_t typedef
4dbcaae48 Rename identifiers related to memory recycling
08a6f0c56 Merge branch 'master' into feature/memory-recycling
3a0ecca3f Rename macros in generated parsers
e50f8b233 Merge pull request universal-ctags#63 from masatake/recycle-list
58ad04747 Merge pull request universal-ctags#64 from dolik-rce/benchmark-memory
e559f4c4e add memory measurement to benchmark script
f3a5c7e77 Preallocate memory objects for pcc_thunk_chunk_t, pcc_lr_head_t, and pcc_lr_answer_t
7cd6dffb7 Pass pcc_context_t instead of pcc_auxil_t in many places
710b51f7f Update the copyright years
70389ec19 Conform to the coding style
59668cf87 Divide the character_classes_0.d test into two tests
657508c52 Merge pull request universal-ctags#61 from mingodad/fix-charset-plus-minus
572951a8c Fix handling charset "[+-]"
0e3ee0c8b Update README.md
03c90e03e Fix the reopened issue universal-ctags#56
c2f499eb2 Ensures that all values of unevaluated rules are zero-cleared
f376e099d Support exact column numbers in the PEG source even if UTF-8 multibyte characters are contained
9dfcd9153 Modify a dump function
e27c05d91 Add codes for safety
da750a9a7 Refine code block output
afd64bc61 Update README.md
cea483b89 Support insertion of #line directives in the generated code (universal-ctags#55)
62130fe96 Add a feature to count text lines output to a stream
4982d72ea Introduce a structure to hold code block data
86874c214 Fix incorrect update of the parsing position
41be80f02 Introduce a structure to hold options
5b9f23d18 Rename functions
803317bc4 Update README.md

git-subtree-dir: misc/packcc
git-subtree-split: 739b3ee9edd62b8623d30272069e6fd446270591
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants