Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP parser: include fully qualified class name #787

Closed
sserbin opened this issue Feb 4, 2016 · 31 comments
Closed

PHP parser: include fully qualified class name #787

sserbin opened this issue Feb 4, 2016 · 31 comments

Comments

@sserbin
Copy link

sserbin commented Feb 4, 2016

Would it be possible to include FQCN in addition to a class name (or instead) in a tags file?

Test case:

<?php
namespace foo;

class bar {
    public function __construct() {
    }
}

Current output:

__construct a.php   /^    public function __construct() {$/;"   f   class:foo::bar
bar a.php   /^class bar {$/;"   c   namespace:foo
foo a.php   /^namespace foo;$/;"    n

What I'm looking for:

__construct a.php   /^    public function __construct() {$/;"   f   class:foo::bar
bar a.php   /^class bar {$/;"   c   namespace:foo
foo\\bar    a.php   /^class bar {$/;"   c   namespace:foo
foo a.php   /^namespace foo;$/;"    n

This would allow you to search/jump to a class definition with both :ts bar and :ts foo\\bar

@masatake
Copy link
Member

masatake commented Feb 4, 2016

Thank you for reporting this. You are not only the person who wants this. See #524.
Let's discuss it.

I would like to know all combination of FQCN.

You told me the combination of two classes uses \\.
How about a name space and a class?
How about two namespaces?

Could you tell me all the combinations of kinds which can be part of a FQCN and
separators you expected. If possible, could you tell me your idea how the FQCN and scope are different?

I' have been trying to under stand the critical difference between FQCN and scope.
If it is possible, I would like to avoid adding new code only for php.
I would like to introduce language neutral code which serves parsers including php.
So I insist understanding the difference.

__construct has class:foo::bar as scope information. It looks big hint for me to implement what you want. How about adding foo::bar as a tag entry instead of foo\\bar ?

(This discussion is applicable to Ruby parser and Python parser.)

@sserbin
Copy link
Author

sserbin commented Feb 4, 2016

Forgot to mention that this is probably useful only for vim (and possibly for other editors which can only use a tag name and not the rest of the fields)

FQCN (fully qualified class name) consists of full namespace + a class name. FQCN is somewhat similar to an absolute file path on a file system. Namespaces can be nested as well. Here are some examples of fqcn's:
\foo\bar class "bar" in namespace "foo"
\foo\bar\baz\bar class "bar" within namespace "foo\bar\baz"

You told me the combination of two classes uses \\.

Namespace separator is actually one backslash (\) - not sure why universal-ctags are using two (\\).

If possible, could you tell me your idea how the FQCN and scope are different

I'd imagine a scope is an equivalent of a namespace - thus it differs from FQCN in missing a class name.

I would like to introduce language neutral code which serves parsers including php.

Isn't it ok to have a language-specific additional tag entry?

__construct has class:foo::bar as scope information. It looks big hint for me to implement what you want. How about adding foo::bar as a tag entry instead of foo\bar ?

I'm afraid that wouldn't work so well in practice.

I've also did a mistake in the original posting - the correct FQCN should start with a backlash - thus it should be \foo\bar instead of foo\bar

Thanks

@b4n
Copy link
Member

b4n commented Feb 4, 2016

Namespace separator is actually one backslash (\) - not sure why universal-ctags are using two (\\).

Backslashes (\) are escaped in the name field apparently.

Internally the U-CTags PHP parser never uses \ in the scope but always ::. This is not really correct per PHP conventions as it should probably use \ after a namespace, but it was done on purpose so we can report a qualified name (QN) in the scope tag without requiring the tools using it to be able to split it on more than one separator if they need to split it.

I see several possibilities here, but all are more or less problematic IMO:

  • Use the "correct" separator in the scope field as per PHP rules (e.g. \ after a namespace, :: otherwise). This would be cute, but would break compatibility and add a burden upon anyone wanting to split out a QN. Also requires a flexible way in core to generate FQTNs, or tools correct enough to use the scope info.
  • Keep as is, and manually generate FQTNs. This is doable, wouldn't break anything, but adds very specific code for something that probably should be generic, and not useful to everyone (well, at least not to me :]) Also, IMO any tool that can't search for QN itself using the scope fields is somehow broken IMO.
  • Add a generic enough core support for reporting scopes and all, that would basically contain an array of tag references/names with type, so it would be "easy" to generate any kind of string from it, plus possibly report it in a more precise way not requiring splitting.

In most cases, it would be probably useful for U-Ctags to be able to report the separator(s) used in QNs for a particular parser, so a tool could use that for splitting. Alternatively a more large scale and powerful solution could be a new format for reporting the scope that would allow for non ambiguous QNs, e.g. with each element prefixed with its type or something.

In practice, "all" it would require (if the scope was already using the correct separators, which is a little trickier) is doing something like that:

static void makeExtraTagEntry (tagEntryInfo *const info)
{
    if (isXtagEnabled (XTAG_QUALIFIED_TAGS))
    {
        vString *fqn = vStringNewInit ("\\");

        if (info->extensionFields.scopeName)
        {
            vStringCatS (fqn, info->extensionFields.scopeName);

            if (info->extensionFields.scopeKind == &(PhpKinds[K_NAMESPACE]))
                vStringCatS (fqn, "\\");
            else
                vStringCatS (fqn, "::");
        }
        vStringCatS (fqn, info->name);

        info->name = vStringValue (fqn);
        info->extensionFields.scopeKind = NULL;
        info->extensionFields.scopeName = NULL;
        makeTagEntry (info);
        vStringDelete (fqn);
    }
}

All of which is generic but choosing the separator and prefixing the string with it (if not all languages have the concept of a "named" root scope (empty)).

Anyway, that's a lot of blabbing, but I don't really care for this either way myself. I'd just like not to see more and more specific code when it could be generic :) Esp. this (qualified tags), that merely are just concatenating two already existing info (which in itself, IMO, makes it quite useless in itself though).

@b4n
Copy link
Member

b4n commented Feb 4, 2016

BTW, using the cork scope feature plus a table of kind->separator could probably work for any parser that can use cork (any parse that only reference existing tags in the scope, instead of mere names IIRC).

@sserbin
Copy link
Author

sserbin commented Feb 5, 2016

Keep as is, and manually generate FQTNs. This is doable, wouldn't break anything, but adds very specific code for something that probably should be generic, and not useful to everyone (well, at least not to me :]) Also, IMO any tool that can't search for QN itself using the scope fields is somehow broken IMO.

Well it seems then it would be more appropriate to just do a custom post-processing with awk/etc and add additional tag - at the end overcoming limitations of specific editor shouldn't involve that much effort.

Backslashes () are escaped in the name field apparently.

though your ctags fork - https://github.com/b4n/ctags/tree/better-php-parser - produces tags with one backslash (which is also what phpcomplete.vim relies upon)

@masatake
Copy link
Member

masatake commented Feb 5, 2016

@b4n, thank you for great comments

I have been a developer of u-ctags. However, I have not been its user. I have used TAGS output of u-ctags on emacs. As such background, I had the same option of @b4n wrote

 Esp. this (qualified tags), that merely are just concatenating two already existing info (which in itself, IMO, makes it quite useless in itself though).

However, these days I have to read source code written in Java. Many methods are massively overridden and overloaded.
Now I understand why people want QN(or FQN?). I also want it, too ASAP.

About scope: field representation, the array approach is ideal. However, I cannot implement it now.

First I would like to "correct" approach. Then I would like to add code for keeping compatibility.
Finally making the "compatible" mode as default behavior. An option will be prepared for choosing the "correct". I have been called the mechanism for controlling the behaviour for compatibility and specified application "flavour".

Here is my offering.

  • providing a layer for declaring per-language/per-kind-combination separators (implemented)
    The default separator is ..

    masatake@519bc47
    masatake@5ea46ce
    These are not specialized to cork.

  • providing new field "qn:" for "fqn:". Good (in @b4n's criteria:-) tool can ignore tags marked :qn.
    u-ctags has a layer for defining new extra fields.

  • providing --list-separators=<LANG>.
    Now implemented yet.

  • provide new pseudo tag: !TAGS_LANG_SEPARTOR! ...
    The good tool reading tags file can know the separators used in the file from this pseudo tags.
    u-ctags has a layer for defining new pseudo tags.

After implementing/merging above items, makeExtraTagEntry can be merged.

The finally I would like to clear my question: relation ship between scope and qn.
Basically they are the same. May be qn is subset of scope. e.g. consider a local variable V
in a function F. The scope field, function:F will be attached to the entry of V. However,
F.V is not recorded to tags file even when --extra=+q is given. Why? I guess users don't
want it.

fq is alike syntax sugar. The real tool can work without it. However, carefully generated fq tags entry are useful.

masatake added a commit to masatake/ctags that referenced this issue Feb 5, 2016
Originally written by @b4n at universal-ctags#787.

(universal-ctags#787 (comment))

Changes:

	integrate this with scopeSeparatorFor(),
	temporary remove "root separator",
	don't clear the scope information of a qualified tag, and
	reuse vString object
@masatake
Copy link
Member

masatake commented Feb 5, 2016

I merged @b4n's makeExtraTagEntry to my branch.
Now python parser can emit FQ tags. See #788 .

For php, following code may be needed.

diff --git a/parsers/php.c b/parsers/php.c
index 8740020..a1ecd26 100644
--- a/parsers/php.c
+++ b/parsers/php.c
@@ -113,8 +113,15 @@ typedef enum {
        COUNT_KIND
 } phpKind;

+static scopeSeparator PhpClassSeparators [] = {
+       { .parentLetter = 'c', .separator = "\\" },
+       { .parentLetter = 'n', .separator = "\\" },
+};
+
 static kindOption PhpKinds[COUNT_KIND] = {
-       { TRUE, 'c', "class",           "classes" },
+       { TRUE, 'c', "class",           "classes",
+         .separators = (PhpClassSeparators),
+         .separatorCount = ARRAY_SIZE (PhpClassSeparators) },
        { TRUE, 'd', "define",          "constant definitions" },
        { TRUE, 'f', "function",        "functions" },
        { TRUE, 'i', "interface",       "interfaces" },
@@ -315,6 +322,7 @@ static void makeSimplePhpTag (const tokenInfo *const token, const phpKind kind,

                initPhpEntry (&e, token, kind, access);
                makeTagEntry (&e);
                makeQualifiedTagEntry (&e);
        }
 }

@masatake
Copy link
Member

I need a bit more complex example. Could you show me more nested input and its expected tags.
class can be nested?

@b4n
Copy link
Member

b4n commented Feb 11, 2016

@masatake It's relatively simple actually: always use :: as the separatror but after a namespace, where you use \; nothing else is needed. sep = (prev->type == NAMESPACE) ? "\\" : "::"

In practice, nothing can be below a namespace, so you won't ever see something like \Ns1\Class1::Ns2::func1. But you don't have to know that, the expected thing would still be to put a \ after a namespace and a :: otherwise.

The only "subtlety" is that PHP considers there is a namespace with an empty name containing everything else. So, in practice just prepend \ to everything and it's fine: it's an empty string (the root namespace) followed by \.

\$global
\function
\Class
\Class::function
\Class::$property
\Namespace
\Namespace\Class
\Namespace\function
\Namespace\$global
\Ns1\Ns2\Class::function
\Ns1\Ns2\Class::function::$local

and so on.

class can be nested?

Not really, but again, that doesn't really matter either :)

@masatake
Copy link
Member

@b4n, thank you. my scopeSeparatorFor cannot put the root separator(?), but other than "subtlety",
my local code works fine.

The separator table definition:

static scopeSeparator PhpGenericSeparators [] = {
    { 'n'          , "\\" },
    { KIND_WILDCARD, "::" },

};

What I got:

__construct foo.php /^    public function __construct() {$/;"   f   class:foo\\bar
bar foo.php /^class bar {$/;"   c   namespace:foo
foo foo.php /^namespace foo;$/;"    n
foo\\bar    foo.php /^class bar {$/;"   c   namespace:foo
foo\\bar::__construct   foo.php /^    public function __construct() {$/;"   f   class:foo\\bar

TODO:

  • new pseudo tag: !_TAGS_SEPARATORS!Php ...
  • new field: fqtag: or something
  • root separator
  • --list-separators=LANG

@masatake
Copy link
Member

I'm not sure --list-separators is needed or not.

@masatake
Copy link
Member

(I guess the root separator should be put in php.c side.)

@masatake
Copy link
Member

Adding the root separator to full qualified tags is o.k.::

 \\foo\\bar input.php   /^class bar {$/;"   c   namespace:\\foo fullqualified:

Should we add the root separator to the scope informatoin, too?

bar input.php   /^class bar {$/;"   c   namespace:\\foo

or

bar input.php   /^class bar {$/;"   c   namespace:foo

Which is better? @b4n, do you have any idea?

As we wrote full qualified tags are just sugar tags in meaning of syntax sugar in programming languages; serious higher level tool developers can be alive without it.
(I don't say non-serious higher level tool developers are not important.)

However, scope information is essential. Ideally it should be well formalized.
Via pseudo tags, ctags can tell what kinds of seprators are used in tags output to highter level tools.
So higher level tools can dissolve the scope information. (Of course providing an array is much better.)
I wonder how should I handle/represent the root separator.

In other word, I wonder higher tool developers want the root separator in scope information.
I guess, the answer is no; it will just add one meaningless step to the process of parsing tags file.

I can add the information about root separator to tags file via pseudo tag. However, I am not sure
the root separator in scope information is really needed or not.

Comments are welcome.

@masatake
Copy link
Member

Experimental pseudo tag for separators is added:

!_TAG_FILE_FORMAT       2       /extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED       1       /0=unsorted, 1=sorted, 2=foldcase/
!_TAG_KIND_SEPARATOR!PHP!c      *       /::/
!_TAG_KIND_SEPARATOR!PHP!c      n       /\\/
!_TAG_KIND_SEPARATOR!PHP!d      *       /::/
!_TAG_KIND_SEPARATOR!PHP!d      n       /\\/
!_TAG_KIND_SEPARATOR!PHP!f      *       /::/
!_TAG_KIND_SEPARATOR!PHP!f      n       /\\/
!_TAG_KIND_SEPARATOR!PHP!i      *       /::/
!_TAG_KIND_SEPARATOR!PHP!i      n       /\\/
!_TAG_KIND_SEPARATOR!PHP!l      *       /::/
!_TAG_KIND_SEPARATOR!PHP!l      n       /\\/
!_TAG_KIND_SEPARATOR!PHP!n      *       /::/
!_TAG_KIND_SEPARATOR!PHP!n      n       /\\/
!_TAG_KIND_SEPARATOR!PHP!t      *       /::/
!_TAG_KIND_SEPARATOR!PHP!t      n       /\\/
!_TAG_KIND_SEPARATOR!PHP!v      *       /::/
!_TAG_KIND_SEPARATOR!PHP!v      n       /\\/
!_TAG_PROGRAM_AUTHOR    Universal Ctags Team    //
!_TAG_PROGRAM_NAME      Universal Ctags /Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL       https://ctags.io/       /official site/
!_TAG_PROGRAM_VERSION   0.0.0   /4e1443a/
A       Units/parser-php.r/php-php5_5_class_kw.d/input.php      /^class A {$/;" c
B       Units/parser-php.r/php-php5_5_class_kw.d/input.php      /^class B {$/;" c
C       Units/parser-php.r/php-php5_5_class_kw.d/input.php      /^class C {$/;" c
__construct     Units/parser-php.r/php-php5_5_class_kw.d/input.php      /^  public function __construct () {$/;"        f       class:\\B

I will not include code implementing fq or fullQufalified extra tags in my next pull request. The pull request will become too large if I include it.

masatake added a commit to masatake/ctags that referenced this issue Feb 24, 2016
…scope info

For closing universal-ctags#787.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue Feb 24, 2016
For closing universal-ctags#787.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

@sserbin, could you try #806?

@sserbin
Copy link
Author

sserbin commented Feb 24, 2016

Tried with #806 although I'm not sure what is the change I should be looking at? The only thing changed is different delimiter (\ vs ::) in "class" attribute for a tag entry.

Also, is it strictly neccessarily to escape backslash in tag name (for namespaces) - https://github.com/b4n/ctags/tree/better-php-parser ctags don't have them escaped:
b4n ctags:
foo\baz\bar b.php /^namespace foo\\baz\\bar;$/;" n
universal ctags:
foo\\baz\\bar b.php /^namespace foo\\baz\\bar;$/;" n

@masatake
Copy link
Member

\\ cannot be changed.
Backslashes () are escaped in the name field apparently. This is needed to handle \t in name.
My understanding is that @b4n's better-php-parser branch cannot handle \t in a name well.
@b4n, am I correct?

How about other things ?

The only thing changed is different delimiter (\ vs ::) in "class" attribute for a tag entry.

This is what I have done in the pull request. You may think the result is very small but I took much time to introduce the change consistent way. How about function under a namespace?

@masatake
Copy link
Member

@sserbin, I took some mistakes.
What I would like you to see 5d65833 . (I have rebuilt my commits.)

@sserbin
Copy link
Author

sserbin commented Feb 24, 2016

You may think the result is very small but I took much time to introduce the change consistent way

That's totally understandable and appreciated, I just wanted to confirm I didn't miss any other changes.

How about function under a namespace?

Yep, it has a valid "namespace" attribute as well.

My understanding is that @b4n's better-php-parser branch cannot handle \t in a name well.

Speaking of php, I don't think it's possible to have \t in a name anyway.

@masatake
Copy link
Member

You may think the result is very small but I took much time to introduce the change consistent way

That's totally understandable and appreciated, I just wanted to confirm I didn't miss any other changes.

Internally, with my new code, parser developers can declare separators used in scope and full qualified tags declarative way. In addition all separators can be printed as pseudo tags. Php parser is the first parser using this new facility. As far as I can remember this new facility solve the issues of Ruby and python, too. (So I took much time for the small change.)

Yep, it has a valid "namespace" attribute as well.

Thank you for verifying.
So can I say with the pull reqeust, the half of your issue is fixed?
However, the other half, anout using double backslash in names, is not fixed yet.
RIght? About the double backslashes I would like to deal as a different issue.

@sserbin
Copy link
Author

sserbin commented Feb 24, 2016

@sserbin, I took some mistakes.
What I would like you to see 5d65833 . (I have rebuilt my commits.)

That seem to work as desired (i.e. --extra+q produces extra tag entries with fqcn). I'll have a closer look later today and will let know if I have any more questions. Thank you.

However, the other half, anout using double backslash in names, is not fixed yet. RIght?

Yes. I'm ok with having it dealt with in a separate issue.

@masatake
Copy link
Member

@sserbin, thank you.

@compleatguru
Copy link

Hi!

I have an issue with PHP using Yii2 framework.

e.g:

@masatake
Copy link
Member

@compleatguru, do you want to capture WebController of use line?

@compleatguru
Copy link

@masatake, yes. I hope that the ctag can recognize WebController is referring to yii\web\Controller.

As of now, without the namespace alias,

@masatake
Copy link
Member

I read php.c and I found the current parser doesn't handle the use statement.
Could you kindly open this as a new issue?

Here is a workaround.

[yamato@x201]~/var/ctags-github% cat input.php
cat input.php
<?php
namespace frontend\controllers;

use yii\web\Controller as WebController;
use yii\web\Foo;

class SiteController extends WebController{
}

[yamato@x201]~/var/ctags-github% ./ctags -o - --regex-php='/^use .*[ \\]([A-Z0-9a-z]+);/\1/a,alias,aliases/' input.php
<ex-php='/^use .*[ \\]([A-Z0-9a-z]+);/\1/a,alias,aliases/' input.php
Foo input.php   /^use yii\\web\\Foo;$/;"    a
SiteController  input.php   /^class SiteController extends WebController{$/;"   c   namespace:\\frontend\\controllers
WebController   input.php   /^use yii\\web\\Controller as WebController;$/;"    a
frontend\\controllers   input.php   /^namespace frontend\\controllers;$/;"  n

You can put the regex option to your .ctags. (Don't forget remove the single quotes when putting the option to .ctags.)

@compleatguru
Copy link

Thanks! will open a new issue.

@masatake
Copy link
Member

@sserbin, @b4n, I have a question about the root separator.

I read
http://php.net/manual/en/language.namespaces.importing.php
introduced by @compleatguru for understanding use in php.

Quote from the page:

Note that for namespaced names (fully qualified namespace names containing namespace separator, such as Foo\Bar as opposed to global names that do not, such as FooBar), the leading backslash is unnecessary and not recommended,

It seems that a full qualified namespace with root separator is not recommended.
I would like to remove it from php.c. Do you have any comment/objection?

@sserbin
Copy link
Author

sserbin commented Mar 16, 2016

Continuing the quote from php.net:

[...] the leading backslash is unnecessary and not recommended, as import names must be fully qualified, and are not processed relative to the current namespace.

use statements indeed treat import names as absolute so the explicit root separator is not really needed in use statements - this doesn't mean though it's not used in other places where names are treated relatively (which is basically everything but the use statements and code in global namespace, I guess).

I would like to remove it from php.c. Do you have any comment/objection?

I'm for it. (you might even notice the code sample from the first comment doesn't include a root separator). Besides, @b4n ctags branch also doesn't include it.

Edit: actually, it seems it would be even more approriate to leave out root \ - get_class and ::class constant - which supposed to give you fqcn - both return a name without root separator. My bad for misinforming you on this previously.

@b4n
Copy link
Member

b4n commented Mar 17, 2016

@masatake I have no real opinion on the matter, and nothing more to add to what @sserbin said. FQN more or less have it to make it an "absolute" FQN, and a relative FQN is kind of weird, but I don't have a real opinion on this as I'm not quite sure what the real use case is for this. And yeah, the leading \ is somewhat implicit.

Anyway, as PHP syntax is not really consistent on that matter (as you see yourself), either way goes IMO.

Besides, @b4n ctags branch also doesn't include it.

That's not really an interesting point, as mine didn't support FQN tags at all, so it's already different.

@masatake
Copy link
Member

Thank you for comments. I will remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants