New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can ECMAScript be fully compatible with PCRE? #1294

Open
reinaldorauch opened this Issue Aug 21, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@reinaldorauch

reinaldorauch commented Aug 21, 2018

I was trying to write a regex that uses a long pattern and tryied to reutilize it with the (?n) syntax but I saw that ECMAScript is not fully compatible with PCRE. There is a reason for that? If not, I would suggest that the language become fully compatible with PCRE, so we could use the full power of them.

@leobalter

This comment has been minimized.

Show comment
Hide comment
@leobalter

leobalter Aug 21, 2018

Member

I'm not sure if this is right place to discuss this, but I think it's a long step forward to make it fully compatible with PCRE. Add the fact this might be not retrocompatible and eventually break the web.

I think the best approach here would be gradually adding RegExp features approximating the compatibility, with that, we would need to describe them in spec text. Would you be willing to help on that?

Member

leobalter commented Aug 21, 2018

I'm not sure if this is right place to discuss this, but I think it's a long step forward to make it fully compatible with PCRE. Add the fact this might be not retrocompatible and eventually break the web.

I think the best approach here would be gradually adding RegExp features approximating the compatibility, with that, we would need to describe them in spec text. Would you be willing to help on that?

@reinaldorauch

This comment has been minimized.

Show comment
Hide comment
@reinaldorauch

reinaldorauch Aug 21, 2018

I fully agree with that. I think now that I've rushed when posting because after I searched the mailing list and found discussions about and learned the magnitude of the job.

Of course I can help but I'm not sure that I am knowledgeable enough to endure this task.
I just wanted to open (now reopen) the discution around that.

Thanks for the reply @leobalter

reinaldorauch commented Aug 21, 2018

I fully agree with that. I think now that I've rushed when posting because after I searched the mailing list and found discussions about and learned the magnitude of the job.

Of course I can help but I'm not sure that I am knowledgeable enough to endure this task.
I just wanted to open (now reopen) the discution around that.

Thanks for the reply @leobalter

@littledan

This comment has been minimized.

Show comment
Hide comment
@littledan

littledan Sep 17, 2018

Member

@reinaldorauch Which feature do you want to use from PCRE which is missing in JavaScript?

Member

littledan commented Sep 17, 2018

@reinaldorauch Which feature do you want to use from PCRE which is missing in JavaScript?

@reinaldorauch

This comment has been minimized.

Show comment
Hide comment
@reinaldorauch

reinaldorauch Sep 19, 2018

@littledan back reference for pattern groups, like instead of doing this:

const r = /^(?:[a-zA-Z\u00C0-\u017F]+) (?:[a-zA-Z\u00C0-\u017F]+ )*(?:[a-zA-Z\u00C0-\u017F]+)$/

I would do this:

const r = /^(?:[a-zA-Z\u00C0-\u017F]+) (?1)*(?1)$/;

or something approximate.

reinaldorauch commented Sep 19, 2018

@littledan back reference for pattern groups, like instead of doing this:

const r = /^(?:[a-zA-Z\u00C0-\u017F]+) (?:[a-zA-Z\u00C0-\u017F]+ )*(?:[a-zA-Z\u00C0-\u017F]+)$/

I would do this:

const r = /^(?:[a-zA-Z\u00C0-\u017F]+) (?1)*(?1)$/;

or something approximate.

@claudepache

This comment has been minimized.

Show comment
Hide comment
@claudepache

claudepache Sep 20, 2018

Contributor

@littledan back reference for pattern groups, like instead of doing this:

const r = /^(?:[a-zA-Z\u00C0-\u017F]+) (?:[a-zA-Z\u00C0-\u017F]+ )*(?:[a-zA-Z\u00C0-\u017F]+)$/

I would do this:

const r = /^(?:[a-zA-Z\u00C0-\u017F]+) (?1)*(?1)$/;

or something approximate.

@reinaldorauch There is an obvious bug in your example: (?1) refers to the subpattern defined by the first capturing group, and there is no capturing group in your example.


The feature of subpatterns is described here:

http://www.pcre.org/current/doc/html/pcre2pattern.html#subpatternsassubroutines

But the feature is not only about to “not repeating oneself”, for which use case there exists already another technique: let subpattern = '(?:[a-zA-Z\u00C0-\u017F]+)'; const r = new RegExp(`^${subpattern} ${subpattern}*${subpattern}$`);. It is more useful in the context of recursive patterns:

http://www.pcre.org/current/doc/html/pcre2pattern.html#recursion

Contributor

claudepache commented Sep 20, 2018

@littledan back reference for pattern groups, like instead of doing this:

const r = /^(?:[a-zA-Z\u00C0-\u017F]+) (?:[a-zA-Z\u00C0-\u017F]+ )*(?:[a-zA-Z\u00C0-\u017F]+)$/

I would do this:

const r = /^(?:[a-zA-Z\u00C0-\u017F]+) (?1)*(?1)$/;

or something approximate.

@reinaldorauch There is an obvious bug in your example: (?1) refers to the subpattern defined by the first capturing group, and there is no capturing group in your example.


The feature of subpatterns is described here:

http://www.pcre.org/current/doc/html/pcre2pattern.html#subpatternsassubroutines

But the feature is not only about to “not repeating oneself”, for which use case there exists already another technique: let subpattern = '(?:[a-zA-Z\u00C0-\u017F]+)'; const r = new RegExp(`^${subpattern} ${subpattern}*${subpattern}$`);. It is more useful in the context of recursive patterns:

http://www.pcre.org/current/doc/html/pcre2pattern.html#recursion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment