Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegExMatch support for multiline pattern + unit test #38

Merged
merged 3 commits into from
Sep 4, 2017

Conversation

Aluriak
Copy link
Contributor

@Aluriak Aluriak commented Sep 1, 2017

This PR aims to provide multiline (comment) support.

This is performed by providing another switch to RegExMatch constructor, multiline.
Purpose of multiline is to abstract the need of multiline regexes, made possible by the re.DOTALL flag.

Please consider adding a more generalist flag parameter waiting for arbitrary re flags,
allowing further customization of the RegExMatch object.

@igordejanovic
Copy link
Member

Thanks for the contribution. I agree that it would be better to have flags param that if not None is used in regex and if None (default) other params are used as before. Backward compatibility would be preserved and a user could use all the flags from the re module.

Do you agree? Could you update PR?

@Aluriak
Copy link
Contributor Author

Aluriak commented Sep 2, 2017

Done. (i named the parameter re_flags, which i found more accurate and understandable)

Note that the condition: param flags is considered if there is a regex and no other regex flag is implicitely provided, should IMHO be param flags is considered if no other regex flag is implicitely provided.
This PR implements the first, but it seems to me to be useless.

What about decide of re.compile flags parameter in __init__ instead of compile ?
I kept the current behavior, but it feels weird.

Copy link
Member

@igordejanovic igordejanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to inline comments, compile is needed as separate method because of the way the compiler is built from the Python code style grammars. RegExMatch is constructed but is changed afterwards to account for the globaly set parser flags (currenty ignore_case but might be others in the future).
Other feature that is planed and partly implemented in textX is to support per-rule flags, i.e. to define case sensitivity or some other flag in the context of a single grammar rule. This will require recompiling regexes or making multiple version on the fly.


'''
def __init__(self, to_match, rule_name='', root=False, ignore_case=None,
str_repr=None):
multiline=False, str_repr=None, re_flags=re.MULTILINE):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiline param should by default be None to test if the user set it explicitly.

super(RegExMatch, self).__init__(rule_name, root)
self.to_match_regex = to_match
self.ignore_case = ignore_case
self.multiline = multiline
implicit_flag = any((self.ignore_case, self.multiline, to_match is None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit None test should be made as the user might set some of the flags to False.
to_match is never None!
I think something like this:

implicit_flag = self.ignore_case is not None or self.multiline is not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems more logical: implicit_flag = (self.ignore_case or 0) | (self.multiline or 0).

Copy link
Contributor Author

@Aluriak Aluriak Sep 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_mach is never None !

So what's the point of:

have flags param that if not None is used in regex

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems more logical: implicit_flag = (self.ignore_case or 0) | (self.multiline or 0)

These expressions are not equivalent. I though that implicit_flag should be True only if some of the param/flags is given (e.g. not None). Your expression will be True (1 actually) if some of the param is True, which is not what we are testing, right?

So what's the point of:

have flags param that if not None is used in regex

Sorry, maybe I wasn't clear enough. In the initial post I suggested to use flags param only if it is given, e.g. if its value (the value of flags param not to_match) is not None which is default. In that case additional paras (ignore_case and multiline) would be ignored.

Another solution would be to merge all params by applying ignore_case and multiline over given re_flags (which is by default re.MULTILINE to keep current behaviour) anyway.

@igordejanovic
Copy link
Member

In addition, I think that re_flags could safely be merged with the ignore_case and multiline when calculating flags to be used. The initial value of flags could be re.MULTILINE if re_flags is not provided. That way all given parameters would be honoured and the logic is more straightforward.

@Aluriak
Copy link
Contributor Author

Aluriak commented Sep 4, 2017

What would happen if client sets re_flags to re.IGNORE_CASE and ignore_case to False ?

@igordejanovic
Copy link
Member

What would happen if client sets re_flags to re.IGNORE_CASE and ignore_case to False ?

This is a conflicting situation. Depending on the implementation one would be used or the other. It should be stated in the docs which way is the "stronger one".

I guess that it is simpler to implement re_flags to be "stronger"?

@igordejanovic
Copy link
Member

Your last change seems perfectly fine. So, explicit flags will be "stronger". I'll just do some more testing before merge. Thanks!

@Aluriak
Copy link
Contributor Author

Aluriak commented Sep 4, 2017

Well, it seems to be that implicit flags must be stronger, since, as you say, another object can modify RexExMatch instances attributes in order to stick with a global state.

If explicit flags have priority over globally set values, then the global context is no more prioritized.

Could be a way to short-circuit the global state, but for the sake of completness, a unwanted_re_flag should also be provided.

I have the impression that we do not use the same terms, or that one of us mistakes something.

To me, implicit flags are those given by multiline and ignore_case flags, since client do not handle re flags by itself.
The explicit flags are set using re_flags parameter, and should not have priority over implicit ones, precisely because of the global context that modify the object between its construction and compile call.

@igordejanovic
Copy link
Member

Yeah, unwanted_re_flag might be useful but I think it got quite flexible with this PR change.

@Aluriak
Copy link
Contributor Author

Aluriak commented Sep 4, 2017

Do you know when you will upload the next version of arpeggio providing this new feature ?

@igordejanovic
Copy link
Member

No. Usually I do new release when several features are added and docs are updated to reflect new changes. In the mean time you can always pip install from the master branch or use direct github link in your project requirements.txt.

@igordejanovic
Copy link
Member

But if you depend on this change in a package you want to publish on PyPI drop me a line so I will see to speed things up ;)

@Aluriak
Copy link
Contributor Author

Aluriak commented Sep 4, 2017

Well, because of multiline support i had to use another parser library. At usage, i prefer arpeggio, but i would rather not maintain two concurrent implementations for too long.

Do as you wish, i will adapt accordingly.

@Aluriak Aluriak deleted the impl/multiline_regexes branch September 6, 2017 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants