-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegExMatch support for multiline pattern + unit test #38
Conversation
Thanks for the contribution. I agree that it would be better to have Do you agree? Could you update PR? |
Done. (i named the parameter Note that the condition: param What about decide of re.compile flags parameter in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to inline comments, compile
is needed as separate method because of the way the compiler is built from the Python code style grammars. RegExMatch
is constructed but is changed afterwards to account for the globaly set parser flags (currenty ignore_case
but might be others in the future).
Other feature that is planed and partly implemented in textX is to support per-rule flags, i.e. to define case sensitivity or some other flag in the context of a single grammar rule. This will require recompiling regexes or making multiple version on the fly.
arpeggio/__init__.py
Outdated
|
||
''' | ||
def __init__(self, to_match, rule_name='', root=False, ignore_case=None, | ||
str_repr=None): | ||
multiline=False, str_repr=None, re_flags=re.MULTILINE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiline
param should by default be None
to test if the user set it explicitly.
arpeggio/__init__.py
Outdated
super(RegExMatch, self).__init__(rule_name, root) | ||
self.to_match_regex = to_match | ||
self.ignore_case = ignore_case | ||
self.multiline = multiline | ||
implicit_flag = any((self.ignore_case, self.multiline, to_match is None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit None
test should be made as the user might set some of the flags to False
.
to_match
is never None
!
I think something like this:
implicit_flag = self.ignore_case is not None or self.multiline is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems more logical: implicit_flag = (self.ignore_case or 0) | (self.multiline or 0)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_mach
is neverNone
!
So what's the point of:
have
flags
param that if notNone
is used in regex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems more logical:
implicit_flag = (self.ignore_case or 0) | (self.multiline or 0)
These expressions are not equivalent. I though that implicit_flag
should be True
only if some of the param/flags is given (e.g. not None). Your expression will be True
(1 actually) if some of the param is True
, which is not what we are testing, right?
So what's the point of:
have
flags
param that if notNone
is used in regex
Sorry, maybe I wasn't clear enough. In the initial post I suggested to use flags param only if it is given, e.g. if its value (the value of flags
param not to_match
) is not None
which is default. In that case additional paras (ignore_case
and multiline
) would be ignored.
Another solution would be to merge all params by applying ignore_case
and multiline
over given re_flags
(which is by default re.MULTILINE
to keep current behaviour) anyway.
In addition, I think that |
|
This is a conflicting situation. Depending on the implementation one would be used or the other. It should be stated in the docs which way is the "stronger one". I guess that it is simpler to implement |
a35488e
to
1babb2b
Compare
Your last change seems perfectly fine. So, explicit flags will be "stronger". I'll just do some more testing before merge. Thanks! |
Well, it seems to be that implicit flags must be stronger, since, as you say, another object can modify RexExMatch instances attributes in order to stick with a global state. If explicit flags have priority over globally set values, then the global context is no more prioritized. Could be a way to short-circuit the global state, but for the sake of completness, a I have the impression that we do not use the same terms, or that one of us mistakes something. To me, implicit flags are those given by |
Yeah, |
Do you know when you will upload the next version of arpeggio providing this new feature ? |
No. Usually I do new release when several features are added and docs are updated to reflect new changes. In the mean time you can always |
But if you depend on this change in a package you want to publish on PyPI drop me a line so I will see to speed things up ;) |
Well, because of multiline support i had to use another parser library. At usage, i prefer arpeggio, but i would rather not maintain two concurrent implementations for too long. Do as you wish, i will adapt accordingly. |
This PR aims to provide multiline (comment) support.
This is performed by providing another switch to RegExMatch constructor,
multiline
.Purpose of
multiline
is to abstract the need of multiline regexes, made possible by there.DOTALL
flag.Please consider adding a more generalist
flag
parameter waiting for arbitrary re flags,allowing further customization of the RegExMatch object.