Custom realm filters #544

Closed
wants to merge 1 commit into
from

Projects

None yet

4 participants

@blablacio
Contributor

Custom realm filters can now be managed through admin. For example, you can now say that #([a-z]+)-([0-9]+) should be expanded to https://jira.zulip.com/%(pattern)%, so when you send a message that matches the pattern like #zul-385, it gets auto linkified to https://jira.zulip.com/zul-385.

I'm still investigating a problem with some patterns that crash the Markdown parser.

@smarx
smarx commented Mar 17, 2016

Automated message from Dropbox CLA bot

@blablacio, it looks like you've already signed the Dropbox CLA. Thanks!

@timabbott
Member

Looks like this has a decent number of lint errors in the frontend code (reported in the backend tests but the easy way to run locally is tools/lint-all.

The frontend test failure "The file admin_filter_list.handlebars has no test coverage." could probably use its error message improved, but the issue is that we require every handlebars template to be tested by the Node tests; so you should add a test for in in frontend_tests/node_tests/templates.js.

(I bet the ones on the realm filters PR are similar, though I haven't looked at them yet)

@blablacio
Contributor

I believe issues with Markdown and lint errors are resolved now. I added Node test for the template too.

@timabbott timabbott commented on an outdated diff Mar 28, 2016
static/js/admin.js
@@ -110,6 +137,15 @@ exports.setup_page = function () {
error: failed_listing_streams
});
+ // Populate filters table
+ channel.get({
+ url: '/json/realm/filters',
+ timeout: 10*1000,
+ idempotent: true,
+ success: populate_filters,
+ error: failed_listing_filters
+ });
@timabbott
timabbott Mar 28, 2016 Member

Same comment here as with realm_emoji -- the data is already fetched in the / query so we don't need the extra GET call here.

@timabbott
Member

Great, thanks @blablacio ! I triggered the tests to rerun that failed due to the PhantomJS issue, but probably the best approach is to rebase on top of master when you next update this, so we get your fix to that issue using curl.

I'm interested in your thinking around changing the format to just use the first group. While the ([0-9]+) syntax is obviously briefer than the (?P<id>[0-9]+) example, I worry that some more complex regular expressions might be best written with multiple groups (only intending to use one of them for substitution later), though maybe this isn't a practical concern.

The other issue here is backwards-compatibility -- we have existing regular expressions in the old style in existing Zulip installations (added via the management command), so we need to at least support the old syntax. That said, I think for a lot of users, the simpler syntax is probably better, so maybe the right model is to support both.

Also, from a commit readability perspective, changing the valid URL patterns should be a separate commit from adding the UI. So my proposal for how to proceed is for you to rebase and split this commit into two pieces, first a commit that just adds the new UI (which we can try to clean up and merge quickly), and second the changes to what patterns are valid (which need more significant changes to instead support both pattern styles).

Does that approach seem reasonable to you @blablacio ?

I'm not sure how complex it would be to break these changes apart, so let me know if that sounds difficult.

@timabbott timabbott commented on the diff Mar 28, 2016
static/styles/zulip.css
@@ -4184,3 +4185,11 @@ li.show-more-private-messages a {
}
}
+
+.admin_filters_table {
+ margin-top: 20px;
+}
+
+#admin-filter-pattern-status, #admin-filter-format-status {
+ margin: 20px 0 0 0;
+}
@timabbott
timabbott Mar 28, 2016 Member

should fix missing newline at end of file

@timabbott timabbott commented on an outdated diff Mar 28, 2016
zerver/models.py
return filters
def all_realm_filters():
filters = defaultdict(list)
for realm_filter in RealmFilter.objects.all():
- filters[realm_filter.realm.domain].append((realm_filter.pattern, realm_filter.url_format_string))
+ filters[realm_filter.realm.domain].append((realm_filter.pattern, realm_filter.url_format_string, realm_filter.pk))
@timabbott
timabbott Mar 28, 2016 Member

I guess we're missing a space in this for loop too? Might as well fix that while you're at it.

@timabbott timabbott added this to the 2016 roadmap milestone Apr 29, 2016
@timabbott
Member
timabbott commented May 3, 2016 edited

@blablacio just FYI I noticed and fixed this minor bug in the realm emoji version: 4241e01; it's probably present in this implementation as well.

@blablacio
Contributor

Okay, I rebased and addressed all the issues mentioned above. Now only the first part of the diff is in with the old style syntax. Working on the next diff that supports both syntaxes. I also tweaked the UI, so it looks more like the custom realm emoji panel. Backend tests are failing due to linting errors, fixed in the html-lint-fix branch.

@blablacio
Contributor

As for my thinking around the format change -- I think we already discussed this a few weeks back and my main concern here was the security aspect.

Users can abuse the server with overly complex expressions, using lookaround assertions, backreferences, recursion and all the other fancy stuff regular expressions allow us to do (even though Python regular expression don't support some of these features).

What if the user is trying to use regular expression features present only in PCRE compatible flavors? Even constraining the filter length to some sensible value won't help us as we'll have to practically test the regular expression, which would lead to undesirable resource consumption.

We can't match regular expressions with regular expressions by definition. At least not the full grammar and not without recursion.

So, in the end we need to decide whether we want to allow the old syntax in the front end at all. From my perspective we should disable old-style syntax in the front end and leave it enabled through the management command only. Then we need to decide what (simpler) syntax to allow on the front end and define it, so it can be matched against a regular expression or parsed with a simple parser.

Probably we should think about the use cases and go from there. That's what I tried to do in previous code by allowing expressions like #([A-Za-z0-9]-[A-Za-z0-9]) for example as presumably that should work for allowing shorthand entry of URLs for most issue trackers. GitHub uses just numbers, JIRA uses project prefix and a number separated by a -.

@timabbott
Member

Hmm, yeah, allowing the old syntax only via management command is potentially a reasonable approach. The cases I can think of that #([A-Za-z0-9]-[A-Za-z0-9]) doesn't cover are primarily ones where one wants a different leading character or couple characters -- e.g. if you're using phabricator and RT you might have T1234, D1234, and RT1234 referencing Phabricator tasks, Phabricator diffs, and RT issues. And I'm aware of some cases where folks do something like [1234] or bug 123 (whitespace important). So maybe we want the support things of the form <leadingblock>(<matchexpn>)<trailingblock> where and can each be any fixed string (we can just re.escape it for the matching process, but maybe we'd restrict them to not have '(' and ')' characters, respectively, to define their boundaries) and is something like [A-Za-z0-9]-[A-Za-z0-9] as you suggest (and we can expand in small ways to add things like _ as a separator if it becomes important). Do you think that is a reasonable balance of power against ease of implementing security?

Playing around with the UI, I think there's a few things we'll want to do on the design side:

  • We probably need some sort of documentation or link to an explanation of what these are and how they work. It might suffice to have some prose at the start of the "realm filters" section explaining something like: "Configure special patterns that will be automatically linkified when they appear in Zulip messages and topics. For example, you could make "#124" automatically link to issue 124 in an issue tracker.".
  • We'll want something documentation-wise explaining what's allowed in patterns, and maybe give a specific example the user can copy/paste; maybe that should be a tooltip in the creation UI?
  • Probably the URL format box should be wider, similar to the realm emoji one perhaps?
  • Ideally, there'd be some way for the user to test their expression so one doesn't have to send a message to find out if one did it right, if we can figure out a way to do this without a lot of work...
@timabbott timabbott commented on the diff May 9, 2016
zproject/urls.py
# Returns a 204, used by desktop app to verify connectivity status
url(r'generate_204$', 'generate_204'),
+) + patterns('zerver.views.realm_filters',
+ url(r'^realm/filters$', 'rest_dispatch',
+ {'GET': 'list_filters',
+ 'PUT': 'create_filter'}),
+ url(r'^realm/filters/(?P<filter_id>\d+)$', 'rest_dispatch',
+ {'DELETE': 'delete_filter'}),
+ # Endpoint used by iOS devices to register their
+ # unique APNS device token
@timabbott
timabbott May 9, 2016 Member

this comment seems to have gotten separated from the relevant code...

@timabbott timabbott commented on the diff May 9, 2016
zerver/lib/actions.py
@@ -2951,12 +2951,20 @@ def notify_realm_filters(realm):
# * Named groups will be converted to numbered groups automatically
# * Inline-regex flags will be stripped, and where possible translated to RegExp-wide flags
def do_add_realm_filter(realm, pattern, url_format_string):
- RealmFilter(realm=realm, pattern=pattern,
- url_format_string=url_format_string).save()
+ realm_filter = RealmFilter(
+ realm=realm, pattern=pattern,
+ url_format_string=url_format_string)
+ realm_filter.full_clean()
+ realm_filter.save()
notify_realm_filters(realm)
@timabbott
timabbott May 9, 2016 Member

probably should rename to check_add_realm_filter since this can fail now

@blablacio
blablacio May 9, 2016 Contributor

Oops, this slipped away from a previous diff, will fix now.

@timabbott timabbott commented on the diff May 9, 2016
zerver/models.py
+ if not value:
+ raise ValidationError('Filter pattern cannot be empty')
+
+ if prefix not in settings.ALLOWED_FILTER_PREFIXES:
+ raise ValidationError(
+ 'Filter pattern cannot start with `%s`, use one of the valid prefixes: %s' %
+ (value[0], ', '.join(settings.ALLOWED_FILTER_PREFIXES)))
+
+ try:
+ re.compile(pattern)
+ except:
+ # Regex is invalid
+ raise ValidationError('Invalid filter pattern')
+
+ if not regex.match(pattern):
+ raise ValidationError('Pattern format string must be in the following format: (?P<\w+>.+)')
@timabbott
timabbott May 9, 2016 Member

Might make sense to add a small test class for this validator to keep track of important examples we want to make sure work

@blablacio
Contributor

So, basically <leadingblock> translates to settings.ALLOWED_FILTER_PREFIXES and we should add something similar to settings for <trailingblock> (probably settings.ALLOWED_FILTER_SUFFIXES)? This means that in this instance [1234], the <leadingblock> would be [ and the <trailingblock> would be ] or otherwise the whole expression is the (<matchexpn>), which is doable if we keep the number of brackets allowed to the bare minimum (like one pair).

And for the other example bug 123, we'll have no <leadingblock> or <trailingblock> and a filter pattern similar to bug \d+.

I'll think about this and come up with a solution over the next few days. I'll also add the documentation as it's really hard for a new comer to understand what this is all about. I was thinking about how filters can be tested in the UI, but still have no easy way of doing it. Probably we can have a button in the form to test this by opening a pop up with the compose box and rendering a test message? Suggestion welcome.

@timabbott
Member

Re rendering, yeah, I think maybe having. We have code on the frontend to do rendering in echo.js (search for realm_filter there and you'll find it), so we just need to make a bit of UI to provide an input/output section. I'd probably try to do it as something that expands out in the creation form, not a popup/modal, since I think a popup/modal would be pretty annoying if you need to iterate on your pattern.

A few thoughts on the rendering model:

  • Do we need the ALLOWED_FILTER_* settings? I was thinking the / model could mean we don't need to handcode anything there.
  • For bug 123, I was thinking one could do leadingblock=bug; no trailingblock; filter pattern would be just [0-9]+. The idea was that we could keep the matchexpn logic simpler this way.

I guess to summarize in more detail, concretely, what I had in mind was something like this to process patterns:

# Example pattern: [([0-9]+)]
if '(' in pattern:
    (leading_block, pattern) = pattern.split('(', 2)
    (pattern, trailing_block) = pattern.rsplit(')', 2)
else:
    leading_block = trailing_block = ''

# now validate the remaining `pattern` is appropriately limited to be some combination of character classes.
... # insert validation here

regexp = r'%s(%s)%' % (re.escape(leading_block), pattern, re.escape(trailing_block))

But maybe this is getting difficult for users to understand...

@timabbott
Member

I guess if we wanted to explain something like that to users we could always make the input box 3 fields next to each other.

@blablacio
Contributor

So, my initial idea about ALLOWED_FILTER_PREFIXES setting was that we need to actually constrain and validate these. If we don't allow regular expressions for prefix and suffix and just accept the literal values then we should be good to go I guess. I actually like the idea of having three boxes for prefix, pattern and suffix and working on this and the other suggestions already.

@timabbott
Member

OK sounds good!

@brainwane
Contributor

@blablacio on how the user can test it in the UI: this past weekend I had a very good experience using http://regexr.com which made it easy for me to test out the regexes I was writing and iterate to get the regexes I wanted. So a model like that would also be helpful for me in this instance, I think.

@TigorC TigorC referenced this pull request Nov 9, 2016
Closed

Custom realm filters #2261

@timabbott timabbott added a commit that closed this pull request Nov 18, 2016
@blablacio @timabbott blablacio + timabbott Add initial implementation of custom realm filters.
This PR was abandoned by Vladislav and then substantially modified by
Igor Tokarev and Tim Abbott to complete it and fix a number of bugs.

Fixes #544.
d7e1e4a
@timabbott timabbott closed this in d7e1e4a Nov 18, 2016
@timabbott
Member

Merged via #2261.

@timabbott timabbott modified the milestone: Zulip roadmap, Old roadmap Nov 18, 2016
@brockwhittaker brockwhittaker added a commit to brockwhittaker/zulip that referenced this pull request Nov 30, 2016
@blablacio @brockwhittaker blablacio + brockwhittaker Add initial implementation of custom realm filters.
This PR was abandoned by Vladislav and then substantially modified by
Igor Tokarev and Tim Abbott to complete it and fix a number of bugs.

Fixes #544.
a2658c5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment