Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream_search: Split names at both "-" and space. #12075

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

YashRE42
Copy link
Collaborator

@YashRE42 YashRE42 commented Apr 7, 2019

Fixes #9399.
beforevsaftersplitat-
Testing Plan:
Added node test case, and screenshot from manual testing are bellow.

GIFs or Screenshots:
I created the following streams:
alldeminfras
We get better results for search on "infra"
findalldeminfras
Other searches are as normal:
otherwisenormal

@zulipbot
Copy link
Member

zulipbot commented Apr 7, 2019

Hello @zulip/server-search, @zulip/server-sidebars members, this pull request was labeled with the "area: left-sidebar", "area: search" labels, so you may want to check it out!

@YashRE42
Copy link
Collaborator Author

YashRE42 commented Apr 7, 2019

@timabbott I think this first commit is good to solve #9399, and should be safe to merge.

I think we could move these changes into the typeahead as a seperate follow up commit.

@YashRE42
Copy link
Collaborator Author

YashRE42 commented Apr 7, 2019

@rishig
Please have a look at how this second commit affects the typeahead, it's a small change which effects a few places, I hope all of these are positive changes.

Topic suggestions are improved eg "three-four" shown for "four".
topicSuggestionImprovement

Stream suggestions are improved eg "ihrd-infra.." shown for "infra".
improvementinstreamsuggest

"pm" will suggest individual pms operator as well as group pms operator
improvepm

"with" will suggest individual pms operator as well as group pms operator
improvewithopp

@@ -26,7 +26,8 @@ function filter_streams_by_search(streams, search_term) {
var filtered_streams = _.filter(streams, function (stream) {
return _.any(search_terms, function (search_term) {
var lower_stream_name = stream.toLowerCase();
var cands = lower_stream_name.split(" ");
// This regex allows us to split at " " and "-"
var cands = lower_stream_name.split(/-| /);
Copy link
Member

@rishig rishig Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim, thoughts on including a bunch of things here? like -, _, /, :, etc?
(Though I'm not sure how this interacts with our other stream typeahead searches -- ideally the various methods of searching for streams (here, left sidebar, compose typeahead, streams settings) would have similar logic here.)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful about characters that have other meanings. Certainly : would be a bad choice in typeahead because of overlap with emoji. I think _- / are the widest set that makes sense.

@rishig
Copy link
Member

rishig commented Apr 7, 2019

cool, as a product change I think it's positive. I guess these would show up above other search results? (e.g. if there's someone whose email starts with pm, or whose name contains with?)

@YashRE42
Copy link
Collaborator Author

YashRE42 commented Apr 8, 2019

These show up bellow other results:
image

Apart from this, this makes me think about how we make matches with username searches as well:

image

"pm" does not match "with-pm" (the user), I think that is something that we might want to change?
I could look into that, it would be another reasonable follow up to this, if we want it to be changed.

@YashRE42
Copy link
Collaborator Author

@rishig @timabbott

@timabbott
Copy link
Sponsor Member

Thinking about this more, I worry this whole implementation is hacky and the wrong approach.

Perhaps rather than treating various characters as spaces, we should instead just change the matching algorithm for streams to be "prefix matches, followed by non-prefix textual matches" rather than "prefix matches"? Or possibly "prefix matches, followed by non-prefix textual matches starting with an alphanumeric character"

@YashRE42
Copy link
Collaborator Author

YashRE42 commented Apr 11, 2019

@timabbott
I think that makes sense, so far, I was working based on the comment on the issue, which mentioned:

I wonder if we're doing per-word prefix matching already but just not treating - as a word-break... if not, that's probably what we should be doing.

but I understand if that's not how we'd want to handle this.
I'm not sure I understand what you mean by "non-prefix textual matches" or "non-prefix textual matches starting with an alphanumeric character". Would this be the same as a fuzzy string search?
(several days later: after reading Rishi's next comment, I figured out what you meant)

Also, am I correct in assuming we'd also want to change the algorithms for the typeahead in the same way?
Also, thanks for reviewing.

@rishig
Copy link
Member

rishig commented Apr 13, 2019

I think we should consider doing just straight substring matching.
In this proposal, ordering would always be the same as the left sidebar (i.e. we wouldn't value prefix matches over other matches)

The biggest potential issue of substring matching (too many matches) is also mitigated by the fact that most users won't have more than 50-100 streams.

@YashRE42
Copy link
Collaborator Author

I tried this out, based on the suggestions I was getting, I just wanted to confirm with @rishig whether this is what we want or whether the approach Tim suggested would be better.

These results (to me) seemed to improve:
image
image

These results (to me) seemed to have gotten worse:
image
image

but that may be just because I'm used to our old behavior where I could just type "s" and get "sales" as the first result because it was the only stream that began with "s".

Of course, typing "sa" gets me just "sales".
image
and "er" gets:
image

If we're fine with these images, then no problems and we can discuss where else we want the same logic to be used.
If we're not fine with this, then we could resolve it in a few ways:

  1. Just use the previous method if we have less than say 3 letters to match against. (although this would probably look pretty inconsistent
  2. Use something similar to what Tim mentioned such that we get the prefix matches at the top

@rishig
Copy link
Member

rishig commented Apr 16, 2019

hm, good observation about the 1 letter searches being bad.

ok, what do you think about

  • match word prefixes, like Tim suggested
  • but keep sort order the same as we have now?

One nice thing about keeping the order is that pinned streams will naturally come above normal streams, will naturally come above inactive streams.

(This is also probably the minimal change we can make to our current behavior, that still resolves the issue.)

@YashRE42
Copy link
Collaborator Author

YashRE42 commented Apr 16, 2019 via email

@rishig
Copy link
Member

rishig commented Apr 16, 2019

It's different -- in my proposal a non-prefix pinned stream match will appear above a prefix non-pinned stream match.
(where non-prefix means word match -- no more arbitrary substring matches)

@YashRE42
Copy link
Collaborator Author

YashRE42 commented Apr 16, 2019 via email

@rishig
Copy link
Member

rishig commented Apr 17, 2019

Not quite that either :).

<Prefix and non-prefix matches in pinned streams, alphabetically ordered.>
<Prefix and non-prefix matches in normal streams, alphabetically ordered.>
<Prefix and non-prefix matches in dormant streams, alphabetically ordered.>

Changes phrase_match in common.js to split words at "-" in addition to
space. phrase_match is called in 4 places, which are effected as
follows:
- Stream suggestions are improved eg "ihrd-infra.." shown for "infra".
- Topic suggestions are improved eg "three-four" shown for "four".
- opperator suggestions are changed:
"with" will suggest individual pms as well as group pms
"pm" will suggest individual pms as well as group pms
- Integration searches will return matches after "-" but this doesn't
really affect anything right now as curently there are no integrations
with "-" in the display name.
This commit is supposed to be squashed with / replace the first commit
from this branch
@YashRE42
Copy link
Collaborator Author

@rishig There was some confusion last we were discussing this, I've pushed the changes that I was experimenting with.

Not quite that either :).

<Prefix and non-prefix matches in pinned streams, alphabetically ordered.>
<Prefix and non-prefix matches in normal streams, alphabetically ordered.>
<Prefix and non-prefix matches in dormant streams, alphabetically ordered.>

This is exactly what this branch currently is, perhaps it wasn't clear because the screenshots didn't have any pinned streams. Here is an example of that:
image
The above streams give this:
image

However, this suffers from the problem I was describing here:

I tried this out, based on the suggestions I was getting, I just wanted to confirm with @rishig whether this is what we want or whether the approach Tim suggested would be better.

These results (to me) seemed to improve:
image
image

These results (to me) seemed to have gotten worse:
image
image

but that may be just because I'm used to our old behavior where I could just type "s" and get "sales" as the first result because it was the only stream that began with "s".

Of course, typing "sa" gets me just "sales".
image
and "er" gets:
image

If we're fine with these images, then no problems and we can discuss where else we want the same logic to be used.
If we're not fine with this, then we could resolve it in a few ways:

1. Just use the previous method if we have less than say 3 letters to match against. (although this would probably look pretty inconsistent)

2. Use something similar to what Tim mentioned such that we get the prefix matches at the top

(all the images are how this currently is.)

@rishig
Copy link
Member

rishig commented May 21, 2019

yeah, I meant we should only match the beginning of words; sorry that wasn't clear. So, ap and ba would match apple banana, but n would not.

@YashRE42
Copy link
Collaborator Author

@rishig do you mean we should only do prefix matches for the first few letters?

Just so the context here hasn't been lost:
This PR is to resolve a complaint with our prefix matching that we won't catch words after a hyphen, the initial approach was to split words at hyphens at then do prefix matches on the split parts, but then we decided that doing substring matches would be better than splitting words.

@rishig
Copy link
Member

rishig commented May 21, 2019

I guess I should have used ap and ba would match apple-banana, but n would not. Does that answer the question? (If not let me know :))

but then we decided that doing substring matches would be better than splitting words.

Your examples of one letter matches were compelling, so this idea has been abandoned.

@YashRE42
Copy link
Collaborator Author

Your examples of one letter matches were compelling, so this idea has been abandoned.

Okay, glad I didn't rewrite that first commit then.

@timabbott
Is there a way for me to implement d19117d (first commit) to seem less hacky ?
Also, should I expand the regex to split at _, - and /?

The second commit (6c37d9c) pushes the changes into a few places, including the typeahead suggestions and a few other places as well, I can make changes there if needed.

If these are good then we can just drop the third commit (ab4fe76), I'm leaving it on this branch right now in case you want to see the alternate approach I tried (that we're rejecting from design side).

@rishig
Copy link
Member

rishig commented May 23, 2019

Can we just split on any non alpha-numeric character? (whatever the i18n equivalent is).

@zulipbot
Copy link
Member

Heads up @YashRE42, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support more than just left-anchored (prefix) matches in stream search UI
4 participants