Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve search based on player name #248

Closed
abaisero opened this issue Mar 13, 2022 · 2 comments
Closed

Improve search based on player name #248

abaisero opened this issue Mar 13, 2022 · 2 comments
Labels
enhancement feature request Any new feature for the AGAGD.

Comments

@abaisero
Copy link
Contributor

abaisero commented Mar 13, 2022

Is your feature request related to a problem?

The current player name search seems to be based on a very naive substring matching, which causes a number of very simple and straightforward search queries to fail.

I'll use the top ranked player's name (Albert Yen) as an example, also because it showcases additional issues which arise when someone has very common first and last names:

  • "yen, albert" finds the correct match.
  • "yen, albert" (note the two spaces after the comma) finds no matches.
  • " yen, albert" (note the space before the surname) finds no matches.
  • "yen, albert " (note the space after the name) finds no matches.
  • "yen albert" (note no comma) finds no matches.
  • "albert yen" finds no matches.
  • "albert" finds many matches, which need to be parsed to find the right one.
  • "yen" finds many matches, which need to be parsed to find the right one.

Especially when someone has very common name or surname, it seems like the only way to find their profile quickly is to match the exact format "surname, name" with the correct number of spaces and commas located at the right places. Further, the search query needs to be at least sanitized to remove unnecessary spaces or commas from the query.

Describe the feature you'd like to see on the AGAGD.

I don't know much about best practices when it comes to finding fuzzy substring matches, but off the top of my head the following would be a simple improvement which would address the biggest of the above issues:

  • take the search query, sanitize it and split it into tokens of alphanumeric characters (ignore spaces and commas, but allow dashes and apostrophes since those do appear in names)
  • for each token, run a search based on simple substring matching (or better yet, based on a version of substring matching which ignores accents and matches "o" with "ò" for example)
  • join the results obtained by each token, ignoring duplicates
@abaisero abaisero added enhancement feature request Any new feature for the AGAGD. labels Mar 13, 2022
@abaisero
Copy link
Contributor Author

Just realized this is related to #164, might warrant closing this.

abaisero added a commit to abaisero/agagd that referenced this issue Apr 5, 2022
Prior to this commit, the search query was used monolithically as a
single string to match the members' member id or their (comma-separated)
full name;  this had many shortcomings, e.g., a search query " 12345 "
would fail to match a member with id "12345", and a search query
"surname name" would fail to match a member with full name "surname,
name".  Also see usgo#164 and usgo#248.

In this commit, the search is first stripped of outer whitespace, which
helps search for member ids.  If the search query is not an id, then the
search query is split into tokens, and a match with a member is found if
all of the respective query tokens match the member's full name.  This
allows users to run queries such as "name", "surname", "name surname" or
"surname name".
@abaisero
Copy link
Contributor Author

abaisero commented May 6, 2022

fixed by #249

@abaisero abaisero closed this as completed May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feature request Any new feature for the AGAGD.
Projects
None yet
Development

No branches or pull requests

1 participant