Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Add more accurate account search #11537
When ElasticSearch is available, a more accurate search is implemented:
Additionally, the previous behaviour is also kept:
The exact match precedence only takes effect when the input conforms to the username format and the username part of it is complete, i.e. when the user started typing the domain part.
What does this mean? Searching for people who have unicode characters in their display names, such as accents and umlauts, has become possible without entering those characters (asciifolding); likewise, you no longer need to use the exact flavour of CJK characters (full-width/half-width) to get the same results; accounts that haven't been active in a long time are less likely to appear near the top; accounts that are follow-botting are less likely to appear near the top (spamminess), and more established accounts (by followers number) are more likely to appear higher (but mind, that's within specific searches). Like before, the biggest factor is whether you're following someone.
I decided to run different queries for the same search term to see the differences between results and how they are scored. The queries did not consider follow relationships because a user that is followed by the searching user will universally appear at the top of the results.
Search term "garg"
Expecting to find: "Gargron"
Here is the status quo, search results from PostgreSQL for comparison:
The first result is a dead account, the rest are mostly bots.
Now, let's begin with a simple query without any additional scoring:
The third result is an active account, the rest, not so much. They're also all local accounts.
Same query with follower ratio affecting the score:
There are more interesting results here. Some accounts are bots/inactive/fake, however. The follower ratio isn't very insightful alone because it behaves wildly at low numbers.
Same query, but with followers number affecting the score:
These are pretty interesting results. The first two accounts are real and active. The third is real, but belongs to an instance that's been dead for a year. The fourth is a dead account. The fifth and sixth accounts are real and active.
Same query, but with last activity affecting the score:
These results seem a bit nonsense. At the very least, they're neither all local, nor is the long inactive gonext.gg account among them.
Now, same query, but combining all 3 scoring modifiers:
The first 4 results are real and active accounts. The fifth is fake, the sixth is inactive, the rest are dead. Seeing these suggestions, the user could press ENTER to get the desired completion immediately.
Search term "electro"
Expected to find "electroCutie@beach.city"
Status quo from PostgreSQL:
Just like in the other example, there is no real logic to the results and what we expect to find isn't there at all.
No additional scoring:
While a "electrocutie" is the second result, that's a different account that was last active in 2017.
Follower ratio affecting the score:
What we expect to find isn't there at all. The first result is an active and popular bot, but most of the other results are inactive or fake.
Followers number affecting the score:
The first two accounts are inactive. The third and fourth are real, and hey, it appears! What we expect to find is the 5th result.
Last activity affecting the score:
All the accounts indeed have posted within the last two months, and one of them is the one we expect to find, however, it is low on the list.
The first result is real. The second has never posted. The third is inactive. The fourth is real and active. The fifth is what we were looking to find. Seeing these suggestions, the user would have to type another "c" to get the desired result and press ENTER.