-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alternative to text search in SearchClans #58
base: master
Are you sure you want to change the base?
Conversation
models/clan.go
Outdated
filter = bson.M{"$text": bson.M{"$search": term}} | ||
} else { | ||
escapedTerm := fmt.Sprintf(`\Q%s\E`, term) | ||
filter = bson.M{"name": bson.M{"$regex": escapedTerm, "$options": "i"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query is very inefficient, be careful. It was like this before I changed it to use text index!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that, however the way we currently do it is by essentially by-passing khan to execute that search. Text search doesn't work as intended for our purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be acceptable to change to case-sensitive then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan.
As I understand, the search is still O(N) but with a cheaper constant, because you load index data instead of the documents themselves, which is supposed to be faster... I think only load tests would tell if it's really okay. But, as I remember, most queries would take more than 2 seconds with the regex of this patch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wasn't excluding case-sensitivity. Sorry if it wasn't clear.
\Q and \E are used to escape the term, in the sense of interpreting literally all the characters in between. https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC2
Got it. Still can't tell if it will have an impact or not... But the MongoDB docs do not have any warnings on that, so I guess it's okay. Anyway, only load tests will really tell if a solution is good or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble figuring out exactly what to do for those load tests. Am I supposed to write a new config file? How do I know what values would be reasonable? And after perfoming those load tests, how will I know if the result is acceptable or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Setup a new instance of Khan API to load test it. New instances of Redis, Postgres and MongoDB will also be necessary for this API to work for your purposes, because clan insertions on Postgres will be dispatched to MongoDB through Redis.
- Make sure this new API instance will send its latency metrics to somewhere (Datadog, Graphana...). You're interested in the latency metric of the Search Clans endpoint.
- Drill this API somehow (from your local machine or from a cluster) using the load tests command-line tool I created. For this step you just need to copy/change the tool's configs. What you want here is to create a lot of clans first, and then drill the Search Clans endpoint. For that you can first set the probability of
createClan
to 1 and everything else to 0, then later set the probability ofsearchClans
to 1 and everything else to 0. - After drilling the Search Clans endpoint, analyze its latency metric along the experiment. You want to know if this metric did not reach absurd values, like 2 seconds. Also, check the databases' metrics (CPU, etc.) to see if they were okay.
Someone in your team can certainly help you with the details in each of these steps. Also, refer to the docs of the load tests command-line tool, I'm pretty sure I explained all its configs there. If something is not clear, feel free to ask me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leohahn you probably want to read this thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matheuscscp. We'll first see if the load test is really required since it is already running on one of our games, its 95p latency seems to be around 200-300ms.
I think the postgres container in travis-ci is broken, probably because of the same issue I fixed in the docker-compose for testing. |
Don't forget to make sure the index on |
api/clan.go
Outdated
@@ -678,6 +678,7 @@ func SearchClansHandler(app *App) func(c echo.Context) error { | |||
start := time.Now() | |||
gameID := c.Param("gameID") | |||
term := c.QueryParam("term") | |||
useTextSearchStr := c.QueryParam("useTextSearch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally default boolean variables are false. I think having useRegexSearch
defaulting to false is more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not only more intuitive, but also probably more safe, given we don't really know if regex search is acceptable before having results of load tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/clan_test.go
Outdated
@@ -12,6 +12,7 @@ import ( | |||
"encoding/json" | |||
"fmt" | |||
"net/http" | |||
netUrl "net/url" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
netUrl "net/url" | |
neturl "net/url" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/interface.go
Outdated
@@ -21,5 +21,5 @@ type KhanInterface interface { | |||
TransferOwnership(context.Context, string, string) (*TransferOwnershipResult, error) | |||
UpdateClan(context.Context, *ClanPayload) (*Result, error) | |||
UpdatePlayer(context.Context, string, string, interface{}) (*Result, error) | |||
SearchClans(context.Context, string) (*SearchClansResult, error) | |||
SearchClans(context.Context, string, ...bool) (*SearchClansResult, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to avoid using multiple bool arguments, otherwise you may have the following function call:
SearchClans(ctx, "", false, true, true, true, false)
It is not readable at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned below, it is meant to be a single optional argument, but I agree. I will follow your suggestion to use enums instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/lib.go
Outdated
func (k *Khan) SearchClans(ctx context.Context, clanName string, useTextSearchOpt ... bool) (*SearchClansResult, error) { | ||
useTextSearch := true | ||
if len(useTextSearchOpt) > 0 { | ||
useTextSearch = useTextSearchOpt[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have an array of booleans if you only use the first one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to have an optional boolean argument, so as to be backwards compatible. Would it be better to disregard this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, if you want backwards compatibility, maybe add a new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
filter = bson.M{"$text": bson.M{"$search": term}} | ||
} else { | ||
escapedTerm := fmt.Sprintf(`^\Q%s\E`, term) | ||
filter = bson.M{"name": bson.M{"$regex": escapedTerm}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golang Regex engine is guaranteed to run at linear time based on the length of the input.
https://golang.org/pkg/regexp/
The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input. (This is a property not guaranteed by most open source implementations of regular expressions.)
Do you know if Mongo guarantees the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implication if it does not follow is that the user can pass some exponential time regexes: https://stackoverflow.com/questions/8887724/why-can-regular-expressions-have-an-exponential-running-time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mongo uses PCRE, which does not have such guarantees. However, the regex /^\Q...\E/ is a prefix expression with no operators, and according to this https://docs.mongodb.com/manual/reference/operator/query/regex/#index-use,
/^a/ can stop scanning after matching the prefix
so it should be pretty efficient.
Will try to confirm with the load test.
models/clan_test.go
Outdated
@@ -1087,22 +1087,22 @@ var _ = Describe("Clan Model", func() { | |||
It("Should return clan by search term", func() { | |||
err := testing.CreateClanNameTextIndexInMongo(GetTestMongo, player.GameID) | |||
Expect(err).NotTo(HaveOccurred()) | |||
clans, err := SearchClan(testDb, testMongo, player.GameID, "SEARCH", 10) | |||
clans, err := SearchClan(testDb, testMongo, player.GameID, "SEARCH", 10, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid raw boolean values on function calls. Adding an enumeration is easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
748dd2e
to
bde9cd5
Compare
…upport Add support to pagination to clan search queries
Make Limit an optional parameter to the lib's clan search call
The current clan search implementation uses the text search provided by Mongo. However, it turns out that if the clan name doesn't have any alphabetic characters, the text search will not find it, as it was made to find words. As such, the only way to find clans with such names is via regex search, which is the proposed alternative. You can choose which search to use via the new query parameter
useRegexSearch
, whereuseRegexSearch=true
means use regex search anduseRegexSearch=false
means use text search (if non-existent, defaults to false).