Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alternative to text search in SearchClans #58

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jpholanda
Copy link

@jpholanda jpholanda commented Mar 27, 2020

The current clan search implementation uses the text search provided by Mongo. However, it turns out that if the clan name doesn't have any alphabetic characters, the text search will not find it, as it was made to find words. As such, the only way to find clans with such names is via regex search, which is the proposed alternative. You can choose which search to use via the new query parameter useRegexSearch, where useRegexSearch=true means use regex search and useRegexSearch=false means use text search (if non-existent, defaults to false).

models/clan.go Outdated
filter = bson.M{"$text": bson.M{"$search": term}}
} else {
escapedTerm := fmt.Sprintf(`\Q%s\E`, term)
filter = bson.M{"name": bson.M{"$regex": escapedTerm, "$options": "i"}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is very inefficient, be careful. It was like this before I changed it to use text index!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@jpholanda jpholanda Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that, however the way we currently do it is by essentially by-passing khan to execute that search. Text search doesn't work as intended for our purposes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be acceptable to change to case-sensitive then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan.

As I understand, the search is still O(N) but with a cheaper constant, because you load index data instead of the documents themselves, which is supposed to be faster... I think only load tests would tell if it's really okay. But, as I remember, most queries would take more than 2 seconds with the regex of this patch...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wasn't excluding case-sensitivity. Sorry if it wasn't clear.
\Q and \E are used to escape the term, in the sense of interpreting literally all the characters in between. https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC2

Got it. Still can't tell if it will have an impact or not... But the MongoDB docs do not have any warnings on that, so I guess it's okay. Anyway, only load tests will really tell if a solution is good or not.

Copy link
Author

@jpholanda jpholanda Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble figuring out exactly what to do for those load tests. Am I supposed to write a new config file? How do I know what values would be reasonable? And after perfoming those load tests, how will I know if the result is acceptable or not?

Copy link
Contributor

@matheuscscp matheuscscp Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Setup a new instance of Khan API to load test it. New instances of Redis, Postgres and MongoDB will also be necessary for this API to work for your purposes, because clan insertions on Postgres will be dispatched to MongoDB through Redis.
  2. Make sure this new API instance will send its latency metrics to somewhere (Datadog, Graphana...). You're interested in the latency metric of the Search Clans endpoint.
  3. Drill this API somehow (from your local machine or from a cluster) using the load tests command-line tool I created. For this step you just need to copy/change the tool's configs. What you want here is to create a lot of clans first, and then drill the Search Clans endpoint. For that you can first set the probability of createClan to 1 and everything else to 0, then later set the probability of searchClans to 1 and everything else to 0.
  4. After drilling the Search Clans endpoint, analyze its latency metric along the experiment. You want to know if this metric did not reach absurd values, like 2 seconds. Also, check the databases' metrics (CPU, etc.) to see if they were okay.

Someone in your team can certainly help you with the details in each of these steps. Also, refer to the docs of the load tests command-line tool, I'm pretty sure I explained all its configs there. If something is not clear, feel free to ask me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leohahn you probably want to read this thread

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matheuscscp. We'll first see if the load test is really required since it is already running on one of our games, its 95p latency seems to be around 200-300ms.

@jpholanda
Copy link
Author

I think the postgres container in travis-ci is broken, probably because of the same issue I fixed in the docker-compose for testing.

@matheuscscp
Copy link
Contributor

matheuscscp commented Mar 27, 2020

Don't forget to make sure the index on name exists in all games. Since we don't have a MongoDB migrations tool, it would be a good idea to add the creation of these indexes here: https://github.com/topfreegames/khan/blob/master/cmd/migrate_mongo.go#L128

api/clan.go Outdated
@@ -678,6 +678,7 @@ func SearchClansHandler(app *App) func(c echo.Context) error {
start := time.Now()
gameID := c.Param("gameID")
term := c.QueryParam("term")
useTextSearchStr := c.QueryParam("useTextSearch")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally default boolean variables are false. I think having useRegexSearch defaulting to false is more intuitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only more intuitive, but also probably more safe, given we don't really know if regex search is acceptable before having results of load tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

api/clan_test.go Outdated
@@ -12,6 +12,7 @@ import (
"encoding/json"
"fmt"
"net/http"
netUrl "net/url"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
netUrl "net/url"
neturl "net/url"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lib/interface.go Outdated
@@ -21,5 +21,5 @@ type KhanInterface interface {
TransferOwnership(context.Context, string, string) (*TransferOwnershipResult, error)
UpdateClan(context.Context, *ClanPayload) (*Result, error)
UpdatePlayer(context.Context, string, string, interface{}) (*Result, error)
SearchClans(context.Context, string) (*SearchClansResult, error)
SearchClans(context.Context, string, ...bool) (*SearchClansResult, error)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to avoid using multiple bool arguments, otherwise you may have the following function call:

SearchClans(ctx, "", false, true, true, true, false)

It is not readable at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned below, it is meant to be a single optional argument, but I agree. I will follow your suggestion to use enums instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lib/lib.go Outdated
func (k *Khan) SearchClans(ctx context.Context, clanName string, useTextSearchOpt ... bool) (*SearchClansResult, error) {
useTextSearch := true
if len(useTextSearchOpt) > 0 {
useTextSearch = useTextSearchOpt[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have an array of booleans if you only use the first one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to have an optional boolean argument, so as to be backwards compatible. Would it be better to disregard this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, if you want backwards compatibility, maybe add a new method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

filter = bson.M{"$text": bson.M{"$search": term}}
} else {
escapedTerm := fmt.Sprintf(`^\Q%s\E`, term)
filter = bson.M{"name": bson.M{"$regex": escapedTerm}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golang Regex engine is guaranteed to run at linear time based on the length of the input.
https://golang.org/pkg/regexp/

The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input. (This is a property not guaranteed by most open source implementations of regular expressions.)

Do you know if Mongo guarantees the same?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implication if it does not follow is that the user can pass some exponential time regexes: https://stackoverflow.com/questions/8887724/why-can-regular-expressions-have-an-exponential-running-time

Copy link
Author

@jpholanda jpholanda Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mongo uses PCRE, which does not have such guarantees. However, the regex /^\Q...\E/ is a prefix expression with no operators, and according to this https://docs.mongodb.com/manual/reference/operator/query/regex/#index-use,

/^a/ can stop scanning after matching the prefix

so it should be pretty efficient.
Will try to confirm with the load test.

@@ -1087,22 +1087,22 @@ var _ = Describe("Clan Model", func() {
It("Should return clan by search term", func() {
err := testing.CreateClanNameTextIndexInMongo(GetTestMongo, player.GameID)
Expect(err).NotTo(HaveOccurred())
clans, err := SearchClan(testDb, testMongo, player.GameID, "SEARCH", 10)
clans, err := SearchClan(testDb, testMongo, player.GameID, "SEARCH", 10, true)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid raw boolean values on function calls. Adding an enumeration is easier to understand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jpholanda jpholanda force-pushed the master branch 2 times, most recently from 748dd2e to bde9cd5 Compare March 31, 2020 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants