support multi-index ES search #57

Rizziepit · 2015-05-13T12:01:30Z

No description provided.

Rizziepit · 2015-05-13T12:41:40Z

@smn @miltontony I'd like your input on this. I'm weighing up the pros and cons of implementing multi-index search vs. using the same index for multiple repos.

Multi-index

Pro: We can easily search across a collection of workspaces. I can reuse existing elastic-git functionality to make this happen.
Con: Elasticsearch's scoring operates per index (more info here). I might be mistaken, but I don't think there is a way to rank results from multiple indices given that the scores aren't comparable. Either of you know more?
Con: Probably a slight performance hit given the small amount of data per repo.

Single-index, multiple repos

Pro: Search scoring works as intended.
Con: We can't use the index to identify the repo. So we have to store that information in Elasticsearch. The one repo = one index assumption in elastic-git means I'll have to rewrite some functionality.

smn · 2015-05-13T13:32:08Z

How are we going to map the list of workspaces to indexes in the underlying elastig-git S() stuff?

smn · 2015-05-13T13:33:44Z

What if we changed the idea of workspaces to include multiple repositories?

ws = EG.workspace(dir1, dir2, dir3)

Would we use the same index prefix for all content directories or have separate ones for each and then query multiple indexes with S()?

Which, now that I'm reading things again, is what I suppose you're asking too?

Rizziepit · 2015-05-13T13:58:13Z

@smn I like upgrading a workspace to have multiple content dirs. I'm slightly inclined towards implementing the same index for all content directories. This approach doesn't break scoring. Is there a good reason to put content dirs in separate indices? @miltontony?

smn · 2015-05-13T14:00:14Z

Are we sure scoring breaks across different indexes?
I like the idea of separate indexes because then we still have the option of querying them separately.

Rizziepit · 2015-05-13T14:16:20Z

No, I'm not sure. I'm basing this on the fact that scoring can be wonky across shards. You can explicitly use a DFS query to fix it, but I can't find mention of similar functionality across indices. It might be because people tend to put different types in different indices. I'm probably going down a rabbit hole though. Ignore and implement separate indices?

smn · 2015-05-13T14:17:50Z

Perhaps we could allow both? Always single indexes, optionally a combined index?

Rizziepit · 2015-05-13T14:21:02Z

+1

Rizziepit · 2015-05-15T10:02:22Z

@smn @miltontony feedback on what I've done so far would be much appreciated.

So far I've partially converted Workspace and StorageManager to use multiple repos. It adds a lot of complexity, but I don't think it's a good idea to put multi-repo logic elsewhere (e.g. a Unicore frontend site). Thoughts?

Rizziepit · 2015-05-18T13:50:34Z

New approach for multi-index search:

# indexes = ['repo1-master', 'repo2-master']
[obj] = S(
    Localisation,
    in_=['repos/repo1', 'repos/repo2'])

# indexes = ['tz-repo1-prod-master', 'tz-repo2-prod-master']
[obj] = S(
    Localisation,
    in_=['repos/repo1', 'repos/repo2'],
    index_prefixes=['tz-repo1-prod', 'tz-repo2-prod'])

Queries return ReadOnlyModelMappingType objects which have a to_object method - get_object raises NotImplementedError.

Rizziepit · 2015-05-18T19:01:26Z

@smn @miltontony ready for review.

I'm getting a failure locally on repeat test runs because TestSearch.workspace1's index isn't deleted after tests. I can't figure out why. Any ideas?

smn · 2015-05-18T19:04:47Z

@Rizziepit try running that single test with KEEP_REPO=1 py.test unicore -k <name of test> and then inspect what stuff is left lying around in the .repos dir? The KEEP_REPO environment flag is in elasticgit.tests.base.ModelBaseTest

smn · 2015-05-19T06:41:53Z

elasticgit/search.py

@@ -24,25 +35,57 @@ def get_mapping_type_name(cls):
    def get_model(self):
        return self.model_class

-    def get_object(self):


I think we should rather raise a NotImplementedError here instead of removing the function entirely.

smn · 2015-05-19T08:21:12Z

Sorry for the questions overload, I'm probably missing some obvious stuff...

…ub.com:universalcore/elastic-git into feature/issue-57-support-multi-index-ES-search

smn · 2015-05-19T11:24:03Z

👍 on the code.
Could you add some documentation for this?

Rizziepit · 2015-05-19T12:55:50Z

elasticgit/search.py

+        return obj
+
+
+class SM(S):


@smn this could use a re-review. Instead of a single S class, there is now S and SM where SM is explicitly instantiated with model_class and in_. Workspace still uses S so that it can specify the read-write mapping type.

smn · 2015-05-19T15:46:22Z

👍 from me on the S() -> SM() changes.

smn · 2015-05-20T08:53:45Z

👍 again

smn added the in progress label May 13, 2015

smn reviewed May 19, 2015
View reviewed changes

Rizziepit added 3 commits May 19, 2015 11:59

fix test issue where indexes aren't deleted

4d4b3d9

improve MappingType classes

9cc1cdf

Merge branch 'feature/issue-57-support-multi-index-ES-search' of gith…

f56a551

…ub.com:universalcore/elastic-git into feature/issue-57-support-multi-index-ES-search

split S into S and SM, latter explicitly initialized with model_class

0c06b12

Rizziepit reviewed May 19, 2015
View reviewed changes

fix handling of repos in SM.__init__

434cef0

Rizziepit added 2 commits May 19, 2015 18:20

document SM class

42812c1

tests to cover SM._clone and SM.get_es

4c711fe

Rizziepit merged commit 4c711fe into develop May 20, 2015

smn removed the in progress label May 20, 2015

Rizziepit mentioned this pull request May 21, 2015

Have the RemoteWorkspace work with the diff returned by unicore.distribute's pull. #59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support multi-index ES search #57

support multi-index ES search #57

Rizziepit commented May 13, 2015

Rizziepit commented May 13, 2015

smn commented May 13, 2015

smn commented May 13, 2015

Rizziepit commented May 13, 2015

smn commented May 13, 2015

Rizziepit commented May 13, 2015

smn commented May 13, 2015

Rizziepit commented May 13, 2015

Rizziepit commented May 15, 2015

Rizziepit commented May 18, 2015

Rizziepit commented May 18, 2015

smn commented May 18, 2015

smn May 19, 2015

smn commented May 19, 2015

smn commented May 19, 2015

Rizziepit May 19, 2015

smn commented May 19, 2015

smn commented May 20, 2015

support multi-index ES search #57

support multi-index ES search #57

Conversation

Rizziepit commented May 13, 2015

Rizziepit commented May 13, 2015

smn commented May 13, 2015

smn commented May 13, 2015

Rizziepit commented May 13, 2015

smn commented May 13, 2015

Rizziepit commented May 13, 2015

smn commented May 13, 2015

Rizziepit commented May 13, 2015

Rizziepit commented May 15, 2015

Rizziepit commented May 18, 2015

Rizziepit commented May 18, 2015

smn commented May 18, 2015

smn May 19, 2015

Choose a reason for hiding this comment

smn commented May 19, 2015

smn commented May 19, 2015

Rizziepit May 19, 2015

Choose a reason for hiding this comment

smn commented May 19, 2015

smn commented May 20, 2015