Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design doc(to be continued) #9

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

design doc(to be continued) #9

wants to merge 1 commit into from

Conversation

chenkaitopic
Copy link

Click View button to see the fully rendered markdown text.


## Specifications
### Sharding strategy
We plan to use Document based sharding. A comparison between doc based sharding and term based sharding([stolen from Jeff Dean](http://web.stanford.edu/class/cs276/Jeff-Dean-Stanford-CS276-April-2015.pdf)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome you found this comparison!

@yitopic
Copy link
Contributor

yitopic commented Feb 16, 2016

我喜欢一个node里可以有多个shard。这样把算法设计中的shard概念和系统运维中的node数量decouple了。

Sequential Document ID Generation

我在想,如果反正我们得把每一个document都persist了。那么是不是可以按照document被加入到我们系统(或者说被persist的顺序)编号作为document ID?

换句话说,document ID就是从0开始的——如果要删除一个document,则在persist storage(例如S3)里把那个document标记为removed,但是并不真的删除,尤其是不会改变document ID序列。

这样,我们就可以做到“每N个document构成一个shard”。而且新来的document会进入新的shards。

Indexer Root in addition to Search Root

另外,是不是可以(或者说应该)在search root之外再加一个进程,indexer root?

![Alt text](http://g.gravizo.com/g?
digraph G {
aize ="4,4";
SearchRoot -> Leaf1;
SearchRoot -> Leaf2;
SearchRoot -> Leaf3;
IndexerRoot -> Leaf1;
IndexerRoot -> Leaf2;
IndexerRoot -> Leaf3;
PersistentStorage [shape=box]
IndexerRoot -> PersistentStorage
Etcd [shape=box]
IndexerRoot -> Etcd;
SearchRoot -> Etcd;
Leaf1 -> Etcd
Leaf2 -> Etcd
Leaf3 -> Etcd
})

IndexerRoot负责document ID的序列化生成。

具体的说,SearchRoot 实现RPC Search;IndexerRoot实现RPC AddDocument

Go RPC calls 都是被多goroutine并发执行的,为了产生sequential document Id,AddDocument需要

  1. 把document放进一个go channel,然后这个channel的reader goroutine来生成document ID,
  2. 然后把分配了ID的document发给维护最新shard的那个node,
  3. 那个node负责persist document

以上想法都假设AddDocument的调用频率远远小于Search的调用频率。

@ghost ghost mentioned this pull request Feb 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants