design doc(to be continued) #9

chenkaitopic · 2016-02-15T21:29:12Z

Click View button to see the fully rendered markdown text.

yitopic · 2016-02-16T05:42:14Z

multishard_design.md

+
+## Specifications
+### Sharding strategy
+We plan to use Document based sharding. A comparison between doc based sharding and term based sharding([stolen from Jeff Dean](http://web.stanford.edu/class/cs276/Jeff-Dean-Stanford-CS276-April-2015.pdf)):


Awesome you found this comparison!

yitopic · 2016-02-16T05:53:49Z

我喜欢一个node里可以有多个shard。这样把算法设计中的shard概念和系统运维中的node数量decouple了。

Sequential Document ID Generation

我在想，如果反正我们得把每一个document都persist了。那么是不是可以按照document被加入到我们系统（或者说被persist的顺序）编号作为document ID？

换句话说，document ID就是从0开始的——如果要删除一个document，则在persist storage（例如S3）里把那个document标记为removed，但是并不真的删除，尤其是不会改变document ID序列。

这样，我们就可以做到“每N个document构成一个shard”。而且新来的document会进入新的shards。

Indexer Root in addition to Search Root

另外，是不是可以（或者说应该）在search root之外再加一个进程，indexer root？

![Alt text](http://g.gravizo.com/g?
digraph G {
aize ="4,4";
SearchRoot -> Leaf1;
SearchRoot -> Leaf2;
SearchRoot -> Leaf3;
IndexerRoot -> Leaf1;
IndexerRoot -> Leaf2;
IndexerRoot -> Leaf3;
PersistentStorage [shape=box]
IndexerRoot -> PersistentStorage
Etcd [shape=box]
IndexerRoot -> Etcd;
SearchRoot -> Etcd;
Leaf1 -> Etcd
Leaf2 -> Etcd
Leaf3 -> Etcd
})

IndexerRoot负责document ID的序列化生成。

具体的说，SearchRoot 实现RPC Search；IndexerRoot实现RPC AddDocument。

Go RPC calls 都是被多goroutine并发执行的，为了产生sequential document Id，AddDocument需要

把document放进一个go channel，然后这个channel的reader goroutine来生成document ID，
然后把分配了ID的document发给维护最新shard的那个node，
那个node负责persist document

以上想法都假设AddDocument的调用频率远远小于Search的调用频率。

design doc

9adebf2

yitopic reviewed Feb 16, 2016
View reviewed changes

ghost mentioned this pull request Feb 16, 2016

持久化需求 #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design doc(to be continued) #9

design doc(to be continued) #9

chenkaitopic commented Feb 15, 2016

yitopic Feb 16, 2016

yitopic commented Feb 16, 2016

design doc(to be continued) #9

Are you sure you want to change the base?

design doc(to be continued) #9

Conversation

chenkaitopic commented Feb 15, 2016

yitopic Feb 16, 2016

Choose a reason for hiding this comment

yitopic commented Feb 16, 2016

Sequential Document ID Generation

Indexer Root in addition to Search Root