New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial Update: elasticsearch/solr vs vespa #4154
Comments
Vespa supports partial updates of existing indexed documents, fastest is for fields defined with 'attribute' and of type numeric. See http://docs.vespa.ai/documentation/reference/document-json-update-format.html for update json syntax. |
200M a day is about 2k per second. That should work fine for any kind of field even on a single node. |
vespa’s partial update just reindex the updated fields ? es will reindex all the fields |
Just the fields that you want to update. That is why we call it a partial update. Numeric fields like byte, int, float etc is faster then string. Agree with @bratseth, 200M updates a day should be no match for even a single core machine. |
@zhuxiang1981 solr has in-place-updates but with some caveats (non indexed etc) https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates |
Any further questions on this topic @zhuxiang1981 ? Thanks |
We recently saw that 16k updates/sec were successful in one of our experiments with a cluster having 3 nodes, although all were integer updates. It's a good enough for now. We want to achieve 100k/sec updates which we would horizontally scale and achieve. Though we found that update throughput got very low (4k/sec) after we simultaneously ran benchmarking and hit the system with lots of queries. Any suggestions ? |
In order to tell wether your numbers makes sense, I need need to know the
machine config you are using.
Also your search definition and services file would be helpful.
There are some tricks that can be applied to push it even further up in
some cases.
Feed performance will go down during query load, how much depends on number
of threads on your machine.
As it is a search engine it is designed to favour queries over feed. It can
be tuned, but that has not been done very often, so it must be experimented
in each case.
I also do not remeber how well documented it is.
…On Tue, May 22, 2018 at 10:47 PM, Vandit Thakkar ***@***.***> wrote:
We recently saw that 16k updates/sec were successful in one of our
experiments with a cluster having 3 nodes, although all were integer
updates. It's a good enough for now. We want to achieve 100k/sec updates
which we would horizontally scale and achieve. Though we found that update
throughput got very low after we simultaneously ran benchmarking and hit
the system with lots of queries. Any suggestions ?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4154 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AS8BfFSxSkDzI9LJBWbhBXjRmVRSg5tQks5t1HlNgaJpZM4QgFWB>
.
|
Our application need partial update a field(rank field) of 200million documents daily, but solr and es goes very slow
The text was updated successfully, but these errors were encountered: