Skip to content
Permalink
master
Go to file
 
 
Cannot retrieve contributors at this time
316 lines (288 sloc) 13.6 KB
---
# Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
title: "Writing to Vespa"
---
<p>
<a href="documents.html">Documents</a> are stored in <em>content</em> clusters.
Updates (PUT, UPDATE, DELETE) and reads (GET) pass through a <em>container</em> cluster -
Refer to <a href="elastic-vespa.html#writing-documents">elastic Vespa</a> for a more detailed flow:
</p>
<img src="img/overview/vespa-overview.svg" style="padding:10px">
<p>
This guide covers the <em>functional</em> aspects of updating documents in Vespa.
</p><p>
Vespa's indexing structures is built for high-rate, memory-only operations for field updates.
Refer to the <a href="performance/sizing-feeding.html">feed sizing guide</a> for write performance.
</p><p>
Vespa supports <a href="parent-child.html">parent/child</a> for de-normalized data.
This can be used to simplify the code to update application data,
as one write will update all children documents.
</p><p>
Applications can add custom feed document processors and multiple container clusters -
see <a href="indexing.html">indexing</a> for details.
</p>
<h2 id="reads-and-writes">Reads and writes</h2>
<table class="table">
<tbody>
<tr>
<td><a href="reference/document-v1-api-reference.html#get">Get</a></td>
<td><p>
Returns the newest document instance.
</p></td>
</tr><tr>
<td><a href="reference/document-v1-api-reference.html#put">Put</a></td>
<td><p>
Write a <a href="documents.html">document</a> by ID -
a document is overwritten if a document with the same document ID exists.
</p></td>
</tr><tr>
<td><a href="reference/document-v1-api-reference.html#delete">Remove</a></td>
<td><p>
Remove a document by ID.
Later requests to access the document will not find it - read
more about <a href="elastic-vespa.html#data-retention-vs-size">remove-entries</a>.
If the document to be removed is not found, this is returned in the reply.
This is not considered a failure.
</p></td>
</tr><tr>
<td><a href="reference/document-v1-api-reference.html#update">Update</a></td>
<td><p>
Also referred to as <em>partial update</em>, as it updates some/all fields of a document by ID.
If the document to update does not exist, the update returns a reply
stating that no document was found.
</p><p>
Update supports <a href="#create-if-nonexistent">create if nonexistent</a>.
</p><p>
All data structures (<a href="attributes.html">attribute</a>,
<a href="proton.html#index">index</a> and <a href="document-summaries.html">summary</a>) are updatable.
Note that only <em>assign</em> and <em>remove</em> are idempotent -
message re-sending can apply updates more than once.
Use conditional writes for stronger consistency.
</p>
<table class="table">
<tr>
<td><strong>All field types</strong></td>
<td><ul>
<li><code><a href="reference/document-json-format.html#assign">assign</a></code> (may also be used to clear fields)</li>
</ul></td>
</tr><tr>
<td><strong>Numeric field types</strong></td>
<td><ul>
<li><code><a href="reference/document-json-format.html#arithmetic">increment</a></code>
See <a href="#create-if-nonexistent-keys">create if nonexistent - keys</a>
for how to auto-create keys in weighted sets</li>
<li><code><a href="reference/document-json-format.html#arithmetic">decrement</a></code></li>
<li><code><a href="reference/document-json-format.html#arithmetic">multiply</a></code></li>
<li><code><a href="reference/document-json-format.html#arithmetic">divide</a></code></li>
</ul></td>
</tr><tr>
<td><strong>Composite types</strong></td>
<td><ul>
<li><code><a href="reference/document-json-format.html#add">add</a></code>
For <em>array</em> and <em>weighted set</em>.
To put into a <em>map</em>,
see the <a href="reference/document-json-format.html#assign">assign</a> section</li>
<li><code><a href="reference/document-json-format.html#remove">remove</a></code></li>
<li><code><a href="reference/document-json-format.html#match">match</a></code>
Pick element from collection, then apply given operation to matched element</li>
<li><a href="reference/document-json-format.html#fieldpath">
accessing elements within a composite field using fieldpaths</a></li>
</ul></td>
</tr><tr>
<td><strong>Tensor types</strong></td>
<td><ul>
<li><code><a href="reference/document-json-format.html#tensor-modify">modify</a></code>
Modify individual cells in a tensor - can replace, add or multiply cell values</li>
<li><code><a href="reference/document-json-format.html#tensor-add">add</a></code>
Add cells to mapped or mixed tensors</li>
<li><code><a href="reference/document-json-format.html#tensor-remove">remove</a></code>
Remove cells from mapped or mixed tensors</li>
</ul></td>
</tr>
</table>
</td>
</tr>
</tbody>
</table>
<h2 id="conditional-writes">Conditional writes</h2>
<p>
A <em>test and set</em> <em>condition</em> can be added to Put, Remove and Update operations.
The <em>condition</em> is a <a href="reference/document-select-language.html">document selection</a>.
Refer to the <a href="reference/document-json-format.html#test-and-set">test-and-set reference</a>.
</p><p>
Example: Increment the <em>sales</em> field only if it is already equal to 999:
<pre>
{
"update": "id:music:music::BestOf",
"condition": "music.sales==999",
"fields": {
"sales": {
"increment": 1
}
}
}
</pre>
<strong>Note:</strong> Use <em>documenttype.fieldname</em> (e.g. music.sales) in the condition,
not only <em>fieldname</em>.
</p><p>
<strong>Note:</strong> If the condition is not met, an error is returned.
<strong>ToDo:</strong> There is a discussion whether to change to not return error,
and instead return a <em>condition-not-met</em> in the response.
</p>
<h2 id="create-if-nonexistent">Create if nonexistent</h2>
<p>
Updates to nonexistent documents are supported using
<a href="reference/document-json-format.html#create">create</a>.
An empty document is created on the content nodes, before the update is applied.
This simplifies client code in the case of multiple writers.
Java example using the <a href="document-api-guide.html">Document API</a>:
<pre>
public DocumentUpdate createUpdate(DocumentType musicType) {
DocumentUpdate update = new DocumentUpdate(musicType, "id:mynamespace:music::http://music.yahoo.com/bobdylan/BestOf");
update.setCreateIfNonExistent(true);
return update;
}
</pre>
<em>create</em> can be used in combination with a
<a href="#conditional-writes">condition</a>.
If the document does not exist, the condition will be ignored
and a new document with the update applied is automatically created.
Otherwise, the condition must match for the update to take place.
</p><p>
<strong>Caution:</strong> if all existing replicas of a document are missing <!-- ToDo: rewrite this / move to elastic-vespa details ... -->
when an update with <em>"create": true</em> is executed,
a new document will always be created.
This happens even if a condition has been given.
If the existing replicas become available later,
their version of the document will be overwritten by the newest update since it has a higher timestamp.
</p>
<h2 id="create-if-nonexistent-keys">Create if nonexistent - keys</h2>
<p>
If a key is missing in a weighted set field,
it can be <a href="reference/schema-reference.html#weightedset">auto created</a>
before incrementing its key value:
<pre>
field tag type weightedset<string> {
indexing: attribute | summary
weightedset {
create-if-nonexistent
remove-if-zero
}
}
</pre>
The above also auto-deletes the key if decremented to 0.
</p>
<h2 id="api-and-utilities">API and utilities</h2>
<p>
Documents are created using <a href="reference/document-json-format.html">JSON</a> or in
<a href="https://github.com/vespa-engine/vespa/blob/master/document/src/main/java/com/yahoo/document/Document.java">Java</a>:
<table class="table">
<tr>
<td><a href="reference/document-v1-api-reference.html">/document/v1/</a></td>
<td>API for get, put, remove, update, visit.</td>
</tr><tr>
<td><a href="vespa-http-client.html">Vespa HTTP client</a></td>
<td>Jar writing to Vespa either by method calls in Java or from the command line.
It provides a simple API while achieving high performance
by using multiplexing and multiple parallel async connections.
It is recommended in all cases when feeding from a node outside the Vespa cluster.</td>
</tr><tr>
<td style="white-space: nowrap"><a href="document-api-guide.html">Java Document API</a></td>
<td>Provides direct read-and write access to Vespa documents using Vespa's internal communication layer.
Use this when accessing documents from Java components in Vespa
such as <a href="searcher-development.html">searchers</a> and
<a href="document-processing.html">document processors</a>.</td>
</tr><tr>
<td><a href="reference/vespa-cmdline-tools.html#vespa-feeder">vespa-feeder</a></td>
<td>Utility to feed data with high performance.
<a href="reference/vespa-cmdline-tools.html#vespa-get">vespa-get</a> gets single documents,
<a href="reference/vespa-cmdline-tools.html#vespa-visit">vespa-visit</a> gets multiple.</td>
</tr>
</table>
</p>
<h2 id="feed-block">Feed block</h2>
<p>
Write operations fail when a cluster is at disk or memory capacity.
The <a href="performance/attribute-memory-usage.html#multivalue-mapping-enum-store">attribute multivalue mapping and enum store</a>
can also go full and block feeding.
</p><p>
To remedy, add nodes to the content cluster, or use nodes with higher capacity.
The data will <a href="elastic-vespa.html">auto-redistribute</a>, and feeding is unblocked.
Configure <a href="reference/services-content.html#resource-limits">resource-limits</a> to tune this.
Temporarily increasing <em>resource-limits</em> (e.g. from 0.8 to 0.9) is an alternative if the cluster is stuck.
</p><p>
These <a href="reference/metrics.html">metrics</a> indicate whether feeding is blocked (set to 1 when blocked):
<table class="table">
<tr>
<th>content.proton.resource_usage.feeding_blocked</th>
<td>disk or memory</td>
</tr><tr>
<th>content.proton.documentdb.attribute.resource_usage.feeding_blocked</th>
<td><a href="performance/attribute-memory-usage.html#multivalue-mapping-enum-store">attribute enum store</a>
or <a href="schemas.html#multivalue-fields">multivalue</a></td>
</tr>
</table>
When feeding is blocked, events are logged - examples:
<pre>
java.lang.RuntimeException: ReturnCode(NO_SPACE, Put operation rejected for document 'id:test:test::0': 'diskLimitReached: {
action: \"add more content nodes\",
reason: \"disk used (0.85) &gt; disk limit (0.8)\",
capacity: 100000000000,
free: 85000000000,
available: 85000000000,
diskLimit: 0.8
}'
</pre>
</p>
<h2 id="consistency">Consistency</h2>
<p>
Read the reference at <a href="content/consistency.html">Vespa consistency model</a>.
Vespa is <em>eventually consistent</em> -
find details on dynamic behavior in <a href="elastic-vespa.html#consistency">elastic Vespa</a>.
</p><p>
It is recommended to use the same client instance for updating a given document -
both for data consistency but also
<a href="performance/sizing-feeding.html#concurrent-mutations">performance</a> (see <em>concurrent mutations</em>).
Read more on write operation <a href="content/content-nodes.html#ordering">ordering</a>.
</p><p>
Where possible, group field updates to the same document into
<a href="performance/sizing-feeding.html#client-roundtrips">one update operation</a>.
</p>
<h2 id="dropped-writes-document-expiry">Dropped writes - document expiry</h2>
<p>
Applications can <a href="documents.html#document-expiry">auto-expire documents</a>.
This feature also blocks PUTs to documents that are already expired -
see <a href="indexing.html#document-selection">indexing</a> and
<a href="reference/services-content.html#documents">document selection</a>.
This is a common problem when feeding test data with timestamps,
and the writes a silently dropped.
</p>
<h2 id="batch-delete">Batch delete</h2>
<p>
Options for batch deleting documents:
<ul>
<li><strong>Find documents using search, delete, repeat</strong>. Pseudocode:
<pre>
while True; do
query and read document ids, if empty exit
delete document ids using <a href="reference/document-v1-api-reference.html#delete">/document/v1</a>
wait a sec # optional, add wait to reduce load while deleting
</pre>
</li><li><strong>Like 1. but use the <a href="vespa-http-client.html">Java client</a></strong>.
Instead of deleting one-by-one, stream remove operations to the API (write a Java program for this),
or append to a JSON file and use the binary:
<pre>
$ java -jar $VESPA_HOME/lib/jars/vespa-http-client-jar-with-dependencies.jar --host document-api-host &lt; deletes.json
</pre>
</li><li><strong>Use a <a href="documents.html#document-expiry">document selection</a></strong> to expire documents.
This deletes all documents <em>not</em> matching the expression.
It is possible to use parent documents and imported fields for expiry of a document set.
The content node will iterate over the corpus and delete documents (that are later compacted out):
<pre>
&lt;documents garbage-collection="true"&gt;
&lt;document type="mytype" selection="mytype.version &amp;gt; 4" &gt;
&lt;/documents&gt;
</pre>
</li>
</ul>
</p>
You can’t perform that action at this time.