/
admin-procedures.html
550 lines (504 loc) · 24.4 KB
/
admin-procedures.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
---
# Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
title: "Administrative Procedures"
redirect_from:
- /documentation/operations/admin-procedures.html
- /en/operations/admin-procedures.html
---
<!--suppress HttpUrlsUsage -->
<h2 id="install">Install</h2>
<p>
Refer to the <a href="https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA">
multinode-HA</a> sample application for a primer on how to set up a cluster -
use this as a starting point.
Try the <a href="https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode">
Multinode testing and observability</a> sample app to get familiar with interfaces and behavior.
</p>
<h2 id="vespa-start-stop-restart">Vespa start / stop / restart</h2>
<p>
There is no restart command, do a stop then start for a restart.
Start and stop <span style="text-decoration: underline">all</span> services on a node:
</p>
<pre>
$ $VESPA_HOME/bin/<a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-start-services">vespa-start-services</a>
$ $VESPA_HOME/bin/<a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-stop-services">vespa-stop-services</a>
</pre>
<p>Likewise, for the config server:</p>
<pre>
$ $VESPA_HOME/bin/<a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-start-configserver">vespa-start-configserver</a>
$ $VESPA_HOME/bin/<a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-stop-configserver">vespa-stop-configserver</a>
</pre>
<p>
Learn more about which processes / services are started at <a href="/en/operations-selfhosted/config-sentinel.html">Vespa startup</a>,
read the <a href="/en/operations-selfhosted/configuration-server.html#start-sequence">start sequence</a>
and find training videos at the <a href="https://vespa.ai/resources">resource site</a>.
</p>
<p>
Use <a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-sentinel-cmd">vespa-sentinel-cmd</a>
to stop/start individual <span style="text-decoration: underline">services</span>.
</p>
{% include important.html content="Running <em>vespa-stop-services</em> on a content node will call
<a href='/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-proton-cmd'>prepareRestart</a> to optimize restart time,
and is the recommended way to stop Vespa on a node." %}
<p>
See <a href="multinode-systems.html#aws-ec2">multinode</a> for <em>systemd</em>/<em>systemctl</em> examples.
<a href="/en/operations-selfhosted/docker-containers.html">Docker containers</a> has relevant start/stop information, too.
</p>
<h2 id="system-status">System status</h2>
<ul>
<li>Use <a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-config-status">vespa-config-status</a>
on all nodes in <a href="../reference/hosts.html">hosts.xml</a> to verify all services run with updated config</li>
<li>Make sure <a href="/en/operations-selfhosted/files-processes-and-ports.html#environment-variables">
VESPA_CONFIGSERVERS</a> is set and identical on all nodes in hosts.xml</li>
<li>Use the <em>cluster controller</em> status page (below)
to track the status of search/storage nodes.</li>
<li>Check <a href="../reference/logs.html">logs</a></li>
<li>Use performance graphs, System Activity Report (<em>sar</em>)
or <a href="#status-pages">status pages</a> to track load</li>
<li>Use <a href="../reference/query-api-reference.html#trace.level">query tracing</a></li>
<li>Disk and/or memory might be exhausted and block feeding - recover from
<a href="/en/operations/feed-block.html">feed block</a></li>
</ul>
<!-- ToDo: https://github.com/vespa-engine/vespa/issues/18555
Tool to check a cluster's status like Vespa Cloud's deploy output -->
<h2 id="process-pid-files">Process PID files</h2>
<p>
All Vespa processes have a PID file <em>$VESPA_HOME/var/run/{service name}.pid</em>,
where <em>{service name}</em> is the Vespa service name, e.g. <em>container</em> or <em>distributor</em>.
It is the same name which is used in the administration interface
in the <a href="/en/operations-selfhosted/config-sentinel.html">config sentinel</a>.
</p><p>
Learn how to <a href="#vespa-start-stop-restart">start / stop</a> nodes.
</p>
<h2 id="status-pages">Status pages</h2>
<p>
Vespa service instances have status pages for debugging and testing.
Status pages are subject to change at any time - take care when automating. Procedure:
</p>
<ol>
<li>
<strong>Find the port:</strong>
The status pages runs on ports assigned by Vespa. To find status page ports,
use <a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect">vespa-model-inspect</a>
to list the services run in the application.
<pre>
$ vespa-model-inspect services
</pre>
To find the status page port for a specific node for a specific service,
pick the correct service and run:
<pre>
$ vespa-model-inspect service [Options] <service-name>
</pre>
</li><li>
<strong>Get the status and metrics:</strong>
<em>distributor</em>, <em>storagenode</em>, <em>searchnode</em> and
<em>container-clustercontroller</em> are content services with status pages.
These ports are tagged HTTP. The cluster controller have multiple ports tagged HTTP,
where the port tagged STATE is the one with the status page.
Try connecting to the root at the port, or /state/v1/metrics.
The <em>distributor</em> and <em>storagenode</em> status pages are available at <code>/</code>:
<pre>
$ vespa-model-inspect service searchnode
searchnode @ myhost.mydomain.com : search
search/search/cluster.search/0
tcp/myhost.mydomain.com:19110 (STATUS ADMIN RTC RPC)
tcp/myhost.mydomain.com:19111 (FS4)
tcp/myhost.mydomain.com:19112 (TEST HACK SRMP)
tcp/myhost.mydomain.com:19113 (ENGINES-PROVIDER RPC)
<span class="pre-hilite">tcp/myhost.mydomain.com:19114 (HEALTH JSON HTTP)</span>
$ curl http://myhost.mydomain.com:19114/state/v1/metrics
...
$ vespa-model-inspect service distributor
distributor @ myhost.mydomain.com : content
search/distributor/0
tcp/myhost.mydomain.com:19116 (MESSAGING)
tcp/myhost.mydomain.com:19117 (STATUS RPC)
<span class="pre-hilite">tcp/myhost.mydomain.com:19118 (STATE STATUS HTTP)</span>
$ curl http://myhost.mydomain.com:19118/state/v1/metrics
...
$ curl http://myhost.mydomain.com:19118/
...
</pre>
</li><li>
<strong>Use the cluster controller status page</strong>:
A status page for the cluster controller is available at the status port at
<code>http://hostname:port/clustercontroller-status/v1/<em><clustername></em></code>.
If <em>clustername</em> is not specified, the available clusters will be listed.
The cluster controller leader status page will show if any nodes are operating with differing cluster state versions.
It will also show how many data buckets are pending merging (document set reconciliation)
due to either missing or being out of sync.
<pre>
$ <a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect">vespa-model-inspect</a> service container-clustercontroller | grep HTTP
</pre>
<p>
With multiple cluster controllers, look at the one with a "/0" suffix in its config ID;
it is the preferred leader.
</p><p>
The cluster state version is listed under the <em>SSV</em> table column.
Divergence here usually points to host or networking issues.
</p>
</li>
</ol>
<h2 id="cluster-state">Cluster state</h2>
<p>
Cluster and node state information is available through the
<a href="../reference/cluster-v2.html">/cluster/v2 API</a>.
This API can also be used to set a <em>user state</em> for a node - alternatively use:
</p>
<ul>
<li><a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-get-cluster-state">vespa-get-cluster-state</a></li>
<li><a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-get-node-state">vespa-get-node-state</a></li>
<li><a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-set-node-state">vespa-set-node-state</a></li>
</ul>
<p>
Also see the cluster controller <a href="#status-pages">status page</a>.
</p><p>
State is persisted in a ZooKeeper cluster, restarting/changing a cluster controller preserves this:
</p>
<ul>
<li>Last cluster state version number, for new cluster controller handover at restarts</li>
<li>User states, set by operators - i.e. nodes manually set to down / maintenance</li>
</ul>
<p>
In case of state data lost, the cluster state is reset - see
<a href="../content/content-nodes.html#cluster-controller">cluster controller</a> for implications.
</p>
<h2 id="cluster-controller-configuration">Cluster controller configuration</h2>
<p>
It is recommended to run the cluster controller on the same hosts as
<a href="/en/operations-selfhosted/configuration-server.html">config servers</a>,
as they share a zookeeper cluster for state and deploying three nodes is best practise for both.
See the <a href="https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA">multinode-HA</a>
sample app for a working example.
</p>
<p>
To configure the cluster controller, use
<a href="../reference/services-content.html#cluster-controller">services.xml</a> and/or add
<a href="https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def">
configuration</a> under the <em>services</em> element - example:
</p>
<pre>
<services version="1.0">
<config name="vespa.config.content.fleetcontroller">
<min_time_between_new_systemstates>5000</min_time_between_new_systemstates>
</config>
</pre>
<p>
A broken content node may end up with processes constantly restarting.
It may die during initialization due to accessing corrupt files,
or it may die when it starts receiving requests of a given type triggering a node local bug.
This is bad for distributor nodes,
as these restarts create constant ownership transfer between distributors,
causing windows where buckets are unavailable.
</p><p>
The cluster controller has functionality for detecting such nodes.
If a node restarts in a way that is not detected as a controlled shutdown, more than
<a href="https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def">
max_premature_crashes</a>,
the cluster controller will set the wanted state of this node to be down.
</p><p>
Detecting a controlled restart is currently a bit tricky.
A controlled restart is typically initiated by sending a TERM signal to the process.
Not having any other sign,
the content layer has to assume that all TERM signals are the cause of controlled shutdowns.
Thus, if the process keep being killed by kernel due to using too much memory,
this will look like controlled shutdowns to the content layer.
</p>
<h2 id="monitor-distance-to-ideal-state">Monitor distance to ideal state</h2>
<p>
Refer to the <a href="../content/idealstate.html">distribution algorithm</a>.
Use distributor <a href="#status-pages">status pages</a> to inspect state metrics,
see <a href="../content/content-nodes.html#metrics">metrics</a>.
<code>idealstate.merge_bucket.pending</code> is the best metric to track,
it is 0 when the cluster is balanced - a non-zero value indicates buckets out of sync.
</p>
<h2 id="cluster-configuration">Cluster configuration</h2>
<ul>
<li>Running <code>vespa prepare</code> will not change served
configuration until <code>vespa activate</code> is run.
<code>vespa prepare</code> will warn about all config changes that require restart.
</li><li>
Refer to <a href="../schemas.html">schemas</a> for how to add/change/remove these.
</li><li>
Refer to <a href="../elasticity.html">elasticity</a>
for how to add/remove capacity from a Vespa cluster, procedure below.
</li><li>
See <a href="../components/chained-components.html">chained components</a>
for how to add or remove searchers and document processors.
</li><li>
Refer to the <a href="../performance/sizing-examples.html">sizing examples</a>
for changing from a <em>flat</em> to <em>grouped</em> content cluster.
</li>
</ul>
<h2 id="add-or-remove-a-content-node">Add or remove a content node</h2>
<ol>
<li>
<strong>Node setup:</strong>
Prepare the node by installing software, set up the file systems/directories
and set <a href="/en/operations-selfhosted/files-processes-and-ports.html#environment-variables">
VESPA_CONFIGSERVERS</a>.
<a href="#vespa-start-stop-restart">Start</a> the node.
</li><li>
<strong>Modify configuration:</strong>
Add/remove a <a href="../reference/services-content.html#node">node</a>-element in <em>services.xml</em>
and <a href="../reference/hosts.html">hosts.xml</a>.
Refer to <a href="multinode-systems.html">multinode install</a>.
Make sure the <em>distribution-key</em> is unique.
</li><li>
<strong>Deploy</strong>: <a href="#monitor-distance-to-ideal-state">
Observe metrics</a> to track progress as the cluster redistributes documents.
Use the <a href="../content/content-nodes.html#cluster-controller">cluster controller</a>
to monitor the state of the cluster.
</li><li>
<strong>Tune performance (optional):</strong>
Use <a href="https://github.com/vespa-engine/vespa/blob/master/storage/src/vespa/storage/config/stor-distributormanager.def">
maxpendingidealstateoperations</a> to tune concurrency of bucket merge operations from distributor nodes.
Likewise, tune <a href="../reference/services-content.html#merges">merges</a> -
concurrent merge operations per content node. The tradeoff is speed of bucket replication vs
use of resources, which impacts the applications' regular load.
</li><li>
<strong>Finish:</strong>
The cluster is done redistributing when <code>idealstate.merge_bucket.pending</code>
is zero on all distributors.
</li>
</ol>
<p>
Do not remove more than <em>redundancy</em>-1 nodes at a time, to avoid data loss.
Observe <code>idealstate.merge_bucket.pending</code> to know bucket replica status,
when zero on all distributor nodes, it is safe to remove more nodes.
If <a href="../elasticity.html#grouped-distribution">grouped distribution</a>
is used to control bucket replicas, remove all nodes in a group
if the redundancy settings ensure replicas in each group.
</p><p>
To increase bucket redundancy level before taking nodes out,
<a href="../content/content-nodes.html">retire</a> nodes.
Again, track <code>idealstate.merge_bucket.pending</code> to know when done.
Use the <a href="../reference/cluster-v2.html">/cluster/v2 API</a> or
<a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-set-node-state">vespa-set-node-state</a>
to set a node to <em>retired</em>.
The <a href="../content/content-nodes.html#cluster-controller">cluster controller's</a>
status page lists node states.
</p><p>
An alternative to increasing cluster size is building a new cluster, then migrate documents to it.
This is supported using <a href="../visiting.html">visiting</a>.
</p><p>
To <em>merge</em> two content clusters, add nodes to the cluster like above, considering:
</p>
<ul>
<li>
<a href="../reference/services-content.html#node">distribution-keys</a> must be unique.
Modify paths like <em>$VESPA_HOME/var/db/vespa/search/mycluster/n3</em> before adding the node.
</li><li>
Set <a href="/en/operations-selfhosted/files-processes-and-ports.html#environment-variables">
VESPA_CONFIGSERVERS</a>, then start the node.
</li>
</ul>
<h2 id="topology-change">Topology change</h2>
<p>
Read <a href="/en/elasticity.html#changing-topology">changing topology first</a>,
and plan the sequence of steps.
</p>
<p>
Make sure to <span style="text-decoration: underline">not</span> change the <code>distribution-key</code>
for nodes in <em>services.xml</em>.
</p>
<p>
It is not required to restart nodes as part of this process
</p>
<h2 id="add-or-remove-services-on-a-node">Add or remove services on a node</h2>
<p>
It is possible to run multiple Vespa services on the same host.
If changing the services on a given host,
stop Vespa on the given host before running <code>vespa activate</code>.
This is because the services are dynamically allocated port numbers,
depending on what is running on the host.
Consider if some of the services changed are used by services on other hosts.
In that case, restart services on those hosts too. Procedure:
</p>
<ol>
<li>Edit <em>services.xml</em> and <em>hosts.xml</em></li>
<li>Stop Vespa on the nodes that have changes</li>
<li>Run <code>vespa prepare</code> and <code>vespa activate</code></li>
<li>Start Vespa on the nodes that have changes</li>
</ol>
<h2 id="troubleshooting">Troubleshooting</h2>
<p>
Also see the <a href="../faq.html">FAQ</a>.
</p>
<table class="table">
<tr>
<th>No endpoint</th>
<td>
<p id="no-endpoint">
Most problems with the quick start guides are due to Docker out of memory.
Make sure at least 6G memory is allocated to Docker:
</p>
<pre>
$ docker info | grep "Total Memory"
</pre>
OOM symptoms include
<pre>
INFO: Problem with Handshake localhost:8080 ssl=false: localhost:8080 failed to respond
</pre>
The container is named <em>vespa</em> in the guides, for a shell do:
<pre>
$ docker exec -it vespa bash
</pre>
</td>
</tr>
<tr>
<th>Log viewing</th>
<td>
<p id="log-viewing">
Use <a href="/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-logfmt">vespa-logfmt</a> to view the vespa log - example:
</p>
<pre>
$ /opt/vespa/bin/vespa-logfmt -l warning,error
</pre>
</td>
</tr>
<tr>
<th>Json</th>
<td>
<p id="json">For json pretty-print, append</p>
<pre>
| python -m json.tool
</pre>
to commands that output json - or use <a href="https://stedolan.github.io/jq/">jq</a>.
</td>
</tr>
<tr>
<th>Routing</th>
<td>
<p id="routing">
Vespa lets application set up custom document processing / indexing,
with different feed endpoints.
Refer to <a href="../indexing.html">indexing</a> for how to configure this in <em>services.xml</em>.
</p>
<p>
<a href="https://github.com/vespa-engine/vespa/issues/13193">#13193</a>
has a summary of problems and solutions.
</p>
</td>
</tr>
<tr>
<th>Tracing</th>
<td>
<p id="tracing">
Use <a href="../reference/document-v1-api-reference.html#request-parameters">tracelevel</a>
to dump the routes and hops for a write operation - example:
</p>
<pre>
$ curl -H Content-Type:application/json --data-binary @docs.json \
$ENDPOINT/document/v1/mynamespace/doc/docid/1?tracelevel=4 | jq .
{
"pathId": "/document/v1/mynamespace/doc/docid/1",
"id": "id:mynamespace:doc::1",
"trace": [
{ "message": "[1623413878.905] Sending message (version 7.418.23) from client to ..." },
{ "message": "[1623413878.906] Message (type 100004) received at 'default/container.0' ..." },
{ "message": "[1623413878.907] Sending message (version 7.418.23) from 'default/container.0' ..." },
{ "message": "[1623413878.907] Message (type 100004) received at 'default/container.0' ..." },
{ "message": "[1623413878.909] Selecting route" },
{ "message": "[1623413878.909] No cluster state cached. Sending to random distributor." }
</pre>
</td>
</tr>
</table>
<h2 id="clean-start-mode">Clean start mode</h2>
<p>
There has been rare occasions were Vespa stored data that was internally inconsistent.
For those circumstances it is possible to start the node in a
<a href="https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/proton.def">
validate_and_sanitize_docstore</a> mode.
This will do its best to clean up inconsistent data.
However, detecting that this is required is not easy, consult the Vespa Team first.
In order for this approach to work, all nodes must be stopped before enabling this feature -
this to make sure the data is not redistributed.
</p>
<h2 id="content-cluster-configuration">Content cluster configuration</h2>
<table class="table">
<thead></thead><tbody>
<tr><th>Availability vs resources</th>
<td>
<p id="availability-vs-resources">
Keeping index structures costs resources. Not all replicas of buckets are
necessarily searchable, unless configured using
<a href="../reference/services-content.html#searchable-copies">searchable-copies</a>.
As Vespa indexes buckets on-demand, the most cost-efficient setting is 1,
if one can tolerate temporary coverage loss during node failures.
</p>
</td>
</tr>
<tr><th>Data retention vs size</th>
<td>
<p id="data-retention-vs-size">
When a document is removed, the document data is not immediately purged.
Instead, <em>remove-entries</em> (tombstones of removed documents) are kept
for a configurable amount of time. The default is two weeks, refer to
<a href="../reference/services-content.html#removed-db-prune-age">removed-db prune age</a>.
This ensures that removed documents stay removed in a distributed system where nodes change state.
Entries are removed periodically after expiry.
Hence, if a node comes back up after being down for more than two weeks,
removed documents are available again, unless the data on the node is wiped first.
A larger <em>prune age</em> will grow the storage size as this keeps
document and tombstones longer.
</p><p>
<strong>Note:</strong> The backend does not store remove-entries for nonexistent documents.
This to prevent clients sending wrong document identifiers from
filling a cluster with invalid remove-entries.
A side effect is that if a problem has caused all replicas of a bucket to be unavailable,
documents in this bucket cannot be marked removed until at least one replica is available again.
Documents are written in new bucket replicas while the others are down -
if these are removed, then older versions of these will not re-emerge,
as the most recent change wins.
</p></td>
</tr>
<tr><th style="white-space:nowrap;">Transition time</th>
<td>
<p id="transition-time">
See <a href="../reference/services-content.html#transition-time">transition-time</a>
for tradeoffs for how quickly nodes are set down vs. system stability.
</p>
</td>
</tr>
<tr><th>Removing unstable nodes</th>
<td>
<p id="removing-unstable-nodes">
One can configure how many times a node is allowed to crash before it will automatically be removed.
The crash count is reset if the node has been up or down continuously for more than the
<a href="../reference/services-content.html#stable-state-period">stable state period</a>.
If the crash count exceeds
<a href="../reference/services-content.html#max-premature-crashes">max premature crashes</a>, the node will be disabled.
Refer to <a href="#troubleshooting">troubleshooting</a>.
</p>
</td>
</tr>
<tr><th>Minimal amount of nodes required to be available</th>
<td>
<p id="minimal-amount-of-nodes-required-to-be-available">
A cluster is typically sized to handle a given load.
A given percentage of the cluster resources are required for normal operations,
and the remainder is the available resources that can be used if some of the nodes are no longer usable.
If the cluster loses enough nodes, it will be overloaded:
</p>
<ul>
<li>Remaining nodes may create disk full situation.
This will likely fail a lot of write operations, and if disk is shared with OS,
it may also stop the node from functioning.</li>
<li>Partition queues will grow to maximum size.
As queues are processed in FIFO order, operations are likely to get long latencies.</li>
<li>Many operations may time out while being processed,
causing the operation to be resent, adding more load to the cluster.</li>
<li>When new nodes are added, they cannot serve requests
before data is moved to the new nodes from the already overloaded nodes.
Moving data puts even more load on the existing nodes,
and as moving data is typically not high priority this may never actually happen.</li>
</ul>
To configure what the minimal cluster size is, use
<a href="../reference/services-content.html#min-distributor-up-ratio">min-distributor-up-ratio</a> and
<a href="../reference/services-content.html#min-storage-up-ratio">min-storage-up-ratio</a>.
</td>
</tr>
</tbody>
</table>