Skip to content

DEV: add new vector set data type and command pages #1334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 3, 2025
119 changes: 119 additions & 0 deletions content/commands/vadd/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
categories:
- docs
- develop
- stack
- oss
- rs
- rc
- oss
- kubernetes
- clients
complexity: O(log(N)) for each element added, where N is the number of elements in the vector set.
description: Add a new element to a vector set, or update its vector if it already exists.
group: vector_set
hidden: false
linkTitle: VADD
since: 8.0.0
summary: Add a new element to a vector set, or update its vector if it already exists.
syntax_fmt: "VADD key [REDUCE dim] (FP32 | VALUES num) vector element [CAS] [NOQUANT | Q8 | BIN]\n [EF build-exploration-factor] [SETATTR attributes] [M numlinks]"
title: VADD
bannerText: Vector set is a new data type that is currently in preview and may be subject to change.
---

Add a new element into the vector set specified by `key`. The vector can be provided as 32-bit floating point (`FP32`) blob of values, or as floating point numbers as strings, prefixed by the number of elements (3 in the example below):

```
VADD mykey VALUES 3 0.1 1.2 0.5 my-element
```

## Required arguments

<details open>
<summary><code>key</code></summary>

is the name of the key that will hold the vector set data.
</details>

<details open>
<summary><code>FP32 vector or VALUES num vector</code></summary>

either a 32-bit floating point (FP32) blob of values or `num` floating point numbers as strings.
</details>

<details open>
<summary><code>element</code></summary>

is the name of the element that is being added to the vector set.
</details>

## Optional arguments

<details open>
<summary><code>REDUCE dim</code></summary>

implements random projection to reduce the dimensionality of the vector. The projection matrix is saved and reloaded along with the vector set. Please note that the REDUCE option must be passed immediately before the vector. For example,

```
VADD mykey REDUCE 50 VALUES ...
```
</details>

<details open>
<summary><code>CAS</code></summary>

performs the operation partially using threads, in a check-and-set style. The neighbor candidates collection, which is slow, is performed in the background, while the command is executed in the main thread.
</details>

<details open>
<summary><code>NOQUANT</code></summary>

in the first VADD call for a given key, NOQUANT forces the vector to be created without int8 quantization, which is otherwise the default.
</details>

<details open>
<summary><code>BIN</code></summary>

forces the vector to use binary quantization instead of int8. This is much faster and uses less memory, but impacts the recall quality.
</details>

<details open>
<summary><code>Q8</code></summary>

forces the vector to use signed 8-bit quantization. This is the default, and the option only exists to make sure to check at insertion time that the vector set is of the same format.
</details>

{{< note >}}
`NOQUANT`, `Q8`, and `BIN` are mutually exclusive.

{{< /note >}}

<details open>
<summary><code>EF build-exploration-factor</code></summary>

plays a role in the effort made to find good candidates when connecting the new node to the existing Hierarchical Navigable Small World (HNSW) graph. The default is 200. Using a larger value may help in achieving a better recall. To improve the recall it is also possible to increase EF during VSIM searches.
</details>

<details open>
<summary><code>SETATTR attributes</code></summary>

associates attributes in the form of a JavaScript object to the newly created entry or updates the attributes (if they already exist).
It is the same as calling the VSETATTR command separately.
</details>

<details open>
<summary><code>M numlinks</code></summary>

is the maximum number of connections that each node of the graph will have with other nodes. The default is 16. More connections means more memory, but provides for more efficient graph exploration. Nodes at layer zero (every node exists at least at layer zero) have `M * 2` connections, while the other layers only have `M` connections. For example, setting `M` to `64` will use at least 1024 bytes of memory for layer zero. That's `M * 2` connections times 8 bytes (pointers), or `128 * 8 = 1024`. For higher layers, consider the following:

- Each node appears in ~1.33 layers on average (empirical observation from HNSW papers), which works out to be 0.33 higher layers per node.
- Each of those higher layers has `M = 64` connections.

So, the additional amount of memory is approximately `0.33 × 64 × 8 ≈ 169.6` bytes per node, bringing the total memory to ~1193 bytes.

If you don't have a recall quality problem, the default is acceptable, and uses a minimal amount of memory.
</details>

## Related topics

- [Vector sets]({{< relref "/develop/data-types/vector-sets" >}})
41 changes: 41 additions & 0 deletions content/commands/vcard/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
categories:
- docs
- develop
- stack
- oss
- rs
- rc
- oss
- kubernetes
- clients
complexity: O(1)
description: Return the number of elements in a vector set.
group: vector_set
hidden: false
linkTitle: VCARD
since: 8.0.0
summary: Return the number of elements in a vector set.
syntax_fmt: "VCARD key"
title: VCARD
bannerText: Vector set is a new data type that is currently in preview and may be subject to change.
---

Return the number of elements in the specified vector set.

```shell
VCARD word_embeddings
(integer) 3000000
```

## Required arguments

<details open>
<summary><code>key</code></summary>

is the name of the key that holds the vector set.
</details>

## Related topics

- [Vector sets]({{< relref "/develop/data-types/vector-sets" >}})
43 changes: 43 additions & 0 deletions content/commands/vdim/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
categories:
- docs
- develop
- stack
- oss
- rs
- rc
- oss
- kubernetes
- clients
complexity: O(1)
description: Return the dimension of vectors in the vector set.
group: vector_set
hidden: false
linkTitle: VDIM
since: 8.0.0
summary: Return the dimension of vectors in the vector set.
syntax_fmt: "VDIM key"
title: VDIM
bannerText: Vector set is a new data type that is currently in preview and may be subject to change.
---

Return the number of dimensions of the vectors in the specified vector set.

```shell
VDIM word_embeddings
(integer) 300
```

If the vector set was created using the `REDUCE` option for dimensionality reduction, this command reports the reduced dimension. However, you must still use full-size vectors when performing queries with the `VSIM` command.

## Required arguments

<details open>
<summary><code>key</code></summary>

is the name of the key that holds the vector set.
</details>

## Related topics

- [Vector sets]({{< relref "/develop/data-types/vector-sets" >}})
72 changes: 72 additions & 0 deletions content/commands/vemb/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
categories:
- docs
- develop
- stack
- oss
- rs
- rc
- oss
- kubernetes
- clients
complexity: O(1)
description: Return the vector associated with an element.
group: vector_set
hidden: false
linkTitle: VEMB
since: 8.0.0
summary: Return the vector associated with an element.
syntax_fmt: "VEMB key element [RAW]"
title: VEMB
bannerText: Vector set is a new data type that is currently in preview and may be subject to change.
---

Return the approximate vector associated with a given element in the vector set.

```shell
VEMB word_embeddings SQL
1) "0.18208661675453186"
2) "0.08535309880971909"
3) "0.1365649551153183"
4) "-0.16501599550247192"
5) "0.14225517213344574"
... 295 more elements ...
```

Vector sets normalize and may quantize vectors on insertion. `VEMB` reverses this process to approximate the original vector by de-normalizing and de-quantizing it.

To retrieve the raw internal representation, use the `RAW` option:

```shell
VEMB word_embeddings apple RAW
1) int8
2) "\xf1\xdc\xfd\x1e\xcc%E...\xde\x1f\xfbN" # artificially shortened for this example
3) "3.1426539421081543"
4) "0.17898885905742645"
```

## Required arguments

<details open>
<summary><code>key</code></summary>

is the name of the key that holds the vector set.
</details>

<details open>
<summary><code>element</code></summary>

is the name of the element whose vector you want to retrieve.
</details>

## Optional arguments

<details open>
<summary><code>RAW</code></summary>

returns the raw vector data, its quantization type, and metadata such as norm and range.
</details>

## Related topics

- [Vector sets]({{< relref "/develop/data-types/vector-sets" >}})
46 changes: 46 additions & 0 deletions content/commands/vgetattr/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
categories:
- docs
- develop
- stack
- oss
- rs
- rc
- oss
- kubernetes
- clients
complexity: O(1)
description: Retrieve the JSON attributes of elements.
group: vector_set
hidden: false
linkTitle: VGETATTR
since: 8.0.0
summary: Retrieve the JSON attributes of elements.
syntax_fmt: "VGETATTR key element"
title: VGETATTR
bannerText: Vector set is a new data type that is currently in preview and may be subject to change.
---

Return the JSON attributes associated with an element in a vector set.

```shell
VGETATTR key element
```

## Required arguments

<details open>
<summary><code>key</code></summary>

is the name of the key that holds the vector set.
</details>

<details open>
<summary><code>element</code></summary>

is the name of the element whose attributes you want to retrieve.
</details>

## Related topics

- [Vector sets]({{< relref "/develop/data-types/vector-sets" >}})
52 changes: 52 additions & 0 deletions content/commands/vinfo/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
categories:
- docs
- develop
- stack
- oss
- rs
- rc
- oss
- kubernetes
- clients
complexity: O(1)
description: Return information about a vector set.
group: vector_set
hidden: false
linkTitle: VINFO
since: 8.0.0
summary: Return information about a vector set.
syntax_fmt: "VINFO key"
title: VINFO
bannerText: Vector set is a new data type that is currently in preview and may be subject to change.
---

Return metadata and internal details about a vector set, including size, dimensions, quantization type, and graph structure.

```shell
VINFO word_embeddings
1) quant-type
2) int8
3) vector-dim
4) (integer) 300
5) size
6) (integer) 3000000
7) max-level
8) (integer) 12
9) vset-uid
10) (integer) 1
11) hnsw-max-node-uid
12) (integer) 3000000
```

## Required arguments

<details open>
<summary><code>key</code></summary>

is the name of the key that holds the vector set.
</details>

## Related topics

- [Vector sets]({{< relref "/develop/data-types/vector-sets" >}})
Loading
Oops, something went wrong.