<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://vespa.ai/assets/vespa-ai-logo-heather.svg">
  <source media="(prefers-color-scheme: light)" srcset="https://vespa.ai/assets/vespa-ai-logo-rock.svg">
  <img alt="#Vespa" width="200" src="https://vespa.ai/assets/vespa-ai-logo-rock.svg" style="margin-bottom: 25px;">
</picture>

# Cohere Binary Embeddings

Cohere just released a new embedding API with support for binary vectors and int8 vectors [blog post](https://txt.cohere.com/int8-binary-embeddings/)

> We are excited to announce that Cohere Embed is the first embedding model that natively supports int8 and binary embeddings.


This notebook demonstrates how to use the binary vectors with Vespa, including re-ranking using the full precision query vector for improved accuracy. 

Let us dive in!

In [None]:
!pip3 install -U pyvespa cohere

## Examining the Cohere embeddings

Let us check out the API

In [23]:
import cohere

In [27]:
api_key="your_api_key_here"
co = cohere.Client(api_key)

Some sample documents

In [28]:
documents = [
    "Alan Turing  was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.",
    "Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.",
    "Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.",
    "Marie Curie was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity"
]

In [33]:

# Compute the binary embeddings ofdocuments. Set input_type to "search_document" and embedding_types to "binary"
embeddings = co.embed(documents, model="embed-english-v3.0", input_type="search_document", embedding_types=["binary"])

In [34]:
print(embeddings)

cohere.Embeddings {
	response_type: embeddings_by_type
	embeddings: cohere.EmbeddingsByType {
	float: None
	int8: None
	uint8: None
	binary: [[-110, 121, 110, -50, 87, -59, 8, 35, 114, 30, -92, -112, -118, -16, 7, 96, 17, 19, 97, -9, -23, 25, -103, -35, -78, -45, 72, -123, -41, 67, 14, -31, -42, -126, 75, 111, 62, -64, 57, 64, -52, -66, -64, -12, 100, 99, 87, 61, -5, 5, 23, 34, -75, -66, -16, 91, 92, 121, 55, 117, 100, -112, -24, 84, 84, -65, 61, -31, -45, 7, 44, 8, -35, -125, 16, -50, -52, 11, -105, -32, 102, -62, -3, 86, -107, 21, 95, 15, 27, -79, -20, 114, 90, 125, 110, -97, -15, -98, 21, -102, -124, 112, -115, 26, -86, -55, 67, 7, 11, -127, 125, 103, -46, -55, 79, -31, 126, -32, 33, -128, -124, -80, 21, 27, -49, -9, 112, 103], [-110, -7, -24, 23, -33, 68, 24, 35, 22, -50, -32, 86, 74, -14, 71, 96, 81, -45, 105, -25, -73, 108, -99, 13, -76, 125, 73, -44, -34, -34, -105, 75, 86, -58, 85, -30, -92, -27, -39, 0, -91, -2, 30, -12, -116, 9, 81, 39, 76, 44, 87, 20, -107, 110, -75, 20, 44,

## Definining the Vespa application

First, we define a [Vespa schema](https://docs.vespa.ai/en/schemas.html) with the fields we want to store and their type. The binary version uses hamming distance 
and the original vector dimensionality of 1024 floats have been packed into 128 int8. 

In [14]:
from vespa.package import Schema, Document, Field, FieldSet
my_schema = Schema(
            name="doc",
            mode="index",
            document=Document(
                fields=[
                    Field(name="doc_id", type="string", indexing=["summary"]),
                    Field(name="text", type="string", indexing=["summary", "index"], index="enable-bm25"),
                    Field(name="binary_vector", type="tensor<int8>(x[128])",
                        indexing=["attribute", "index"],
                        attribute=["distance-metric: hamming"]
                    )
                ]
            ),
            fieldsets=[
                FieldSet(name = "default", fields = ["text"])
            ]
)



We must add the schema to a Vespa [application package](https://docs.vespa.ai/en/application-packages.html).
This consists of configuration files, schemas, models, and possibly even custom code (plugins).

In [15]:
from vespa.package import ApplicationPackage

vespa_app_name = "cohere"
vespa_application_package = ApplicationPackage(
        name=vespa_app_name,
        schema=[my_schema]
)

In the last step, we configure [ranking](https://docs.vespa.ai/en/ranking.html) by adding `rank-profile`'s to the schema.

`unpack_bits` is documented here https://docs.vespa.ai/en/reference/ranking-expressions.html#unpack-bits




In [20]:
from vespa.package import RankProfile, FirstPhaseRanking, SecondPhaseRanking


rerank = RankProfile(
    name="rerank",
    inputs=[
        ("query(q_binary)", "tensor<int8>(x[128])"),
        ("query(q_full)", "tensor<float>(x[1024])")
        ],
    first_phase=FirstPhaseRanking(
        expression="closeness(field, binary_vector)" #hamming distance between the binary query and the binary_vector
    ),
    second_phase=SecondPhaseRanking(
        expression="sum(query(q_full)*unpack_bits(attribute(binary_vector)))" #rescoring using the full query and a unpacked binary_vector
    )
)
my_schema.add_rank_profile(rerank)

## Deploy the application to Vespa Cloud

With the configured application, we can deploy it to [Vespa Cloud](https://cloud.vespa.ai/en/).
It is also possible to deploy the app using docker; see the [Hybrid Search - Quickstart](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html) guide for
an example of deploying it to a local docker container.

Install the Vespa CLI using [homebrew](https://brew.sh/) - or download a binary from GitHub as demonstrated below.

In [None]:
!brew install vespa-cli

Alternatively, if running in Colab, download the Vespa CLI:

In [None]:
import os
import requests
res = requests.get(url="https://api.github.com/repos/vespa-engine/vespa/releases/latest").json()
os.environ["VERSION"] = res["tag_name"].replace("v", "")
!curl -fsSL https://github.com/vespa-engine/vespa/releases/download/v${VERSION}/vespa-cli_${VERSION}_linux_amd64.tar.gz | tar -zxf -
!ln -sf /content/vespa-cli_${VERSION}_linux_amd64/bin/vespa /bin/vespa

To deploy the application to Vespa Cloud we need to create a tenant in the Vespa Cloud:

Create a tenant at [console.vespa-cloud.com](https://console.vespa-cloud.com/) (unless you already have one).
This step requires a Google or GitHub account, and will start your [free trial](https://cloud.vespa.ai/en/free-trial).
Make note of the tenant name, it is used in the next steps.

### Configure Vespa Cloud date-plane security

Create Vespa Cloud data-plane mTLS cert/key-pair. The mutual certificate pair is used to talk to your Vespa cloud endpoints. See [Vespa Cloud Security Guide](https://cloud.vespa.ai/en/security/guide) for details.

We save the paths to the credentials for later data-plane access without using pyvespa APIs.

In [None]:
import os

os.environ["TENANT_NAME"] = "vespa-team" # Replace with your tenant name

vespa_cli_command = f'vespa config set application {os.environ["TENANT_NAME"]}.{vespa_app_name}'

!vespa config set target cloud
!{vespa_cli_command}
!vespa auth cert -N

Validate that we have the expected data-plane credential files:

In [6]:
from os.path import exists
from pathlib import Path

cert_path = Path.home() / ".vespa" / f"{os.environ['TENANT_NAME']}.{vespa_app_name}.default/data-plane-public-cert.pem"
key_path = Path.home() / ".vespa" / f"{os.environ['TENANT_NAME']}.{vespa_app_name}.default/data-plane-private-key.pem"

if not exists(cert_path) or not exists(key_path):
    print("ERROR: set the correct paths to security credentials. Correct paths above and rerun until you do not see this error")

Note that the subsequent Vespa Cloud deploy call below will add `data-plane-public-cert.pem` to the application before deploying it to Vespa Cloud, so that
you have access to both the private key and the public certificate. At the same time, Vespa Cloud only knows the public certificate.

### Configure Vespa Cloud control-plane security

Authenticate to generate a tenant level control plane API key for deploying the applications to Vespa Cloud, and save the path to it.

The generated tenant api key must be added in the Vespa Console before attemting to deploy the application.

```
To use this key in Vespa Cloud click 'Add custom key' at
https://console.vespa-cloud.com/tenant/TENANT_NAME/account/keys
and paste the entire public key including the BEGIN and END lines.
```

In [None]:
!vespa auth api-key

from pathlib import Path
api_key_path = Path.home() / ".vespa" / f"{os.environ['TENANT_NAME']}.api-key.pem"

### Deploy to Vespa Cloud

Now that we have data-plane and control-plane credentials ready, we can deploy our application to Vespa Cloud!

`PyVespa` supports deploying apps to the [development zone](https://cloud.vespa.ai/en/reference/environments#dev-and-perf).

>Note: Deployments to dev and perf expire after 7 days of inactivity, i.e., 7 days after running deploy. This applies to all plans, not only the Free Trial. Use the Vespa Console to extend the expiry period, or redeploy the application to add 7 more days.

In [21]:
from vespa.deployment import VespaCloud

def read_secret():
    """Read the API key from the environment variable. This is
    only used for CI/CD purposes."""
    t = os.getenv("VESPA_TEAM_API_KEY")
    if t:
        return t.replace(r"\n", "\n")
    else:
        return t

vespa_cloud = VespaCloud(
    tenant=os.environ["TENANT_NAME"],
    application=vespa_app_name,
    key_content=read_secret() if read_secret() else None,
    key_location=api_key_path,
    application_package=vespa_application_package)

Now deploy the app to Vespa Cloud dev zone.

The first deployment typically takes 2 minutes until the endpoint is up.

In [22]:
from vespa.application import Vespa
app:Vespa = vespa_cloud.deploy()

Deployment started in run 4 of dev-aws-us-east-1c for samples.cohere. This may take a few minutes the first time.
INFO    [20:46:43]  Deploying platform version 8.320.68 and application dev build 4 for dev-aws-us-east-1c of default ...
INFO    [20:46:43]  Using CA signed certificate version 0
INFO    [20:46:43]  Using 1 nodes in container cluster 'cohere_container'
INFO    [20:46:47]  Session 1951 for tenant 'samples' prepared and activated.
INFO    [20:46:50]  ######## Details for all nodes ########
INFO    [20:46:57]  h88969a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [20:46:57]  --- platform vespa/cloud-tenant-rhel8:8.320.68 <-- :
INFO    [20:46:57]  --- logserver-container on port 4080 has not started 
INFO    [20:46:57]  --- metricsproxy-container on port 19092 has not started 
INFO    [20:46:57]  h90001b.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [20:46:57]  --- platform vespa/cloud-tenant-rhel8:8.320.68 <-- :
INFO

## Feeding the documents and binary embeddings. 



In [44]:
for i, doc in enumerate(documents):
    response = app.feed_data_point(
        schema="doc",
        data_id=str(i),
        fields={
            "doc_id": str(i),
            "text": doc,
            "binary_vector": embeddings.embeddings.binary[i]
        }
    )
    assert response.is_successful()

### Querying data


Read more about querying Vespa in:

- [Vespa Query API](https://docs.vespa.ai/en/query-api.html)
- [Vespa Query API reference](https://docs.vespa.ai/en/reference/query-api-reference.html)
- [Vespa Query Language API (YQL)](https://docs.vespa.ai/en/query-language.html)

In [45]:
query = "Who discovered x-ray?"

# Make sure to set input_type="search_query" when getting the embeddings for the query. We ask for both the float version and the binary version
query_emb = co.embed([query], model="embed-english-v3.0", input_type="search_query", embedding_types=["float","binary"])

Now, we use nearestNeighbor search to retrieve 10 hits using hamming distance, these hits are then exposed to vespa ranking framework, where we re-rank
using the dotproduct between the float tensor and the unpacked binary vector (the unpack returns a 1024 float version).

In [48]:
response = app.query(
  yql="select * from doc where {targetHits:10}nearestNeighbor(binary_vector,q_binary)", 
  ranking="rerank",
  body = {
  'input.query(q_binary)': query_emb.embeddings.binary[0],
  'input.query(q_full)': query_emb.embeddings.float[0]
  }
)

In [49]:
response.hits

[{'id': 'id:doc:doc::3',
  'relevance': 4.8650288581848145,
  'source': 'cohere_content',
  'fields': {'sddocname': 'doc',
   'documentid': 'id:doc:doc::3',
   'doc_id': '3',
   'text': 'Marie Curie was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity'}},
 {'id': 'id:doc:doc::1',
  'relevance': 3.811260938644409,
  'source': 'cohere_content',
  'fields': {'sddocname': 'doc',
   'documentid': 'id:doc:doc::1',
   'doc_id': '1',
   'text': 'Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.'}},
 {'id': 'id:doc:doc::2',
  'relevance': 3.752835273742676,
  'source': 'cohere_content',
  'fields': {'sddocname': 'doc',
   'documentid': 'id:doc:doc::2',
   'doc_id': '2',
   'text': 'Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural ph

## Conclusions

This new embedding API is a huge step forward for cost-efficient vector search at scale. Already in 2021, we wrote about using binarized vectors
in our [billion-scale vector search blog series](https://blog.vespa.ai/billion-scale-knn/). 



Clean up
We can now delete the cloud instance:

In [None]:
vespa_cloud.delete()