## **1. Install the SDK**
**VectorStackAI Embeddings** is an embeddings-as-a-service product by [VectorStackAI](https://vectorstack.ai), 
designed to provide state-of-the-art domain-specific embeddings. 
Currently, we offer a Python SDK to interact with the VectorStackAI Embeddings service.
To get started, install the VectorStackAI Python SDK using pip:

In [4]:
%pip install vectorstackai -U 

Collecting vectorstackai
  Downloading vectorstackai-0.2.1-py3-none-any.whl.metadata (1.2 kB)
Downloading vectorstackai-0.2.1-py3-none-any.whl (18 kB)
Installing collected packages: vectorstackai
  Attempting uninstall: vectorstackai
    Found existing installation: vectorstackai 0.1.6
    Uninstalling vectorstackai-0.1.6:
      Successfully uninstalled vectorstackai-0.1.6
Successfully installed vectorstackai-0.2.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## **2. Get an API key**
You will need an API key to use the SDK. 
You can get it by signing up on the [VectorStackAI website](https://vectorstack.ai).

## **3. Generating Embeddings**
This section covers generating embeddings for both **documents** and **queries**.

### **3.1 Generating Embeddings for Documents**
To generate embeddings, first import the vectorstackai package and create a client object with your API key:

In [12]:
import vectorstackai
import numpy as np

# Replace with your actual API key
api_key = "your_api_key"
client = vectorstackai.Client(api_key=api_key, timeout=30)

Once, the client object is created, you can use the `client.embed` method to generate embeddings for a list of documents.

The method takes the following parameters:

- `texts`: A list of text documents to embed
- `model`: The name of the embedding model to use (e.g. 'vstackai-law-1' for legal documents)
- `is_query`: A boolean flag indicating whether the texts are queries (`True`) or documents (`False`)

The method returns an `EmbeddingObject` containing the generated embeddings.
The embeddings are in numpy array format, and can be accessed using the `embeddings` attribute of the `EmbeddingObject`.

For more details on the `embed` method, checkout the API reference [here](https://docs.vectorstack.ai/embeddings/reference.html).

In [6]:
# Documents related to law domain (e.g., court cases, consumer contracts, etc.)
documents = [
    "The defendant was charged with violation of contract terms in the lease agreement signed on January 1, 2022.",
    "This contract stipulates that the consumer has 30 days to return the product in case of any manufacturing defects.",
    "In the case of Smith v. Johnson, the court ruled that the plaintiff had the right to claim damages under section 12 of the Consumer Protection Act."
]

# Get embeddings for the legal documents
doc_embeddings = client.embed(texts=documents, model='vstackai-law-1', is_query=False)
doc_embeddings = doc_embeddings.embeddings  # (3, 1536) numpy array

### **3.2 Generating Embeddings for Queries**
Now, let's generate embeddings for a query.

Since `vstackai-law-1` is an instruction-tuned model, it is recommended to provide an instruction when embedding queries. This helps guide the model to produce embeddings that are more relevant to the task/instruction.
You can learn more about instruction-tuned models [here](https://instructor-embedding.github.io).

In [13]:
# Encode a query
query = "How many days does the consumer have to return the product?"
query_embedding = client.embed(
    texts=[query], 
    model='vstackai-law-1', 
    is_query=True, 
    instruction='Represent the query for searching legal documents'
)
query_embedding = query_embedding.embeddings  # (1, 1536) numpy array

## **4. Computing Similarity**
Once you have embeddings for both documents and queries, you can compute similarity scores to find the most relevant match.

Below, we compute the dot product of the document embeddings and the query embedding to get the similarity scores. You can use other similarity metrics as well (eg. cosine similarity, euclidean distance, etc.).

In [14]:
# Compute similarity between query and documents
similarities = np.dot(doc_embeddings, query_embedding.T)
print(similarities)
# Example output:
# array([[0.355],
#        [0.772],
#        [0.433]])

[[0.3557]
 [0.774 ]
 [0.4336]]


The document with the highest similarity score corresponds to the most relevant match for the query.

## **5. Conclusion**
This concludes the quickstart guide. You can now use the VectorStackAI Embeddings service to generate embeddings for your documents and queries.