## Enabling Product Quantization(PQ) Vector Compression for your Class

In order to compress vectors using PQ you need:

**1.** Connect to a Weaviate instance and create a Schema

**2.** Add datapoints to the class - it is recommended to add atleast 10k-100k objects to Weaviate before enabling PQ

**3.** Enable PQ by updating the Schema configuration (This will take the datapoints and vectors already added to Weaviate and will train the PQ algorithm on them - learning centroids that can be used to compress current and any future added vectors):
    
    a. You can specify the `trainingLimit` which will allow you to dictate how many of the added vectors will be used to train the centroids. By default this will take upto the first 100k objects added to Weaviate
    
    b. You can specify the `segments` to use which will specify how many pieces to quantize the vectors into. This will dictate the compression rate.

In [30]:
import requests
import json

# Download the data
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/intro-workshop/main/data/jeopardy_1k.json')
data = json.loads(resp.text)  # Load data

# Parse the JSON and preview it
print(type(data), len(data))
print(json.dumps(data[1], indent=2))


<class 'list'> 1000
{
  "Air Date": "2005-11-18",
  "Round": "Jeopardy!",
  "Value": 200,
  "Category": "RHYME TIME",
  "Question": "Any pigment on the wall so faded you can barely see it",
  "Answer": "faint paint"
}


### 1. Connect to the Weaviate instance:

In [2]:
import weaviate
from weaviate import EmbeddedOptions
import os

client = weaviate.Client(
    url = "http://localhost:8080/",  # Replace with your endpoint
    additional_headers = {
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")# Replace with your inference API key
    }
)

client.is_ready()

True

Create the class:

In [3]:
if client.schema.exists("Question"):
    client.schema.delete_class("Question")

Create the Schema: (by default PQ will be `disabled`)

In [33]:
#Define the class that will be used to add the data
# We need to have properties for the questions, answer and round

class_definition = {
    
    "class": "Question",
    "vectorizer":"text2vec-openai",
    "vectorIndexConfig": {
        "distance" : "cosine"
    },
    
    'properties' : [
        {
            'name' : "question",
            "dataType" : ['text']
        },
        {
            'name' : "answer",
            "dataType" : ["text"]
        },
        {
            'name' : 'round',
            'dataType': ['text']
        }
    ]
}

client.schema.create_class(class_definition)

### 2. Add data to the instance:

In [36]:
#Insert the data into Weaviate
with client.batch() as batch:
    for o in data:
        obj_body = {
            'question':o["Question"],
            'answer':o["Answer"],
            'round':o["Round"]
        }
        
        batch.add_data_object(
        data_object=obj_body,
        class_name="Question"
        )

In [37]:
print(json.dumps(client.query.aggregate("Question").with_meta_count().do(), indent=2))

{
  "data": {
    "Aggregate": {
      "Question": [
        {
          "meta": {
            "count": 1000
          }
        }
      ]
    }
  }
}


Perform a vector search:

In [38]:
response = (client.query
            .get("Question", ['question','answer'])
            .with_near_text({"concepts":"spicy food recipes"})
            .with_additional(['distance'])
            .with_limit(2)
            .do()
)

print(json.dumps(response, indent=2))

{
  "data": {
    "Get": {
      "Question": [
        {
          "_additional": {
            "distance": 0.20124269
          },
          "answer": "tripe",
          "question": "Popular in Pennsylvania, pepper pot is a peppery soup made from this stomach lining"
        },
        {
          "_additional": {
            "distance": 0.20296884
          },
          "answer": "Chiles Rellenos",
          "question": "The name of this Mexican dish made with chiles & cheese translates to \"stuffed peppers\""
        }
      ]
    }
  }
}


### 3. Enable PQ by updating the Schema:

In [39]:
client.schema.update_config("Question", {
  "vectorIndexConfig": {
    "pq": {
      "enabled": True,         #We want to enable PQ 
      "trainingLimit": 100000, #If not set will default to upto the first 100k vectors added to Weaviate
      "segments": 96 #how many segments to break/quantize the vector representation into - has to be an integer multiple of vector dimension
    }
  }
})

Your Weaviate instance will then enable compression and if you're monitoring the instance it will log the following:


```bash
product_quantization_compression-weaviate-1  | {"action":"compress","level":"info","msg":"switching to compressed vectors","time":"2023-11-13T21:10:52Z"}

product_quantization_compression-weaviate-1  | {"action":"compress","level":"info","msg":"vector compression complete","time":"2023-11-13T21:10:53Z"}
```

Re-run the same vector search now on PQ compressed vectors, (Rescoring is enabled by default)

In [40]:
response = (client.query
            .get("Question", ['question','answer'])
            .with_near_text({"concepts":"spicy food recipes"})
            .with_additional(['distance'])
            .with_limit(2)
            .do()
)

print(json.dumps(response, indent=2))

{
  "data": {
    "Get": {
      "Question": [
        {
          "_additional": {
            "distance": 0.20124269
          },
          "answer": "tripe",
          "question": "Popular in Pennsylvania, pepper pot is a peppery soup made from this stomach lining"
        },
        {
          "_additional": {
            "distance": 0.20296884
          },
          "answer": "Chiles Rellenos",
          "question": "The name of this Mexican dish made with chiles & cheese translates to \"stuffed peppers\""
        }
      ]
    }
  }
}
