Inspired  by https://www.pinecone.io/learn/series/faiss/vector-indexes/ and https://www.pinecone.io/learn/series/faiss/product-quantization/

See [here](https://github.com/torsstei/leanvector) source code of class VectorDataEngine of the package `leanvector`.

![](https://github.com/torsstei/leanvector/blob/master/leanvector.png?raw=true)

# Set up runtime

In [2]:
!set -o pipefail; pip install --upgrade "humanize" "leanvector" | { grep -v "already satisfied" || :; }

In [3]:
from leanvector import VectorDataEngine, VectorIndexes
import humanize
import datetime
import numpy as np
np.set_printoptions(threshold=np.inf)

In [20]:
import getpass
accesskey = getpass.getpass("Please enter your S3 Access Key: ")
secretkey = getpass.getpass("Please enter your S3 Access Key: ")
endpoint = input("Please enter your S3 Endpoint: ")
bucket = input("Please enter your S3 Bucket: ")

print("We will be using your Access Key {} with Secret Key {} and your S3 bucket {} at endpoint {} for this demo."
      .format("*" * len(accesskey), "*" * len(secretkey), bucket, endpoint))

Please enter your S3 Access Key:  ········
Please enter your S3 Access Key:  ········
Please enter your S3 Endpoint:  https://s3.us-south.cloud-object-storage.appdomain.cloud
Please enter your S3 Bucket:  my-data-lake-cos-bucket


We will be using your Access Key ******************************** with Secret Key ************************************************ and your S3 bucket my-data-lake-cos-bucket at endpoint https://s3.us-south.cloud-object-storage.appdomain.cloud for this demo.


# Demo with Tiny Vector Data

Let's generate some vector data, 10000 vectors of 128 dimensions:

In [5]:
vector_data = np.random.uniform(low=0.0, high=256.0, size=(10000,128))
#vector_data = np.random.randint(low=0, high=256, size=(10000,128))

Now we initialize a `VectorDataEngine` with this vector data. We pass in the vector data via the optional `vectors` parameter. In subsequent initializations we can simply ommit this parameter because the vector data is being persisted in the S3 object storage location of the specified `vector_project` and will be loaded from there when needed.

In [12]:
tiny_vector_engine = VectorDataEngine(vector_project='tiny demo', s3endpoint=endpoint,
                                      s3accesskey=accesskey, s3secretkey=secretkey,
                                      bucket=bucket, vectors=vector_data)

In [13]:
tiny_vector_engine.get_vector_array().shape

(10000, 128)

Now let's create all supported vector indexes.
**Note** VectorDataEngine will automatically persist these indexes in the S3 locagation of the current `vector_project`.

In [14]:
tiny_vector_engine.create_index(VectorIndexes.FLAT)
tiny_vector_engine.create_index(VectorIndexes.LSH)
tiny_vector_engine.create_index(VectorIndexes.HNSW)
tiny_vector_engine.create_index(VectorIndexes.IVFFLAT)
tiny_vector_engine.create_index(VectorIndexes.PQ)
tiny_vector_engine.create_index(VectorIndexes.IVFPQ)

Now we generate a query vector to use for our nearest neighbor searches on the previously generated and then indexed vector array.

In [15]:
tiny_query = np.random.uniform(low=0.0, high=256.0, size=(1,128))
#tiny_query = np.random.randint(low=0, high=256, size=(1,128))

In order to demonstrate the staless nature of `VectorDataEngine` we initialize a separate instance of it with the same `vector_project` for which we just created and persisted the indexes above.

In [17]:
tiny_vector_search_engine = VectorDataEngine(vector_project='tiny demo', s3endpoint=endpoint,
                                             s3accesskey=accesskey, s3secretkey=secretkey,
                                             bucket=bucket)

Let's expliciutly cache the indexes in our local runtime from S3 to get best performance with our searches from the first call:

In [18]:
tiny_vector_search_engine.cache_all_indexes()

Now let's run some vector searches with all the different indexes:

In [19]:
before_time = datetime.datetime.now()
D, I = tiny_vector_search_engine.search(tiny_query, VectorIndexes.FLAT)
print("Flat Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
baseline = I[0].tolist()

before_time = datetime.datetime.now()
D, I = tiny_vector_search_engine.search(tiny_query, VectorIndexes.LSH)
print("LSH Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("LSH Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

before_time = datetime.datetime.now()
D, I = tiny_vector_search_engine.search(tiny_query, VectorIndexes.HNSW)
print("HNSW Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("HNSW Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

before_time = datetime.datetime.now()
D, I = tiny_vector_search_engine.search(tiny_query, VectorIndexes.IVFFLAT)
print("IVFFLAT Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("IVFFLAT Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

before_time = datetime.datetime.now()
D, I = tiny_vector_search_engine.search(tiny_query, VectorIndexes.PQ)
print("PQ Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("PQ Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

before_time = datetime.datetime.now()
D, I = tiny_vector_search_engine.search(tiny_query, VectorIndexes.IVFPQ)
print("IVFPQ Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("IVFPQ Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

Flat Index Search time: 1.2429999999999999 milliseconds
LSH Index Search time: 3.853 milliseconds
LSH Index Recall Rate: 11.0 %
HNSW Index Search time: 2.76 milliseconds
HNSW Index Recall Rate: 5.0 %
IVFFLAT Index Search time: 0.381 milliseconds
IVFFLAT Index Recall Rate: 6.0 %
PQ Index Search time: 1.004 milliseconds
PQ Index Recall Rate: 5.0 %
IVFPQ Index Search time: 0.15 milliseconds
IVFPQ Index Recall Rate: 5.0 %


# Demo With Realistic Vector Test Data (1 Million Vectors with 128 Dimensions)

Download and unpack a public test data set. We use a popular test data set for approximate nearest neighbor search: http://corpus-texmex.irisa.fr

In [29]:
from contextlib import closing
import urllib.request as request
import shutil
with closing(request.urlopen('ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz')) as r:
    with open('sift.tar.gz', 'wb') as f:
        shutil.copyfileobj(r, f)

In [30]:
import tarfile
tar = tarfile.open('sift.tar.gz', "r:gz")
tar.extractall()

### Load 1M Vectors with 128 Dimensions

You can specify `fvecs_file` parameter when you use the `vector_project` for the first time. VectorDataEngine will ingest and store the provided vector data in object storage fr further use.

In subsequent invocations we will load the vector data from the object store location of the `vector_project`. After you have initialized a `vector_project` with vector data set for the first time you can simply omit the `fvecs_file` parameter in subsequent runs.

In [5]:
vector_engine = VectorDataEngine(vector_project='sift demo', s3endpoint=endpoint, s3accesskey=accesskey,
                                 s3secretkey=secretkey, bucket=bucket, fvecs_file='./sift/sift_base.fvecs')

In [7]:
# Sample of the vector embeddings
vector_engine.get_vector_array()[:3]

array([[  0.,  16.,  35.,   5.,  32.,  31.,  14.,  10.,  11.,  78.,  55.,
         10.,  45.,  83.,  11.,   6.,  14.,  57., 102.,  75.,  20.,   8.,
          3.,   5.,  67.,  17.,  19.,  26.,   5.,   0.,   1.,  22.,  60.,
         26.,   7.,   1.,  18.,  22.,  84.,  53.,  85., 119., 119.,   4.,
         24.,  18.,   7.,   7.,   1.,  81., 106., 102.,  72.,  30.,   6.,
          0.,   9.,   1.,   9., 119.,  72.,   1.,   4.,  33., 119.,  29.,
          6.,   1.,   0.,   1.,  14.,  52., 119.,  30.,   3.,   0.,   0.,
         55.,  92., 111.,   2.,   5.,   4.,   9.,  22.,  89.,  96.,  14.,
          1.,   0.,   1.,  82.,  59.,  16.,  20.,   5.,  25.,  14.,  11.,
          4.,   0.,   0.,   1.,  26.,  47.,  23.,   4.,   0.,   0.,   4.,
         38.,  83.,  30.,  14.,   9.,   4.,   9.,  17.,  23.,  41.,   0.,
          0.,   2.,   8.,  19.,  25.,  23.,   1.],
       [ 14.,  35.,  19.,  20.,   3.,   1.,  13.,  11.,  16., 119.,  85.,
          5.,   0.,   5.,  24.,  26.,   0.,  27., 119.,  13.,

In [8]:
# The vector search space
vector_engine.get_vector_array().shape

(1000000, 128)

### Create Vector Indexes

In [96]:
before_time = datetime.datetime.now()
vector_engine.create_index(VectorIndexes.FLAT)
print("Flat Index Build time: {}".format(humanize.naturaldelta(datetime.datetime.now() - before_time)))
print("Flat Index Size: {}".format(humanize.naturalsize(vector_engine.get_index_size(VectorIndexes.FLAT))))

Flat Index Build time: 5 seconds
Flat Index Size: 512.0 MB %


In [97]:
before_time = datetime.datetime.now()
vector_engine.create_index(VectorIndexes.LSH)
print("LSH Index Build time: {}".format(humanize.naturaldelta(datetime.datetime.now() - before_time)))
print("LSH Index Size: {}".format(humanize.naturalsize(vector_engine.get_index_size(VectorIndexes.LSH))))

LSH Index Build time: 8 seconds
LSH Index Size: 128.5 MB %


In [98]:
before_time = datetime.datetime.now()
vector_engine.create_index(VectorIndexes.HNSW)
print("HNSW Index Build time: {}".format(humanize.naturaldelta(datetime.datetime.now() - before_time)))
print("HNSW Index Size: {}".format(humanize.naturalsize(vector_engine.get_index_size(VectorIndexes.HNSW))))

HNSW Index Build time: 5 minutes
HNSW Index Size: 1.0 GB %


In [99]:
before_time = datetime.datetime.now()
vector_engine.create_index(VectorIndexes.IVFFLAT)
print("IVFFLAT Index Build time: {}".format(humanize.naturaldelta(datetime.datetime.now() - before_time)))
print("IVFFLAT Index Size: {}".format(humanize.naturalsize(vector_engine.get_index_size(VectorIndexes.IVFFLAT))))

IVFFLAT Index Build time: 6 seconds
IVFFLAT Index Size: 520.1 MB %


In [100]:
before_time = datetime.datetime.now()
vector_engine.create_index(VectorIndexes.PQ)
print("PQ Index Build time: {}".format(humanize.naturaldelta(datetime.datetime.now() - before_time)))
print("PQ Index Size: {}".format(humanize.naturalsize(vector_engine.get_index_size(VectorIndexes.PQ))))

PQ Index Build time: 11 seconds
PQ Index Size: 8.1 MB %


In [101]:
before_time = datetime.datetime.now()
vector_engine.create_index(VectorIndexes.IVFPQ)
print("IVFPQ Index Build time: {}".format(humanize.naturaldelta(datetime.datetime.now() - before_time)))
print("IVFPQ Index Size: {}".format(humanize.naturalsize(vector_engine.get_index_size(VectorIndexes.IVFPQ))))

IVFPQ Index Build time: 12 seconds
IVFPQ Index Size: 16.2 MB %


### Run Vector Queries
**Note** This part of the notebook can run independently from the previous part. You don't have to run the previous parts to use this section here. It will load required vector index data from COS.

Define the query vector tp use for tests:

In [16]:
sift_query = np.asarray([[  1.,   3.,  11., 110.,  62.,  22.,   4.,   0.,  43.,  21.,  22.,
        18.,   6.,  28.,  64.,   9.,  11.,   1.,   0.,   0.,   1.,  40.,
       101.,  21.,  20.,   2.,   4.,   2.,   2.,   9.,  18.,  35.,   1.,
         1.,   7.,  25., 108., 116.,  63.,   2.,   0.,   0.,  11.,  74.,
        40., 101., 116.,   3.,  33.,   1.,   1.,  11.,  14.,  18., 116.,
       116.,  68.,  12.,   5.,   4.,   2.,   2.,   9., 102.,  17.,   3.,
        10.,  18.,   8.,  15.,  67.,  63.,  15.,   0.,  14., 116.,  80.,
         0.,   2.,  22.,  96.,  37.,  28.,  88.,  43.,   1.,   4.,  18.,
       116.,  51.,   5.,  11.,  32.,  14.,   8.,  23.,  44.,  17.,  12.,
         9.,   0.,   0.,  19.,  37.,  85.,  18.,  16., 104.,  22.,   6.,
         2.,  26.,  12.,  58.,  67.,  82.,  25.,  12.,   2.,   2.,  25.,
        18.,   8.,   2.,  19.,  42.,  48.,  11.]])

**Note** We create a fresh VectorDataEngine object since this section can run standalone, separate from the previous section.

In [9]:
vector_engine = VectorDataEngine(vector_project='sift demo', s3endpoint=endpoint, s3accesskey=accesskey, s3secretkey=secretkey, bucket=bucket)

In [104]:
vector_engine.cache_all_indexes()

In [15]:
before_time = datetime.datetime.now()
D, I = vector_engine.search(sift_query, VectorIndexes.FLAT)
print("Flat Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
baseline = I[0].tolist()

Flat Index Search time: 63.24100000000001 milliseconds


In [46]:
before_time = datetime.datetime.now()
D, I = vector_engine.search(sift_query, VectorIndexes.LSH)
print("LSH Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("LSH Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

LSH Index Search time: 28.227 milliseconds
LSH Index Recall Rate: 47.0 %


In [48]:
before_time = datetime.datetime.now()
D, I = vector_engine.search(sift_query, VectorIndexes.HNSW)
print("HNSW Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("HNSW Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

HNSW Index Search time: 1.4480000000000002 milliseconds
HNSW Index Recall Rate: 87.0 %


In [52]:
before_time = datetime.datetime.now()
D, I = vector_engine.search(sift_query, VectorIndexes.IVFFLAT)
print("IVFFLAT Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("IVFFLAT Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

IVFFLAT Index Search time: 6.556 milliseconds
IVFFLAT Index Recall Rate: 93.0 %


In [53]:
before_time = datetime.datetime.now()
D, I = vector_engine.search(sift_query, VectorIndexes.PQ)
print("PQ Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("PQ Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

PQ Index Search time: 6.995 milliseconds
PQ Index Recall Rate: 40.0 %


In [54]:
before_time = datetime.datetime.now()
D, I = vector_engine.search(sift_query, VectorIndexes.IVFPQ)
print("IVFPQ Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("IVFPQ Index Recall Rate: {} %".format(np.array(baseline)[np.in1d(baseline, I).tolist()].size / np.array(baseline).size * 100))

IVFPQ Index Search time: 1.688 milliseconds
IVFPQ Index Recall Rate: 39.0 %


# Tsts with 1M Vectors with 960 Dimensions

In [7]:
with closing(request.urlopen('ftp://ftp.irisa.fr/local/texmex/corpus/gist.tar.gz')) as r:
    with open('gist.tar.gz', 'wb') as f:
        shutil.copyfileobj(r, f)

In [8]:
tar = tarfile.open('gist.tar.gz', "r:gz")
tar.extractall()

In [6]:
vector_engine2 = VectorDataEngine(vector_project='gist demo', s3endpoint=endpoint, s3accesskey=accesskey, s3secretkey=secretkey, bucket=bucket)#, fvecs_file='./gist/gist_base.fvecs')

In [7]:
vector_engine2.create_index(VectorIndexes.FLAT)

In [7]:
gist_query = np.asarray([[0.0117, 0.0115, 0.0087, 0.01  , 0.0785, 0.1   , 0.0784, 0.053 ,
        0.0524, 0.0819, 0.0658, 0.058 , 0.0159, 0.017 , 0.0461, 0.0242,
        0.0084, 0.0064, 0.0072, 0.0102, 0.0304, 0.0679, 0.0589, 0.0571,
        0.0333, 0.0786, 0.0892, 0.0423, 0.0138, 0.0133, 0.029 , 0.0219,
        0.009 , 0.0122, 0.0107, 0.0108, 0.0266, 0.0385, 0.0571, 0.052 ,
        0.0355, 0.0488, 0.0692, 0.0531, 0.0144, 0.0143, 0.0149, 0.025 ,
        0.0171, 0.0161, 0.0106, 0.0324, 0.0271, 0.0458, 0.0531, 0.0624,
        0.0316, 0.0608, 0.0661, 0.0813, 0.0186, 0.0227, 0.0116, 0.0337,
        0.0247, 0.0118, 0.0107, 0.0639, 0.0395, 0.0403, 0.0525, 0.0958,
        0.0551, 0.0676, 0.0858, 0.1749, 0.0244, 0.0281, 0.0087, 0.0512,
        0.0149, 0.0086, 0.0124, 0.0356, 0.0328, 0.0387, 0.0463, 0.0489,
        0.0492, 0.0641, 0.0705, 0.1164, 0.0206, 0.0162, 0.0125, 0.0402,
        0.0135, 0.0113, 0.0074, 0.0118, 0.033 , 0.0298, 0.0365, 0.042 ,
        0.0441, 0.0519, 0.0659, 0.0527, 0.0139, 0.0162, 0.0151, 0.0224,
        0.0118, 0.0088, 0.0078, 0.0109, 0.0373, 0.0705, 0.0628, 0.0391,
        0.0242, 0.0558, 0.0721, 0.0448, 0.0168, 0.0134, 0.0331, 0.0208,
        0.0041, 0.009 , 0.0165, 0.0117, 0.0962, 0.1359, 0.1215, 0.088 ,
        0.0639, 0.1174, 0.097 , 0.0703, 0.0126, 0.0219, 0.0746, 0.0422,
        0.0037, 0.0084, 0.01  , 0.0132, 0.032 , 0.0613, 0.0751, 0.0914,
        0.0429, 0.0641, 0.0747, 0.0466, 0.0087, 0.0286, 0.0515, 0.0279,
        0.0073, 0.0149, 0.0225, 0.0182, 0.0181, 0.0176, 0.0862, 0.064 ,
        0.0275, 0.029 , 0.0558, 0.0605, 0.0114, 0.0145, 0.0207, 0.0285,
        0.0208, 0.0191, 0.012 , 0.045 , 0.0234, 0.0231, 0.0533, 0.0818,
        0.0397, 0.0243, 0.0389, 0.1082, 0.0221, 0.0303, 0.016 , 0.0534,
        0.0271, 0.0189, 0.0164, 0.0881, 0.0437, 0.017 , 0.0282, 0.139 ,
        0.0601, 0.0391, 0.0371, 0.2031, 0.0362, 0.0341, 0.0119, 0.0745,
        0.0167, 0.0053, 0.0104, 0.0456, 0.0222, 0.0201, 0.0156, 0.0458,
        0.0431, 0.0402, 0.0444, 0.1346, 0.0206, 0.0192, 0.015 , 0.0468,
        0.0103, 0.0066, 0.012 , 0.0179, 0.0185, 0.0195, 0.044 , 0.0329,
        0.0298, 0.0706, 0.0955, 0.0581, 0.0097, 0.0157, 0.0319, 0.0406,
        0.0074, 0.0098, 0.0103, 0.0116, 0.0407, 0.083 , 0.0675, 0.0356,
        0.0352, 0.0849, 0.074 , 0.0575, 0.008 , 0.0105, 0.0502, 0.0379,
        0.0099, 0.0213, 0.0365, 0.0247, 0.0481, 0.0781, 0.0942, 0.084 ,
        0.0417, 0.0525, 0.0729, 0.0664, 0.0101, 0.0231, 0.0868, 0.0646,
        0.0338, 0.0216, 0.019 , 0.037 , 0.0274, 0.0446, 0.0435, 0.1009,
        0.0525, 0.0677, 0.0575, 0.0464, 0.0227, 0.0417, 0.0475, 0.024 ,
        0.0608, 0.0334, 0.0368, 0.0763, 0.0533, 0.0219, 0.0608, 0.1125,
        0.0674, 0.0218, 0.0527, 0.1312, 0.0589, 0.0331, 0.0216, 0.0653,
        0.0221, 0.0174, 0.0307, 0.0303, 0.0273, 0.0294, 0.0628, 0.0338,
        0.0401, 0.0437, 0.059 , 0.0761, 0.0286, 0.0189, 0.0418, 0.0696,
        0.008 , 0.0079, 0.0095, 0.0111, 0.0818, 0.1053, 0.087 , 0.0531,
        0.0518, 0.0886, 0.0751, 0.0689, 0.0158, 0.0182, 0.0516, 0.0286,
        0.0062, 0.0048, 0.0068, 0.0097, 0.0337, 0.0694, 0.0721, 0.0618,
        0.0365, 0.0824, 0.1105, 0.0488, 0.0139, 0.0127, 0.0314, 0.0209,
        0.0103, 0.0085, 0.01  , 0.0086, 0.0292, 0.0371, 0.0693, 0.0521,
        0.0312, 0.0465, 0.0836, 0.0486, 0.0141, 0.0119, 0.0146, 0.0228,
        0.016 , 0.0123, 0.0101, 0.0303, 0.0274, 0.0497, 0.0706, 0.0674,
        0.03  , 0.0573, 0.0547, 0.0882, 0.0157, 0.0201, 0.0119, 0.0353,
        0.0225, 0.0112, 0.0097, 0.06  , 0.0373, 0.0441, 0.0594, 0.0887,
        0.0494, 0.066 , 0.0827, 0.184 , 0.023 , 0.0241, 0.0098, 0.0501,
        0.0139, 0.0064, 0.0112, 0.0338, 0.0315, 0.0393, 0.0474, 0.0409,
        0.0426, 0.066 , 0.0732, 0.125 , 0.0224, 0.0155, 0.0121, 0.0373,
        0.0112, 0.0063, 0.0075, 0.0112, 0.032 , 0.0283, 0.048 , 0.0412,
        0.0419, 0.0583, 0.0784, 0.0516, 0.0135, 0.0142, 0.0169, 0.0195,
        0.0097, 0.0057, 0.0059, 0.008 , 0.0382, 0.0748, 0.0619, 0.0505,
        0.0271, 0.0609, 0.0845, 0.0551, 0.0158, 0.012 , 0.0369, 0.0213,
        0.0062, 0.0144, 0.0228, 0.0113, 0.104 , 0.1485, 0.1449, 0.1063,
        0.0659, 0.1302, 0.1167, 0.0963, 0.0152, 0.0256, 0.0822, 0.0454,
        0.0047, 0.0056, 0.011 , 0.0101, 0.035 , 0.0665, 0.0954, 0.1171,
        0.0459, 0.0704, 0.0955, 0.0715, 0.0097, 0.029 , 0.0574, 0.0251,
        0.0085, 0.0102, 0.02  , 0.0148, 0.0248, 0.0185, 0.0928, 0.0701,
        0.03  , 0.0338, 0.0749, 0.0749, 0.0149, 0.0174, 0.021 , 0.0332,
        0.0182, 0.0122, 0.0083, 0.0419, 0.0252, 0.024 , 0.0531, 0.0762,
        0.0489, 0.0247, 0.033 , 0.1258, 0.0175, 0.0256, 0.0131, 0.0524,
        0.0261, 0.0149, 0.0157, 0.0861, 0.0434, 0.0242, 0.0362, 0.1396,
        0.0565, 0.0427, 0.0431, 0.2136, 0.0345, 0.0251, 0.0143, 0.0757,
        0.0146, 0.006 , 0.0093, 0.0411, 0.0251, 0.0187, 0.031 , 0.047 ,
        0.0351, 0.053 , 0.0541, 0.1344, 0.0167, 0.0176, 0.0149, 0.0439,
        0.0092, 0.0055, 0.0136, 0.0184, 0.0186, 0.0233, 0.0678, 0.0405,
        0.0248, 0.0789, 0.1125, 0.0682, 0.009 , 0.0144, 0.0338, 0.0389,
        0.0066, 0.0129, 0.0116, 0.0091, 0.0416, 0.0913, 0.0806, 0.0461,
        0.0373, 0.0956, 0.0782, 0.0602, 0.0092, 0.0132, 0.0552, 0.0419,
        0.0095, 0.0188, 0.0403, 0.0282, 0.0528, 0.0827, 0.1135, 0.1069,
        0.0432, 0.0556, 0.0898, 0.0818, 0.0119, 0.029 , 0.1051, 0.0709,
        0.0328, 0.0158, 0.0191, 0.0363, 0.028 , 0.0451, 0.0487, 0.1089,
        0.0558, 0.0726, 0.0727, 0.0525, 0.0207, 0.046 , 0.0584, 0.0255,
        0.0552, 0.0263, 0.0332, 0.0733, 0.0475, 0.0252, 0.0662, 0.112 ,
        0.0622, 0.0339, 0.066 , 0.1363, 0.0512, 0.0276, 0.0249, 0.0683,
        0.0199, 0.0159, 0.0293, 0.0286, 0.0281, 0.0352, 0.0785, 0.0371,
        0.039 , 0.0509, 0.0765, 0.0821, 0.025 , 0.0152, 0.0471, 0.0753,
        0.0077, 0.0092, 0.0085, 0.0087, 0.0895, 0.1157, 0.1134, 0.0741,
        0.0529, 0.1015, 0.098 , 0.1066, 0.0133, 0.0174, 0.0599, 0.0287,
        0.0067, 0.0081, 0.0082, 0.0077, 0.0332, 0.0759, 0.0883, 0.088 ,
        0.0408, 0.1076, 0.1387, 0.0743, 0.0108, 0.0129, 0.0383, 0.0214,
        0.0063, 0.0075, 0.0116, 0.0074, 0.0303, 0.039 , 0.083 , 0.0561,
        0.0311, 0.0644, 0.1124, 0.0502, 0.0117, 0.0105, 0.0166, 0.0223,
        0.0134, 0.0114, 0.0124, 0.0274, 0.0259, 0.0529, 0.0872, 0.0576,
        0.0298, 0.0676, 0.0937, 0.0814, 0.0133, 0.0178, 0.0112, 0.0341,
        0.0208, 0.0121, 0.0124, 0.0528, 0.0295, 0.0446, 0.0989, 0.0666,
        0.0437, 0.076 , 0.1093, 0.1461, 0.0232, 0.0193, 0.0136, 0.0484,
        0.012 , 0.0075, 0.0122, 0.0309, 0.03  , 0.044 , 0.0796, 0.0408,
        0.0343, 0.0696, 0.0906, 0.1152, 0.0183, 0.0108, 0.0177, 0.0365,
        0.0102, 0.0057, 0.0079, 0.0101, 0.0322, 0.0309, 0.0716, 0.0523,
        0.0341, 0.0646, 0.1177, 0.0614, 0.0117, 0.0132, 0.019 , 0.0203,
        0.0098, 0.0066, 0.0067, 0.0076, 0.0413, 0.0847, 0.0974, 0.077 ,
        0.0297, 0.0705, 0.1156, 0.0871, 0.0121, 0.0123, 0.0413, 0.0233,
        0.0061, 0.0122, 0.0228, 0.011 , 0.1219, 0.171 , 0.1641, 0.1407,
        0.0748, 0.1586, 0.1563, 0.1508, 0.0171, 0.0343, 0.1044, 0.0531,
        0.0052, 0.0062, 0.0121, 0.0115, 0.0393, 0.0698, 0.1048, 0.1629,
        0.0452, 0.0914, 0.1193, 0.1164, 0.0072, 0.034 , 0.0732, 0.0309,
        0.0054, 0.0085, 0.0198, 0.0139, 0.0307, 0.0216, 0.1018, 0.0796,
        0.0345, 0.0454, 0.0958, 0.0869, 0.0096, 0.0189, 0.0264, 0.0361,
        0.0147, 0.0117, 0.0084, 0.0344, 0.0325, 0.0327, 0.055 , 0.0685,
        0.049 , 0.0302, 0.0409, 0.1359, 0.0159, 0.0162, 0.0128, 0.0539,
        0.0154, 0.013 , 0.0152, 0.0684, 0.031 , 0.0289, 0.0493, 0.1177,
        0.0468, 0.056 , 0.0674, 0.2025, 0.0269, 0.0158, 0.0184, 0.0677,
        0.0133, 0.0097, 0.0096, 0.0332, 0.0235, 0.0247, 0.066 , 0.047 ,
        0.0223, 0.0684, 0.0817, 0.122 , 0.0107, 0.0142, 0.0186, 0.042 ,
        0.0092, 0.0066, 0.0155, 0.0173, 0.0241, 0.0281, 0.1046, 0.071 ,
        0.0204, 0.0969, 0.1524, 0.0947, 0.0095, 0.0133, 0.0394, 0.0391,
        0.007 , 0.0129, 0.0136, 0.0099, 0.05  , 0.1057, 0.0946, 0.0848,
        0.043 , 0.1177, 0.0907, 0.0817, 0.0088, 0.0182, 0.0642, 0.0486,
        0.0117, 0.0227, 0.0547, 0.0372, 0.0646, 0.0933, 0.1581, 0.1529,
        0.0495, 0.0595, 0.1293, 0.1124, 0.0113, 0.0368, 0.1317, 0.0753,
        0.0268, 0.013 , 0.0218, 0.0346, 0.0275, 0.0453, 0.0694, 0.1228,
        0.053 , 0.0839, 0.1111, 0.0679, 0.0214, 0.0564, 0.0766, 0.0238,
        0.0396, 0.0164, 0.0232, 0.0558, 0.0389, 0.0357, 0.0815, 0.1093,
        0.0478, 0.067 , 0.1038, 0.1442, 0.0374, 0.0213, 0.0197, 0.0601,
        0.0142, 0.0173, 0.0328, 0.0207, 0.03  , 0.0512, 0.1142, 0.0505,
        0.0337, 0.0719, 0.1242, 0.1062, 0.0234, 0.0124, 0.0603, 0.0866]])

In [None]:
#vector_engine2.cache_all_indexes()

In [20]:
before_time = datetime.datetime.now()
D, I = vector_engine2.search(gist_query, VectorIndexes.FLAT)
print("Flat Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
flat_baseline = I[0].tolist()

Flat Index Search time: 487.882 milliseconds


In [8]:
vector_engine2.create_index(VectorIndexes.LSH)

In [28]:
before_time = datetime.datetime.now()
D, I = vector_engine2.search(gist_query, VectorIndexes.LSH)
print("LSH Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("LSH Index Recall Rate: {} %".format(np.array(flat_baseline)[np.in1d(flat_baseline, I).tolist()].size / np.array(flat_baseline).size * 100))
lsh_baseline = I[0].tolist()

LSH Index Search time: 233.476 milliseconds
LSH Index Recall Rate: 0.0 %


In [26]:
vector_engine2.create_index(VectorIndexes.IVFFLAT)

In [29]:
before_time = datetime.datetime.now()
D, I = vector_engine2.search(gist_query, VectorIndexes.IVFFLAT)
print("IVFFLAT Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("IVFFLAT Index Recall Rate: {} %".format(np.array(flat_baseline)[np.in1d(flat_baseline, I).tolist()].size / np.array(flat_baseline).size * 100))
ivfflat_baseline = I[0].tolist()

IVFFLAT Index Search time: 268.48499999999996 milliseconds
IVFFLAT Index Recall Rate: 0.0 %


In [30]:
vector_engine2.create_index(VectorIndexes.HNSW)

In [30]:
before_time = datetime.datetime.now()
D, I = vector_engine2.search(gist_query, VectorIndexes.HNSW)
print("HNSW Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("HNSW Index Recall Rate: {} %".format(np.array(flat_baseline)[np.in1d(flat_baseline, I).tolist()].size / np.array(flat_baseline).size * 100))
hnsw_baseline = I[0].tolist()

HNSW Index Search time: 3.866 milliseconds
HNSW Index Recall Rate: 0.0 %


In [14]:
vector_engine2.create_index(VectorIndexes.PQ)

In [31]:
before_time = datetime.datetime.now()
D, I = vector_engine2.search(gist_query, VectorIndexes.PQ)
print("PQ Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("PQ Index Recall Rate: {} %".format(np.array(flat_baseline)[np.in1d(flat_baseline, I).tolist()].size / np.array(flat_baseline).size * 100))
pq_baseline = I[0].tolist()

PQ Index Search time: 9.061 milliseconds
PQ Index Recall Rate: 0.0 %


In [16]:
vector_engine2.create_index(VectorIndexes.IVFPQ)

In [32]:
before_time = datetime.datetime.now()
D, I = vector_engine2.search(gist_query, VectorIndexes.IVFPQ)
print("IVFPQ Index Search time: {} milliseconds".format((datetime.datetime.now() - before_time).total_seconds()*1000))
print("IVFPQ Index Recall Rate: {} %".format(np.array(flat_baseline)[np.in1d(flat_baseline, I).tolist()].size / np.array(flat_baseline).size * 100))
ivfpq_baseline = I[0].tolist()

IVFPQ Index Search time: 8.899000000000001 milliseconds
IVFPQ Index Recall Rate: 0.0 %


In [38]:
np.array(ivfflat_baseline)[np.in1d(ivfflat_baseline, flat_baseline).tolist()].size / np.array(ivfflat_baseline).size * 100

0.0