# Retrieval for RAG with Vertex AI Vector Search

> Vertex AI Vector Search (VVS) For Storage, Indexing, And Search

Once [embeddings](https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Embeddings/readme.md) are generated for the content ([chunks](https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Chunking/readme.md)), we next need a vector database to index our embeddings for downstream similarity matching tasks...

**retrieval metrics**
* `Precision` is fraction of retrieved documents that are relevant
* `Recall` is fraction of all relevant documents that are retrieved

In [4]:
! cd .. && tree

[01;34m.[00m
├── README.md
├── [01;34mnotebooks[00m
│   └── [JT]_grounded_rag.ipynb
├── requirements.txt
└── [01;34msrc[00m
    ├── display_utils.py
    ├── docai_langchain_utils.py
    ├── gcs_loader.py
    └── vvs_utils.py

2 directories, 7 files


## imports

In [69]:
from typing import TYPE_CHECKING, Iterator, List, Optional, Sequence
import os, json, time, glob
import numpy as np

# Vertex AI
from google.cloud import aiplatform, storage
import vertexai.language_models # for embeddings API
import vertexai.generative_models # for Gemini Models
# from google.cloud import documentai
# from google.cloud import discoveryengine

# this repo
import sys
sys.path.append("..")
from src import display_utils, docai_utils, gcs_loader, vvs_utils

## config

In [41]:
PREFIX="mortgage-ball"

In [129]:
PROJECT_ID = "hybrid-vertex"
REGION = "us-central1"
LOCATION = REGION.split('-')[0]

EXPERIMENT_1 = "mlb"
EXPERIMENT_2 = "lending"

# Cloud storage buckets
GCS_BUCKET_URI = f"gs://{PREFIX}-central-bucket"
GCS_BUCKET = GCS_BUCKET_URI.replace("gs://", "")

# # mlb - Vertex AI Vector Search
# VS_INDEX_NAME_1 = f"{PREFIX}-{EXPERIMENT_1}"
# VS_CONTENTS_DELTA_URI_1 = f"{GCS_BUCKET_URI}/index/{EXPERIMENT_1}/embeddings"

# # lending - Vertex AI Vector Search
# VS_INDEX_NAME_2 = f"{PREFIX}-{EXPERIMENT_2}"
# VS_CONTENTS_DELTA_URI_2 = f"{GCS_BUCKET_URI}/index/{EXPERIMENT_2}/embeddings"


# same config
VS_INDEX_NAME = f"{PREFIX}-v1-index" 
VS_INDEX_ENDPOINT_NAME = f"{PREFIX}-endpoint"
VS_DIMENSIONS = 768
VS_APPROX_NEIGHBORS = 150
VS_INDEX_UPDATE_METHOD = "STREAM_UPDATE"
VS_INDEX_SHARD_SIZE = "SHARD_SIZE_SMALL"
VS_LEAF_NODE_EMB_COUNT = 500
VS_LEAF_SEARCH_PERCENT = 80
VS_DISTANCE_MEASURE_TYPE = "DOT_PRODUCT_DISTANCE"
VS_MACHINE_TYPE = "e2-standard-16"
VS_MIN_REPLICAS = 1
VS_MAX_REPLICAS = 1
VS_DESCRIPTION = "Index for DIY RAG with Vertex AI APIs"  # @param {type:"string"}

# Models
EMBEDDINGS_MODEL_NAME = "text-embedding-004"
LLM_MODEL_NAME = "gemini-1.5-pro"

# DocumentAI Processor
DOCAI_LOCATION = "us"  # @param ["us", "eu"]
DOCAI_PROCESSOR_NAME = f"{PREFIX}-v1"  # @param {type:"string"}

print(f"GCS_BUCKET              : {GCS_BUCKET}")
print(f"GCS_BUCKET_URI          : {GCS_BUCKET_URI}")
print(f"VS_INDEX_NAME           : {VS_INDEX_NAME}")
print(f"DOCAI_PROCESSOR_NAME    : {DOCAI_PROCESSOR_NAME}")
print(f"VS_INDEX_ENDPOINT_NAME  : {VS_INDEX_ENDPOINT_NAME}")
print(f"VS_CONTENTS_DELTA_URI_1 : {VS_CONTENTS_DELTA_URI_1}")
print(f"VS_CONTENTS_DELTA_URI_2 : {VS_CONTENTS_DELTA_URI_2}")

GCS_BUCKET              : mortgage-ball-central-bucket
GCS_BUCKET_URI          : gs://mortgage-ball-central-bucket
VS_INDEX_NAME           : mortgage-ball-v1-index
DOCAI_PROCESSOR_NAME    : mortgage-ball-v1
VS_INDEX_ENDPOINT_NAME  : mortgage-ball-endpoint
VS_CONTENTS_DELTA_URI_1 : gs://mortgage-ball-bucket/index/mlb/embeddings
VS_CONTENTS_DELTA_URI_2 : gs://mortgage-ball-bucket/index/lending/embeddings


In [130]:
! gcloud storage buckets create $GCS_BUCKET_URI --location=$REGION --project=$PROJECT_ID

Creating gs://mortgage-ball-central-bucket/...


In [139]:
## clients
# vertex ai clients
vertexai.init(project = PROJECT_ID, location = REGION)

# gcs client
gcs = storage.Client(project = PROJECT_ID)
bucket = gcs.bucket(GCS_BUCKET)

### Enable / Disable flags

In [66]:
# Enable/disable flags
# flag to create Google Cloud resources configured above
# refer to the notes before this cell
CREATE_RESOURCES = True  # @param {type:"boolean"}
# flag to run data ingestion
RUN_INGESTION = True  # @param {type:"boolean"}

### helper functions

In [88]:
# @title Utility methods for adding index to Vertex AI Vector Search
# def get_batches(items: List, n: int = 1000) -> List[List]:
#     n = max(1, n)
#     return [items[i : i + n] for i in range(0, len(items), n)]


# def add_data(vector_store, chunks) -> None:
#     if RUN_INGESTION:
#         batch_size = 1000
#         texts = get_batches([chunk.page_content for chunk in chunks], n=batch_size)
#         metadatas = get_batches([chunk.metadata for chunk in chunks], n=batch_size)

#         for i, (b_texts, b_metadatas) in enumerate(zip(texts, metadatas)):
#             print(f"Adding {len(b_texts)} data points to index")
#             is_complete_overwrite = bool(i == 0)
#             vector_store.add_texts(
#                 texts=b_texts,
#                 metadatas=b_metadatas,
#                 is_complete_overwrite=is_complete_overwrite,
#             )
#     else:
#         print("Skipping ingestion. Enable `RUN_INGESTION` flag")

# Load the chunks

In [46]:
local_dir = "files/embeddings-api"

jsonl_files = glob.glob(f"{local_dir}/large-files*.jsonl")
jsonl_files.sort()
jsonl_files

['files/embeddings-api/large-files-chunk-embeddings-0000.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0001.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0002.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0003.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0004.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0005.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0006.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0007.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0008.jsonl',
 'files/embeddings-api/large-files-chunk-embeddings-0009.jsonl']

In [47]:
chunks = []
for file in jsonl_files:
    with open(file, 'r') as f:
        chunks.extend([json.loads(line) for line in f])
len(chunks)

9042

## Review A Chunk

In [48]:
chunks[0].keys()

dict_keys(['instance', 'predictions', 'status'])

In [49]:
chunks[0]['instance']['chunk_id']

'fannie_part_0_c899'

In [50]:
print(chunks[0]['instance']['content'])

# B3-3.1-09, Other Sources of Income (05/01/2024)

## Alimony, Child Support, or Separate Maintenance

Note: The lender may include alimony, child support, or separate maintenance as income only if the borrower discloses it on the Form 1003 and requests that it be considered in qualifying for the loan. If a borrower's alimony or child support income is validated by the DU validation service, DU will issue a message indicating the required documentation. This documentation may differ from the requirements described above for the verification of the borrower's regular receipt of the full payment and its use as stable qualifying income. See B3-2-02, DU Validation Service.


In [51]:
chunks[0]['predictions'][0]['embeddings']['values'][0:10]

[0.029661040753126144,
 -0.013862958177924156,
 0.024388911202549934,
 0.04231889545917511,
 0.015836646780371666,
 0.0672280341386795,
 0.009761175140738487,
 0.04619113728404045,
 -0.019515754655003548,
 0.009478162042796612]

## Prepare Chunk Structure

Make a list of dictionaries with information for each chunk:

In [53]:
content_chunks = [
    dict(
        gse = chunk['instance']['gse'],
        chunk_id = chunk['instance']['chunk_id'],
        content = chunk['instance']['content'],
        embedding = chunk['predictions'][0]['embeddings']['values']
    ) for chunk in chunks
]

In [85]:
# content_chunks[0].keys()
# dict_keys(['gse', 'chunk_id', 'content', 'embedding'])

# content_chunks[0]['gse']
# content_chunks[0]['chunk_id']
content_chunks[0]['content']
# content_chunks[0]['embedding']

"# B3-3.1-09, Other Sources of Income (05/01/2024)\n\n## Alimony, Child Support, or Separate Maintenance\n\nNote: The lender may include alimony, child support, or separate maintenance as income only if the borrower discloses it on the Form 1003 and requests that it be considered in qualifying for the loan. If a borrower's alimony or child support income is validated by the DU validation service, DU will issue a message indicating the required documentation. This documentation may differ from the requirements described above for the verification of the borrower's regular receipt of the full payment and its use as stable qualifying income. See B3-2-02, DU Validation Service."

## Query Embedding

Create a query, or prompt, and get the embedding for it:

Connect to models for text embeddings. Learn more about the model API:

* [Vertex AI Text Embeddings API](https://github.com/statmike/vertex-ai-mlops/blob/315a89c3b3245ff2e08d2522cb24c6f5c3a0ebb1/Applied%20GenAI/Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

In [54]:
question = "Does a lender have to perform servicing functions directly?"

embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')

question_embedding = embedder.get_embeddings([question])[0].values
question_embedding[0:10]

[-0.0005117303808219731,
 0.009651427157223225,
 0.01768726110458374,
 0.014538003131747246,
 -0.01829824410378933,
 0.027877431362867355,
 -0.021124685183167458,
 0.008830446749925613,
 -0.02669006586074829,
 0.06414774805307388]

# Retrieval With Vertex AI Vector Search

batch input for Vector Search are sourced from GCS directory with structure:

```
batch_root/
├── features_1.csv
├── features_2.csv
└── delete/
    └── deletes_1.txt
```

[1] each `features*` files is `.csv`, `.json`, or `.avro` file of input feature data. The `delete` folder has `.txt` files of record IDs to remove from the the index. Each batch job will have a batch root folder like this.

[2] The `features` files have structs of input information for each input and requires a value for `id` and for `embedding` and/or `sparse_embedding`. The `sparse_embedding` can be great for keyword search and hybrid search. The example below focuses on embeddings which are also called dense embeddings.

[3] The `features` files can also have optional `restricts` with `namespace` and `allow` tokens for use in filtering and crowding during search. These will be used in the example below.

References:
* [Input data format and structure](https://cloud.google.com/vertex-ai/docs/vector-search/setup/format-structure)
* [Filter vector matches](https://cloud.google.com/vertex-ai/docs/vector-search/filtering)

In [55]:
inside_vs_data = [
    dict(
        id = chunk['instance']['chunk_id'],
        embedding = chunk['predictions'][0]['embeddings']['values'],
        restricts = [
            dict(
                namespace = 'gse',
                allow = [chunk['instance']['gse']],
                #deny = []
            )
        ],
        #numeric_restricts = [],
        crowding_tag = chunk['instance']['gse']
    ) for chunk in chunks
]

outside_vs_data = {}

for chunk in chunks:
    outside_vs_data[chunk['instance']['chunk_id']] = chunk['instance']['content']

**save to cloud storage bucket**

In [56]:
blob = bucket.blob(f'{PREFIX}/{EXPERIMENT_2}/batches/initial/feature.json')
jsonl_data = '\n'.join(json.dumps(row) for row in inside_vs_data)
blob.upload_from_string(jsonl_data, content_type = 'application/json')
list(bucket.list_blobs(prefix = f'{PREFIX}/{EXPERIMENT_2}'))

[<Blob: mortgage-ball-bucket, mortgage-ball/lending/batches/initial/feature.json, 1730882017969080>,
 <Blob: mortgage-ball-bucket, mortgage-ball/lending/full/fannie.pdf, 1730874724435048>,
 <Blob: mortgage-ball-bucket, mortgage-ball/lending/full/freddie.pdf, 1730874722832492>,
 <Blob: mortgage-ball-bucket, mortgage-ball/lending/parsing/parts/18058820851710512966/0/fannie_part_0-0.json, 1730878586965564>,
 <Blob: mortgage-ball-bucket, mortgage-ball/lending/parsing/parts/18058820851710512966/1/fannie_part_1-0.json, 1730878587340463>,
 <Blob: mortgage-ball-bucket, mortgage-ball/lending/parsing/parts/18058820851710512966/2/fannie_part_2-0.json, 1730878587773105>,
 <Blob: mortgage-ball-bucket, mortgage-ball/lending/parsing/parts/18058820851710512966/3/freddie_part_0-0.json, 1730878588145103>,
 <Blob: mortgage-ball-bucket, mortgage-ball/lending/parsing/parts/18058820851710512966/4/freddie_part_1-0.json, 1730878588520813>,
 <Blob: mortgage-ball-bucket, mortgage-ball/lending/parsing/parts/1805

## Get VVS index

In [67]:
if CREATE_RESOURCES:
    print("Creating new resources.")
else:
    print("Resource creation is skipped.")

Creating new resources.


In [71]:
# sys.path.append("..")
# from src import vvs_utils_v2 as vvs_utils

In [72]:
# Create vector search index if not exists else return index resource name
vs_index = vvs_utils.create_index(
    project_id=PROJECT_ID,
    region=REGION, # str,
    sync_job=False,
    create_resources=CREATE_RESOURCES, # bool, 
    vs_index_name=VS_INDEX_NAME, # str,
    vs_dimensions=VS_DIMENSIONS, # int,
    vs_approx_neghbors=VS_APPROX_NEIGHBORS, # int,
    distance_measure_type=VS_DISTANCE_MEASURE_TYPE, # str,
    vs_leaf_node_emb_count=VS_LEAF_NODE_EMB_COUNT, # int,
    vs_leaf_search_percent=VS_LEAF_SEARCH_PERCENT, # int,
    vs_description=VS_DESCRIPTION, # str,
    vs_index_shard_size=VS_INDEX_SHARD_SIZE, # str,
    vs_index_update_method=VS_INDEX_UPDATE_METHOD, # str,
)

Vector Search index mortgage-ball-v1-index exists with resource name projects/934903580331/locations/us-central1/indexes/1202439110275366912


In [92]:
vs_index.to_dict()['indexStats']

{'shardsCount': 1}

**get brute-force index...**

In [63]:
VS_INDEX_NAME_BF = f"{VS_INDEX_NAME}-brute-force"
print(VS_INDEX_NAME_BF)

check = aiplatform.MatchingEngineIndex.list(filter=f'display_name="{VS_INDEX_NAME_BF}"')

if len(check) > 0:
    print('Retrieved existing index with same name.')
    vs_index_brute_force = check[0]
else:
    print('Creating index ...')
    vs_index_brute_force = aiplatform.MatchingEngineIndex.create_brute_force_index(
        display_name = VS_INDEX_NAME_BF,
        dimensions = len(question_embedding),
        distance_measure_type = 'DOT_PRODUCT_DISTANCE',
        index_update_method = 'BATCH_METHOD',
        shard_size = 'SHARD_SIZE_SMALL',
        sync=False,
    )  

mortgage-ball-v1-index-brute-force
Creating index ...
Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/934903580331/locations/us-central1/indexes/3805572471453646848/operations/449282063384707072
MatchingEngineIndex created. Resource name: projects/934903580331/locations/us-central1/indexes/3805572471453646848
To use this MatchingEngineIndex in another session:
index = aiplatform.MatchingEngineIndex('projects/934903580331/locations/us-central1/indexes/3805572471453646848')


In [191]:
vs_index_brute_force.to_dict()['indexStats']

{'shardsCount': 1}

## Get VVS index endpoint

In [102]:
VS_INDEX_ENDPOINT_NAME = "mortgage-ball-v1-index-endpoint"

# Create vector search index endpoint if not exists else return index endpoint resource name
vs_endpoint = vvs_utils.create_index_endpoint(
    project_id=PROJECT_ID,
    region=REGION,
    sync_job=True,
    create_resources=CREATE_RESOURCES,
    vs_index_endpoint_name=VS_INDEX_ENDPOINT_NAME,
    vs_description=VS_DESCRIPTION,
)

# endpoint_ID="3307168248629297152"

Vector Search index endpoint mortgage-ball-v1-index-endpoint exists with resource name projects/934903580331/locations/us-central1/indexEndpoints/3307168248629297152


## Deploy VVS index to endpoint

In [192]:
# # deploy ANN index
# vvs_utils.deploy_index(
#     index=vs_index, 
#     endpoint=vs_endpoint,
#     create_resources=CREATE_RESOURCES,
#     vs_index_name=VS_INDEX_NAME,
#     vs_machine_type=VS_MACHINE_TYPE,
#     vs_min_replicas=VS_MIN_REPLICAS,
#     vs_max_replicas=VS_MAX_REPLICAS,
# )

# deploy brute-force index
vvs_utils.deploy_index(
    index=vs_index_brute_force, 
    endpoint=vs_endpoint,
    create_resources=CREATE_RESOURCES,
    vs_index_name=VS_INDEX_NAME_BF,
    vs_machine_type=VS_MACHINE_TYPE,
    vs_min_replicas=VS_MIN_REPLICAS,
    vs_max_replicas=VS_MAX_REPLICAS,
)

Deploying Vector Search index mortgage-ball-v1-index-brute-force at endpoint mortgage-ball-v1-index-endpoint ...
Deploying index MatchingEngineIndexEndpoint index_endpoint: projects/934903580331/locations/us-central1/indexEndpoints/3307168248629297152
Deploy index MatchingEngineIndexEndpoint index_endpoint backing LRO: projects/934903580331/locations/us-central1/indexEndpoints/3307168248629297152/operations/1643435254033154048
MatchingEngineIndexEndpoint index_endpoint Deployed index. Resource name: projects/934903580331/locations/us-central1/indexEndpoints/3307168248629297152
Vector Search index mortgage-ball-v1-index-brute-force is deployed at endpoint mortgage-ball-v1-index-endpoint


<google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint.MatchingEngineIndexEndpoint object at 0x7fac38875270> 
resource name: projects/934903580331/locations/us-central1/indexEndpoints/3307168248629297152

## Create DocAI processor

In [103]:
# # Create Document Layout Processor
# docai_processor = vvs_utils.create_docai_processor(
#     project_id=PROJECT_ID,
#     region=REGION, # str,
#     create_resources=CREATE_RESOURCES,
#     docai_location=DOCAI_LOCATION,
#     processor_display_name=DOCAI_PROCESSOR_NAME,
#     processor_type="LAYOUT_PARSER_PROCESSOR",
# )


# PROCESSOR_NAME = docai_processor.name  # DocAI Layout Parser Processor Name

# print(f"PROCESSOR_NAME: {PROCESSOR_NAME}")

## Load data to index

* `outside_vs_data = {'id': 'content', ...}`
* `SERIES = "rag-demo"`

In [134]:
# !gsutil -m cp -r gs://mortgage-ball-bucket/mortgage-ball gs://mortgage-ball-central-bucket/rag-demo

! gsutil ls gs://mortgage-ball-central-bucket/rag-demo

gs://mortgage-ball-central-bucket/rag-demo/embeddings-api/
gs://mortgage-ball-central-bucket/rag-demo/lending/
gs://mortgage-ball-central-bucket/rag-demo/mlb/


*load data to ANN index...*

In [142]:
if 'vectorsCount' not in vs_index.to_dict()['indexStats']:
    print('Loading Embeddings...')
    vs_index.update_embeddings(
        contents_delta_uri = f'gs://{bucket.name}/{SERIES}/{EXPERIMENT_2}/batches/initial',
        is_complete_overwrite = True
    )
    vs_index = aiplatform.MatchingEngineIndex(index_name = vs_index.name)
else:
    print('Embeddings already loaded.')

Loading Embeddings...
Updating MatchingEngineIndex index: projects/934903580331/locations/us-central1/indexes/1202439110275366912
Update MatchingEngineIndex index backing LRO: projects/934903580331/locations/us-central1/indexes/1202439110275366912/operations/803659059063422976
MatchingEngineIndex index Updated. Resource name: projects/934903580331/locations/us-central1/indexes/1202439110275366912


In [144]:
vs_index.to_dict()['indexStats']

{'vectorsCount': '9042', 'shardsCount': 1}

*load data to brute-force index...*

In [196]:
if 'vectorsCount' not in vs_index_brute_force.to_dict()['indexStats']:
    print('Loading Embeddings...')
    vs_index_brute_force.update_embeddings(
        contents_delta_uri = f'gs://{bucket.name}/{SERIES}/{EXPERIMENT_2}/batches/initial',
        is_complete_overwrite = True
    )
    vs_index_brute_force = aiplatform.MatchingEngineIndex(index_name = vs_index_brute_force.name)
else:
    print('Embeddings already loaded.')

In [194]:
vs_index_brute_force.to_dict()['indexStats']

{'vectorsCount': '9042', 'shardsCount': 1}

In [195]:
vs_endpoint.deployed_indexes

[id: "mortgage_ball_v1_index_brute_force_e39a79f7dfc0"
index: "projects/934903580331/locations/us-central1/indexes/3805572471453646848"
display_name: "mortgage-ball-v1-index-brute-force"
create_time {
  seconds: 1731162026
  nanos: 325788000
}
index_sync_time {
  seconds: 1731162209
  nanos: 493701000
}
dedicated_resources {
  machine_spec {
    machine_type: "e2-standard-16"
  }
  min_replica_count: 1
  max_replica_count: 1
}
deployment_group: "default"
, id: "mortgage_ball_v1_index_c56d551518b5"
index: "projects/934903580331/locations/us-central1/indexes/1202439110275366912"
display_name: "mortgage-ball-v1-index"
create_time {
  seconds: 1730869612
  nanos: 996253000
}
index_sync_time {
  seconds: 1731162087
  nanos: 458954000
}
dedicated_resources {
  machine_spec {
    machine_type: "e2-standard-16"
  }
  min_replica_count: 1
  max_replica_count: 1
}
deployment_group: "default"
]

In [None]:
number_of_vectors = sum(
    aiplatform.MatchingEngineIndex(
        deployed_index.index
    )._gca_resource.index_stats.vectors_count
    for deployed_index in vs_endpoint.deployed_indexes
)

print(f"Actual: {number_of_vectors}")

#### tmp - debugging - START

In [122]:
print(chunks[0]['instance']['content'])

# B3-3.1-09, Other Sources of Income (05/01/2024)

## Alimony, Child Support, or Separate Maintenance

Note: The lender may include alimony, child support, or separate maintenance as income only if the borrower discloses it on the Form 1003 and requests that it be considered in qualifying for the loan. If a borrower's alimony or child support income is validated by the DU validation service, DU will issue a message indicating the required documentation. This documentation may differ from the requirements described above for the verification of the borrower's regular receipt of the full payment and its use as stable qualifying income. See B3-2-02, DU Validation Service.


In [112]:
# type(outside_vs_data) # dict
outside_vs_data["fannie_part_0_c899"]

"# B3-3.1-09, Other Sources of Income (05/01/2024)\n\n## Alimony, Child Support, or Separate Maintenance\n\nNote: The lender may include alimony, child support, or separate maintenance as income only if the borrower discloses it on the Form 1003 and requests that it be considered in qualifying for the loan. If a borrower's alimony or child support income is validated by the DU validation service, DU will issue a message indicating the required documentation. This documentation may differ from the requirements described above for the verification of the borrower's regular receipt of the full payment and its use as stable qualifying income. See B3-2-02, DU Validation Service."

In [114]:
# type(chunks) # list
# dict_keys(['instance', 'predictions', 'status'])

chunks[0]['instance']['content']

"# B3-3.1-09, Other Sources of Income (05/01/2024)\n\n## Alimony, Child Support, or Separate Maintenance\n\nNote: The lender may include alimony, child support, or separate maintenance as income only if the borrower discloses it on the Form 1003 and requests that it be considered in qualifying for the loan. If a borrower's alimony or child support income is validated by the DU validation service, DU will issue a message indicating the required documentation. This documentation may differ from the requirements described above for the verification of the borrower's regular receipt of the full payment and its use as stable qualifying income. See B3-2-02, DU Validation Service."

In [116]:
chunks[0]['instance']

{'file_chunk_id': 'c899',
 'chunk_id': 'fannie_part_0_c899',
 'filename': 'fannie_part_0',
 'content': "# B3-3.1-09, Other Sources of Income (05/01/2024)\n\n## Alimony, Child Support, or Separate Maintenance\n\nNote: The lender may include alimony, child support, or separate maintenance as income only if the borrower discloses it on the Form 1003 and requests that it be considered in qualifying for the loan. If a borrower's alimony or child support income is validated by the DU validation service, DU will issue a message indicating the required documentation. This documentation may differ from the requirements described above for the verification of the borrower's regular receipt of the full payment and its use as stable qualifying income. See B3-2-02, DU Validation Service.",
 'gse': 'fannie'}

#### tmp - debugging - END

In [197]:
from langchain_google_vertexai.embeddings import VertexAIEmbeddings
from langchain_google_vertexai.vectorstores.vectorstores import VectorSearchVectorStore

In [198]:
embedding_model = VertexAIEmbeddings(model_name=EMBEDDINGS_MODEL_NAME)

**initialize vectorstore retreiver...**

In [199]:
vs_index_brute_force.resource_name

'projects/934903580331/locations/us-central1/indexes/3805572471453646848'

In [200]:
from langchain_google_vertexai.vectorstores.vectorstores import VectorSearchVectorStore

# ANN index vectorstore
vector_store = VectorSearchVectorStore.from_components(
    project_id=PROJECT_ID,
    region=REGION,
    gcs_bucket_name=GCS_BUCKET_NAME,
    index_id=vs_index.resource_name,
    endpoint_id=vs_endpoint.resource_name,
    embedding=embedding_model,
    stream_update=True,
)

In [201]:
# brute-force index vectorstore
bf_vector_store = VectorSearchVectorStore.from_components(
    project_id=PROJECT_ID,
    region=REGION,
    gcs_bucket_name=GCS_BUCKET_NAME,
    index_id=vs_index_brute_force.resource_name,
    endpoint_id=vs_endpoint.resource_name,
    embedding=embedding_model,
    stream_update=True,
)

**Store chunks as embeddings in the Vector Search index and raw texts in the Cloud Storage bucket**

`docs[0]`:

```
Document(metadata={'chunk_id': 'c1', 'source': 'gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2021Q1_alphabet_earnings_release.pdf'}, page_content='# Alphabet Announces First Quarter 2021 Results\n\nMOUNTAIN VIEW, Calif. – April 27, 2021 – Alphabet Inc. (NASDAQ: GOOG, GOOGL) today announced financial results for the quarter ended March 31, 2021. Sundar Pichai, CEO of Google and Alphabet, said: "Over the last year, people have turned to Google Search and many online services to stay informed, connected and entertained. We\'ve continued our focus on delivering...
```

In [203]:
# add_data(vector_store, docs)

In [None]:
# add_data(bf_vector_store, docs)

## Query Endpoint

> Use the endpoint to query for data and matches

### Retrieve: Embedding For Entity

> Retrieve the index entry for a specific id:

In [163]:
ANN_DEPLOYED_INDEX_ID = "mortgage_ball_v1_index_c56d551518b5"

result = vs_endpoint.read_index_datapoints(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, #.replace('-', '_'),
    ids = ['freddie_part_0_c40'] # fannie_part_0_c899 | fannie_part_0_c40
)
result[0].datapoint_id

'freddie_part_0_c40'

In [164]:
result[0].feature_vector[0:5]

[0.006058377679437399,
 0.03943474963307381,
 0.05166151374578476,
 0.042779989540576935,
 -0.01732536405324936]

In [165]:
result[0].restricts

[namespace: "gse"
allow_list: "freddie"
]

### Retrieve: Embeddings For Entities

> Retrieve the index entry for a list of specific ids:

In [166]:
result = vs_endpoint.read_index_datapoints(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, #vs_index.display_name.replace('-', '_'),
    ids = ['fannie_part_0_c40', 'freddie_part_0_c40']
)
result[0].datapoint_id
# result[0].feature_vector[0:5]

'fannie_part_0_c40'

In [167]:
result[0].restricts

[namespace: "gse"
allow_list: "fannie"
]

## Get Matches

> Return the embedding in the response by adding `return_full_datapoint = True`

### Match an Entity

In [168]:
results = vs_endpoint.find_neighbors(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, # vs_index.display_name.replace('-', '_'),
    num_neighbors = 2,
    embedding_ids = ['fannie_part_0_c40']
)
results

[[MatchNeighbor(id='fannie_part_0_c40', distance=0.9998153448104858, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='fannie_part_0_c39', distance=0.9385684728622437, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[])]]

### Match Query Embedding

> Return the embedding in the response by adding `return_full_datapoint = True`

In [169]:
results = vs_endpoint.find_neighbors(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, #vs_index.display_name.replace('-', '_'),
    queries = [question_embedding],
    num_neighbors = 2
)

results

[[MatchNeighbor(id='fannie_part_0_c352', distance=0.7099842429161072, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='freddie_part_4_c507', distance=0.680526077747345, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[])]]

In [170]:
[(match.id, match.distance) for match in results[0]]

[('fannie_part_0_c352', 0.7099842429161072),
 ('freddie_part_4_c507', 0.680526077747345)]

### Matches: compare ANN vs BF index

Calculate recall by determining how many neighbors correctly retrieved as compared to the brute-force index

> Return the embedding in the response by adding `return_full_datapoint = True`

In [216]:
import time

NUM_NIEGHBORS=100

In [250]:
start = time.time()
ANN_response = vs_endpoint.find_neighbors(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, # vs_index_brute_force.display_name.replace('-', '_'),
    queries = [question_embedding],
    # return_full_datapoint = True,
    num_neighbors = NUM_NIEGHBORS,
    fraction_leaf_nodes_to_search_override = 0.5,
    # per_crowding_attribute_neighbor_count = 2
    
)
elapsed_ann_time = time.time() - start
elapsed_ann_time = round(elapsed_ann_time, 4)
print(f'ANN latency: {elapsed_ann_time} seconds')

ANN latency: 0.0787 seconds


In [251]:
len(ANN_response[0])

100

In [252]:
BF_DEPLOYED_INDEX_ID = "mortgage_ball_v1_index_brute_force_e39a79f7dfc0"

start = time.time()
BF_response = vs_endpoint.find_neighbors(
    deployed_index_id = BF_DEPLOYED_INDEX_ID, # vs_index_brute_force.display_name.replace('-', '_'),
    queries = [question_embedding],
    # return_full_datapoint = True,
    num_neighbors = NUM_NIEGHBORS,
    fraction_leaf_nodes_to_search_override = 0.9,
    # per_crowding_attribute_neighbor_count = 2
    
)
elapsed_bf_time = time.time() - start
elapsed_bf_time = round(elapsed_bf_time, 4)
print(f'BF latency: {elapsed_bf_time} seconds')

BF latency: 0.0742 seconds


In [253]:
len(BF_response[0])
# BF_response[0][0]

100

In [254]:
recalled_neighbors = 0
for tree_ah_neighbors, brute_force_neighbors in zip(
    ANN_response, BF_response
):
    tree_ah_neighbor_ids = [neighbor.id for neighbor in tree_ah_neighbors]
    brute_force_neighbor_ids = [neighbor.id for neighbor in brute_force_neighbors]

    recalled_neighbors += len(
        set(tree_ah_neighbor_ids).intersection(brute_force_neighbor_ids)
    )

recall = recalled_neighbors / len(
    [neighbor for neighbors in BF_response for neighbor in neighbors]
)
print("Recall: {}".format(recall * 100.0))

Recall: 100.0


### Matches: For Query Embedding - Expand Search Candidates

> Return the embedding in the response by adding `return_full_datapoint = True.`

In [255]:
results = vs_endpoint.find_neighbors(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, #vs_index.display_name.replace('-', '_'),
    queries = [question_embedding],
    num_neighbors = 4,
    fraction_leaf_nodes_to_search_override = 0.5
)

results

[[MatchNeighbor(id='fannie_part_0_c352', distance=0.7099842429161072, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='freddie_part_4_c507', distance=0.680526077747345, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='freddie_part_4_c508', distance=0.6753297448158264, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='fannie_part_0_c353', distance=0.6723706722259521, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[])]]

In [256]:
[(match.id, match.distance, match.crowding_tag) for match in results[0]]

[('fannie_part_0_c352', 0.7099842429161072, '1074578924770667433'),
 ('freddie_part_4_c507', 0.680526077747345, '-353579237037459606'),
 ('freddie_part_4_c508', 0.6753297448158264, '-353579237037459606'),
 ('fannie_part_0_c353', 0.6723706722259521, '1074578924770667433')]

### Matches: For Query Embedding - Diversity of Responses

> Return the embedding in the response by adding `return_full_datapoint = True.`

In [257]:
results = vs_endpoint.find_neighbors(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, #vs_index.display_name.replace('-', '_'),
    queries = [question_embedding],
    num_neighbors = 4,
    per_crowding_attribute_neighbor_count = 2
)

results

[[MatchNeighbor(id='fannie_part_0_c352', distance=0.7099842429161072, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='freddie_part_4_c507', distance=0.680526077747345, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='freddie_part_4_c508', distance=0.6753297448158264, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='fannie_part_0_c353', distance=0.6723706722259521, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[])]]

In [258]:
[(match.id, match.distance, match.crowding_tag) for match in results[0]]

[('fannie_part_0_c352', 0.7099842429161072, '1074578924770667433'),
 ('freddie_part_4_c507', 0.680526077747345, '-353579237037459606'),
 ('freddie_part_4_c508', 0.6753297448158264, '-353579237037459606'),
 ('fannie_part_0_c353', 0.6723706722259521, '1074578924770667433')]

### Matches: For Query Embedding - Limit Search Area With Allowlist

In [259]:
results = vs_endpoint.find_neighbors(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, #vs_index.display_name.replace('-', '_'),
    queries = [question_embedding],
    num_neighbors = 4,
    filter = [
        aiplatform.matching_engine.matching_engine_index_endpoint.Namespace('gse', ['fannie'], []) # Namespece, allow list, deny list
    ]
)

results

[[MatchNeighbor(id='fannie_part_0_c352', distance=0.7099842429161072, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='fannie_part_0_c353', distance=0.6723706722259521, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='fannie_part_0_c326', distance=0.6683496832847595, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='fannie_part_0_c93', distance=0.661433756351471, sparse_distance=None, feature_vector=[], crowding_tag='1074578924770667433', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[])]]

In [260]:
[(match.id, match.distance, match.crowding_tag) for match in results[0]]

[('fannie_part_0_c352', 0.7099842429161072, '1074578924770667433'),
 ('fannie_part_0_c353', 0.6723706722259521, '1074578924770667433'),
 ('fannie_part_0_c326', 0.6683496832847595, '1074578924770667433'),
 ('fannie_part_0_c93', 0.661433756351471, '1074578924770667433')]

### Matches: For Query Embedding - Limit Search Area With Denylist

In [261]:
results = vs_endpoint.find_neighbors(
    deployed_index_id = ANN_DEPLOYED_INDEX_ID, #vs_index.display_name.replace('-', '_'),
    queries = [question_embedding],
    num_neighbors = 8,
    filter = [
        aiplatform.matching_engine.matching_engine_index_endpoint.Namespace('gse', [], ['fannie']) # Namespece, allow list, deny list
    ]
)

results

[[MatchNeighbor(id='freddie_part_4_c507', distance=0.680526077747345, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='freddie_part_4_c508', distance=0.6753297448158264, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='freddie_part_4_c470', distance=0.6619843244552612, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='freddie_part_6_c439', distance=0.6604534983634949, sparse_distance=None, feature_vector=[], crowding_tag='-353579237037459606', restricts=[], numeric_restricts=[], sparse_embedding_values=[], sparse_embedding_dimensions=[]),
  MatchNeighbor(id='f

In [262]:
[(match.id, match.distance, match.crowding_tag) for match in results[0]]

[('freddie_part_4_c507', 0.680526077747345, '-353579237037459606'),
 ('freddie_part_4_c508', 0.6753297448158264, '-353579237037459606'),
 ('freddie_part_4_c470', 0.6619843244552612, '-353579237037459606'),
 ('freddie_part_6_c439', 0.6604534983634949, '-353579237037459606'),
 ('freddie_part_4_c556', 0.6575403809547424, '-353579237037459606'),
 ('freddie_part_6_c410', 0.6573532819747925, '-353579237037459606'),
 ('freddie_part_5_c360', 0.6557324528694153, '-353579237037459606'),
 ('freddie_part_4_c506', 0.6519314050674438, '-353579237037459606')]

# Simple RAG Using Vertex AI Vector Search For Retrieval

Vertex AI Vector Search returns the id's of matching entities. The text content for these matches need be retrieved as a next step. In the data preparation above the content of the text was stored in a dictionary locally for easy lookup:

* `outside_vs_data = {'id': 'content', ...}`

In [263]:
# prompt building function...
def get_prompt(question, top_n = 5):
    # get embedding for question
    question_embedding = embedder.get_embeddings([question])[0].values
    
    # get top_n matches:
    results = vs_endpoint.find_neighbors(
        deployed_index_id = ANN_DEPLOYED_INDEX_ID, #vs_index.display_name.replace('-', '_'),
        queries = [question_embedding],
        num_neighbors = top_n
    ) 
    matches = [(match.id, match.distance) for match in results[0]]

    # construct prompt:
    prompt = ''
    for m, match in enumerate(matches):
        prompt += f"Context {m+1}:\n{outside_vs_data[match[0]]}\n\n"
    prompt += f'Answer the following question using the provided contexts:\n{question}'
    
    return matches, prompt

In [264]:
matches, prompt = get_prompt(question) 
print(prompt)

Context 1:
# A3-3-03, Other Servicing Arrangements (12/15/2015)

Introduction This topic provides an overview of other servicing arrangements, including: • Subservicing • General Requirements for Subservicing Arrangements • Pledge of Servicing Rights and Transfer of Interest in Servicing Income

## Subservicing

A lender may use other organizations to perform some or all of its servicing functions. Fannie Mae refers to these arrangements as “subservicing” arrangements, meaning that a servicer (the “subservicer”) other than the contractually responsible servicer (the “master” servicer) is performing the servicing functions. The following are not considered to be subservicing arrangements: • when a computer service bureau is used to perform accounting and reporting functions; • when the originating lender sells and assigns servicing to another lender, unless the originating lender continues to be the contractually responsible servicer.

Context 2:
# (1) Notice requirements

The notice mu

# Grounded Generation

In [265]:
llm = vertexai.generative_models.GenerativeModel("gemini-1.5-flash-002")

answer = llm.generate_content(prompt).text
print(answer)

No, a lender does not have to perform servicing functions directly.  Context 1 explicitly states that a lender "may use other organizations to perform some or all of its servicing functions," referring to this as "subservicing."  The documents further detail the requirements and regulations surrounding such arrangements, including the roles of master servicers and subservicers.

