## Beyond Relational Databases: Taming LLM & Transformer Embeddings with PGVector

### 1. Objectives

- Set up PostgreSQL with the pgvector extension in a Docker container, and create database
- Use langchain to add embeddings to database, created with OpenAI's  `text-embedding-ada-002` embedding model
- Query the database from langchain to find the most similar embeddings to a given query
- Use langchain to add embeddings to database, created with HuggingFace's `all-mpnet-base-v2` embedding model
- Use LlamaIndex to add embeddings to database, created with OpenAI's  `text-embedding-ada-002` embedding model
- Use sqlalchemy and psycopg2 to add embeddings to database, created with HuggingFace's `all-mpnet-base-v2` embedding model

### 2. Install Pre-requisites

In [3]:
# !pip install --upgrade pgvector langchain openai psycopg2-binary tiktoken python-dotenv

### 3. Vectorizing Text Chunks with Langchain

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "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"

## Loading Environment Variables
from dotenv import load_dotenv

load_dotenv()

False

In [5]:
# Import necessary packages
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings

In [None]:
# Read the state_of_the_union.txt file using Langchain's  TextLoader
loader = TextLoader('state_of_the_union.txt', encoding='utf-8')
documents = loader.load()

print(documents)  # prints the document objects
print(len(documents))  # 1 - we've only read one file/document into the loader

In [None]:
# Use the langchain  RecursiveCharacterTextSplitter object to split this text into chunks.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
texts = text_splitter.split_documents(documents)

print(texts)
print(len(texts))

In [8]:
print(texts[0])

page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.' metadata={'source': 'state_of

In [9]:
#  Convert our chunks to embeddings (vectors)
embeddings = OpenAIEmbeddings()

vector = embeddings.embed_query('Testing the embedding model')

print(len(vector))  # 1536 dimensions

  warn_deprecated(


1536


In [None]:
doc_vectors = embeddings.embed_documents([t.page_content for t in texts[:5]])

print(len(doc_vectors))  # 5 vectors in the output
print(doc_vectors[0])    # this will output the first chunk's 1539-dimensional vector

### 4. PGVector for storing Embeddings

#### 4.1 PGVector setup

- Pull the PGVector database: `docker pull ankane/pgvector`
- Start the container with the following command
`docker run --name pgvector-demo -e POSTGRES_PASSWORD=test -p 5432:5432 -d ankane/pgvector`
- Verify that this is running with: `docker ps`
- You can now install a GUI tool such as pgAdmin to inspect the database that is running in the container, or else use psql on the command-line. When connecting, we can specify the host as localhost, and the password as whatever we used in the above command -test, in our case.
- We will now create a database, and then add the pgvector extension to that database, with the following SQL commands:
    
    ```
    CREATE DATABASE vector_db;
    CREATE EXTENSION vector;
    ```

#### 4.2 Storing texts and embeddings to PGVector

In [13]:
# The PGVector Module will try to create a table with the name of the collection.
# So, make sure that the collection name is unique and the user has the permission to create a table.

from langchain.vectorstores.pgvector import PGVector

CONNECTION_STRING = "postgresql+psycopg2://postgres:test@localhost:5432/vector_db"
COLLECTION_NAME = 'state_of_union_vectors'

db = PGVector.from_documents(
    embedding=embeddings,
    documents=texts,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
)

In [14]:
# Let's do this similarity check in the vector space, and we can use the following Langchain code to do so:
query = "What did the president say about Russia"
similar = db.similarity_search_with_score(query, k=2)

for doc in similar:
    print(doc, end="\n\n")

(Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source'

In [None]:
# Let's get the 1536-dimensional embedding for the above query, with this code:
vector = embeddings.embed_query(query)
print(vector)

#### 4.3 Working with vectorstore

Above, we created a vectorstore from scratch. However, often times we want to work with an existing vectorstore. In order to do that, we can initialize it directly.

In [35]:
from langchain.vectorstores.pgvector import PGVector
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
CONNECTION_STRING = "postgresql+psycopg2://postgres:test@localhost:5432/vector_db"
COLLECTION_NAME = 'test'

store = PGVector(
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
    embedding_function=embeddings,
    
)

In [36]:
from langchain.docstore.document import Document

# Add documents
store.add_documents([Document(page_content="welcome back")])

['3be98acf-c686-11ee-9da3-a4423b5e9371']

### 5. PGVector and LangChain with 768 dim HuggingFace embeddings

We can not allow users to store embeddings into a custom table and this may not be a good choice for the data team who would like to create or use existing tables, by default langchain library will store embeddings in the following tables but it can accept the custom embedding size.

1. langchain_pg_collection: Store the collection details
2. langchain_pg_embedding: Store the embedding details.

Discussion reference: https://github.com/langchain-ai/langchain/discussions/17223

In [None]:
from langchain.vectorstores.pgvector import PGVector
import os
from langchain.embeddings import HuggingFaceEmbeddings

# embeddings = OpenAIEmbeddings()
embeddings = HuggingFaceEmbeddings(model_name="all-mpnet-base-v2")

CONNECTION_STRING = "postgresql+psycopg2://postgres:test@localhost:5432/vector_db"
COLLECTION_NAME = 'huggingface'

os.environ["PGVECTOR_VECTOR_SIZE"] = str(768)
vectorstore = PGVector(connection_string=CONNECTION_STRING,
    embedding_function=embeddings,
    collection_name=COLLECTION_NAME,
    #pre_delete_collection=True # for testing purposes
    )


from langchain.docstore.document import Document

# Add documents
vectorstore.add_documents([Document(page_content="How are you today?")])

vector = embeddings.embed_query('How are you today?')
vector

### 6. PGVector and LlamaIndex with 1536 dim OpenAI embeddings

We can allow users to store embeddings into a custom table by providing the custom embedding size but it creates a table with `data_<table_name>` prefix.

In [8]:
# !pip install llama_index
# !pip install asyncpg

In [24]:
from llama_index import SimpleDirectoryReader, StorageContext
from llama_index.indices.vector_store import VectorStoreIndex
from llama_index.vector_stores import PGVectorStore
import textwrap
import openai

In [87]:
# Setup OpenAI
import os
os.environ["OPENAI_API_KEY"] = "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"

## Loading Environment Variables
from dotenv import load_dotenv

load_dotenv()

False

In [26]:
# Download Data Manually
# !mkdir -p 'data/paul_graham/'
# !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

# Loading documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)

Document ID: b7e03f64-4108-459f-827d-87e879731e01


In [28]:
# Create the Database
import psycopg2

connection_string = "postgresql://postgres:test@localhost:5432"
db_name = "vector_db"
conn = psycopg2.connect(connection_string)
conn.autocommit = True

with conn.cursor() as c:
    c.execute(f"DROP DATABASE IF EXISTS {db_name}")
    c.execute(f"CREATE DATABASE {db_name}")

In [29]:
from llama_index import SimpleDirectoryReader, StorageContext
from llama_index.indices.vector_store import VectorStoreIndex
from llama_index.vector_stores import PGVectorStore

# Create the index
from sqlalchemy import make_url

# Loading documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)

url = make_url(connection_string)
vector_store = PGVectorStore.from_params(
    database=db_name,
    host=url.host,
    password=url.password,
    port=url.port,
    user=url.username,
    table_name="paul_graham_essay",
    embed_dim=1536,  # openai embedding dimension
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, show_progress=True
)

query_engine = index.as_query_engine()

Document ID: 03603262-b343-496b-a602-f764b4187652


Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Parsing nodes: 100%|██████████| 1/1 [00:00<00:00,  5.87it/s]
Generating embeddings: 100%|██████████| 21/21 [00:08<00:00,  2.56it/s]


### 7. PGVector and HuggingFace

In [2]:
# !pip install sentence-transformers==2.2.2

In [32]:
# !pip install sqlalchemy

#### 7.1 PGVector and HuggingFace with 768 dim embeddings using sqlalchemy Python API

The main uses of SQLAlchemy are avoiding the need to write SQL directly, providing a database agnostic ORM, handling connections and transactions, and giving a high level interface while still allowing low level control when needed. The ORM and abstraction from specific database syntax is the major advantage.

Let's create a CustomPGVector class by referencing langchain so that we can allow users to store embeddings into custom table by providing the custom embedding size.

In [67]:
import sqlalchemy
from pgvector.sqlalchemy import Vector
from sqlalchemy import create_engine, Column, Integer, LargeBinary
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sentence_transformers import SentenceTransformer
from sqlalchemy.dialects.postgresql import JSON, UUID
import uuid

from typing import Any, Dict, Iterable, List, Optional, Tuple, Type
import logging

class CustomPGVector():
    def __init__(
        self,
        model,
        connection_string: str,
        table_name: str,
        embed_dim: int,
        logger: Optional[logging.Logger] = None,
    ) -> None:
        self.table_name = table_name
        self.model = model
        self.embed_dim = embed_dim
        self.connection_string = connection_string
        self.logger = logger or logging.getLogger(__name__)
        self.__post_init__()
    
        # Define SQLAlchemy Base and EmbeddingStore classes
        self.Base = sqlalchemy.orm.declarative_base()

        class EmbeddingStore(self.Base):
            __tablename__ = self.table_name

            uuid = sqlalchemy.Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
            embedding = sqlalchemy.Column(Vector(self.embed_dim))
            document = sqlalchemy.Column(sqlalchemy.String, nullable=True)
            cmetadata = sqlalchemy.Column(JSON, nullable=True)

        self.EmbeddingStore = EmbeddingStore

    def __post_init__(
        self,
    ) -> None:
        """
        Initialize the store.
        """
        self._conn = self.connect()


    def connect(self) -> sqlalchemy.engine.Connection:
        engine = sqlalchemy.create_engine(self.connection_string)
        conn = engine.connect()
        return conn


    def add_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        # ids: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> List[str]:
        """Run more texts through the embeddings and add to the vectorstore.

        Args:
            texts: Iterable of strings to add to the vectorstore.
            metadatas: Optional list of metadatas associated with the texts.
            kwargs: vectorstore specific parameters

        Returns:
            List of ids from adding the texts into the vectorstore.
        """    
        if not metadatas:
            metadatas = [{} for _ in texts]

        embeddings = self.embed_documents(list(texts))

        Session = sessionmaker(bind=self._conn)
        with Session() as session:
            uuids = []
            for text, metadata, embedding in zip(texts, metadatas, embeddings):
                embedding_store = self.EmbeddingStore(
                    embedding=embedding,
                    document=text,
                    cmetadata=metadata
                )
                session.add(embedding_store)
                session.commit()
                uuids.append(str(embedding_store.uuid))
        return uuids


    def embed_documents(self, texts: Iterable[str]):
        embeddings = self.model.encode(texts)
        return embeddings.tolist()
    

    def retrieve_embeddings(self, doc_ids: List[str]):
        try:
            placeholder = ', '.join(["'" + e + "'" for e in doc_ids])

            Session = sessionmaker(bind=self._conn)
            with Session() as session:
                statement = sqlalchemy.text(f'SELECT embedding FROM {self.table_name} WHERE uuid IN ({placeholder});')
                result = session.execute(statement)
                embeddings = result.fetchall()
                session.commit()
        except Exception as e:
            self.logger.exception(e)
            embeddings = []

        return embeddings
        

    def retrieve_top_k_relevant_docs(self, k: int):
        try:
            Session = sessionmaker(bind=self._conn)
            with Session() as session:
                statement = sqlalchemy.text(f'SELECT uuid, document FROM {self.table_name} ORDER BY embedding <-> embedding LIMIT {k};')
                result = session.execute(statement)
                docs = result.fetchall()
                session.commit()
        except Exception as e:
            self.logger.exception(e)
            docs = []

        return docs

Store text embeddings into a custom table in PGVector and provide the required embedding size:

In [68]:
# Example usage
model = SentenceTransformer('all-mpnet-base-v2')
CONNECTION_STRING = 'postgresql+psycopg2://postgres:test@localhost:5432/vector_db'
TABLE_NAME = "huggingface"

store = CustomPGVector(
    model=model,
    connection_string=CONNECTION_STRING,
    table_name=TABLE_NAME,
    embed_dim=768
)

texts = [
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.",
"Last year COVID-19 kept us apart. This year we are finally together again.",
"Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.",
"With a duty to one another to the American people to the Constitution.",
"And with an unwavering resolve that freedom will always triumph over tyranny.",
"Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated.",
"He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined.",
"He met the Ukrainian people.",
"From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.",
"Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.",
"In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.",
"Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.",
"Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people.",
"Throughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.",
"They keep moving."]

uuids = store.add_texts(texts)
print(uuids)

Retrieve embeddings:

In [None]:
doc_id_list = ['1e62faab-5211-4b56-b156-9136a7a00195', '3a6d3b79-368b-4057-9ae2-db4f331a602f']
store.retrieve_embeddings(doc_id_list)

Retrieve the top K highest-scoring documents from PGVector:

In [19]:
store.retrieve_top_k_relevant_docs(5)

docs [(UUID('3a6d3b79-368b-4057-9ae2-db4f331a602f'), 'Last year COVID-19 kept us apart. This year we are finally together again.'), (UUID('3ce34ccf-d78d-4469-99e1-29d1afd00699'), 'Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.'), (UUID('29770383-3e7b-45f0-8121-fecb1e2a7a5e'), 'With a duty to one another to the American people to the Constitution.'), (UUID('95326af9-3f25-44d0-9123-fb368698a6a7'), 'And with an unwavering resolve that freedom will always triumph over tyranny.'), (UUID('1e62faab-5211-4b56-b156-9136a7a00195'), 'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.')]


[(UUID('3a6d3b79-368b-4057-9ae2-db4f331a602f'), 'Last year COVID-19 kept us apart. This year we are finally together again.'),
 (UUID('3ce34ccf-d78d-4469-99e1-29d1afd00699'), 'Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.'),
 (UUID('29770383-3e7b-45f0-8121-fecb1e2a7a5e'), 'With a duty to one another to the American people to the Constitution.'),
 (UUID('95326af9-3f25-44d0-9123-fb368698a6a7'), 'And with an unwavering resolve that freedom will always triumph over tyranny.'),
 (UUID('1e62faab-5211-4b56-b156-9136a7a00195'), 'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.')]

#### 7.2 PGVector and HuggingFace with 768 dim embeddings using psycopg2 Python API

psycopg2 is the new generation psycopg, redesigned to be faster, lighter, and better suited for concurrency. It is the recommended PostgreSQL adapter for current Python versions.

Let's create a CustomPGVector class so that we can allow users to store and query embeddings into a custom table by providing the custom embedding size.

In [None]:
# !pip install psycopg2

In [72]:
import psycopg2
import uuid

from typing import Any, Iterable, List, Optional
import logging

class CustomPGVector():
    def __init__(
        self,
        model,
        connection_string: str,
        table_name: str,
        embed_dim: int,
        logger: Optional[logging.Logger] = None,
    ) -> None:
        self.table_name = table_name
        self.model = model
        self.embed_dim = embed_dim
        self.connection_string = connection_string
        self.logger = logger or logging.getLogger(__name__)
        self.__post_init__()

    def __post_init__(
        self,
    ) -> None:
        """
        Initialize the store.
        """
        self.cur = self.connect()


    def connect(self):
        conn = psycopg2.connect(self.connection_string)
        conn.autocommit = True
        cur = conn.cursor()
        return cur


    def add_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        # ids: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> List[str]:
        """Run more texts through the embeddings and add to the vectorstore.

        Args:
            texts: Iterable of strings to add to the vectorstore.
            kwargs: vectorstore specific parameters

        Returns:
            List of ids from adding the texts into the vectorstore.
        """    
        embeddings = self.embed_documents(list(texts))

        uuids = []
        for document, embedding in zip(texts, embeddings):
            doc_id = uuid.uuid4()
            self.cur.execute(f'INSERT INTO {self.table_name} (id, document, embedding) VALUES (%s, %s, %s) RETURNING id', (doc_id, document, embedding))
            uuids.append(self.cur.fetchone()[0])

        return uuids


    def embed_documents(self, texts: Iterable[str]):
        """Generate embeddings for text documents.

        Args:
            texts: Iterable of strings to add to the vectorstore.

        Returns:
            List of embeddings adding into the vectorstore.
        """    
        embeddings = self.model.encode(texts)
        return embeddings.tolist()
    

    def retrieve_embeddings(self, doc_ids: List[str]):
        """Retrieve embeddings from vectorstore.

        Args:
            doc_ids: Iterable of strings to fetch embeddings from the vectorstore.

        Returns:
            List of embeddings added into the vectorstore.
        """ 
        try:
            placeholder = ', '.join(["'" + e + "'" for e in doc_ids])
            self.cur.execute(f'SELECT embedding FROM {self.table_name} WHERE ID IN ({placeholder});')
            embeddings = self.cur.fetchall()
        except Exception as e:
            self.logger.exception(e)
            embeddings = []

        return embeddings
    

    def retrieve_top_k_relevant_docs(self, k: int):
        """Retrieve top k relevant documents.
        We order by embedding <-> embedding which will order by nearest neighbor score.
        The <-> operator computes the L2 distance between two vectors. By comparing to itself, it will order by smallest distance (most similar) to largest.

        Args:
            k: Number of relevant documents to return.

        Returns:
            List of top k relevant documents from the vectorstore.
        """ 
        try:
            self.cur.execute(f'SELECT id, document FROM {self.table_name} ORDER BY embedding <-> embedding LIMIT {k};')
            docs = self.cur.fetchall()
        except Exception as e:
            self.logger.exception(e)
            docs = []

        return docs

Store text embeddings into a custom table in PGVector and provide the required embedding size:

In [73]:
from sentence_transformers import SentenceTransformer

# Example usage
model = SentenceTransformer('all-mpnet-base-v2')
CONNECTION_STRING = 'postgresql://postgres:test@localhost:5432/vector_db'
TABLE_NAME = "documents"

store = CustomPGVector(
    model=model,
    connection_string=CONNECTION_STRING,
    table_name=TABLE_NAME,
    embed_dim=768
    # table_columns=TABLE_COLUMNS
)

texts = [
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.",
"Last year COVID-19 kept us apart. This year we are finally together again.",
"Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.",
"With a duty to one another to the American people to the Constitution.",
"And with an unwavering resolve that freedom will always triumph over tyranny.",
"Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated.",
"He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined.",
"He met the Ukrainian people.",
"From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.",
"Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.",
"In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.",
"Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.",
"Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people.",
"Throughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.",
"They keep moving."]

uuids = store.add_texts(texts)
print(uuids)

['3245d70d-7a92-487a-8d82-ed30083bea88', '78082dd8-92e3-4206-8341-91f4807903ba', '2442cd86-cc65-4c6b-bffe-0ce6a932b45c', '987414da-559f-4a23-8a1e-acc332332ab0', '6121d6d9-4b50-4117-a313-73369990aedd', '7efb7549-5663-429f-ade2-daf1c812b2f3', '50dc6f83-9ea8-4a98-8bcc-1052e2c4251c', '2bf6f7c3-7870-4301-a2b5-b56efc215a53', 'b0cc8899-4519-4206-b106-4684d66d4a1b', 'a09f4de7-f7e4-4e8a-b5cb-e3d2597915d8', '7cbad19a-4d12-4356-b8ab-fd7af0dcc96f', 'ef54de6c-cb2c-449f-b49f-aeb37b2da117', '9435702c-68a9-40ed-bf7e-d5bb399d0a78', '6231002f-28c5-4656-8e31-ab0b23d16ec9', '3eba89e4-1bb4-4f28-ae75-5351ce8eceab']


Retrieve embeddings:

In [None]:
doc_id_list = ['b4ceeb3a-e9dd-4869-955c-d13f5b9856cc', 'de2ccb67-4815-4045-be98-7138b61095f7']
store.retrieve_embeddings(doc_id_list)

Retrieve the top K highest-scoring documents from PGVector:

In [59]:
store.retrieve_top_k_relevant_docs(5)

[('de2ccb67-4815-4045-be98-7138b61095f7',
  'Last year COVID-19 kept us apart. This year we are finally together again.'),
 ('18e921d1-0a34-492c-8b36-f839690b9513',
  'Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.'),
 ('635e8328-99f9-4ce6-9b0e-b3ff4b8b463d',
  'With a duty to one another to the American people to the Constitution.'),
 ('45ae1d35-0cdd-4a55-86b7-9d56600fe691',
  'And with an unwavering resolve that freedom will always triumph over tyranny.'),
 ('b4ceeb3a-e9dd-4869-955c-d13f5b9856cc',
  'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.')]

### 8. References

- PGAdmin UI installation - https://www.postgresql.org/ftp/pgadmin/pgadmin4/v8.2/windows/
- Postgress pgvector extension - https://www.youtube.com/watch?v=FDBnyJu_Ndg
- https://bugbytes.io/posts/vector-databases-pgvector-and-langchain/
- Langchain pgvector - https://python.langchain.com/docs/integrations/vectorstores/pgvector
- pgvector: https://github.com/pgvector/pgvector
- pgvector DockerHub image: https://hub.docker.com/r/ankane/pgvector