Closed
Description
Describe the issue
When running the pgvector example I get the following error:
m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Trying to create collection.
2024-05-12 19:51:30,510 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Use the existing collection `flaml_collection`.
File M:\OneDrive\Documents\dev\DISCO\pgVector\..\website\docs does not exist. Skipping.
2024-05-12 19:51:30,974 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 2 chunks.
2024-05-12 19:51:30,975 - autogen.agentchat.contrib.vectordb.pgvectordb - INFO - Error executing select on non-existent table: flaml_collection. Creating it instead. Error: relation "flaml_collection" does not exist
LINE 1: SELECT id, metadatas, documents, embedding FROM flaml_collec...
^
2024-05-12 19:51:31,007 - autogen.agentchat.contrib.vectordb.pgvectordb - INFO - Created table flaml_collection
VectorDB returns doc_ids: [[b'bdfbc921', b'7968cf3c']]
Traceback (most recent call last):
File "m:\OneDrive\Documents\dev\DISCO\pgVector\autogen_pgvector_1.py", line 84, in <module>
chat_result = ragproxyagent.initiate_chat(
File "m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\autogen\agentchat\conversable_agent.py", line 1004, in initiate_chat
msg2send = message(_chat_info["sender"], _chat_info["recipient"], kwargs)
File "m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\autogen\agentchat\contrib\retrieve_user_proxy_agent.py", line 631, in message_generator
doc_contents = sender._get_context(sender._results)
File "m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\autogen\agentchat\contrib\retrieve_user_proxy_agent.py", line 426, in _get_context
_doc_tokens = self.custom_token_count_function(doc["content"], self._model)
File "m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\autogen\token_count_utils.py", line 69, in count_token
raise ValueError(f"input must be str, list or dict, but we got {type(input)}")
ValueError: input must be str, list or dict, but we got <class 'bytes'>
After some investigation psycopg (3.1.19) always returns the id and descriptions fields as bytes. pgvectordb.py is expecting strings.
It is unclear why these 2 fields are always returned as bytes. Other fields in other tables on my postgres server do return strings as expected.
The only workaround I can find is to decode in pgvectordb.py:

Is this an issue others are facing?
Steps to reproduce
- Install postgres 16.3 on ubuntu 22.04
- Configure postgres for remote access
- Install pgvector from source.
- Add pgvector plugin to postgres
- Run example: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_pgvector_RetrieveChat.ipynb
Screenshots and logs
Additional Information
pyautogen-0.2.27