-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to interop? #22
Comments
I thought a lot about this, but did not come to a conclusion if anything needed to be added to bosquet itself. Regarding vector databases: Regarding "integration" python: In my view it kinds of hardly ever pays off to "wrap" a python library, unless you do it "ad hoc" for exactly your problem. Regarding usage of babashka: |
yes, data-interop is what I am doing now, and, I think, we can expand on this. I am just thinking of other ideas, like using ElasticSearch as vector store. For instance, we could extend Elastisch lib to make it work with vector data and then integrate it with Bosquet. |
Agree with @behrica regarding Python interop. Going that path the whole LLM layer can be done with Langchain (or whatnot) and then use python-clj just to pass data around. On the other hand, Python ecosystem is so rich with all sorts of LLM tools that introducing some kind of unifying LLM<->Python<->Bosqyet interoperability might end up in simply unnecessary complexity. |
I would think about it in more abstract terms - adding memory, more-than-token-count-limit context handling abstractions in Bosquet. Underneath they might or might not be implemented using vector dbs. |
here is a low-hanging fruit: cosine similarity, using tech.ml.dataset: This could work as a simple in-memory vector db. Also, many llms can be used to generate triples, which, again, could be inserted into datalog database. |
I fully agree that cosine distance of text embeddings are one potential base for text similarity calculations. A vector database has one higher level of abstraction, it answers to : give me x similar texts to this text (and does internally the vector operations) But I cannot see what code to add in bosquet to help using this things without restricting the user on a single technology to use. |
Sorry, I just meant to add an utility code to make it more usable. |
"More usable" is if course a good goal, but need to be seen relative to effort I was looking at langchain in python and its big amount of data loaders. Require python modules (ns try
(:require [libpython-clj2.python :as py]
[libpython-clj2.require :refer [require-python]]))
(require-python '[builtins :as bt])
(py/initialize!)
(py/ '[builtins :as bt])
(py/from-import langchain.document_loaders OpenCityDataLoader) use python (def dataset "tmnf-yvry")
(def loader (OpenCityDataLoader
:city_id "data.sfgov.org"
:dataset_id dataset
:limit 2000))
(def docs (py/py. loader load))
(-> docs first
(py/py.- page_content)
(bt/eval {})
(py/->jvm)) it is as long as the corresponding python code: dataset = "tmnf-yvry" # crime data
loader = OpenCityDataLoader(city_id="data.sfgov.org",
dataset_id=dataset,
limit=2000)
docs = loader.load()
eval(docs[0].page_content)
Which kind of utility code can we imagine to make this even easier ? |
But maybe I have a rather extreme opinion on this. I am neither thinking that "Clojure wrappers" for Java libraries are worth the effort. |
I do not see the need to add some Python interop with Langchain or other LLM Python libs. If there is some specialized, unique, hard-to-reimplement small Python lib out there, integrating which would bring immense value, then why not. But it is not the case. If it is the case, let's open up concrete issues to deal with it. @behrica thanks for moving out Vector DB discussion to the other Issue where we can continue. @usametov I am closing this. As noted above I am open to discussions on very concrete Python-based functionality/lib to integrate. |
I am thinking about incorporating existing tools from langchain ecosystem, which is growing very fast. They have mature products, like AutoGPT and BabyAGI.
It would be nice if we could extend/reuse some of that in bosquet.
For instance, we certainly need google search. Currently I am using babashka to call python google-search client.
One option would be use a python interop via libpython_clj.
Also, we can integrate vector db APIs, like activeloop, because they have serverless REST api.
That way I could keep my babashka script which populates deeplake and then use bosquet for the rest of my needs.
I know that Vald Vector DB has clojure client, but, unfortunately, langchain tools do not integrate with Vald. I hope they will add it in the near future. In the meantime, we could just build AWS lambda clojure-vald client. Calling AWS Lambda from langchain is easy.
The text was updated successfully, but these errors were encountered: