Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage/ Dev Documentation #67

Open
JimVincentW opened this issue Feb 2, 2024 · 6 comments
Open

Usage/ Dev Documentation #67

JimVincentW opened this issue Feb 2, 2024 · 6 comments
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@JimVincentW
Copy link

Can you point me to the right documentation for how I can spin this up or access the service? Is there a docker image I can pull?

@ff6347 ff6347 added documentation Improvements or additions to documentation question Further information is requested labels Feb 3, 2024
@ff6347
Copy link
Member

ff6347 commented Feb 3, 2024

Hey @JimVincentW Sorry that our docs are still unclear on that. For local development you need the following things:

  • Node.js 20 + git
  • supabase cli installed
  • docker installed

Clone the repo and start supabase (database and api)

git clone https://github.com/technologiestiftung/parla-api.git
cd parla-api
supabase start

Setup your environment

You will need to populate your environment with some variables. At Technologiestiftung we use direnv for that. Thats why there is a .envrc.sample in the repo. If you use direnv:

cd parla-api
cp .envrc.sample .envrc
# populate the variables, then
direnv allow

Without direnv not you can source the .envrc file

cd parla-api
cp .envrc.sample .envrc
# populate the variables, then
source .envrc

Install dependencies

cd parla-api
npm ci

Run the api

cd parla-api
npm run dev

The frontend only needs the URL of the api as env var. Which by default should be http://localhost:8080

  • clone frontend
  • cd into it
  • npm ci
  • cp .env.sample .env
  • Adjust env var for api
  • npm run dev

There are already some documents prepared in the supabase/seed.sql. If you want to work with your own documents you need to take a look at the https://github.com/technologiestiftung/parla-document-processor/

@ff6347 ff6347 mentioned this issue Feb 3, 2024
@ff6347
Copy link
Member

ff6347 commented Feb 3, 2024

And yes. There is also an docker image https://hub.docker.com/repository/docker/technologiestiftung/parla-api/general

@JimVincentW
Copy link
Author

No worries! And thanks for the further insight :)

I would like to use the parla vector storage, because plugging in some Llama or Mixtral on my rented instance would run that RAG a lot cheaper, but I really love the effort to store Berlins parliamentary documents! Could we hack a way together to make that vector storage accessible as an api endpoint?

I am personally in love with qdrant, I could imagiine a lightweight ci-cd pipeline for updating a hosted qdrant docker image.
Open-source and easy to manage.

@ff6347
Copy link
Member

ff6347 commented Feb 4, 2024

The vector storage is just supabase.com should be pretty straightforward to use it as vector db. No need to use the parla api for that.
It provides an introspected api out of the box. This here might be a starting point.

https://supabase.com/docs/guides/ai

@ff6347
Copy link
Member

ff6347 commented Feb 4, 2024

No worries! And thanks for the further insight :)

I would like to use the parla vector storage, because plugging in some Llama or Mixtral on my rented instance would run that RAG a lot cheaper, but I really love the effort to store Berlins parliamentary documents! Could we hack a way together to make that vector storage accessible as an api endpoint?

I am personally in love with qdrant, I could imagiine a lightweight ci-cd pipeline for updating a hosted qdrant docker image.

Open-source and easy to manage.

A wait, now I get it. You want us to expose an endpoint for searching through our embeddings.

This is something we need to discuss internally. Comes with a cost for us since it might produce a lot of egress.

@JimVincentW
Copy link
Author

If you decide it's feasible and in scope I would like to contribute to it. Because imho plugging multiple of vector databases (e.g. Abgeordnetenhaus & Bundestag) would make research on really intelligent political RAG/ Agents much easier. Also I can't imagine calling the OpenAI Api is cost-efficient.

A cronjob could just regularly update the qdrant docker image from postgres/ or supabase in a microservice. Querying qdrant afterwards is super straightforward with multiple similary/ recommendation algorithms at hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants