# Setup 📔
This notebook will explain how to setup all required depedencies to run Vectrix. For some components you can use paid or hosted services, but for the sake of this setup we will focuse on open source solutions.

## Weaviate Vector Database
This is needed to store the embeddings and run the vector search. It's important to use a multi-modal embeddings engine since extracted documents can also contain images and graphs you might want to search on.

**The Docker Compose File**
```yaml
---
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.26.0
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - ~/weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      SPELLCHECK_INFERENCE_API: 'http://text-spellcheck:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-ollama'
      ENABLE_MODULES: 'text2vec-ollama,text-spellcheck,generative-ollama'
      CLUSTER_HOSTNAME: 'node1'
  text-spellcheck:
    image: cr.weaviate.io/semitechnologies/text-spellcheck-model:pyspellchecker-en
volumes:
  weaviate_data:
...
```

Note that we did not include an Ollama service since we will use the locally installed version of Ollama for this setup.

In [None]:
# First download an embeddings model using ollama
!ollama pull mxbai-embed-large:335m

# Download the docker-compose file for Weaviate
!docker compose --project-name weaviate up -d

### Build the Docker container from source

In [1]:
!docker build --no-cache -t vectrix .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
 => [internal] load build definition from Dockerfile                       0.0s
[?25h[1A[1A[0G[?25l[+] Building 0.0s (1/1) FINISHED                           docker:desktop-linux
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 2B                                         0.0s
[0m[?25hERROR: failed to solve: failed to read dockerfile: open Dockerfile: no such file or directory

View build details: ]8;;docker-desktop://dashboard/build/desktop-linux/desktop-linux/wl7rww0bdozoq2qyfkv6nuepj\docker-desktop://dashboard/build/desktop-linux/desktop-linux/wl7rww0bdozoq2qyfkv6nuepj]8;;\


### Run the container and attach the enviroment file

In [None]:
!docker run -p 8501:8501 --env-file .env vectrix

## Installing the PostgreSQL Database

In [None]:
# Pull the postgres image, install the pgvector extension and run the container
!docker pull ankane/pgvector
!docker run -d --name paginx -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -e PG_EXTENSIONS="pgvector" ankane/pgvector

### Connecting to the database

In [1]:
from dotenv import load_dotenv
load_dotenv()
import os

from google.cloud.sql.connector import Connector, IPTypes
import pg8000

import sqlalchemy


def connect_with_connector() -> sqlalchemy.engine.base.Engine:
    """
    Initializes a connection pool for a Cloud SQL instance of Postgres.

    Uses the Cloud SQL Python Connector package.
    """
    # Note: Saving credentials in environment variables is convenient, but not
    # secure - consider a more secure solution such as
    # Cloud Secret Manager (https://cloud.google.com/secret-manager) to help
    # keep secrets safe.

    instance_connection_name = os.environ[
        "INSTANCE_CONNECTION_NAME"
    ]  # e.g. 'project:region:instance'
    db_user = os.environ["DB_USER"]  # e.g. 'my-db-user'
    db_pass = os.environ["DB_PASS"]  # e.g. 'my-db-password'
    db_name = os.environ["DB_NAME"]  # e.g. 'my-database'

    ip_type = IPTypes.PRIVATE if os.environ.get("PRIVATE_IP") else IPTypes.PUBLIC

    # initialize Cloud SQL Python Connector object
    connector = Connector()

    def getconn() -> pg8000.dbapi.Connection:
        conn: pg8000.dbapi.Connection = connector.connect(
            instance_connection_name,
            "pg8000",
            user=db_user,
            password=db_pass,
            db=db_name,
            ip_type=ip_type,
        )
        return conn

    # The Cloud SQL Python Connector can be used with SQLAlchemy
    # using the 'creator' argument to 'create_engine'
    pool = sqlalchemy.create_engine(
        "postgresql+pg8000://",
        creator=getconn,
        # ...
    )
    return pool

In [2]:
# Connecting
def test_connection(pool):
    try:
        # Create a connection from the pool
        with pool.connect() as connection:
            # Execute a simple query
            result = connection.execute(sqlalchemy.text("SELECT version();"))
            version = result.fetchone()[0]
            print("Connection successful!")
            print(f"PostgreSQL version: {version}")

            # You can add more test queries here if needed
            # For example, listing all tables in the current schema:
            result = connection.execute(sqlalchemy.text(
                "SELECT table_name FROM information_schema.tables WHERE table_schema = 'public';"
            ))
            tables = [row[0] for row in result]
            print("Tables in the database:")
            for table in tables:
                print(f"- {table}")

    except Exception as e:
        print("Error connecting to the database:")
        print(e)

# Get the connection pool
pool = connect_with_connector()

# Run the test
test_connection(pool)

# Don't forget to dispose of the pool when you're done
pool.dispose()




Connection successful!
PostgreSQL version: PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit
Tables in the database:
- prompts
- documents
- checkpoints
- writes
