### Weaviate Vector DB Creation

In [1]:
!docker compose up -d

 Container rag-comprehensive-weaviate-1  Starting
Error response from daemon: driver failed programming external connectivity on endpoint rag-comprehensive-weaviate-1 (3fcb62ccc193fd63c746d954caf5caff0a82e29d3ed1346fbfe82fd6dc7e6f07): Bind for 0.0.0.0:50051 failed: port is already allocated


In [2]:
import os
import weaviate
from dotenv import load_dotenv
load_dotenv()

# Create weaviate client
google_api_key=os.getenv("GOOGLE_API_KEY")
client = weaviate.Client(
    url = "http://localhost:8080",  # Replace with your endpoint
    additional_headers = {
        "X-Google-Api-Key": google_api_key # Replace with your inference API key
    }
)
client.is_ready()

            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.


True

### Data ingestion

In [3]:
from llama_index.core import SimpleDirectoryReader
# Load documents
documents=SimpleDirectoryReader("./data").load_data()

### Data chunking

Define semantic chunker

In [4]:
from llama_index.core.node_parser import (
    SemanticSplitterNodeParser,
)
from llama_index.embeddings.gemini import GeminiEmbedding

gemini_embed_model=GeminiEmbedding(model_name="models/embedding-001")
semantic_splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=95, embed_model=gemini_embed_model
)

  from .autonotebook import tqdm as notebook_tqdm


Chunk documents into nodes

In [5]:
nodes = semantic_splitter.get_nodes_from_documents(documents=documents)

### Data Indexing

In [6]:
from llama_index.llms.gemini import Gemini
from llama_index.embeddings.gemini import GeminiEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings

Settings.llm = Gemini(models='gemini-pro', api_key=google_api_key)
Settings.embed_model = GeminiEmbedding(model_name="models/embedding-001")
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)
Settings.num_output = 512
Settings.context_window = 3900

In [7]:
from llama_index.core import VectorStoreIndex
from llama_index.core import StorageContext
from llama_index.vector_stores.weaviate import WeaviateVectorStore

# construct vector store 
vector_store = WeaviateVectorStore(weaviate_client=client)

# set up storage for embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# set up the index 
index = VectorStoreIndex(
    nodes,
    storage_context=storage_context
)

### Data Querying

##### Query with Default Vector Search

In [8]:
from llama_index.core.response.notebook_utils import display_response

query_engine = index.as_query_engine()
response = query_engine.query("Tell me about the author's experience in YC")

display_response(response)


**`Final Response:`** The author's experience in YC was that it was taking up more of his attention and was in the process of eating essays too. He realized that either YC was his life's work or he'd have to leave eventually.

### Evaluation with ragas