In [31]:
import weaviate
import weaviate.classes as wvc
from weaviate.classes.config import Property, DataType

from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SimpleNodeParser

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama

client: weaviate.WeaviateClient = weaviate.connect_to_local()

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)
Settings.llm = Ollama(model="llama2", request_timeout=60.0)

In [32]:
# Create a new data collection
collection = client.collections.create(
    name = "simplerag1", # Name of the data collection
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="doc", data_type=DataType.TEXT),
    ],
)

UnexpectedStatusCodeError: Collection may not have been created properly.! Unexpected status code: 422, with response body: {'error': [{'message': 'class name "Simplerag1" already exists'}]}.

In [33]:
source_dir = "data/processed/pymupdf/txt"

# load the blogs in using the reader
data = SimpleDirectoryReader(source_dir).load_data()

# chunk up the blog posts into nodes
parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
nodes = parser.get_nodes_from_documents(data)

In [34]:
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex, StorageContext

# construct vector store
vector_store = WeaviateVectorStore(weaviate_client = client, index_name="Data", text_key="content")

# setting up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store = vector_store)

# set up the index
index = VectorStoreIndex(nodes, storage_context = storage_context)

In [37]:
query_engine = index.as_query_engine()

In [38]:
response = query_engine.query("What is the national hydro storage decrease in the month ot March 2025?")
print(response)

Based on the provided context, the national hydro storage decrease in the month of March 2025 is 76% of historical average.


In [39]:
response = query_engine.query("What car did sophie find was within her budget?")
print(response)

Based on the context, Sophie found a Nissan Leaf within her budget.


In [40]:
response = query_engine.query("What was sophies budget to purchase an EV?")
print(response)

Based on the provided context, Sophie's budget for purchasing an electric vehicle (EV) is not explicitly mentioned. However, we can make an educated guess based on the information provided.

According to the guide, "Choosing an EV for a family," the cost of an EV can vary depending on the model and features. The guide mentions that some EVs can cost around $50,000, while others can cost over $100,000.

Assuming Sophie is interested in purchasing an EV for her family, it's likely that she would want a model with decent range and features, which could potentially fall within the price range of around $50,000 to $70,000. However, without more information about Sophie's specific budget or preferences, it's difficult to provide a more accurate estimate.


In [41]:
response = query_engine.query("What distance will most EVs cover if they are able to be fully electric?")
print(response)

Based on the provided context, most EVs (Electric Vehicles) will cover around 30-50 kilometers on electric power before switching to hybrid mode.


In [42]:
response = query_engine.query("As part of the research a PNF was obtained. But what does PNF stand for?")
print(response)

Based on the context provided, PNF likely stands for "Partially Normalized Frequency." This is because the term PNF appears in the context of a literature review related to geocoding and spatial analysis, and it is mentioned in the passage that a PNF was obtained as part of the research.
