[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/llamaindex/simple-query-engine/simple-query-engine.ipynb)

In [None]:
%pip install -U weaviate-client llama-index llama-index-vector-stores-weaviate llama-index-embeddings-openai

In [3]:
import os
import openai
import weaviate
from weaviate import classes as wvc

from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex, StorageContext, Settings

from llama_index.embeddings.openai import OpenAIEmbedding


# global
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Lets set the OPENAI key
# os.environ["OPENAI_API_KEY"] = "sk-key"
openai.api_key = os.environ["OPENAI_API_KEY"]

In [None]:
client = weaviate.connect_to_embedded(
    headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]
    }
)

# client = weaviate.connect_to_local(
#     headers={
#         "X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]
#     }
# )

In [5]:
client.collections.delete("BlogPost")

collection = client.collections.create(
    name="BlogPost",
    description="Blog post from the Weaviate website",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
    generative_config=wvc.config.Configure.Generative.openai(model="gpt-3.5-turbo"),
    properties=[
        wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT)
    ]
)

In [6]:
documents = SimpleDirectoryReader('../data-loaders-episode1/data').load_data()

# chunk up the blog posts into nodes 
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

In [7]:
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
print("Number of nodes:", len(nodes))
print(nodes[0])

Number of nodes: 9
Node ID: 99fbe400-8ca8-4ed3-8ada-13bb9c428967
Text: title: What is Ref2Vec and why you need it for your
recommendation system  Weaviate 1.16 introduced the
[Ref2Vec](/developers/weaviate/modules/retriever-vectorizer-
modules/ref2vec-centroid) module. In this article, we give you an
overview of what Ref2Vec is and some examples in which it can add
value such as recommendations or representing long ...


In [8]:
# Let's name our index properly as BlogPost, as we will need it later.
vector_store = WeaviateVectorStore(
    weaviate_client=client, index_name="BlogPost"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

In [9]:
# now, at any time, we can get our vector store and query it.

vector_store = WeaviateVectorStore(
    weaviate_client=client, index_name="BlogPost"
)

loaded_index = VectorStoreIndex.from_vector_store(vector_store)

query_engine = loaded_index.as_query_engine()
response = query_engine.query("What is the intersection between LLMs and search?")
print(response)

The intersection between LLMs and search lies in the challenge of finding suitable representations for long objects, particularly text documents that exceed the input limit of Deep Transformer Neural Networks. This motivates the interest in Hybrid Search techniques that combine Vector Search with algorithms like BM25, which is well-suited for handling text sequences longer than 512 tokens. Ref2Vec can play a role in addressing this challenge by helping to represent long documents effectively in search applications.
