This notebook shows how to use Langchain + Weaviate to do semantic search.

Some imports:

In [1]:
import os

import weaviate
from langchain.document_loaders import GutenbergLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Weaviate

Get a book:

In [2]:
# Grimms' Fairy Tales by Jacob Grimm and Wilhelm Grimm
loader = GutenbergLoader("https://www.gutenberg.org/files/2591/2591-0.txt")


In [3]:
documents = loader.load()


Split the book:

In [4]:
text_splitter = CharacterTextSplitter(
    chunk_size=500, chunk_overlap=0, length_function=len
)
docs = text_splitter.split_documents(documents)


Set up weaviate:

In [5]:
WEAVIATE_URL = "http://weaviate:8080"
client = weaviate.Client(
    url=WEAVIATE_URL,
    additional_headers={"X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]},
)


In [6]:
client.schema.delete_all()
client.schema.get()
schema = {
    "classes": [
        {
            "class": "Paragraph",
            "description": "A written paragraph",
            "vectorizer": "text2vec-openai",
            "moduleConfig": {"text2vec-openai": {"model": "ada", "type": "text"}},
            "properties": [
                {
                    "dataType": ["text"],
                    "description": "The content of the paragraph",
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False,
                        }
                    },
                    "name": "content",
                },
            ],
        },
    ]
}

client.schema.create(schema)


In [7]:
vectorstore = Weaviate(client, "Paragraph", "content")


Store the docs:

In [8]:
text_meta_pair = [(doc.page_content, doc.metadata) for doc in docs]

texts, meta = list(zip(*text_meta_pair))


In [9]:
vectorstore.add_texts(texts, meta)


Do a semantic search:

In [10]:
query = "the part where with talking animals"
docs = vectorstore.similarity_search(query)


In [11]:
for doc in docs:
    print(doc.page_content)
    print("*" * 80)


came and wept with him over poor Partlet. And six mice built a little


hearse to carry her to her grave; and when it was ready they harnessed


themselves before it, and Chanticleer drove them. On the way they


met the fox. ‘Where are you going, Chanticleer?’ said he. ‘To bury my


Partlet,’ said the other. ‘May I go with you?’ said the fox. ‘Yes; but


you must get up behind, or my horses will not be able to draw you.’ Then


the fox got up behind; and presently the wolf, the bear, the goat, and
********************************************************************************
himself, for his ears stuck out of the bush; and when he shook one of


them a little, the cat, seeing something move, and thinking it was a


mouse, sprang upon it, and bit and scratched it, so that the boar jumped


up and grunted, and ran away, roaring out, ‘Look up in the tree, there


sits the one who is to blame.’ So they looked up, and espied the wolf


sitting amongst the branches; and they called him a 