## Creating an index and populating it with documents using Redis

Simple example on how to ingest PDF documents, then web pages content into a Redis VectorStore.

Requirements:
- A Redis cluster
- A Redis database with at least 2GB of memory (to match with the initial index cap)

### Base parameters, the Redis info

In [1]:
redis_url = "redis://server:port"
index_name = "docs"

#### Imports

In [2]:
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores.redis import Redis

## Initial index creation and document ingestion

#### Document loading from a folder containing PDFs

In [3]:
pdf_folder_path = 'rhods-doc'

loader = PyPDFDirectoryLoader(pdf_folder_path)
docs = loader.load()

#### Split documents into chunks with some overlap

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024,
                                               chunk_overlap=40)
all_splits = text_splitter.split_documents(docs)

#### Create the index and ingest the documents

In [6]:
embeddings = HuggingFaceEmbeddings()
rds = Redis.from_documents(all_splits,
                           embeddings,
                           redis_url=redis_url,
                           index_name=index_name)

#### Write the schema to a yaml file to be able to open the index later on

In [7]:
rds.write_schema("redis_schema.yaml")

## Ingesting new documents

#### Example with Web pages

In [8]:
from langchain.document_loaders import WebBaseLoader

In [10]:
loader = WebBaseLoader(["https://ai-on-openshift.io/getting-started/openshift/",
                        "https://ai-on-openshift.io/getting-started/opendatahub/",
                        "https://ai-on-openshift.io/getting-started/openshift-data-science/",
                        "https://ai-on-openshift.io/odh-rhods/configuration/",
                        "https://ai-on-openshift.io/odh-rhods/custom-notebooks/",
                        "https://ai-on-openshift.io/odh-rhods/nvidia-gpus/",
                        "https://ai-on-openshift.io/odh-rhods/custom-runtime-triton/",
                        "https://ai-on-openshift.io/odh-rhods/openshift-group-management/",
                        "https://ai-on-openshift.io/tools-and-applications/minio/minio/"
                       ])

In [11]:
data = loader.load()

In [13]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024,
                                               chunk_overlap=40)
all_splits = text_splitter.split_documents(data)

In [14]:
embeddings = HuggingFaceEmbeddings()
rds = Redis.from_existing_index(embeddings,
                                redis_url=redis_url,
                                index_name=index_name,
                                schema="redis_schema.yaml")

In [15]:
rds.add_documents(all_splits)

['doc:docs:f2a7bece8c614c7f9d155c027b117d96',
 'doc:docs:5baf6fa8a1d24d4b813b2009308c526d',
 'doc:docs:285beb515cb24fea84ff286337c33cf5',
 'doc:docs:5a35f267a08c4e6486856b4bb95dff34',
 'doc:docs:0b27cc06b4394950962d955ecdaef04b',
 'doc:docs:1b81d697143f4c3aa7f0f703a92a9101',
 'doc:docs:cb4802ae6fe04d35a6c82cc5b26555da',
 'doc:docs:c7245897e2e34c59823eb92e60301710',
 'doc:docs:3a5fe31bbb18489bb00d5f2e6c703561',
 'doc:docs:ea102f2a011240609012562e8caecf35',
 'doc:docs:8b65285434a94bd4837bb3e622b107c7',
 'doc:docs:af698b432128443a99db3f0f49a1ac3f',
 'doc:docs:2922d6a4f5074d6ab462facfadee7a2f',
 'doc:docs:839ede47a7f04b9386078c3318e7d460',
 'doc:docs:1f6a879c852948c7a591c7633e6ba226',
 'doc:docs:99e4bd0c92ab4b359d41514555b08560',
 'doc:docs:d2b1afb621ea47f391322b6ed11eb6e0',
 'doc:docs:f0544ebeb1c2442dabff7cdbac055b9c',
 'doc:docs:0b94519f9cf140ff95c3da08094ab0a4',
 'doc:docs:c7b634e728b64cf5aa8fe1e0ecc80bc4',
 'doc:docs:89e07bd82f484211a4c310cc73d3e1e3',
 'doc:docs:029315e067fc4fc78474c66