<a href="https://colab.research.google.com/github/jeffvestal/ElasticDocs_GPT/blob/main/load_embedding_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ElasticDocs GPT Blog
# Loading an embedding from Hugging Face into Elasticsearch

This code will show you how to load a supported embedding model from Hugging Face into an elasticsearch cluster in [Elastic Cloud](https://cloud.elastic.co/)

[Blog - ChatGPT and Elasticsearch: OpenAI meets private data](https://www.elastic.co/blog/chatgpt-elasticsearch-openai-meets-private-data)

# Setup


## Install and import required python libraries

Elastic uses the [eland python library](https://github.com/elastic/eland) to download modesl from Hugging Face hub and load them into elasticsearch

In [10]:
pip -q install eland elasticsearch sentence_transformers transformers torch==1.11 python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [11]:
from pathlib import Path
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import TransformerModel
from elasticsearch import Elasticsearch
from elasticsearch.client import MlClient
import os
import getpass
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env.

True

## Configure elasticsearch authentication. 
The recommended authentication approach is using the [Elastic Cloud ID](https://www.elastic.co/guide/en/cloud/current/ec-cloud-id.html) and a [cluster level API key](https://www.elastic.co/guide/en/kibana/current/api-keys.html)

You can use any method you wish to set the required credentials. We are using getpass in this example to prompt for credentials to avoide storing them in github.

In [19]:
es_cloud_id = os.environ.get("cloud_id")
es_user = os.environ.get("cloud_user")
es_pass = os.environ.get("cloud_pass")
#es_api_id = getpass.getpass('Enter cluster API key ID:  ') 
#es_api_key = getpass.getpass('Enter cluster API key:  ')

## Connect to Elastic Cloud

In [20]:
#es = Elasticsearch(cloud_id=es_cloud_id, 
#                   api_key=(es_api_id, es_api_key)
#                   )

es = Elasticsearch(cloud_id=es_cloud_id, 
                   basic_auth=(es_user, es_pass)
                   )
es.info() # should return cluster info

ObjectApiResponse({'name': 'instance-0000000000', 'cluster_name': '8b04a544ae20467f99d194e8ca877eab', 'cluster_uuid': 'sDda2bhGQ--Pv4F8gsR3Ww', 'version': {'number': '8.10.2', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '6d20dd8ce62365be9b1aca96427de4622e970e9e', 'build_date': '2023-09-19T08:16:24.564900370Z', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})

# Load the model From Hugging Face into Elasticsearch
Here we specify the model id from Hugging Face. The easiest way to get this id is clicking the copy the model name icon next to the name on the model page. 

When calling `TransformerModel` you specify the HF model id and the task type. You can try specifying `auto` and eland will attempt to determine the correct type from info in the model config. This is not always possible so a list of specific `task_type` values can be viewed in the following code: 
[Supported values](https://github.com/elastic/eland/blob/15a300728876022b206161d71055c67b500a0192/eland/ml/pytorch/transformers.py#*L41*)

In [21]:
hf_model_id='sentence-transformers/all-distilroberta-v1'
tm = TransformerModel(hf_model_id, "text_embedding")

es_model_id = tm.elasticsearch_model_id()

tmp_path = "models"
Path(tmp_path).mkdir(parents=True, exist_ok=True)
model_path, config, vocab_path = tm.save(tmp_path)

ptm = PyTorchModel(es, es_model_id)
ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config) 


Downloading (…)okenizer_config.json: 100%|██████████| 333/333 [00:00<00:00, 142kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.03MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 731kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 75.0kB/s]
Downloading (…)lve/main/config.json: 100%|██████████| 653/653 [00:00<00:00, 1.08MB/s]
Downloading pytorch_model.bin: 100%|██████████| 329M/329M [00:22<00:00, 14.8MB/s] 
Downloading (…)87e68/.gitattributes: 100%|██████████| 737/737 [00:00<00:00, 336kB/s]
Downloading (…)_Pooling/config.json: 100%|██████████| 190/190 [00:00<00:00, 61.9kB/s]
Downloading (…)5afc487e68/README.md: 100%|██████████| 10.3k/10.3k [00:00<00:00, 4.41MB/s]
Downloading (…)fc487e68/config.json: 100%|██████████| 653/653 [00:00<00:00, 338kB/s]
Downloading (…)ce_transformers.json: 100%|██████████| 116/116 [00:00<00:00, 58.7kB/s]
Downloading (…)e68/data_config.json: 100%|██████████|

# Starting the Model

## View information about the model
This is not required but can be handy to get a model overivew

In [22]:

m = MlClient.get_trained_models(es, model_id=es_model_id)
m.body

{'count': 1,
 'trained_model_configs': [{'model_id': 'sentence-transformers__all-distilroberta-v1',
   'model_type': 'pytorch',
   'created_by': 'api_user',
   'version': '10.0.0',
   'create_time': 1696191246646,
   'model_size_bytes': 0,
   'estimated_operations': 0,
   'license_level': 'platinum',
   'description': "Model sentence-transformers/all-distilroberta-v1 for task type 'text_embedding'",
   'tags': [],
   'input': {'field_names': ['text_field']},
   'inference_config': {'text_embedding': {'vocabulary': {'index': '.ml-inference-native-000001'},
     'tokenization': {'roberta': {'do_lower_case': False,
       'with_special_tokens': True,
       'max_sequence_length': 512,
       'truncate': 'first',
       'span': -1,
       'add_prefix_space': False}}}},
   'location': {'index': {'name': '.ml-inference-native-000001'}}}]}

## Deploy the model
This will load the model on the ML nodes and start the process(es) making it available for the NLP task

In [None]:
s = MlClient.start_trained_model_deployment(es, model_id=es_model_id)
s.body

## Verify the model started without issue
Should output -> {'routing_state': 'started'}

In [23]:
stats = MlClient.get_trained_models_stats(es, model_id=es_model_id)
stats.body['trained_model_stats'][0]['deployment_stats']['nodes'][0]['routing_state']

{'routing_state': 'started'}

In [25]:
!pip install langchain

Defaulting to user installation because normal site-packages is not writeable
Collecting langchain
  Obtaining dependency information for langchain from https://files.pythonhosted.org/packages/ab/75/262c3e01208c27068144eb76bdf668fad8be97283febaa44f9395ece288b/langchain-0.0.305-py3-none-any.whl.metadata
  Downloading langchain-0.0.305-py3-none-any.whl.metadata (15 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Obtaining dependency information for SQLAlchemy<3,>=1.4 from https://files.pythonhosted.org/packages/51/d2/6f94e299b1b3afacb04fa05582d5dcd6c401b36835e4e548c82bbb6e5da6/SQLAlchemy-2.0.21-cp39-cp39-macosx_11_0_arm64.whl.metadata
  Downloading SQLAlchemy-2.0.21-cp39-cp39-macosx_11_0_arm64.whl.metadata (9.4 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain)
  Obtaining dependency information for aiohttp<4.0.0,>=3.8.3 from https://files.pythonhosted.org/packages/4c/11/4d5b58a7b5654df85a0c9b66cc45ca983330eb1d575ec845dfdacfc0839b/aiohttp-3.8.5-cp39-cp39-macosx_11_0_arm64.whl.met

[-0.24175570905208588,
 -0.014599747024476528,
 0.42253026366233826,
 0.18622802197933197,
 0.18547432124614716,
 -0.026411332190036774,
 0.2060837596654892,
 0.3480953574180603,
 -0.19985777139663696,
 0.23087435960769653,
 0.2692231833934784,
 -0.19024747610092163,
 0.036673154681921005,
 -0.08839139342308044,
 -0.1324239820241928,
 -0.05830953270196915,
 0.25267907977104187,
 0.1721261590719223,
 0.16267246007919312,
 0.021371368318796158,
 0.06245815381407738,
 -0.08952192217111588,
 -0.1090797707438469,
 -0.34464240074157715,
 0.13248425722122192,
 -0.07076163589954376,
 0.33955925703048706,
 0.07396332174539566,
 0.0006133475108072162,
 0.10686536133289337,
 0.016076665371656418,
 -0.002866720547899604,
 0.0811944380402565,
 0.0003226141561754048,
 -0.03801163658499718,
 0.012929367832839489,
 0.07142987847328186,
 -0.04902137070894241,
 -0.0038107100408524275,
 0.1022312194108963,
 -0.04484926909208298,
 0.05128607153892517,
 -0.02099519409239292,
 0.08373233675956726,
 0.360399

In [13]:
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = Ollama(base_url="http://localhost:11434",
             model="llama2",
             verbose=True,
             callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))



In [23]:
!pip install bs4

import bs4
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://cp-algorithms.com/")
data = loader.load()

print(data)

Defaulting to user installation because normal site-packages is not writeable
[Document(page_content='\n\n\n\n\n\n\n\n\n\n\n\n\nMain Page - Algorithms for Competitive Programming\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n          Skip to content\n        \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n              Algorithms for Competitive Programming\n            \n\n\n\n\n\n              \n                Main Page\n              \n            \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n            Initializing search\n          \n\n\n\n\n\n\n\n\n\n\n\n\n    cp-algorithms/cp-algorithms\n  \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n          \n  \n  Home\n\n        \n\n\n\n          \n  \n  Algebra\n\n        \n\n\n\n          \n  \n  Data Structures\n\n        \n\n\n\n          \n  \n  Dynamic Programming\n\n        \n\n\n\n          \n  \n  String Processing\n\n        \n\n\n\n          \n  \n  Linear Algebra\n\n        \n\n\n\n          \n  \n  Combinatorics\n\n        \n\n\

In [30]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

print(all_splits.__len__())

45


In [31]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OllamaEmbeddings

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OllamaEmbeddings())

# https://python.langchain.com/docs/integrations/vectorstores/elasticsearch

In [32]:
print(vectorstore)

<langchain.vectorstores.chroma.Chroma object at 0x127d8f490>


In [35]:
question = "What is a segment tree?"
docs = vectorstore.similarity_search(question)

In [28]:
from langchain.prompts import PromptTemplate

# Prompt
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. If the answer is not contained in the supplied doc reply with Apologies I don't know the answer
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

llm("What are segment tree")

 A segment tree is a data structure used to efficiently manage and query a set of intervals in a way that allows for fast range queries, insertions, and deletions. Einzeln The basic idea behind a segment tree is to divide the set of intervals into smaller subsets, called segments, such that each segment contains only a few intervals. This division is done recursively until each segment contains only one interval, at which point the segment tree is complete.

The segment tree is constructed by starting with a single root node, which represents the entire set of intervals. Then, the algorithm recursively divides the set of intervals into smaller subsets, called child nodes, until each child node contains only one interval. At this point, the algorithm creates a new root node for the next level of the tree, and repeats the process until the desired level of detail is reached.

Each node in the segment tree represents a subset of the original set of intervals, and contains a list of all th

" A segment tree is a data structure used to efficiently manage and query a set of intervals in a way that allows for fast range queries, insertions, and deletions. Einzeln The basic idea behind a segment tree is to divide the set of intervals into smaller subsets, called segments, such that each segment contains only a few intervals. This division is done recursively until each segment contains only one interval, at which point the segment tree is complete.\n\nThe segment tree is constructed by starting with a single root node, which represents the entire set of intervals. Then, the algorithm recursively divides the set of intervals into smaller subsets, called child nodes, until each child node contains only one interval. At this point, the algorithm creates a new root node for the next level of the tree, and repeats the process until the desired level of detail is reached.\n\nEach node in the segment tree represents a subset of the original set of intervals, and contains a list of a

In [20]:
# LLM
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = Ollama(base_url="http://localhost:11434",
             model="llama2",
             verbose=True,
             callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))

Defaulting to user installation because normal site-packages is not writeable
[Document(page_content='\n\n\n\n\n\nLLM Powered Autonomous Agents | Lil\'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil\'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch\n\n\n\n\nTags\n\n\n\n\nFAQ\n\n\n\n\nemojisearch.app\n\n\n\n\n\n\n\n\n\n      LLM Powered Autonomous Agents\n    \nJune 23, 2023\xa0·\xa031 min\xa0·\xa0Lilian Weng\n\n\n \n\n\nTable of Contents\n\n\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)\n\n\nComponent Three: Tool Use\n\nCase Studies\n\nScientific Discovery Agent\n\nGenerative Agents Simulation\n\nProof-of-Concept Examples\n\n\nChallenges\n\nCitation\n\nReferences\n\n\n\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-

In [38]:
# QA chain
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)


question = "What is a cross cluster replication?"
result = qa_chain({"query": question})

 Cross-cluster replication (CCR) is a technique used in distributed systems to maintain consistent replicas of data across multiple clusters or nodes. It allows for the automatic propagation of changes made to data in one cluster to other clusters, ensuring that all replicas have the same up-to-date state. This can be useful in situations where data is distributed across multiple clusters, and it is important to maintain consistency and availability of the data across all clusters.

# Private LLM


In [43]:
template = """Use the following pieces of context to answer the question at the end. 
If the answer is not contained in the supplied doc reply with Apologies I don't know the answer.
Use three sentences maximum and keep the answer as concise as possible. 

{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
# QA chain
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)

question = "What is a cross cluster replica?"

result = qa_chain({"query": question})

 A cross cluster replica is a data structure used in distributed systems to maintain consistency across multiple clusters or nodes. It is a way to keep copies of data in different clusters synchronized, so that changes made to one copy are reflected in all other copies. This helps ensure that the system remains consistent and fault-tolerant, even in the event of failures or node failures.