# Langchain <--> Elastic Search

Elasticsearch is an open source distributed, RESTful search and analytics engine, scalable data store, and vector database capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning-fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.
Elasticsearch can store and index a variety of data, including structured and unstructured text, numerical data, and geospatial data. It's known for its ability to find queries in large-scale unstructured data
Elasticsearch uses a search index, which is similar to an index in the back of a book, to map content to its location in a document. This allows users to quickly find information without scanning through an entire document

- https://www.elastic.co/search-labs/blog/langchain-collaboration
- https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
- https://python.langchain.com/docs/integrations/vectorstores/elasticsearch/
- https://www.elastic.co/blog/elasticsearch-is-open-source-again


In [None]:
! pip install -r requirements.txt -q

# Install ELastic Search Docker

- docker network create elastic
- docker pull docker.elastic.co/elasticsearch/elasticsearch:8.15.3
- docker run --name es01 --net elastic -p 9200:9200 -it -m 1GB docker.elastic.co/elasticsearch/elasticsearch:8.15.3

In [2]:
import os
from dotenv import dotenv_values

In [3]:
config = dotenv_values("./keys/.env")

In [7]:
import os, tempfile
from langchain.prompts import PromptTemplate


from langchain_community.document_loaders import TextLoader

from langchain.chains import ConversationalRetrievalChain, RetrievalQA

from langchain_text_splitters import CharacterTextSplitter
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from google.oauth2 import service_account
from dotenv import dotenv_values
import json
import vertexai
 
import itertools
import time


In [8]:
config = dotenv_values("./keys/.env")
with open("./keys/complete-tube-421007-208a4862c992.json") as source:
    info = json.load(source)

vertex_credentials = service_account.Credentials.from_service_account_info(info)
vertexai.init(
    project=config["PROJECT"],
    location=config["REGION"],
    credentials=vertex_credentials,
)
google_api_key = config["GEMINI-API-KEY"]
os.environ["GEMINI_API_KEY"] = google_api_key

In [9]:
ROOT= os.getcwd()
ROOT

'/mnt/d/repos2/elastic'

In [10]:
embeddings = GoogleGenerativeAIEmbeddings(
                        model="models/embedding-001",
                        credentials=vertex_credentials,
                        google_api_key=google_api_key,
                    )

In [13]:
from langchain_elasticsearch import ElasticsearchStore

elastic_vector_search = ElasticsearchStore(
    es_url="http://localhost:9200",
    index_name="langchain_index",
    embedding=embeddings,
    es_user="elastic",
    es_password="changeme",
)

In [15]:
from langchain_elasticsearch import ElasticsearchStore

vector_store = ElasticsearchStore(
    "langchain-demo", embedding=embeddings, es_url="http://localhost:9200"
)

In [16]:
mypath = "./docs"
onlyfiles = [f for f in os.listdir(mypath) if os.path.isfile(os.path.join(mypath, f))]
onlyfiles

['Parser Source 2.pdf']

In [17]:
def load_file(path):
    # load pdf file and transform into Langchain Documents
    loader = PyPDFLoader(path)
    pages = loader.load_and_split()
    return pages

path = os.path.join("docs", onlyfiles[0])
pages = load_file(path)

In [46]:
pages[0]

Document(metadata={'source': 'docs/Parser Source 2.pdf', 'page': 0}, page_content='Company Profiles on following pages')

In [47]:
len(pages)

196

In [26]:
from uuid import uuid4

from langchain_core.documents import Document

In [27]:
uuids = [str(uuid4()) for _ in range(len(pages))]

vector_store.add_documents(documents=pages, ids=uuids)

['56e59f0d-a761-46e1-ba51-bd81e9a267d2',
 '27ea64cb-1995-47a0-a70d-058a1a87b9b2',
 'a632e509-dfdd-40e0-b2ae-3e6fc8ee4aef',
 '1d277a30-17a3-4d4b-ab77-47a718682c5c',
 '69e9b652-96f7-4f4f-9b47-54c658f83855',
 'b1a2e6bb-5bb0-43f6-b397-be7bde8e0263',
 '66f53694-80a9-4c68-8b3e-f5c424e953b9',
 '042ecfe6-3f67-4bbf-9332-8afec6485816',
 'f22cb185-75c5-427b-8589-49dd83d41931',
 '17ec4d9c-5f52-4fca-b293-8d47242e7fb2',
 '6fb88ff6-d9fb-44fd-b745-087aebffdcb7',
 'c4c465e0-7880-467b-abd2-5badd6eb5dd1',
 'e1003769-96f0-435c-8faa-e419bd1a01c8',
 '02a4fa78-8231-4cd0-893c-a21a760ec836',
 '3fe7ba4e-802a-4617-bed6-347c15a2ea40',
 '65bc1daa-ca65-4261-ba30-86f56973ef84',
 '7c7cb1df-fedd-43b4-9b2f-1c62f72eb170',
 'b40e98ee-1b97-49a9-afb2-e0e4134ba0c2',
 'f4adb478-3cea-490c-9edb-d2e42573ae57',
 '82e99d24-3e0b-4464-9e9d-e68b55331d91',
 '9c555f71-e98f-405b-b61d-0ad820b02d91',
 '5e8db496-57a3-4e12-ba18-416e102026ab',
 'a629e83f-b61f-4e01-ab68-46c39a6030a0',
 '6a4e5dae-3ce2-4c36-acc0-5261f2846ce1',
 'c656ee2e-9d6b-

In [28]:
retriever = vector_store.as_retriever(
                        search_kwargs={"k": 5})

In [29]:
llm = ChatGoogleGenerativeAI(
                    model="gemini-1.5-pro-001", credentials=vertex_credentials
                )

In [48]:
from langchain.chains import ConversationalRetrievalChain, RetrievalQA
# filter={"source" :"docs\\Baremo 2015.pdf"}
chain = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, return_source_documents=True)

In [49]:
chat_history = []

query = """Provide Main Details of the company Aardvark Constructions Limited. Including following details:
Name:
Country:
Company Number:
Incorporated:
Company Type:
Company Status:
Primary Addresses Registered Office:
Accounting Dates:
Confirmation Statement:
"""
result = chain.invoke({"question": query, "chat_history": chat_history})

print(result['answer'])

Name: Aardvark Constructions Limited
Country: United Kingdom
Company Number: 123456
Incorporated: 20/10/2020
Company Type: Limited by Shares
Company Status: Active
Primary Addresses Registered Office: 6 Chancery Road, London, WC2A 5DP, United Kingdom
Accounting Dates: 
      Last Period End: 16/11/2022
      Current Period End: 16/11/2024
Confirmation Statement:
      Last Signed: 17/02/2023
      Filed: 17/02/2023
      Next Overdue: 03/03/2023 



In [50]:
result.keys()

dict_keys(['question', 'chat_history', 'answer', 'source_documents'])

In [52]:
len(result["source_documents"])

5

In [53]:
chat_history = [(query, result["answer"])]

In [54]:
query2 = """From Management Details extract:
Managed By:
Managed By Email:
"""
result = chain.invoke({"question": query2, "chat_history": chat_history})
print(result['answer'])

The document lists "Caroline McPartland" as "Managed By" with the email address "cmcpartland@diligent.com". 



In [35]:
chat_history.append((query2, result["answer"]))
chat_history

[('Provide Main Details of the company Aardvark Constructions Limited. Including following details:\nName:\nCountry:\nCompany Number:\nIncorporated:\nCompany Type:\nCompany Status:\nPrimary Addresses Registered Office:\nAccounting Dates:\nConfirmation Statement:\n',
  'Name: Aardvark Constructions Limited\nCountry: United Kingdom\nCompany Number: 123456\nIncorporated: 20/10/2020\nCompany Type: Limited by Shares\nCompany Status: Active\nPrimary Addresses Registered Office: 6 Chancery Road, London, WC2A 5DP, United Kingdom\nAccounting Dates: \n      Last Period End: 16/11/2022\n      Current Period End: 16/11/2024 \nConfirmation Statement:\n      Last Signed: 17/02/2023\n      Filed: 17/02/2023\n      Next Overdue: 03/03/2023 \n'),
 ('From Management Details extract:\nManaged By:\nManaged By Email:\n',
  'The document lists Caroline McPartland as "Managed By", and her email address is cmcpartland@diligent.com. \n')]

In [55]:
query3 = """Past Names of the Company with their period """
result = chain.invoke({"question": query3 ,"chat_history": chat_history})
print(result['answer'])

Aardvark Constructions Limited had two previous names:

* **Aardvark Construction** from 20/10/2020 to 20/10/2021
* **Aardvark and Son Ltd** from 20/10/2021 to 20/10/2022 



In [37]:
chat_history.append((query3, result["answer"]))
chat_history

[('Provide Main Details of the company Aardvark Constructions Limited. Including following details:\nName:\nCountry:\nCompany Number:\nIncorporated:\nCompany Type:\nCompany Status:\nPrimary Addresses Registered Office:\nAccounting Dates:\nConfirmation Statement:\n',
  'Name: Aardvark Constructions Limited\nCountry: United Kingdom\nCompany Number: 123456\nIncorporated: 20/10/2020\nCompany Type: Limited by Shares\nCompany Status: Active\nPrimary Addresses Registered Office: 6 Chancery Road, London, WC2A 5DP, United Kingdom\nAccounting Dates: \n      Last Period End: 16/11/2022\n      Current Period End: 16/11/2024 \nConfirmation Statement:\n      Last Signed: 17/02/2023\n      Filed: 17/02/2023\n      Next Overdue: 03/03/2023 \n'),
 ('From Management Details extract:\nManaged By:\nManaged By Email:\n',
  'The document lists Caroline McPartland as "Managed By", and her email address is cmcpartland@diligent.com. \n'),
 ('Past Names of the Company with their period ',
  'Aardvark Constructi

In [56]:
query4 = """Appointments Board Positions"""
result = chain.invoke({"question": query4 ,"chat_history": chat_history})
print(result['answer'])

Appointments  
Board Positions  
Name  QuickRef  Position  Appointed  Job Title  
Abbles, James  ABBLES -J Director  19/04/2023  Trainer  
Abdreatta, Leopoldo  ABDREATT -L Director  18/10/2023  Secretary  
Adam, Nicole  ADAMS -N Alternate Director  04/04/2023  CFO  
  Non Executive 
Director  10/04/2024  CFO  
Alberts, Stoffel  ALBERTS -S Company Secretary  16/12/2022  Accountant  
Rutter, Gus  RUTTER -G Director  07/03/2024  Director  
 
Past Appointments  
Name  QuickRef  Position  Appointed  Resigned  
Malek, Mohammed  MALEK -M Director  22/07/2022  18/01/2024 

