### Importing required libraries


In [14]:
!pip list | grep langchain

langchain                                0.3.25
langchain-community                      0.3.25
langchain-core                           0.3.78
langchain-experimental                   0.3.4
langchain-google-genai                   2.0.10
langchain-ollama                         0.3.10
langchain-openai                         0.3.34
langchain-text-splitters                 0.3.11
langchainhub                             0.1.21


In [15]:
# You can use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
# from langchain.embeddings import HuggingFaceEmbeddings
from langchain_ollama import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings

from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

from ibm_watsonx_ai.foundation_models import Model
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes, DecodingMethods
# from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM
import wget

### Load the document

The document, which is provided in a TXT format, outlines some company policies and serves as an example data set for the project.

This is the `load` step in `Indexing`.<br>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/MPdUH7bXpHR5muZztZfOQg.png" width="50%" alt="split"/>

In [16]:
filename = 'companyPolicies.txt'
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/6JDbUb_L3egv_eOkouY71A.txt'

# Use wget to download the file
wget.download(url, out=filename)
print('file downloaded')

file downloaded


In [17]:
with open(filename, 'r') as file:
    # Read the contents of the file
    contents = file.read()
    print(type(contents))

<class 'str'>


### Splitting the document into chunks


In this step, you are splitting the document into chunks, which is basically the `split` process in `Indexing`.
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/0JFmAV5e_mejAXvCilgHWg.png" width="50%" alt="split"/>

`LangChain` is used to split the document and create chunks. It helps you divide a long story (document) into smaller parts, which are called `chunks`, so that it's easier to handle. 

For the splitting process, the goal is to ensure that each segment is as extensive as if you were to count to a certain number of characters and meet the split separator. This certain number is called `chunk size`. Let's set 1000 as the chunk size in this project. Though the chunk size is 1000, the splitting is happening randomly. This is an issue with LangChain. `CharacterTextSplitter` uses `\n\n` as the default split separator. You can change it by adding the `separator` parameter in the `CharacterTextSplitter` function; for example, `separator="\n"`.


In [18]:
loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(len(texts))

Created a chunk of size 1624, which is longer than the specified 1000
Created a chunk of size 1885, which is longer than the specified 1000
Created a chunk of size 1903, which is longer than the specified 1000
Created a chunk of size 1729, which is longer than the specified 1000
Created a chunk of size 1678, which is longer than the specified 1000
Created a chunk of size 2032, which is longer than the specified 1000
Created a chunk of size 1894, which is longer than the specified 1000


16


### Embedding and storing


This step is the `embed` and `store` processes in `Indexing`. <br>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/u_oJz3v2cSR_lr0YvU6PaA.png" width="50%" alt="split"/>


In [19]:
# Use your Ollama server URL/model
embeddings = OllamaEmbeddings(model="nomic-embed-text:v1.5", base_url="http://localhost:11434") 

docsearch = Chroma.from_documents(texts, embeddings)  # store the embedding in docsearch using Chromadb
print('document ingested')

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


document ingested


### LLM model construction


This completes the `LLM` part of the `Retrieval` task. <br>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/UZXQ44Tgv4EQ2-mTcu5e-A.png" width="50%" alt="split"/>


In [20]:
llama_llm = ChatOllama(model="llama3.2")
deepseek_r1_llm = ChatOllama(model="deepseek-r1:1.5b")

### Integrating LangChain

LangChain has a number of components that are designed to help retrieve information from the document and build question-answering applications, which helps you complete the `retrieve` part of the `Retrieval` task. <br>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/M4WpkkMMbfK0Wkz0W60Jiw.png" width="50%" alt="split"/>

In the following steps, you create a simple Q&A application over the document source using LangChain's `RetrievalQA`.

Then, you ask the query "what is mobile policy?"

In [21]:
qa = RetrievalQA.from_chain_type(llm=llama_llm, 
                                 chain_type="stuff", 
                                 retriever=docsearch.as_retriever(), 
                                 return_source_documents=False)
query = "what is mobile policy?"
qa.invoke(query)

Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


{'query': 'what is mobile policy?',
 'result': 'I don\'t have a specific definition for "Mobile Phone Policy" in mind, as it can vary widely depending on the context (e.g., workplace, school, organization). However, I can provide some general information.\n\nA Mobile Phone Policy typically outlines guidelines and rules for using mobile phones within a particular setting or organization. These policies may cover topics such as:\n\n* Permitted use of mobile phones during work hours or classes\n* Restricted areas where mobile phones are not allowed\n* Rules for handling confidential or sensitive information on mobile devices\n* Procedures for reporting lost, stolen, or damaged mobile phones\n* Guidelines for maintaining workplace or academic productivity while using mobile phones\n\nIf you could provide more context or clarify which Mobile Phone Policy you\'re referring to (e.g., workplace policy, school policy), I\'d be happy to try and help further!'}

In [22]:
qa = RetrievalQA.from_chain_type(llm=deepseek_r1_llm, 
                                 chain_type="stuff", 
                                 retriever=docsearch.as_retriever(), 
                                 return_source_documents=False)
query = "Can you summarize the document for me?"
qa.invoke(query)

{'query': 'Can you summarize the document for me?',
 'result': '<think>\nOkay, so I just saw this user message asking if I can summarize the document they provided. Let me read through it again to make sure I understand what they\'re looking for.\n\nThe original text is a lot of the same information about a policy called "Discipline and Termination Policy." It\'s structured in three main sections: an introduction, performance conduct expectations, disciplinary actions, termination procedures, and an exit process. Each section repeats similar points from previous versions but with some minor variations.\n\nHmm, I notice that the user has pasted the same information twice already. The first time they provided a detailed summary after my initial response, the second time the user added another set of notes. But since the content is identical, it\'s clear that each section repeats exactly what was before.\n\nI think the main task here is to understand if I can create a concise summary base

### Dive deeper


How to add the prompt in retrieval using LangChain? <br>

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/bvw3pPRCYRUsv-Z2m33hmQ.png" width="50%" alt="split"/>


Use prompts to guide the responses from an LLM the way you want. For instance, if the LLM is uncertain about an answer, you instruct it to simply state, "I do not know," instead of attempting to generate a speculative response.


In [23]:
# The query is asking something that does not exist in the document. 
# The LLM responds with information that actually is not true. 
# we don't want this to happen, so you must add a prompt to the LLM.
qa = RetrievalQA.from_chain_type(llm=llama_llm, 
                                 chain_type="stuff", 
                                 retriever=docsearch.as_retriever(), 
                                 return_source_documents=False)
query = "Can I eat in company vehicles?"
qa.invoke(query)

{'query': 'Can I eat in company vehicles?',
 'result': 'I don\'t know, but according to the policy text provided, it doesn\'t explicitly mention eating in company vehicles. However, it does state that "Smoking is not permitted in company vehicles," which could be interpreted to mean that eating or drinking might also be prohibited or restricted in some way, although this isn\'t made clear.'}

#### `Prompt Template`


In [24]:
prompt_template = """Use the information from the document to answer the question at the end. If you don't know the answer, just say that you don't know, definately do not try to make up an answer.

{context}

Question: {question}
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

chain_type_kwargs = {"prompt": PROMPT}

In [25]:
qa = RetrievalQA.from_chain_type(llm=llama_llm, 
                                 chain_type="stuff", 
                                 retriever=docsearch.as_retriever(), 
                                 chain_type_kwargs=chain_type_kwargs, 
                                 return_source_documents=False)

query = "Can I eat in company vehicles?"
qa.invoke(query)

{'query': 'Can I eat in company vehicles?',
 'result': 'No, the document does not mention eating in company vehicles. It only addresses smoking policies, specifically stating that smoking is not permitted inside or in any company vehicle to maintain their condition and cleanliness.'}

#### `Make the conversation have memory`

For conversations with an LLM to be more like a dialogue with a friend who remembers what you talked about last time? An LLM that retains the memory of previous exchanges builds a more coherent and contextually rich conversation.

To make the LLM have memory, you introduce the `ConversationBufferMemory` function from LangChain.



In [26]:
memory = ConversationBufferMemory(memory_key = "chat_history", return_message = True)

In [27]:
qa = ConversationalRetrievalChain.from_llm(llm=llama_llm, 
                                           chain_type="stuff", 
                                           retriever=docsearch.as_retriever(), 
                                           memory = memory, 
                                           get_chat_history=lambda h : h, 
                                           return_source_documents=False)

In [28]:
history = []

In [29]:
query = "What is mobile policy?"
result = qa.invoke({"question":query}, {"chat_history": history})
print(result["answer"])

I don't know the specific details of a "Mobile Phone Policy" in this context, but I can try to help you find more information.

A Mobile Phone Policy is likely a set of rules or guidelines that govern the use of personal mobile devices on an organization's premises, during work hours, or as part of company business. This policy might cover topics such as:

* Permitted and prohibited uses of mobile phones
* Network security and data protection
* Personal vs. work-related activities on mobile devices
* Confidentiality and information sharing

If you could provide more context or specify which organization's Mobile Phone Policy you are referring to, I'd be happy to try and help you find the specific details you're looking for.


In [30]:
history.append((query, result["answer"])) # Append the previous query and answer to the chat history again.

In [31]:
query = "List points in it?"
result = qa({"question": query}, {"chat_history": history})
print(result["answer"])

A mobile phone policy should typically cover the following key points:

1. Allowance and Use: Guidelines for when mobile phones can be used on company premises, including during work hours and in meetings.
2. Security and Encryption: Requirements for encryption methods, password protection, and secure storage of sensitive data.
3. Access to Company Information: Rules regarding access to company data, emails, and other digital resources on personal devices.
4. Monitoring and Enforcement: Procedures for monitoring mobile phone use, reporting breaches, and disciplinary actions for non-compliance.
5. Device Management: Guidelines for managing personal and company-owned devices, including maintenance, updates, and disposal procedures.
6. Personal Use of Company Resources: Rules governing the use of company resources, such as data plans, internet access, or device costs, on personal mobile phones.
7. Data Protection and Storage: Regulations for storing and handling sensitive data, both in tr

In [32]:
history.append((query, result["answer"])) # Append the previous query and answer to the chat history again.

In [33]:
query = "What is the aim of it?"
result = qa({"question": query}, {"chat_history": history})
print(result["answer"])

I don't know the specific details or aims of a typical mobile phone policy, as it can vary depending on the organization, workplace, or institution. However, I can provide some general information.

A mobile phone policy is typically designed to manage the use of personal mobile devices within an organization's context, such as the workplace. The main goals of such policies usually include:

1. Ensuring data security and protecting confidential information.
2. Minimizing distractions and maintaining productivity.
3. Preventing unauthorized use or misuse of company resources.
4. Establishing guidelines for acceptable mobile phone behavior.

These policies may cover aspects like personal vs. work-related usage, screen time limits, app restrictions, backup procedures, and consequences for non-compliance.

If you're looking for specific details about a particular policy, could you please provide more context or information about the organization or institution you're interested in?


`Return the source from the document`

In [36]:
# print(result['source_documents'][0])

### Chat 

An agent which can retrieve information from the document and has the conversation memory.

In [41]:
def qa():
    memory = ConversationBufferMemory(memory_key = "chat_history", return_message = True)
    qa = ConversationalRetrievalChain.from_llm(llm=deepseek_r1_llm, 
                                               chain_type="stuff", 
                                               retriever=docsearch.as_retriever(), 
                                               memory = memory, 
                                               get_chat_history=lambda h : h, 
                                               return_source_documents=False)
    history = []
    while True:
        query = input("Question: ")
        print("Query: ", query)
        
        if query.lower() in ["quit","exit","bye"]:
            print("Answer: Goodbye!")
            break
            
        result = qa({"question": query}, {"chat_history": history})
        
        history.append((query, result["answer"]))
        
        print("Answer: ", result["answer"], end="\n\n")

In [42]:
qa()

Query:  what are all the policies?
Answer:  <think>
Okay, so I need to figure out all the policies related to drug and alcohol in the context given. The user provided a section labeled "Drug and Alcohol Policy" with several instances of that label but no actual content inside them. Hmm, that's confusing.

First, I'll try to recall any official documents or sections related to drug and alcohol policies. Maybe there are state laws or federal regulations? But without the specific text in between, it's hard to be precise. The user mentioned "Drug and Alcohol Policy" six times with no content each time. Perhaps the actual policies were previously discussed in another part of the context but got cut off.

I wonder if I'm missing something from elsewhere in the document that isn't visible here. Maybe there are sub-sections or additional rules beyond what's presented. Without more details, it's challenging to list all policies accurately. It might also be helpful to know if these policies are 