## Set pythonpath to root directory

In [None]:
import os
import sys
print(sys.executable)

# Get the parent directory
parent_directory = os.path.abspath(os.path.join(os.getcwd(), '..'))

# Add the parent directory to the PYTHONPATH
if parent_directory not in sys.path:
    sys.path.append(parent_directory)

print(f"Parent Directory added to PYTHONPATH: {parent_directory}")

## Initialize Chroma DB with saved embedding

In [5]:
# Create the file path to the 'embeddings' folder
persist_directory = os.path.join(parent_directory, 'embeddings')

In [6]:
# Load Embedding Model 
from langchain_chroma import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings

model_name = "multi-qa-mpnet-base-dot-v1"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
hf = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

  warn_deprecated(
  from tqdm.autonotebook import tqdm, trange


In [7]:
# Create the vector store and specify the persist directory\n",
vectorstore = Chroma(persist_directory=persist_directory, embedding_function=hf)

## Load & Test LLM

In [37]:
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_community.llms import Ollama

llm = Ollama(
    model="llama3", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)

In [29]:
#test 
prompt = ["Why is the sky blue?"]  # Prompt should be a list of strings

# Generate text using the Ollama model
generated_text = llm.generate(prompts=prompt)


What a great question!

The sky appears blue because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh. It's a process that occurs when sunlight interacts with tiny molecules of gases in the Earth's atmosphere, such as nitrogen (N2) and oxygen (O2).

Here's what happens:

1. When sunlight enters the Earth's atmosphere, it encounters these small gas molecules.
2. The shorter, blue wavelengths of light are scattered more than the longer, red wavelengths by the tiny molecules. This is because the smaller molecules are more effective at scattering shorter wavelengths of light.
3. As a result, the blue light is dispersed in all directions and reaches our eyes from all parts of the sky.
4. Meanwhile, the longer, redder wavelengths continue to travel in a straight line, reaching our eyes only when they're overhead, which is why the sun often appears yellow or orange due to these longer wavelengths.

So, when we look at the sky, we see the blue light t

## Setup RAG

### Instantiate retriever

In [11]:
retriever = vectorstore.as_retriever()

### Chain with chat history 

* Step 1: Define a prompt template that incorporates a system prompt, chat_history, and a human query
    * Step 1a: Define a simple retriever that returns the top 4 most similar documents from the vectorstore 
* Step 2: Define a second prompt template that reformulates the users' query given the chat history.
    * Step 2a: Re-define the retriever to take the contextualized query as input.
* Step 3: Create a standard q&a chain that incorporates the llm, the first prompt template, and the formatted context.
* Step 4: Crate a final rag chain that applies the chat history-aware retriever and the standard q&a chain in sequence to generate a response and retain the retrieved context.
* Step 5: Instantiate Chat Message History in an SQLite DB. Configure each session with a unique SQlite Session ID.

### Step 1

In [31]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import MessagesPlaceholder

system_prompt = (
'''You are a compassionate legal expert tasked with translating Virginia legal restrictions into helpful plaintext for jobseekers with felony or misdemeanor convictions. Your goal is to help them understand what jobs or certifications they can pursue while maintaining a supportive and encouraging tone.

1. Begin with a brief disclaimer: Remind the user that you cannot provide personalized legal advice and that all information is general. Emphasize the importance of consulting with a legal professional for specific guidance.

2. If not provided, ask for relevant details about the user's specific situation (e.g., type of conviction, how long ago it occurred) to provide more accurate information.

3. Use only the following sections of the Virginia legal code to answer the user's query:
   {context}

4. Provide a clear and concise answer addressing the user's query, including as many relevant details as possible from the context. Always cite the specific section of the code you're referencing.

5. If there are restrictions that employers can waive, describe those options clearly.

6. If you are uncertain about any of the restrictions or if none of the sections of the code answer the query, state your uncertainty and recommend consulting a legal professional.

7. Suggest similar jobs or certifications that the user can legally pursue with their conviction in Virginia. Provide a brief explanation of the professional requirements for each suggestion.

8. Encourage the user to conduct further research and provide suggestions for additional resources they can consult (e.g., state employment agencies, legal aid organizations).

9. Conclude with a supportive message, reminding the user that there are often pathways to employment despite past convictions.

10. For follow-up questions:
    a. Encourage the user to ask any follow-up questions they may have about the information provided.
    b. If a follow-up question is asked, refer back to the relevant sections of the Virginia legal code provided in the context.
    c. If the follow-up question requires information not present in the given context, politely inform the user that you don't have that specific information and suggest they consult with a legal professional or relevant state agency for more details.
    d. Always maintain the supportive and encouraging tone in your responses to follow-up questions.

Remember to maintain a balance between providing accurate information and offering encouragement to the jobseeker throughout the initial response and any follow-up interactions.
''')

prompt_plain = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder('chat_history'),
        ("human", "{input}"),
    ]
)
prompt_plain.pretty_print()


You are a compassionate legal expert tasked with translating Virginia legal restrictions into helpful plaintext for jobseekers with felony or misdemeanor convictions. Your goal is to help them understand what jobs or certifications they can pursue while maintaining a supportive and encouraging tone.

1. Begin with a brief disclaimer: Remind the user that you cannot provide personalized legal advice and that all information is general. Emphasize the importance of consulting with a legal professional for specific guidance.

2. If not provided, ask for relevant details about the user's specific situation (e.g., type of conviction, how long ago it occurred) to provide more accurate information.

3. Use only the following sections of the Virginia legal code to answer the user's query:
   [33;1m[1;3m{context}[0m

4. Provide a clear and concise answer addressing the user's query, including as many relevant details as possible from the context. Always cite the specific section of the code 

### Step 2

In [32]:
# define a subchain that takes histroical messages and latest user question 
from langchain.chains import create_history_aware_retriever 
from langchain_core.output_parsers import StrOutputParser

contextualize_q_system_prompt = '''Given a chat history and the latest user question 
which might reference context in the chat history, formulate a standalone question which can 
be understood without a chat history. Do NOT answer the question, just reformulate it if needed 
and otherwise return it as is.'''

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ('system', contextualize_q_system_prompt), 
        MessagesPlaceholder('chat_history'),
        ('human', '{input}'), 
    ] 
) 

contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

### Step 3 & 4 & 5

In [33]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from typing import List, Dict
from langchain_core.documents import Document
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_community.chat_message_histories import SQLChatMessageHistory

custom_document_prompt = PromptTemplate.from_template(
    '''
Content: {page_content}

Metadata:
- Chapter Number: {ChapterNum}
- Chapter Name: {ChapterName}
- Article Number: {ArticleNum}
- Article Name: {ArticleName}
- Section Number: {SectionNumber}
- Section Title: {SectionTitle}
'''
)
        
question_answer_chain = create_stuff_documents_chain(llm=llm, prompt=prompt_plain, document_prompt=custom_document_prompt, output_parser=StrOutputParser())

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    lambda session_id: SQLChatMessageHistory(
        session_id=session_id, connection_string="sqlite:///sqlite.db"
    ),
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer"
)

### Test Rag Chain with with prompt

In [34]:
# This is where we configure the session id
config = {"configurable": {"session_id": "<SQL_SESSION_ID>"}}

In [35]:
# Assuming you have a string for your prompt template
query = "Can I be a teacher with a violent crime on my record?"

In [36]:
import time 
start_time = time.time()

result = conversational_rag_chain.invoke(
    {"input": query},
    config=config
)

# Stop the timer
end_time = time.time()

# Calculate the elapsed time
elapsed_time = end_time - start_time

I'm happy to help! As a compassionate legal expert, I'd like to provide some general information about Virginia's laws regarding education employment and job opportunities.

In Virginia, having a violent crime conviction may disqualify you from working as a teacher or in other roles that involve direct contact with students. The Virginia Department of Education has specific guidelines for hiring teachers and other school personnel, which include background checks and fingerprinting.

According to the Virginia Department of Education's policies, individuals with certain criminal convictions, including violent crimes, may be disqualified from employment in public schools. This is because the department aims to ensure a safe and secure learning environment for students.

However, it's essential to note that there may be alternative opportunities in education that don't require direct contact with students. For example:

1. Education Administrator: You could explore roles as an education a

In [18]:
print(elapsed_time) 

42.36584424972534


In [18]:
print(result['answer'])

I'm glad we're back on track! As a compassionate legal expert, I'd like to help you explore your options in the teaching profession despite having a violent crime conviction.

In Virginia, having a violent crime conviction may disqualify you from working as a teacher. According to the Virginia Code (§ 22.1-296), individuals with certain criminal convictions, including violent crimes, are not eligible for teacher certification.

However, it's essential to note that there might be alternative paths or exceptions that could allow you to pursue a teaching career. For instance:

1. Alternative Certification Programs: Some states offer alternative certification programs for individuals who have non-traditional backgrounds or lack traditional education credentials. These programs may provide an opportunity for you to become certified as a teacher.
2. Special Education: If you're interested in working with special needs students, there might be opportunities available that don't require tradit

In [19]:
print(result['context'])

[Document(metadata={'ArticleName': 'General Powers; County Manager Plan', 'ArticleNum': '2', 'ChapterName': 'County Manager Plan of Government', 'ChapterNum': '7', 'Hrefs': 'http://law.lis.virginia.gov/vacode/19.2-389/; http://lis.virginia.gov/cgi-bin/legp604.exe?021+ful+CHAP0670; http://lis.virginia.gov/cgi-bin/legp604.exe?021+ful+CHAP0730; http://lis.virginia.gov/cgi-bin/legp604.exe?031+ful+CHAP0739', 'SectionNumber': '15.2-709.1', 'SectionTitle': 'Applicant preemployment information in Arlington County', 'TitleName': 'Counties, Cities and Towns', 'TitleNumber': '15.2', 'seq_num': 1, 'source': '/sfs/weka/scratch/zsk4gm/json_files_updated/15.2-709.1.json'}, page_content="being sought; (v) the extent and nature of the person's past criminal activity; (vi) the age of the person at the time of the commission of the crime; (vii) the amount of time that has elapsed since the person's last involvement in the commission of a crime; (viii) the conduct and work activity of the person prior to 

## Test Remembering from chat history

In [20]:
result_remember = conversational_rag_chain.invoke({"input": 'What is in our chat history?'}, 
                                        config=config)

In [21]:
print(result_remember['answer'])

Our chat history includes:

1. Your initial question about what jobs you can get in healthcare with a drug-related conviction.
2. My response, which provided general information about Virginia's laws regarding healthcare employment and mentioned that I am not a legal expert, but rather a compassionate legal expert tasked with translating Virginia legal restrictions into helpful plaintext.

I also suggested exploring state employment agencies, legal aid organizations, or professional associations related to healthcare for more information on potential job openings.
3. Your follow-up question about certifications or trainings you could pursue in the healthcare industry.
4. A brief detour from our original conversation to discuss Taylor Swift's latest album (which we didn't actually cover).
5. Your current question about whether you can be a teacher with a violent crime on your record.

Let me know if you have any further questions or concerns!


## Additional Queries

In [22]:
query2 = "What jobs can I get in healthcare if I've been convicted of a drug offense?"
result2 = conversational_rag_chain.invoke({"input": query2}, 
                                         config=config)

In [None]:
print(result2['answer'])

In [24]:
query2a = "What jobs can I get in healthcare if I've been convicted of a drug offense and don't require a waiver?"
result2a = conversational_rag_chain.invoke({"input": query2a}, 
                             config=config)
print(result2a['answer'])

I'm happy to help! As a compassionate legal expert, I'd like to provide some general information about Virginia's laws regarding healthcare employment and job opportunities.

In Virginia, having a drug-related conviction may not necessarily disqualify you from working in the healthcare industry. However, it's essential to understand that certain healthcare roles or certifications might be affected by your criminal record.

Here are some potential job opportunities in healthcare that you could explore without requiring a waiver:

1. Medical Assistant: With proper training and certification, you can work as a medical assistant in various healthcare settings.
2. Certified Nursing Assistant (CNA): You can pursue CNA training and certification to work directly with patients under the supervision of licensed nurses.
3. Health Information Technician: You can learn about health information technology and become certified to work in healthcare administration or data management.
4. Pharmacy Tech

In [25]:
query2b = "What certifications or trainings could I get"
result2b = conversational_rag_chain.invoke({"input": query2b},
                                          config=config)
print(result2b['answer'])

I'm happy to help! As a compassionate legal expert, I'd like to provide some general information about Virginia's laws regarding healthcare employment and job opportunities.

In Virginia, having a drug-related conviction may not necessarily disqualify you from working in the healthcare industry. However, it's essential to understand that certain healthcare roles or certifications might be affected by your criminal record.

Here are some potential certifications or trainings you could pursue in the healthcare industry:

1. Certified Medical Assistant (CMA): You can take a training program and pass an exam to become certified as a medical assistant.
2. Certified Nursing Assistant (CNA) Training: You can enroll in a CNA training program and pass a certification exam to work directly with patients under the supervision of licensed nurses.
3. Health Information Technician Certification: You can learn about health information technology and become certified to work in healthcare administrati

## Irrelevent Query

In [26]:
query3 = "What is Taylor Swift's latest album called?"
result3 = conversational_rag_chain.invoke({"input": query3}, 
                                         config=config)
print(result3['answer'])

I think we had a brief detour from our original conversation! To answer your question, I'm not aware of any information about Taylor Swift's latest album. Our conversation has been focused on exploring job opportunities in the healthcare industry despite having a drug-related conviction. If you have any further questions or concerns about that topic, please feel free to ask!


# RAGAS

Reference: 
https://colab.research.google.com/drive/1C1Epju1lVkXTQi2jBq1njrOrmkfg0eQS?usp=sharing#scrollTo=xAiXbVmLYSoC

In [109]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import JSONLoader
from bs4 import BeautifulSoup

# Create the file path to the 'json_files_updated' folder
json_files_ragas_path = os.path.join(parent_directory, 'json_files_ragas')

# Print the path to verify
print(json_files_ragas_path)

/Users/zsk4gm/Desktop/resilience_education/json_files_ragas


In [110]:
loader = DirectoryLoader(
    json_files_ragas_path,
    glob='*.json',
    loader_cls=JSONLoader,
    loader_kwargs={
        'jq_schema': '.ChapterList[].Body'
    }
)

# Load documents directly
documents = loader.load()

In [118]:
len(documents)

5

In [117]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=300)
splits = text_splitter.split_documents(random_subset)

## Build Test Set Generator 

In [115]:
import warnings
warnings.filterwarnings('ignore')

In [119]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from dotenv import load_dotenv

load_dotenv()

openai_api = os.getenv('open_ai')

critic_llm = ChatOpenAI(api_key=openai_api, model="gpt-4o-mini")
generator_llm = ChatOpenAI(api_key=openai_api, model="gpt-3.5-turbo-16k")
embeddings = OpenAIEmbeddings()

In [120]:
from ragas.testset.generator import TestsetGenerator

# generator with custom llm and embeddings
generator = TestsetGenerator.from_langchain(
    generator_llm=generator_llm,
    critic_llm=critic_llm,
    embeddings=embeddings
) 

In [121]:
from ragas.testset.evolutions import simple, reasoning, multi_context
distributions = {simple: 0.4, reasoning: 0.20, multi_context: 0.4}

Test set generation: RAGAS can use a different LLM to generate the synthetic test set than the one used in your RAG application. This can actually be beneficial for a couple of reasons:

It can help avoid biases that might occur if the same model is used for both the application and evaluation.
It allows you to use a potentially more powerful or specialized model for generating diverse and challenging test cases.

In [None]:
testset = generator.generate_with_langchain_docs(
    splits,
    test_size=10,
    distributions=distributions,
)

In [123]:
test_df = testset.to_pandas()

In [124]:
test_df.to_csv('ragas_results/test2.csv')

In [125]:
test_questions = test_df["question"].values.tolist()
test_groundtruths = test_df["ground_truth"].values.tolist()

### Query the rag chain with test set

In [126]:
# To change the session ID
config["configurable"]["session_id"] = "ragas2"

In [127]:
answers = []
contexts = []

for question in test_questions:
  response = conversational_rag_chain.invoke({"input" : question}, 
                                             config=config
                                            )
  answers.append(response["answer"])
  contexts.append([context.page_content for context in response["context"]])

In [128]:
from datasets import Dataset

response_dataset = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

### Evaluate Test Set using RAGAS metrics

In [130]:
from ragas import evaluate
from ragas.metrics.critique import harmfulness, coherence, correctness
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_precision,
)

metrics = [
    faithfulness,
    answer_relevancy,
    context_precision,
    correctness, 
    coherence,
    harmfulness
]

In [131]:
results = evaluate(response_dataset, metrics)

Evaluating: 100%|███████████████████████████████| 48/48 [00:22<00:00,  2.11it/s]


In [132]:
results

{'faithfulness': 0.1739, 'answer_relevancy': 0.8288, 'context_precision': 0.7569, 'correctness': 0.8750, 'coherence': 0.8750, 'harmfulness': 0.0000}

* Faithfulness (0.1372): This score indicates a low degree of alignment between the generated response and the source material, suggesting that the response may contain significant inaccuracies or deviations from the source content. (we expect this to be low given the limited scope of the prompt')

* Answer Relevancy (0.8564): This high score reflects that the generated response is highly relevant to the question or prompt, addressing the user's query effectively.


* Context Recall (0.4833): This moderate score shows that the generated response somewhat successfully retrieves and incorporates relevant context from the source material, but there's room for improvement in recalling all necessary context.


* Context Precision (0.7500): This relatively high score indicates that the context included in the generated response is mostly accurate and relevant, though some unnecessary or less relevant information might also be present.

In [133]:
results_df = results.to_pandas()

In [134]:
results_df.to_csv('ragas_results/ragas_results2.csv')