# Novel Office AI - Assesment

## 🎯 Objective:
Build an intelligent, high-performance Generative AI chatbot that can deeply understand and interact with the content of a long-form document (100+ pages) using Groq AI.

## Installing Necessary Libraries

In [60]:
!pip install --q langchain langchain-community langchain-core langchain-google-genai langchain-groq chromadb groq tiktoken pypdf

### **📘 Step 1: Choose a Document**

📘 **Document Title for Task Documentation:**
- "Jurisprudence–I (Legal Method) – LB-101"
- Bachelor of Laws (LL.B.), Ist Term Textbook
- Faculty of Law, University of Delhi
- Published for the Academic Session 2023
- Total Pages: 231
- [Public Access Link](https://lawfaculty.du.ac.in/userfiles/downloads/LLBCM/Ist%20Term_Jurisprudence-I_LB101_2023.pdf)

In [40]:
# Loading the Document
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/Ist Term_Jurisprudence-I_LB101_2023.pdf")

pages = loader.load_and_split()

In [41]:
pages[0]

Document(metadata={'producer': 'iLovePDF', 'creator': 'PyPDF', 'creationdate': '', 'moddate': '2023-02-06T11:39:43+00:00', 'source': '/content/Ist Term_Jurisprudence-I_LB101_2023.pdf', 'total_pages': 261, 'page': 0, 'page_label': '1'}, page_content='LL.B. I Term LB-101 - Jurisprudence-I (Legal Method, Indian Legal System and Basic Theories of Law)  Cases Selected and Edited by Mahavir Singh Alka Chawla Anumeha Mishra Amrendra K. Ajit Anita Yadav Ashish Kumar Archa Vashistha Apanjot Kaur Ashutosh Acharya Ajay Sonawane Daya Devi Gurpreet Singh Harleen Kaur Kailash Kurmi Santosh Upadhyay Shakti K Aggarwal Shourie Anand Pushkar Anand Rubina Grewal Nagra Upendra Nath Silky Mukherjee  FACULTY OF LAW UNIVERSITY OF DELHI, DELHI- 110 007   January, 2023  (For private use only in the course of instruction)')

In [42]:
len(pages)

274

### **📚 Step 2: Splitting Entire Document into Chunks of Document**


In [43]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=250)
docs = splitter.split_documents(pages)

In [44]:
len(docs)

1223

### **📚 Step 3: Storing the Embeddings of the Documents in an Venctor Database**
- Vector Store: ChromaDB
- Embeddings: GoogleGenerativeAIEmbeddings


In [55]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyBYBRKYkgZm5OYM1XQYWlrz9psaS3t65Cg"

In [53]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import Chroma

In [57]:
embedding = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectordb = Chroma.from_documents(documents=docs, embedding=embedding, persist_directory="Chroma_db")

In [76]:
import shutil

shutil.make_archive('Novel_Chroma_DB', 'zip', 'Chroma_db')

'/content/Novel_Chroma_DB.zip'

### **📚 Step 4: Using MMR Retriever(Maximum Marginal Retriever)**



In [59]:
mmr_retriever = vectordb.as_retriever(
    search_type = "mmr",
    search_kwargs = {"k":3, "lambda_mult":1})

### **📚 Step 5: LLM: Groq**


In [62]:
import os
os.environ["GROQ_API_KEY"] = "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"

In [64]:
from langchain_groq import ChatGroq
llm = ChatGroq(temperature=0.3, model_name="llama3-70b-8192")

### Beta Testing using A RetreivalQA Chain

In [69]:
# 7. RetrievalQA chain
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm=llm,
                                       chain_type="stuff",
                                       retriever=mmr_retriever, return_source_documents=True)

# 8. Ask questions
query = "Explain Dicey’s Rule of Law as mentioned in the jurisprudence textbook."
result = qa_chain(query)

print("Answer:\n", result["result"])
print("\nSources:\n", [doc.metadata['source'] for doc in result["source_documents"]])

Answer:
 According to the provided context, Dicey's Rule of Law means:

1. The absolute supremacy or predominance of regular law, excluding the influence of arbitrary power, prerogative, or wide discretionary authority on the part of the government.

2. Equality before the law, or the equal subjection of all classes to the ordinary law courts, excluding the idea of any exemption of officials or others from the duty of obedience to the law which governs other citizens or from the jurisdiction of the ordinary tribunals.

In simpler terms, Dicey's Rule of Law means that:

* The law is supreme, and arbitrary power or government discretion is not above the law.
* Everyone, including government officials, is equal before the law and subject to the same ordinary law courts, with no exemptions.

This concept is fundamentally inconsistent with the idea of "administrative law" or "administrative tribunals" found in foreign countries, where government affairs or disputes are dealt with by special

### **📚 Step 6: Integrating Langchain Chaining instead of Default Retrival QA Chain**

- **Benefits**
 - Customizable Prompt Template and Output Parsing
 - allows parallel Chaining for context from retrieved dcouments and LLm output

In [72]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.schema import HumanMessage, AIMessage
from langchain.chains import LLMChain
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableLambda, RunnableParallel, RunnablePassthrough

In [74]:
# Format context function
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chat Prompt Template
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful and knowledgeable AI assistant specialized in answering questions from a legal textbook.

You must use the following information to generate your response:
- Use the provided **textbook context**: {context}
- If the context does not contain the answer, rely on the **chat history** to infer or clarify.
- If neither provides sufficient information, respond honestly that you don’t know.

Be clear, concise, and helpful. Do not make up information that is not present in the context or conversation history."""),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    ("human", "{question}")
])

# Chat Model - Groq
chat_model = ChatGroq(model_name="llama3-70b-8192", temperature=0.2)
parser = StrOutputParser()

# Runnable Chain Components
parallel_chain = RunnableParallel({
    "question": RunnablePassthrough(),
    "context": lambda x: format_docs(mmr_retriever.get_relevant_documents(x["question"]))
})

# Session Memory
memory_dict = {"history": []}
runnable_memory = RunnableLambda(lambda _: memory_dict["history"])

# Final Chain
final_chain = RunnablePassthrough.assign(chat_history=runnable_memory) | parallel_chain | chat_prompt | chat_model | parser

# Conversational Chat Loop
while True:
    question = input("💬 Ask a question about the Jurisprudence textbook (type 'exit' to stop): ")
    if question.lower() in ["exit", "quit", "bye"]:
        print("👋 Bye!")
        break

    print(f"\n👦 Human: {question}")
    query = {"question": question}
    response = final_chain.invoke(query)
    print(f"🤖 AI: {response}\n")

    memory_dict["history"].append(HumanMessage(content=question))
    memory_dict["history"].append(AIMessage(content=response))

💬 Ask a question about the Jurisprudence textbook (type 'exit' to stop): What is the difference between legal theory and legal philosophy?

👦 Human: What is the difference between legal theory and legal philosophy?
🤖 AI: Based on the provided textbook context, I couldn't find a direct answer to this question. The context primarily focuses on the concepts of law, norms, and their meanings, rather than distinguishing between legal theory and legal philosophy.

However, I can try to infer some insights from the context. Hans Kelsen's work, "Pure Theory of Law," seems to be more focused on the theoretical aspects of law, exploring the nature of law, norms, and their meanings. This might be related to legal theory.

On the other hand, legal philosophy might be more concerned with the fundamental questions about the nature of law, justice, morality, and their relationships. While Kelsen's work touches on these aspects, it appears to be more focused on the analytical and theoretical framework