<a href="https://colab.research.google.com/github/waheed444/AgenticAI-Playground/blob/main/Project02_Langchain_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Set-up the Environment & Install Prerequisites

In [19]:
! pip install -q --upgrade google-generativeai langchain langchain-community langchain-google-genai chromadb pypdf


### Display the output in  Proper Formate

In [15]:
from IPython.display import display
from IPython.display import Markdown
import textwrap


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

### Import **google.generativeai** & Configure the **API Key**

In [2]:
import os
import google.generativeai as genai
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

### Import **Libraries** to build QnA System using **PDF Data** Integration

In [3]:
import urllib
import warnings
from pathlib import Path as p
from pprint import pprint

import pandas as pd
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA

warnings.filterwarnings("ignore")
# restart python kernal if issues with langchain import.

### Import **ChatGoogleGenerativeAI**

In [4]:
from langchain_google_genai import ChatGoogleGenerativeAI

### Configure **gimini-pro** model with custom parameters

In [20]:
model = ChatGoogleGenerativeAI(
    model="gemini-pro",
    google_api_key=GOOGLE_API_KEY,
    temperature=0.6,
    convert_system_message_to_human=True
)


### Check file Existence using **pathlib** library

In [21]:
from pathlib import Path

file_path = Path("/content/RAG_File.pdf")
if file_path.exists():
    print("File exists and is ready for processing.")
else:
    print("File does not exist. Please upload it.")


File exists and is ready for processing.


### Loading & Splitting PDF Content Using **PyPDFLoader** library

In [22]:
from langchain_community.document_loaders import PyPDFLoader

pdf_loader = PyPDFLoader("/content/RAG_File.pdf")
pages = pdf_loader.load_and_split()
print(pages[3].page_content)

technologies.Useopensourcelibraries,likeLangchain,CrewAI,andLangGraphtoautomaterepeatable,multi-steptasksandautomatebusinessprocessesthataretypicallydonebyagroupofpeople.
Certifications:
■ MicrosoftCertified:AzureAIEngineerAssociate■ CertifiedcrewAIEngineer
LearningRepo:https://github.com/panaversity/learn-prompt-eng-gpts-ai-agents
● Quarter3:CloudNativeAIPoweredMicroservicesDesign,Development,andDeployment:
BuildscalableAIPoweredAPIsusingFastAPI,Postgres,Kafka,Kong,GenAIAPIslikeOpenAIChatCompletionAPIs,AssistantAPIs,LangChainandOpenSourceAILLMs,developthemusingContainersandDevContainers,anddeploythemusingDockerComposelocallyandKubernetesPoweredServerlessContainerServicesonthecloud.
WewillalsolearntointegratedesignthinkingandBehavior-DrivenDevelopment(BDD)indevelopingAIsystems.WewilllearntocreateAIsolutionsthataredeeplyalignedwithuserneedsandexpectations.Designthinkingensuresathoroughunderstandingoftheuserandproblemspace,whileBDDprovidesastructuredapproachtodefiningandvalidatingthedesi

### Splitting Text into Chunks for Embedding using **GoogleGenerativeAIEmbeddings**

In [10]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

In [11]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
context = "\n\n".join(str(p.page_content) for p in pages)
texts = text_splitter.split_text(context)

In [12]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001",google_api_key=GOOGLE_API_KEY)

### **Vector Indexing** to make a Retrieval-Based QnA System  

In [13]:
vector_index = Chroma.from_texts(texts, embeddings).as_retriever(search_kwargs={"k":5})

qa_chain = RetrievalQA.from_chain_type(
    model,
    retriever=vector_index,
    return_source_documents=True

)

### Ask the Question and Displaying Output!

In [28]:
question = "What are the main topics discussed in this file?"
result = qa_chain({"query": question})
result["result"]
Markdown(result["result"])

This file discusses the following topics:
- Cloud Native Applied Generative AI Engineering (GenEng)
- FastAI and PyTorch for Fine-tuning Open-Source Large Language Models (LLMs)
- Developing an LLM like ChatGPT4 or Google Gemini
- Business Considerations for LLM Development
- Custom Generative Pre-trained Transformers (GPTs)
- Actions in GPTs
- AI Agents and their Differences from Custom GPTs
- Data Preparation for Fine-tuning MetaLLaMA3 with PyTorch
- Fine-tuning MetaLLaMA3 with PyTorch
- Utilizing FastAI for NLP Tasks in Fine-tuning MetaLLaMA3
- Cloud-Native Training and Deployment for Fine-tuned Models
- Exporting Models for Inference and Building Robust Inference Pipelines
- Capstone Project for Fine-tuning and Deploying MetaLLaMA3
- Physical AI and Humanoid Robotics Development
- Cloud Native Microservices Deployment and Distributed System Design
- Front-end Web GUI Development using Next.js and TypeScript
- FAQs and Detailed Answers
- Docker, Kubernetes, and Terraform Technologies for API Development
- API-as-a-Product Model
- Benefits of Docker Containers for Development, Testing, and Deployment
- Advantages of Open Docker, Kubernetes, and Terraform Technologies
- Advantages of Using AWS, Azure, or Google Cloud Technologies
- Reasons for Not Learning to Build LLMs in the Program