# **OJKGPT**

This is the Jupyter Notebook for the development of the 1st version of OJKGPT for the showcase demo capabilites to wealth team program. <br>
In this development, we use several data sources for enriching the RAG system of the chatbot. 

In [1]:
import nest_asyncio
nest_asyncio.apply()

In [2]:
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [4]:
from utils.models_definer import ModelName

# storing or not
OCR_TRESHOLD = 0.98
TOP_K = 3
NUM_BATCH_EVAL = 10
STORE = False
DELETE = False
EVAL = False
model_name = ModelName.AZURE_OPENAI
model_name_eval = ModelName.OPENAI

## **LLM Models Define**

In [5]:
from llama_index.core import Settings
from utils.models_definer import get_llm_and_embedding

# ollama/openai/azure_openai
llm,embedding_llm = get_llm_and_embedding(model_name=model_name)

llm_eval = get_llm_and_embedding(model_name=model_name_eval)[0]


Settings.llm = llm
Settings.embed_model = embedding_llm

## **Loading Documents**

### **Document Reader and Node Parser**

In [6]:
from utils.documents_reader import read_documents
from utils.node_parser import parse_nodes

path = './data'

if STORE:
    docs = read_documents(path=path, ocr_treshold=OCR_TRESHOLD)
    nodes = parse_nodes(documents=docs, llm=llm)

## **Indexing & Storing**

In [7]:
from utils.index_store import (store_vector_index, load_vector_index)

# store
if STORE:
    vector_index = store_vector_index(nodes=nodes,embed_model=embedding_llm, delete=DELETE)
# load
else:
    vector_index = load_vector_index()

Loading the vector index completed.


## **Querying**

### **Retriever**

In [8]:
vector_retriever = vector_index.as_retriever(similarity_top_k=TOP_K)

### **Prompt**

In [25]:
from llama_index.core import PromptTemplate

qa_prompt = """\
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, \
answer the query asking about banking compliance in Indonesia. 
Answer the question based on the context information.
ALWAYS ANSWER WITH USER'S LANGUAGE.
Please provide your answer with [regulation_number](file_url) in metadata 
(if possible) in the following format:

---------------------
Answer... \n\n
Source: [regulation_number](file_url) \n
---------------------

Query: {query_str}
Answer: \
"""

qa_prompt_tmpl = PromptTemplate(qa_prompt)

### **Query Engine**

In [26]:
from llama_index.core.query_engine import RetrieverQueryEngine

vector_query_engine = RetrieverQueryEngine.from_args(retriever=vector_retriever,llm=llm,)

In [27]:
vector_query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

In [28]:
from utils.prompt_display import display_prompt_dict

prompts_dict = vector_query_engine.get_prompts()
display_prompt_dict(prompts_dict)

**Prompt Key**: response_synthesizer:text_qa_template<br>**Text:** <br>

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query asking about banking compliance in Indonesia. 
ALWAYS ANSWER WITH USER'S LANGUAGE.
Please provide your answer with [regulation_number](file_url) in metadata 
(if possible) in the following format:

---------------------
Answer... 


Source: [regulation_number](file_url) 

---------------------

Query: {query_str}
Answer: 


<br><br>

**Prompt Key**: response_synthesizer:refine_template<br>**Text:** <br>

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 


<br><br>

In [33]:
a = "Apa judul peraturan 7/33/PBI/2005?" # Pencabutan atas Peraturan Bank Indonesia Nomor 5/17/PBI/2003 tentang Persyaratan dan Tata Cara Pelaksanaan Jaminan Pemerintah terhadap Kewajiban Pembayaran Bank Perkreditan Rakyat
b = "Kapan surat edaran No. 15/26/DPbS mulai berlaku?" # 1 Agustus 2013.
c = "Siapa nama dan jabatannya yang menandatangani surat dengan nomor 1/SEOJK.04/2013?" # NURHAIDA, kepala eksekutif pengawas pasar modal
d = "Saya ingin menyelenggarakan kegiatan pasar modal berikan saya nomor surat yang membahas mengenai hal ini!" # Peraturan Pemerintah Nomor 12 Tahun 2004
e = "Berapa persen jaminan moneter pada tanggal 20 Agustus 1958?" # 7,3%
f = "Surat edaran nomor berapa yang mengatur bank umum syariah dan unit usaha syariah?" # 15/26/DPbS
g = "Apa kepanjangan dari PAPSI?" # Pedoman Akuntansi Perbankan Syariah Indonesia
h = "apa judul peraturan nomor 112/KMK.03/2001?" # Keputusan Menteri Keuangan tentang Pemotongan Pajak Penghasil Pasal 21 atas Penghasilan berupa Uang Pesangon, Uang Tebusan Pensiun, dan Tunjangan Hari Tua atau Jaminan Hari Tua
i = "Saya ingin membuat sistem informasi lembaga jasa keuangan, berikan nomor regulasi dari peraturan yang membahas tentang manejemen risiko nya!" # 4/POJK.05/2021
j = "Apa kepanjangan dari SWDKLLJ?" # Sumbangan Wajib Dana Kecelakaan Lalu Lintas Jalan
k = "Berapa nilai SWDKLLJ dari sedan?" # Rp. 140.000
l = "Apa latar belakang dari peraturan NOMOR 4/POJK.05/2021?" # dalam bentuk list
m = "Apa itu LJKNB?" # Lembaga Jasa Keuangan Non Bank
n = "Apakah KMK Nomor 462/KMK.04/1998 masih berlaku" # tidak
o = "Apa itu Uang Pesangon?" # penghasilan yang dibayarkan oleh pemberi kerja kepada karyawan dengan nama dan dalam bentuk apapun sehubungan dengan berakhirnya masa kerja atau terjadi pemutusan  hubungan kerja, termasuk uang penghargaan masa kerja dan uang  ganti kerugian
p = "Apa itu CKPN?" # Cadangan Kerugian Penurunan Nilai.
q = "Kapan, dimana, dan oleh siapa surat nomor PER- 06/BL/2012 ditetapkan?" # Surat nomor PER-06/BL/2012 ditetapkan pada tanggal 22 November 2012 di Jakarta oleh Ketua Badan Pengawas Pasar Modal dan Lembaga Keuangan.
r = "Apa kepanjangan PSAK?" # Pernyataan Standar Akuntansi Keuangan
s = "Apa itu 'shahibul maal'?" # Pemilik dana pihak ketiga
t = "Judul peraturan NOMOR 1 /POJK.05/2018?"

query_str = r

In [34]:
from llama_index.core.response.notebook_utils import display_response

response_vector = vector_query_engine.query(query_str)
display_response(response_vector)

**`Final Response:`** PSAK adalah kepanjangan dari Pernyataan Standar Akuntansi Keuangan.

### **Chat Engine**

In [33]:
from llama_index.core.storage.chat_store import SimpleChatStore
from llama_index.core.memory import ChatMemoryBuffer

chat_store = SimpleChatStore()
memory = ChatMemoryBuffer.from_defaults(
    token_limit=10000,
    chat_store=chat_store,
    chat_store_key="user1",
)

In [34]:
# from llama_index.agent.openai import OpenAIAgent
# from llama_index.core.agent.legacy.react.base import ReActAgent
from llama_index.core.chat_engine import CondenseQuestionChatEngine

chat_engine = CondenseQuestionChatEngine.from_defaults(
    llm=llm,
    memory=memory,
    query_engine=vector_query_engine,
    verbose=True,
)

In [35]:
from llama_index.core.response.notebook_utils import display_response

response = chat_engine.chat(message=query_str)
display_response(response)

Querying with: UNDANG-UNDANG REPUBLIK INDONESIA NOMOR 2 TAHUN 2009 berjudul apa?


**`Final Response:`** UNDANG-UNDANG REPUBLIK INDONESIA NOMOR 2 TAHUN 2009 berjudul "Penetapan Peraturan Pemerintah Pengganti Undang-Undang Nomor 2 Tahun 2008 Tentang Perubahan Kedua Atas Undang- Undang Nomor 23 Tahun 1999 Tentang Bank Indonesia Menjadi Undang-Undang".

In [19]:
from llama_index.core.response.notebook_utils import display_source_node

for node in response.source_nodes:
    display_source_node(node, show_source_metadata=True)    

**Node ID:** 4f679b39-6dab-42f8-bf24-42c90f009d62<br>**Similarity:** 0.9154753684997559<br>**Text:** Cukup jelas 
Angka 13 
Pasal 42 
Ayat (1) 
Cukup jelas 
Pasal II 
Cukup jelas 
Pasal III 
Cukup j...<br>**Metadata:** {'file_name': 'ojk-peraturan_pemerintah-63_tahun_1999-02071999-pp_ri_nomor_63_tahun_1999_tentang_perubahan_atas_peraturan_pemerintah_nomor_73_tahun_1992_ppas3_1389239909_pdf.pdf', 'title': 'Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian', 'sector': 'IKNB', 'subsector': 'Asuransi', 'regulation_type': 'Peraturan Pemerintah', 'regulation_number': '63 TAHUN 1999', 'effective_date': '2 Juli 1999', 'file_url': 'https://www.ojk.go.id/id/regulasi/Documents/Pages/PP-RI-Nomor-63-Tahun-1999-tentang-Perubahan-atas-Peraturan-Pemerintah-Nomor-73-Tahun-1992/ppas3_1389239909.pdf', 'questions_this_excerpt_can_answer': '1. What is the title and effective date of Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999?\n2. What is the sector and subsector that this regulation pertains to?\n3. What is the regulation type and number of Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999?\n4. What is the file name and URL of the document containing Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999?\n5. What is the additional information provided in Lembaran Negara Republik Indonesia Nomor 3861?', 'prev_section_summary': 'The key topics of this section include reinsurance, cooperation in risk distribution, penalties for late submission of financial reports, and the regulation of new insurance programs. The entities mentioned in this section are Perusahaan Asuransi (Insurance Companies), Perusahaan Reasuransi (Reinsurance Companies), and Departemen Keuangan (Ministry of Finance).', 'section_summary': 'The section provides information about a regulation titled "Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian." The effective date of this regulation is July 2, 1999. It pertains to the sector of IKNB (non-bank financial institutions) and the subsector of Asuransi (insurance). The regulation is classified as a Peraturan Pemerintah and its number is 63 TAHUN 1999. The file name of the document containing this regulation is "ojk-peraturan_pemerintah-63_tahun_1999-02071999-pp_ri_nomor_63_tahun_1999_tentang_perubahan_atas_peraturan_pemerintah_nomor_73_tahun_1992_ppas3_1389239909_pdf.pdf," and it can be accessed through the provided URL. Additionally, there is a mention of additional information provided in Lembaran Negara Republik Indonesia Nomor 3861.'}<br>

**Node ID:** e44ba555-0b60-4f3d-88dd-db81a56bbe88<br>**Similarity:** 0.9136121273040771<br>**Text:** Ayat (3) 
Cukup jelas 
Angka 2 
Pasal 9 
Ayat (1) 
Cukup jelas 
Ayat (2) 
Polis Asuransi Indemnit...<br>**Metadata:** {'file_name': 'ojk-peraturan_pemerintah-63_tahun_1999-02071999-pp_ri_nomor_63_tahun_1999_tentang_perubahan_atas_peraturan_pemerintah_nomor_73_tahun_1992_ppas3_1389239909_pdf.pdf', 'title': 'Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian', 'sector': 'IKNB', 'subsector': 'Asuransi', 'regulation_type': 'Peraturan Pemerintah', 'regulation_number': '63 TAHUN 1999', 'effective_date': '2 Juli 1999', 'file_url': 'https://www.ojk.go.id/id/regulasi/Documents/Pages/PP-RI-Nomor-63-Tahun-1999-tentang-Perubahan-atas-Peraturan-Pemerintah-Nomor-73-Tahun-1992/ppas3_1389239909.pdf', 'questions_this_excerpt_can_answer': '1. What is the effective date of Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian?\n- The effective date of the regulation is 2 Juli 1999.\n\n2. What is the file URL for the document containing Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian?\n- The file URL is https://www.ojk.go.id/id/regulasi/Documents/Pages/PP-RI-Nomor-63-Tahun-1999-tentang-Perubahan-atas-Peraturan-Pemerintah-Nomor-73-Tahun-1992/ppas3_1389239909.pdf.\n\n3. What is the subsector of the regulation?\n- The subsector of the regulation is Asuransi.\n\n4. What is the regulation type of Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian?\n- The regulation type is Peraturan Pemerintah.\n\n5. What is the title of the regulation?\n- The title of the regulation is "Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian."', 'prev_section_summary': 'The key topics of the section are the Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999, which is a regulation about the changes to the Peraturan Pemerintah Nomor 73 Tahun 1992 regarding the Penyelenggaraan Usaha Perasuransian. The effective date of the regulation is July 2, 1999. The sector and subsector that this regulation belongs to are IKNB and Asuransi, respectively. The file URL for accessing the full text of the regulation is provided. The President of the Republic of Indonesia when this regulation was enacted was Bacharuddin Jusuf Habibie.', 'section_summary': 'The key topics of this section are the Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian. The section also mentions the effective date of the regulation, the file URL for the document, the subsector of the regulation (Asuransi), and the regulation type (Peraturan Pemerintah). The title of the regulation is also provided.'}<br>

**Node ID:** 819b6417-623d-4cf6-a99a-5dcd38fad89d<br>**Similarity:** 0.913331151008606<br>**Text:** Dukungan reasuransi otomatis tersebut sedapat mungkin diperoleh dari 
Perusahaan Asuransi dan Per...<br>**Metadata:** {'file_name': 'ojk-peraturan_pemerintah-63_tahun_1999-02071999-pp_ri_nomor_63_tahun_1999_tentang_perubahan_atas_peraturan_pemerintah_nomor_73_tahun_1992_ppas3_1389239909_pdf.pdf', 'title': 'Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian', 'sector': 'IKNB', 'subsector': 'Asuransi', 'regulation_type': 'Peraturan Pemerintah', 'regulation_number': '63 TAHUN 1999', 'effective_date': '2 Juli 1999', 'file_url': 'https://www.ojk.go.id/id/regulasi/Documents/Pages/PP-RI-Nomor-63-Tahun-1999-tentang-Perubahan-atas-Peraturan-Pemerintah-Nomor-73-Tahun-1992/ppas3_1389239909.pdf', 'questions_this_excerpt_can_answer': '1. What is the effective date of Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian?\n- The effective date of the regulation is 2 Juli 1999.\n\n2. What is the sector and subsector that this regulation pertains to?\n- The regulation pertains to the sector of IKNB (Insurance, Pension Funds, and Other Financial Services) and the subsector of Asuransi (Insurance).\n\n3. What is the regulation type and number of this Peraturan Pemerintah?\n- The regulation type is Peraturan Pemerintah and the regulation number is 63 TAHUN 1999.\n\n4. Where can the full file of this regulation be accessed?\n- The full file of this regulation can be accessed at the following URL: https://www.ojk.go.id/id/regulasi/Documents/Pages/PP-RI-Nomor-63-Tahun-1999-tentang-Perubahan-atas-Peraturan-Pemerintah-Nomor-73-Tahun-1992/ppas3_1389239909.pdf\n\n5. What is the title of the regulation?\n- The title of the regulation is "Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999 tentang Perubahan atas Peraturan Pemerintah Nomor 73 Tahun 1992 tentang Penyelenggaraan Usaha Perasuransian."', 'prev_section_summary': 'The key topics of the section include the regulation number and effective date of Peraturan Pemerintah Republik Indonesia Nomor 63 Tahun 1999, the sector and subsector that the regulation pertains to, the maximum percentage of foreign ownership allowed for Perusahaan Asuransi and Perusahaan Reasuransi, specific provisions regarding solvency levels, types of assets, and obligations in the regulation, and the definition and requirement of automatic reinsurance (treaty reinsurance) in the insurance industry. The entities mentioned in the section are Perusahaan Asuransi and Perusahaan Reasuransi.', 'section_summary': 'The key topics of this section include reinsurance, cooperation in risk distribution, penalties for late submission of financial reports, and the regulation of new insurance programs. The entities mentioned in this section are Perusahaan Asuransi (Insurance Companies), Perusahaan Reasuransi (Reinsurance Companies), and Departemen Keuangan (Ministry of Finance).'}<br>

In [20]:
chat_store_string = chat_store.json()
loaded_chat_store = SimpleChatStore.parse_raw(chat_store_string)

## **Evaluation**

In [21]:
import os

# check directory exist
if not os.path.exists("./json_data"):
    os.makedirs("./json_data")

In [22]:
from llama_index.core.evaluation import (
    generate_question_context_pairs,
    EmbeddingQAFinetuneDataset,
)
if EVAL:
    if STORE:
        qa_dataset = generate_question_context_pairs(
            nodes=nodes,
            llm=llm_eval,
            num_questions_per_chunk=2,
        )
        qa_dataset.save_json("./json_data/pg_eval_dataset.json")
    else:
        qa_dataset = EmbeddingQAFinetuneDataset.from_json("./json_data/pg_eval_dataset.json")

### **Retrieval Evaluation**

In [23]:
metrics = ["hit_rate", "mrr", "precision", "recall", "ap", "ndcg"]

In [24]:
from utils.evaluator import get_retrieval_eval_df

if EVAL:
    vector_retrieval_eval_results = await get_retrieval_eval_df(name=f"Top-{TOP_K} Eval", metrics=metrics, retriever=vector_retriever, qa_dataset=qa_dataset)
    display(vector_retrieval_eval_results)

### **Response Evaluation**

In [25]:
from llama_index.core.evaluation import (
    FaithfulnessEvaluator,
    RelevancyEvaluator,
)

if EVAL:
    # model for evaluation
    relevancy_evaluator = RelevancyEvaluator(llm=llm_eval)
    faithfullness_evaluator = FaithfulnessEvaluator(llm=llm_eval)

#### **Faithfulness Evaluator**

Evaluate Response

In [26]:
from utils.evaluator import get_response_eval_df

if EVAL:
    vector_faithfullness_eval_result = faithfullness_evaluator.evaluate_response(response=response_vector, query=query_str)
    display(get_response_eval_df(query=query_str, response=response_vector, eval_result=vector_faithfullness_eval_result))

Evaluate Source Nodes

In [27]:
from utils.evaluator import get_response_eval_sources_df

if EVAL:
    relevancy_eval_sources = get_response_eval_sources_df(query=query_str, response=response_vector, evaluator=faithfullness_evaluator)
    display(relevancy_eval_sources)

#### **Relevancy Evaluator**

Evaluate Response

In [28]:
if EVAL:
    relevancy_eval_result = relevancy_evaluator.evaluate_response(
        query=query_str, response=response_vector
    )

    display(get_response_eval_df(query=query_str, response=response_vector, eval_result=relevancy_eval_result))

Evaluate Source Nodes

In [29]:
from utils.evaluator import get_response_eval_sources_df

if EVAL:
    relevancy_eval_sources = get_response_eval_sources_df(query=query_str, response=response_vector, evaluator=relevancy_evaluator)
    display(relevancy_eval_sources)

#### **Batch Evaluator (Faithfulness, Relevancy, Correctness)**

In [30]:
from llama_index.core.evaluation import BatchEvalRunner
from utils.evaluator import get_batch_eval_results
from utils.evaluator import get_batch_eval_df

if EVAL:
    runner = BatchEvalRunner(
        {
            "faithfulness": faithfullness_evaluator, 
            "relevancy": relevancy_evaluator,
        },
        workers=8,
    )

    vector_eval_results = await get_batch_eval_results(runner=runner, qa_dataset=qa_dataset, query_engine=vector_query_engine, num_queries=NUM_BATCH_EVAL)
    display(get_batch_eval_df(vector_eval_results))