In [1]:
!pip install jupyter openai minsearch requests python-dotenv



### RAG

In [3]:
import json

# The data was cleaned in extended-data-preprocessing.ipynb
with open("quran_with_tafsir.json", "r", encoding="utf-8") as f:
    documents = json.load(f)

In [4]:
from minsearch import AppendableIndex

index = AppendableIndex(
    text_fields=["question", "text", "section"],
    keyword_fields=["surah_number", "surah_name", "surah_translation", "ayah_number", "reference", "text", "language", "tafsir_text", "tafsir_source"]
)

index.fit(documents)

<minsearch.append.AppendableIndex at 0x15f9ea090>

In [5]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        boost_dict=boost,
        num_results=5,
        output_ids=True
    )

    return results

In [6]:
prompt_template = """
You are an Imam and a teacher of the QURAN.

You're given a QUESTION from a person and that you need to answer with provided CONTEXT. And if there is no CONTEXT you can use your own knowledge.
At the beginning the context is EMPTY.

The CONTEXT is build with the QURAN and documents of TAFSIR.
SEARCH_QUERIES contains the queries that were used to retrieve the documents from QURAN to and add them to the context.
PREVIOUS_ACTIONS contains the actions you already performed.

At the beginning the CONTEXT is empty.

When answering:
- Use clear, respectful, and simple language. 
- Quote directly from the Qur’an or tafsir when relevant. 
- Always include the surah and ayah reference (e.g., Surah Al-Fatiha 1:5).
- If the Qur’an text alone does not fully answer the QUESTION and you use tafsir (explanatory commentary) to clarify, explicitly label it as 'Tafsir clarification'.

You can perform the following actions:

- Search in the QURAN and TAFSIR database to get more data for the CONTEXT
- Answer the question using the CONTEXT
- Answer the question using your own knowledge

For the SEARCH action, build search requests based on the CONTEXT and the QUESTION.
Carefully analyze the CONTEXT and generate the requests to deeply explore the topic. 

Don't use search queries used at the previous iterations.

Don't repeat previously performed actions.

Don't perform more than {max_iterations} iterations for a given student question.
The current iteration number: {iteration_number}. If we exceed the allowed number 
of iterations, give the best possible answer with the provided information.

Output templates:

If you want to perform search, use this template:

{{
"action": "SEARCH",
"reasoning": "<add your reasoning here>",
"keywords": ["search query 1", "search query 2", ...]
}}

If you can answer the QUESTION using CONTEXT, use this template:

{{
"action": "ANSWER_CONTEXT",
"answer": "<your answer>",
"source": "CONTEXT"
}}

If the context doesn't contain the answer, use your own knowledge to answer the question

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}}

<QUESTION>
{question}
</QUESTION>

<SEARCH_QUERIES>
{search_queries}
</SEARCH_QUERIES>

<CONTEXT> 
{context}
</CONTEXT>

<PREVIOUS_ACTIONS>
{previous_actions}
</PREVIOUS_ACTIONS>
""".strip()

In [41]:
def build_context(search_results):
    context = ""

    for doc in search_results:
        context = context + f"surah_name: {doc['surah_name']}\nreference: {doc['reference']}\nquran_text: {doc['text']}\ntafsir: {doc["tafsir_text"]}\n\n"
    return context

In [19]:
question = 'In which verses is Moses mentioned?'

In [21]:
search_results = search(question)
search_results

[{'surah_number': 8,
  'surah_name': 'Al-Anfal',
  'surah_translation': 'The Spoils of War',
  'ayah_number': 2,
  'reference': '8:2',
  'text': 'The believers are only those who, when Allah is mentioned, their hearts become fearful, and when His verses are recited to them, it increases them in faith; and upon their Lord they rely',
  'language': 'English',
  'tafsir_text': 'Qualities of the Faithful and Truthful Believers `Ali bin Abi Talhah reported that Ibn `Abbas said about the Ayah, إِنَّمَا الْمُؤْمِنُونَ الَّذِينَ إِذَا ذُكِرَ اللَّهُ وَجِلَتْ قُلُوبُهُمْ (The believers are only those who, when Allah is mentioned, feel a fear in their hearts) "None of Allah\'s remembrance enters the hearts of the hypocrites upon performing what He has ordained. They neither believe in any of Allah\'s Ayat nor trust (in Allah) nor pray if they are alone nor pay the Zakah due on their wealth. Allah stated that they are not believers. He then described the believers by saying, إِنَّمَا الْمُؤْمِنُو

In [23]:
from dotenv import load_dotenv
import os
load_dotenv()

True

In [25]:
from openai import OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def llm(prompt):
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

### Making Agentic RAG

In [27]:
max_iterations = 3
iteration_number = 0
search_queries = []
search_results = []
previous_actions = []

In [28]:
# def agentic_rag_v1(question):
#     context = "EMPTY"
#     prompt = prompt_template.format(question=question, context=context)
#     answer_json = llm(prompt)
#     answer = json.loads(answer_json)
#     print(answer)

#     if answer['action'] == 'SEARCH':
#         print('need to perform search...')
#         search_results = search(question)
#         context = build_context(search_results)
        
#         prompt = prompt_template.format(question=question, context=context)
#         answer_json = llm(prompt)
#         answer = json.loads(answer_json)

#     return answer

In [35]:
def agentic_search(question):
    search_queries = []
    search_results = []
    previous_actions = []

    iteration = 0
    
    while True:
        print(f'ITERATION #{iteration}...')
    
        context = build_context(search_results)
        prompt = prompt_template.format(
            question=question,
            context=context,
            search_queries="\n".join(search_queries),
            previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
            max_iterations=3,
            iteration_number=iteration
        )
    
        print(prompt)
    
        answer_json = llm(prompt)
        answer = json.loads(answer_json)
        print(json.dumps(answer, indent=2))

        previous_actions.append(answer)
    
        action = answer['action']
        if action != 'SEARCH':
            break
    
        keywords = answer['keywords']
        search_queries = list(set(search_queries) | set(keywords))

        for k in keywords:
            res = search(k)
            search_results.extend(res)
    
        # search_results = dedup(search_results)
        
        iteration = iteration + 1
        if iteration >= 4:
            break
    
        print()

    return answer

In [37]:
question = 'In which verses is Moses mentioned?'

In [43]:
answer = agentic_search(question)

ITERATION #0...
You are an Imam and a teacher of the QURAN.

You're given a QUESTION from a person and that you need to answer with provided CONTEXT. And if there is no CONTEXT you can use your own knowledge.
At the beginning the context is EMPTY.

The CONTEXT is build with the QURAN and documents of TAFSIR.
SEARCH_QUERIES contains the queries that were used to retrieve the documents from QURAN to and add them to the context.
PREVIOUS_ACTIONS contains the actions you already performed.

At the beginning the CONTEXT is empty.

When answering:
- Use clear, respectful, and simple language. 
- Quote directly from the Qur’an or tafsir when relevant. 
- Always include the surah and ayah reference (e.g., Surah Al-Fatiha 1:5).
- If the Qur’an text alone does not fully answer the QUESTION and you use tafsir (explanatory commentary) to clarify, explicitly label it as 'Tafsir clarification'.

You can perform the following actions:

- Search in the QURAN and TAFSIR database to get more data for th

In [49]:
print(answer['answer'])

Moses, also known as Musa in Arabic, is mentioned in several verses throughout the Qur'an. Some notable verses include: 

1. **Surah Al-Baqarah (2:51)**: 'And [recall] when We made an appointment with Moses for forty nights...'
2. **Surah Ash-Shu'ara (26:24)**: '[Moses] said, "The Lord of the heavens and earth and that between them..."'
3. **Surah Al-Qasas (28:45)**: 'But We produced [many] generations [after Moses]...'
4. **Surah Al-Anfal (8:31)**: 'And when Our verses are recited to them, they say, "We have heard..."'
5. **Surah Al-A'raf (7:174)**: 'And thus do We [explain in] detail the verses, and perhaps they will return...'

These verses reflect his pivotal role as a prophet and the events surrounding his life and mission.


In [51]:
answer

{'action': 'ANSWER_CONTEXT',
 'answer': 'Moses, also known as Musa in Arabic, is mentioned in several verses throughout the Qur\'an. Some notable verses include: \n\n1. **Surah Al-Baqarah (2:51)**: \'And [recall] when We made an appointment with Moses for forty nights...\'\n2. **Surah Ash-Shu\'ara (26:24)**: \'[Moses] said, "The Lord of the heavens and earth and that between them..."\'\n3. **Surah Al-Qasas (28:45)**: \'But We produced [many] generations [after Moses]...\'\n4. **Surah Al-Anfal (8:31)**: \'And when Our verses are recited to them, they say, "We have heard..."\'\n5. **Surah Al-A\'raf (7:174)**: \'And thus do We [explain in] detail the verses, and perhaps they will return...\'\n\nThese verses reflect his pivotal role as a prophet and the events surrounding his life and mission.',
 'source': 'CONTEXT'}