### Custom MBA courses RAG project
---

In [1]:
# Go one level up in the directories hierarchy to access src directory and codes
import sys
import os
# Add project root to Python path
project_root = os.path.abspath("..")  # go one level up from notebooks/
sys.path.append(project_root)

cert_path = "C:/Users/Sanzhar/Downloads/certificate.cer"

if os.path.exists(cert_path):
    os.environ['SSL_CERT_FILE'] = cert_path
    os.environ['REQUESTS_CA_BUNDLE'] = cert_path
    os.environ['HTTPLIB2_CA_CERTS'] = cert_path
    # This one is for newer HTTPX/AIOHTTP clients
    os.environ['SSL_CERT_DIR'] = os.path.dirname(cert_path)
else:
    print(f"❌ ERROR: Certificate file not found at {cert_path}")

### Test the RagSystem class defined

In [2]:
# Let's test our RAG module defined in class to see how well it is refactored
from core.src.rag.rag_workflow import RagChatWorkflow

import nest_asyncio
nest_asyncio.apply()

rag_chat = RagChatWorkflow()

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 337/337 [00:00<00:00, 5983.70it/s]
Generating embeddings: 100%|██████████| 337/337 [00:11<00:00, 28.64it/s]
Parsing nodes: 100%|██████████| 205/205 [00:00<00:00, 418.87it/s]
Generating embeddings: 100%|██████████| 217/217 [00:12<00:00, 17.59it/s]
Parsing nodes: 100%|██████████| 203/203 [00:00<00:00, 6126.99it/s]
Generating embeddings: 100%|██████████| 209/209 [00:05<00:00, 37.24it/s]
Parsing nodes: 100%|██████████| 306/306 [00:00<00:00, 8305.07it/s]
Generating embeddings: 100%|██████████| 306/306 [00:08<00:00, 36.63it/s]
Parsing nodes: 100%|██████████| 292/292 [00:00<00:00, 9997.52it/s]
Generating embeddings: 100%|██████████| 292/292 [00:04<00:00, 58.77it/s]
Parsing nodes: 100%|██████████| 365/365 [00:00<00:00, 7155.01it/s]
Generating embeddings: 100%|██████████| 365/365 [00:12<00:00, 28.99it/s]


In [3]:
user_name = "SomeUser"
user_id = "01"
user_query = "What are the differences between Financial and Managerial accounting?"

rag_response = await rag_chat.run(
    user_query = user_query,
    user_name = user_name, 
    user_id = user_id
)
print(rag_response)

2026-01-05 15:11:27,049 - INFO - AFC is enabled with max remote calls: 10.
2026-01-05 15:11:28,451 - INFO - Selecting retriever 0: This choice describes financial accounting, focusing on reading, analyzing, and interpreting financial accounting information for external stakeholders. It details the financial statements and the principles governing their preparation, which are key aspects of financial accounting. While it doesn't explicitly contrast with managerial accounting, it provides a clear definition of one side of the comparison..
2026-01-05 15:11:28,479 - INFO - AFC is enabled with max remote calls: 10.
2026-01-05 15:11:29,378 - INFO - AFC is enabled with max remote calls: 10.


SomeUser, Financial accounting is an external accounting system prepared under external rules like IFRS and US-GAAP, audited by external firms, and intended for external decision-makers such as shareholders and creditors, measuring business activities in monetary units. Managerial accounting, conversely, is an internal accounting system prepared under internal guidelines, is not audited, and is intended for internal managers, utilizing both monetary and non-monetary units.


In [4]:
rag_response = await rag_chat.run(
    user_query = "Summarize in 2 sentences what we dicussed so far",
    user_name = user_name, 
    user_id = user_id
)
print(rag_response)

2026-01-05 15:11:36,681 - INFO - AFC is enabled with max remote calls: 10.
2026-01-05 15:11:38,236 - INFO - Selecting retriever 0: This course focuses on reading, analyzing, and interpreting financial accounting information to make informed business decisions, covering financial statements, accounting mechanics, and analysis methods..
2026-01-05 15:11:38,238 - INFO - Selecting retriever 1: This course provides an analytical study of financial flows and decision-making, covering foundational theories, capital markets, and valuation principles..
2026-01-05 15:11:38,239 - INFO - Selecting retriever 5: This course focuses on formulating and implementing business and corporate strategy to achieve superior performance, integrating different perspectives for complex management situations..
2026-01-05 15:11:38,310 - INFO - AFC is enabled with max remote calls: 10.
2026-01-05 15:11:38,772 - INFO - AFC is enabled with max remote calls: 10.


SomeUser, we have discussed the fundamental differences between financial and managerial accounting, highlighting their distinct purposes, reporting standards, audiences, and the types of information they convey. Financial accounting serves external stakeholders with audited reports adhering to GAAP or IFRS, while managerial accounting supports internal decision-making with unaudited, often non-monetary data.


### ETL prototyping

In [10]:
from etl.md_chunker import hybrid_hierarchical_chunking

dir_path = "../documents/markdown_files/md_files"

nodes = hybrid_hierarchical_chunking(dir_path = dir_path)

In [9]:
from langchain_text_splitters import MarkdownHeaderTextSplitter



'Context: /\nContent: # В \r\n\r\nВычислите (10.10-10.11):\r\n10.10. 1) $\\left(\\left(\\frac{1}{2}\\right)^{\\sqrt{2}}\\right)^{-\\sqrt{8}}$\r\n2) $((\\sqrt[3]{6})^{\\sqrt{3}})^{-3 \\sqrt{3}}$;\r\n3) $8^{\\frac{2}{3}}-\\left(\\frac{1}{16}\\right)^{-0,75}+\\left(\\frac{1}{9}\\right)^{1,5}$\r\n4) $\\left(64^{\\frac{1}{2}}+\\frac{4}{8}\\right)^{0} \\cdot\\left(343^{\\frac{1}{3}}-81^{\\frac{1}{2}}\\right)$.\r\n10.11. 1) $-0,027^{\\frac{1}{3}}+\\left(\\frac{1}{6}\\right)^{-1}-3^{-1}+(5,5)^{0}$;\r\n2) $\\left(\\left(\\frac{3}{4}\\right)^{0}\\right)^{-0,5}-7,5-\\left(\\sqrt[4]{4^{\\frac{3}{2}}}\\right)^{2}-2 \\cdot(-2)^{4}$;\r\n3) $(0,008)^{\\frac{9}{3}} \\cdot(0,64)^{0,5}:(0,04)^{-0,5}:(0,25)^{-1,5}$;\r\n4) $0,125^{\\frac{1}{3}}-\\left(-\\frac{1}{6}\\right)^{-2}+256^{0,75}+(1,2)^{0}$.\r\n10.12. '

In [14]:
import os
import re

def get_markdown_depth_from_dir(directory_path):
    all_levels = set()
    
    # 1. Loop through every file in that folder
    for filename in os.listdir(directory_path):
        if filename.endswith(".md"):  # Only process markdown files
            file_path = os.path.join(directory_path, filename)
            
            # 2. Open the ACTUAL file, not the folder
            with open(file_path, 'r', encoding='utf-8') as f:
                content = f.read()
                headers = re.findall(r'^(#{1,6})\s', content, re.MULTILINE)
                for h in headers:
                    all_levels.add(len(h))
    
    return sorted(list(all_levels))

print(f"Your documents use these header levels: {get_markdown_depth_from_dir(dir_path)}")

Your documents use these header levels: [1, 2]


In [35]:
from langchain_text_splitters import MarkdownHeaderTextSplitter

file_name = os.listdir(dir_path)[0]
file_path = os.path.join(dir_path, file_name)
with open(file_path, 'r', encoding='utf-8') as f:
    markdown_document = f.read()

# Define the headers to split on and the metadata key for each level
headers_to_split_on = [
    ("#", "Header 1"),
    # ("##", "Header 2"),
    # ("###", "Header 3"),
    # ("####", "Header 4"),
    # ("#####", "Header 5"),
    # ("######", "Header 6"),
]

# Initialize the splitter (strip_headers=True is the default)
markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on
)

# Split the text
md_header_splits = markdown_splitter.split_text(markdown_document)

md_header_splits

[Document(metadata={'Header 1': 'АЛГЕБРА и НАЧАЛА АНАЛИЗА'}, page_content='## Учебник  \n## 11  \n## Естественно-математическое направление  \n---  \n![[алгебра 11 рус_p2_img1.jpeg]]\n*Книга предоставлена исключительно в образовательных целях согласно Приказа Министра образования и науки Республики Казахстан от 17 мая 2019 года № 217  \n---'),
 Document(metadata={'Header 1': 'ВВЕДЕНИЕ'}, page_content='Дорогие учащиеся! Предлагаемый вам учебник является продолжением курса "Алгебра и начала анализа" для 10 класса естественноматематического направления.  \nВ 11 классе вы будете изучать такие понятия как первообразная, неопределенный и определенный интегралы, корень п-й степени, степень с рациональным и иррациональным показателями, логарифм, степенная, показательная и логарифмическая функции, комплексные числа, дифференциальное уравнение, дискретные и интервальные вариационные ряды. Вы научитесь решать иррациональные, показательные, логарифмические уравнения, неравенства и их системы, дифф