In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Financial Expert Chatbot with Exaone and RAG\n",
    "\n",
    "This notebook implements a financial expert chatbot using the Exaone model via Ollama.\n",
    "\n",
    "The chatbot uses a Retrieval-Augmented Generation (RAG) pipeline to answer questions based on the financial documents located in the `output/concept_details` directory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Install necessary packages"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "%pip install -q langchain langchain-community langchain-ollama faiss-cpu unstructured markdown"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Load Documents\n",
    "\n",
    "Load the markdown files from the `output/concept_details` directory."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain_community.document_loaders import DirectoryLoader\n",
    "\n",
    "# The documents are in the `concept_details` subdirectory.\n",
    "loader = DirectoryLoader('output/concept_details/', glob=\"**/*.md\", show_progress=True)\n",
    "docs = loader.load()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Split Documents into Chunks\n",
    "\n",
    "Split the loaded documents into smaller chunks to prepare them for the vector store."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)\n",
    "splits = text_splitter.split_documents(docs)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Setup LLM and Embeddings\n",
    "\n",
    "Initialize the Exaone LLM and the embedding model from Ollama."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain_ollama.llms import ChatOllama\n",
    "from langchain_ollama.embeddings import OllamaEmbeddings\n",
    "\n",
    "llm = ChatOllama(model=\"exaone3.5:7.8b\")\n",
    "embeddings = OllamaEmbeddings(model=\"exaone3.5:7.8b\")"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Create Vector Store\n",
    "\n",
    "Create a FAISS vector store from the document chunks and the Ollama embeddings."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain_community.vectorstores import FAISS\n",
    "\n",
    "vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. Create RAG Chain\n",
    "\n",
    "Create a retrieval chain that:\n",
    "1. Retrieves relevant document splits from the vector store.\n",
    "2. Passes them to a prompt template.\n",
    "3. Sends the combined prompt to the Exaone model to generate an answer."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain.chains.combine_documents import create_stuff_documents_chain\n",
    "from langchain.chains import create_retrieval_chain\n",
    "\n",
    "retriever = vectorstore.as_retriever()\n",
    "\n",
    "template = \"\"\"\n",
    "You are a helpful financial expert assistant. \n",
    "Answer the user's question based only on the following context.\n",
    "If you don't know the answer, just say that you don't know.\n",
    "\n",
    "Context:\n",
    "{context}\n",
    "\n",
    "Question: {input}\n",
    "\"\"\"\n",
    "\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n",
    "document_chain = create_stuff_documents_chain(llm, prompt)\n",
    "retrieval_chain = create_retrieval_chain(retriever, document_chain)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7. Ask a Question!\n",
    "\n",
    "Now you can ask questions to your financial chatbot."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "response = retrieval_chain.invoke({\"input\": \"회사의 영업이익은 얼마인가요?\"})\n",
    "print(response[\"answer\"])"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "response = retrieval_chain.invoke({\"input\": \"바이오의약품 부문의 매출액과 영업이익을 알려주세요.\"})\n",
    "print(response[\"answer\"])"
   ],
   "outputs": [],
   "execution_count": null
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "celt",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


NameError: name 'null' is not defined