<a href="https://colab.research.google.com/github/ychoi-kr/llm-api-prog/blob/main/7_langchain/langchain_custom_loader.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install langchain-core==0.2.41 langchain==0.2.16 langchain-openai==0.1.23 langchain-community==0.2.17 tiktoken

Collecting langchain-core==0.2.41
  Downloading langchain_core-0.2.41-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain==0.2.16
  Downloading langchain-0.2.16-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-openai==0.1.23
  Downloading langchain_openai-0.1.23-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-community==0.2.17
  Downloading langchain_community-0.2.17-py3-none-any.whl.metadata (2.7 kB)
Collecting tiktoken
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core==0.2.41)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.2.0,>=0.1.112 (from langchain-core==0.2.41)
  Downloading langsmith-0.1.135-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain-core==0.2.41)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (

In [2]:
#!pip install faiss-gpu
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m61.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.9.0


In [3]:
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

## 로더 클래스 정의

In [4]:
from typing import List
import requests

from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader


class WikidocsLoader(BaseLoader):
    def __init__(self, book_id: int, base_url="https://wikidocs.net", **kwargs):
        super().__init__(**kwargs)
        self.book_id = book_id
        self.base_url = base_url
        self.headers = {"Content-Type": "application/json"}

    def load(self) -> List[Document]:
        toc = self._get_toc(self.book_id)
        pages = []
        for item in toc:
            page_id = item["id"]
            page_data = self._get_page(page_id)
            document = Document(
                title=page_data["subject"],
                page_content=page_data["content"],
                metadata={
                    'id': page_id,
                    'source': f"{self.base_url}/{page_id}",
                    'title': page_data["subject"]
                }
            )
            pages.append(document)

        return pages

    def _get_toc(self, book_id):
        url = f"{self.base_url}/api/v1/toc/{book_id}"
        response = requests.get(url, headers=self.headers)
        if response.status_code == 200:
            return response.json()
        else:
            raise ValueError("Failed to get table of contents")

    def _get_page(self, page_id):
        url = f"{self.base_url}/api/v1/page/{page_id}"
        response = requests.get(url, headers=self.headers)
        if response.status_code == 200:
            return response.json()
        else:
            raise ValueError("Failed to get page")

In [5]:
book_id = 14316  # 생성AI 프로그래밍 트러블슈팅 가이드
loader = WikidocsLoader(book_id)
documents = loader.load()

## 색인 생성

In [6]:
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

In [7]:
text_splitter = CharacterTextSplitter(chunk_size=600, chunk_overlap=0)
docs = text_splitter.split_documents(documents)



In [8]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [9]:
search_index = FAISS.from_documents(docs, embeddings)

## 질의응답(Retrieval QA with Sources)

In [10]:
from langchain.chains import RetrievalQAWithSourcesChain
from langchain_openai import OpenAI

In [11]:
retrieval_qa_with_sources_chain = RetrievalQAWithSourcesChain.from_chain_type(
    OpenAI(temperature=0), chain_type="stuff", retriever=search_index.as_retriever()
)

In [12]:
def retrieval_qa_with_sources(question):
    response = retrieval_qa_with_sources_chain.invoke(
        {"question": question}, return_only_outputs=True
    )
    if response["sources"]:
        return response["answer"] + "출처: " + response["sources"]
    else:
        return response["answer"]

In [13]:
print(retrieval_qa_with_sources("openai 패키지 구버전과 최신 버전 설치 방법"))

 openai 패키지의 구버전(0.28)과 최신 버전의 설치 방법은 다음과 같다: 
- 구버전(0.28)으로 고정: `pip install -U openai==0.28` (참조: https://wikidocs.net/229554#installing-openai-0.28)
- 최신 버전 설치: `pip install -U openai` (참조: https://wikidocs.net/229554#installing-latest-openai-package)
- 코드 수정하여 신버전의 패키지 사용: openai>=1.0.0에서는 코드를 다음과 같이 수정하면 오류나 경고가 뜨지 않고 잘 실행된다: 
```python
import os
import openai

openai.api_key = os.environ["OPENAI_API_KEY"]

response = openai.[[MARK]]chat.completions[[/MARK]].create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "hello"},
    ],
)
```
참


In [14]:
print(retrieval_qa_with_sources("langchain_community.llms.openai.OpenAI 경고가 떠요"))

 The class `langchain_community.llms.openai.OpenAI` was deprecated in langchain-community 0.0.10 and will be removed in 0.2.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it run `pip install -U langchain-openai` and import as `from langchain_openai import OpenAI`.
출처: https://wikidocs.net/231843, https://wikidocs.net/235770, https://wikidocs.net/233334


## 질의응답(QA with sources)

In [15]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI

In [16]:
template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES").
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
Respond in Korean.

QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER IN KOREAN:"""
PROMPT = PromptTemplate(template=template, input_variables=["summaries", "question"])

qa_with_sources_chain = load_qa_with_sources_chain(
    OpenAI(temperature=0),
    chain_type="stuff",
    prompt=PROMPT
)

See also the following migration guides for replacements based on `chain_type`:
stuff: https://python.langchain.com/v0.2/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/v0.2/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/v0.2/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/v0.2/docs/versions/migrating_chains/map_rerank_docs_chain

  qa_with_sources_chain = load_qa_with_sources_chain(


In [17]:
def qa_with_sources(question):
    return qa_with_sources_chain.invoke(
        {
            "input_documents": search_index.similarity_search(question, k=3),
            "question": question,
        },
        return_only_outputs=True,
    )["output_text"]


In [18]:
print(qa_with_sources('openai 패키지 구버전과 최신 버전 설치 방법'))

 openai 패키지를 설치하는 방법은 두 가지가 있습니다. 첫 번째 방법은 구버전인 0.28로 고정하는 것이고, 두 번째 방법은 최신 버전으로 설치하는 것입니다. 구버전으로 고정하려면 `pip install -U openai==0.28` 명령을 실행하면 됩니다. 최신 버전으로 설치하려면 `pip install -U openai` 명령을 실행하면 됩니다. 하지만 최신 버전에서는 코드를 수정해야 합니다. 따라서 옵션 2를 선택하면 됩니다. 이때 코드를 수정하는 방법은 두 가지가 있습니다. 첫 번째 방법은 다운그레이드하는 것이고, 두 번째 방법은 코드를 수정하는 것입니다. 다운그레이드하는 방법은 [https://wikidocs.net/229554#installing-openai-0.28](https://wikidocs.net


In [19]:
print(qa_with_sources('langchain_community.llms.openai.OpenAI 경고가 떠요'))


SOURCES: https://wikidocs.net/231843, https://wikidocs.net/235770
