## Text 파일 읽고,  질문에 답변하기

In [1]:
# API KEY를 환경변수로 관리하기 위한 설정 파일
from dotenv import load_dotenv
# API KEY 정보로드
load_dotenv()

True

In [2]:
from langchain_teddynote import logging

# 프로젝트 이름을 입력합니다.
logging.langsmith("ragstudy")

LangSmith 추적을 시작합니다.
[프로젝트명]
ragstudy


In [3]:
# 단계 1: 문서 로드(Load Documents)
from langchain.document_loaders import TextLoader
documents = TextLoader("./data/AI.txt").load()

In [4]:
# 단계 2: 문서 분할(Split Documents)
from langchain.text_splitter import RecursiveCharacterTextSplitter
def split_docs(documnets, chunk_size=1000, chunk_overlap=20):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = text_splitter.split_documents(documents)
    return docs

docs = split_docs(documents)

In [5]:
print(len(docs))
print(docs[0].page_content)

3
Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals. It is a field of study in computer science that develops and studies intelligent machines. Such machines may be called AIs.

AI technology is widely used throughout industry, government, and science. Some high-profile applications are: advanced web search engines (e.g., Google Search), recommendation systems (used by YouTube, Amazon, and Netflix), understanding human speech (such as Google Assistant, Siri, and Alexa), self-driving cars (e.g., Waymo), generative and creative tools (ChatGPT and AI art), and superhuman play and analysis in strategy games (such as chess and Go).[1]


In [6]:
# 단계 3: 임베딩(Embedding) 생성
from langchain_openai import OpenAIEmbeddings
# OpenAI의 "text-embedding-3-small" 모델을 사용하여 임베딩을 생성합니다.
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [7]:
# 단계 4: DB 생성(Create DB) 및 저장
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(docs, embeddings)

In [8]:
# 단계 5: 검색기(Retriever) 생성
# 문서에 포함되어 있는 정보를 검색하고 생성합니다.
retriever = vectorstore.as_retriever()
# 검색기에 쿼리를 날려 검색된 chunk 결과를 확인합니다.
retriever.invoke("What were two major advancements in AI after 2012 and 2017 that increased interest and funding in the field?")

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


[Document(metadata={'source': './data/AI.txt'}, page_content='Alan Turing was the first person to carry out substantial research in the field that he called Machine Intelligence.[2] Artificial intelligence was founded as an academic discipline in 1956.[3] The field went through multiple cycles of optimism[4][5] followed by disappointment and loss of funding.[6][7] Funding and interest vastly increased after 2012 when deep learning surpassed all previous AI techniques,[8] and after 2017 with the transformer architecture.[9] This led to the AI spring of the 2020s, with companies, universities, and laboratories overwhelmingly based in the United States pioneering significant advances in artificial intelligence.[10]'),
 Document(metadata={'source': './data/AI.txt'}, page_content='Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals. It is a field of study in computer science that develops and studies intelligent machi

In [9]:
# 단계 6: 프롬프트 생성(Create Prompt)
from langchain_core.prompts import PromptTemplate 
# 프롬프트를 생성합니다.
prompt = PromptTemplate.from_template(
    """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Answer in Korean.

#Question: 
{question} 
#Context: 
{context} 

#Answer:"""
)

In [10]:
# 단계 7: 언어모델(LLM) 생성
from langchain_openai import ChatOpenAI
# 모델(LLM) 을 생성합니다.
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

In [11]:
# 단계 8: 체인(Chain) 생성
from langchain_core.output_parsers import StrOutputParser 
from langchain_core.runnables import RunnablePassthrough 
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [12]:
# # 체인 실행(Run Chain)
# 문서에 대한 질의를 입력하고, 답변을 출력합니다.
question = "What were two major advancements in AI after 2012 and 2017 that increased interest and funding in the field?"
response = chain.invoke(question)
print(response)

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


2012년 이후 딥러닝이 이전의 모든 AI 기술을 능가한 것과 2017년 이후 트랜스포머 아키텍처가 도입된 것이 AI 분야에 대한 관심과 자금 지원을 크게 증가시킨 두 가지 주요 발전입니다.
