<a href="https://colab.research.google.com/github/shhuangmust/AI/blob/master/RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 安裝需要的套件
* langchain：基本的langchain套件
* chromadb：向量儲存資料庫

In [1]:
!pip install langchain
!pip install langchain-openai
!pip install langchain-community
!pip install chromadb



## 將環境變數讀入

In [2]:
# 導入 ColabSecrets 用戶資料模組
from google.colab import userdata

# 設置 OpenAI API key
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

### 先套用OpenAI的API
使用`langchain`中的`OpenAI`套件載入大型語言模型，載入OpenAi模型，並且設定最大輸出長度為1024。此部分會收費

In [3]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0.3,
    max_tokens=512,
    )

  llm = ChatOpenAI(


### 測試沒有RAG時候的問答

In [4]:
llm.invoke("工專時期第3任校長是誰?")

AIMessage(content='工專時期第3任校長是陳炳煌。', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 23, 'total_tokens': 45, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-411bf0c5-55e5-4c71-b54c-8eed59d2f8f9-0')

In [5]:
llm.invoke("明新科技大學的校訓是什麼?")

AIMessage(content='明新科技大學的校訓是「誠信、創新、服務」。', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 26, 'total_tokens': 54, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5b8c31d3-fccb-424f-b316-e0b51c5142f6-0')

## 利用Lanchain與Chroma向量資料庫，建立RAG問答

In [6]:
!wget https://raw.githubusercontent.com/shhuangmust/AI/refs/heads/113-1/must.txt
!wget https://raw.githubusercontent.com/shhuangmust/AI/refs/heads/master/2028president.txt

--2024-12-27 07:12:50--  https://raw.githubusercontent.com/shhuangmust/AI/refs/heads/113-1/must.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26777 (26K) [text/plain]
Saving to: ‘must.txt’


2024-12-27 07:12:50 (140 MB/s) - ‘must.txt’ saved [26777/26777]

--2024-12-27 07:12:50--  https://raw.githubusercontent.com/shhuangmust/AI/refs/heads/master/2028president.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 309 [text/plain]
Saving to: ‘2028president.txt’


2024-12-27 07:12:50 (5.51 MB/s) - ‘2028president.txt’ saved [309/3

In [7]:
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI,VectorDBQA
from langchain.document_loaders import DirectoryLoader

# 載入資料夾中所有TXT檔案
loader = DirectoryLoader('/content/', glob='**/*.txt')

# 將資料轉成document物佚，每個檔案會為作為一個document
documents = loader.load()

# 初始化載入器
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)

# 切割加载的 document
split_docs = text_splitter.split_documents(documents)

# 初始化 openai 的 embeddings 物件
embeddings = OpenAIEmbeddings()

# 將 document 透過 openai 的 embeddings 物件計算 embedding向量資料暫時存入 Chroma 向量資料庫用於後續的搜尋
docsearch = Chroma.from_documents(split_docs, embeddings)

# 建立回答物件
qa = VectorDBQA.from_chain_type(llm=llm, chain_type="stuff", vectorstore=docsearch, return_source_documents=True)





In [8]:
# 進行回答
result = qa({"query": "工專時期第3任校長是誰?"})
print(result['result'])

  result = qa({"query": "工專時期第3任校長是誰?"})


工專時期第三任校長是林世明。


In [9]:
result = qa({"query": "現行明新科技大學之校訓?"})
print(result['result'])

現行明新科技大學的校訓是「堅毅、求新、創造」。


In [10]:
result = qa({"query": "2028總統候選人有誰?"})
print(result['result'])

2028年台灣總統候選人有以下四人：
1. 寶可夢黨｜後藤一里
2. 多利多滋黨｜伊地知虹夏
3. 皮克敏黨｜山田涼
4. 陽光黨｜喜多郁代
