# LangChain 快速入門
[LangChain](https://github.com/langchain-ai/langchain) 是針對大型語言模型 (LLM) 應用的相關實作加以抽象化的程式語言框架，能夠支援多種 LLM 之外，鏈 (Chain) 的概念不僅僅限於單一 LLM 之呼叫，還能納入一系列相關工具的呼叫。藉由 LangChain 所提供的 Chain 之標準介面，目前已經有大量工具與資料來源能夠整合至 LangChain 框架之內。

In [45]:
%pip install langchain openai python-dotenv wikipedia

Note: you may need to restart the kernel to use updated packages.


## 1. 運用輔助函式庫，從環境變數取得 Azure OpenAI API 相關資訊

開始使用 LangChain，Azure OpenAI Service 的主要參數都是透過環境變數設定完成，只需要指定模型之 Deployment_Name。其餘連線必要資訊皆透過以下四個環境變數指定即可

+ OPENAI_API_TYPE
+ OPENAI_API_VERSION
+ OPENAI_API_BASE
+ OPENAI_API_KEY

In [69]:
import os
from dotenv import load_dotenv
from langchain.llms import AzureOpenAI
from langchain.document_loaders import WikipediaLoader


# 載入環境變數
load_dotenv()

# 設定呼叫 OpenAI API 所需連線資訊
model = os.getenv("DEPLOYMENT_NAME")

## 2. 最簡單的自動完成範例


In [71]:
# 目前 LangChain 的 Azure OpenAI LLM 是封裝 Azure OpenAI Completeion API 而成的，搭配 GPT-3.5 仍會有自動完成無法結束的問題

llm = AzureOpenAI(deployment_name=model,model_name='gpt-35-turbo')
llm ('Q:世界最高的山是什麼山? \nA:',stop ="\n")

'喜馬拉雅山 '

In [72]:
print (llm)

[1mAzureOpenAI[0m
Params: {'deployment_name': 'gpt-35-turbo', 'model_name': 'gpt-35-turbo', 'temperature': 0.7, 'max_tokens': 256, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'request_timeout': None, 'logit_bias': {}}


使用 [AzureChatOpenAI](https://python.langchain.com/docs/integrations/chat/azure_chat_openai) 則會使用 ChatComplete API 來完成對話。

In [16]:
from langchain.chat_models import AzureChatOpenAI
from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
chat = AzureChatOpenAI(deployment_name=model,model_name='gpt-35-turbo')
messages = [
    SystemMessage(
        content="你是一個說正體中文的機器人"
    ),
    HumanMessage(
        content="世界最高的山是什麼山?"
    ),
]
chat(messages)

AIMessage(content='世界最高的山是珠穆朗瑪峰，位於尼泊爾和中國的邊界線上，海拔8,848.86公尺。', additional_kwargs={}, example=False)

## 3. 使用文件載入工具載入 Wikipedia 內容

In [17]:
# 查詢 Wikipedia 台灣條目，只載入兩篇內容
docs = WikipediaLoader(query="台灣", load_max_docs=2).load()
len(docs)

2

In [18]:
# 顯示第一篇內容的 Metadata
docs[0].metadata 


{'title': 'Taiwan',
 'summary': 'Taiwan, officially the Republic of China (ROC), is a country in East Asia. It is located at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People\'s Republic of China (PRC) to the northwest, Japan to the northeast, and the Philippines to the south. The territories controlled by the ROC consist of 168 islands with a combined area of 36,193 square kilometres (13,974 square miles). The main island of Taiwan, also known as Formosa, has an area of 35,808 square kilometres (13,826 square miles), with mountain ranges dominating the eastern two-thirds and plains in the western third, where its highly urbanized population is concentrated. The capital, Taipei, forms along with New Taipei City and Keelung, the largest metropolitan area in Taiwan. Other major cities include Taoyuan, Taichung, Tainan, and Kaohsiung. With around 23.9 million inhabitants, Taiwan is among the most densely populated countries in the world.\nTai

In [19]:
# 顯示第一篇內容前400字
content = docs[0].page_content[:400]  
print(content)

Taiwan, officially the Republic of China (ROC), is a country in East Asia. It is located at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the northeast, and the Philippines to the south. The territories controlled by the ROC consist of 168 islands with a combined area of 36,193 square kilometres


## 4 使用載入的文件詢問問題

使用之前已經建立的 llm 來詢問 Wikipedia 下載的文件內容

In [23]:
prompt = "問題:依據以下事實回答台灣面積多大'? \n事實:"+content+"\n答案:"
print (prompt)
llm (prompt ,max_tokens=1000,stop ="\n")

問題:依據以下事實回答台灣面積多大'? 
事實:Taiwan, officially the Republic of China (ROC), is a country in East Asia. It is located at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the northeast, and the Philippines to the south. The territories controlled by the ROC consist of 168 islands with a combined area of 36,193 square kilometres
答案:


'36,193平方公里'

也可以利用 prompt template 來增加彈性，並搭配 predict 來執行自動完成

In [36]:
prompt_template = PromptTemplate(input_variables=["doc_content"], template="問題:依據以下事實回答台灣面積多大'? \n事實:{doc_content}\n答案:")
prompt_formatted_str: str = prompt_template.format(doc_content = content)
prediction = llm.predict(prompt_formatted_str,max_tokens=1000,stop ="\n")
print (prediction)


36,193平方公里


## 5. 改由使用 Chain 的方式來處理問題
接下來我們開始運用各式 Chain 來簡化應用程式撰寫複雜度，下面程式碼是使用專為處理問答的 QA Chain 搭配前面已經建立好的 chat 來處理下載的 Wikipedia 條目文件 docs 之內容問答

In [22]:
from langchain.chains.question_answering import load_qa_chain

chain = load_qa_chain(chat, chain_type="stuff")
query = "台灣面積有多大?"
chain.run(input_documents=docs, question=query, return_only_outputs=True)

'台灣面積為36,193平方公里（13,974平方英里），由168個島嶼組成。其中主要的島嶼是台灣，也被稱為福爾摩沙，面積為35,808平方公里（13,826平方英里）。'

LangChain 擁有眾多功能豐富之 Chain，例如我們可以使用 [LLMMathChain](https://python.langchain.com/docs/use_cases/code_writing/llm_math) 為現有大型語言模型增添數學運算的能力，利用大型語言模型解析自然語言，再搭配 Python 來進行數學運算。 

In [49]:
from langchain import LLMMathChain

prompt = "十八的三次方為多少?" 
llm_math = LLMMathChain.from_llm(llm=chat, verbose=True)
llm_math.run( prompt)



[1m> Entering new LLMMathChain chain...[0m
十八的三次方為多少?[32;1m[1;3m```text
18**3
```
...numexpr.evaluate("18**3")...
[0m
Answer: [33;1m[1;3m5832[0m
[1m> Finished chain.[0m


'Answer: 5832'