<img src=./imgs/model_io.jpg width=35% />

[langchain documents](https://python.langchain.com/docs/modules/model_io/models/chat/llm_chain)

> Language Models 分为 LLM 和 Chat Model. <br>
> 比如GPT-达芬奇-3 根据 聊天语料 微调后 成为 chatGPT3.5<br>
> "predict" for LLMs and "predict messages" for chat models

In [4]:
from langchain.llms import OpenAI

In [5]:
llm = OpenAI()

#### string in string out

In [6]:
llm("Tell me a joke")

'\n\nQ: What did the fish say when it hit the wall?\nA: Dam!'

#### batch call, richer outputs

In [7]:
# llm_result = llm.generate(["Tell me a joke", "Tell me a poem"]*15)
# len(llm_result.generations)
# llm_result.generations[0]
# llm_result.llm_output
# ```
# {'token_usage': {'completion_tokens': 3903,
#       'total_tokens': 4023,
#       'prompt_tokens': 120}}
# ```

#### langchain - async

In [8]:
!pip install asyncio

Collecting asyncio
  Downloading asyncio-3.4.3-py3-none-any.whl (101 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.8/101.8 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: asyncio
Successfully installed asyncio-3.4.3


In [9]:
import time
import asyncio

from langchain.llms import OpenAI

In [10]:
def generate_serially():
    llm = OpenAI(temperature=0.9)
    for _ in range(10):
        resp = llm.generate(['Hello, how are you?'])
        print(resp.generations[0][0].text)

In [11]:
async def async_geneate(llm):
    resp = await llm.agenerate(['Hello, how are you?'])
    print(resp.generations[0][0].text)

In [12]:
async def generate_concurrently():
    llm = OpenAI(temperature=0.9)
    tasks = [async_geneate(llm) for _ in range(10)]
    await asyncio.gather(*tasks)

In [13]:
import time

In [15]:
# s = time.perf_counter()
# # If running this outside of Jupyter, use asyncio.run(generate_concurrently())
# await generate_concurrently()
# elapsed = time.perf_counter() - s
# print("\033[1m" + f"Concurrent executed in {elapsed:0.2f} seconds." + "\033[0m")

# s = time.perf_counter()
# generate_serially()
# elapsed = time.perf_counter() - s
# print("\033[1m" + f"Serial executed in {elapsed:0.2f} seconds." + "\033[0m")

> **由于并发限制,这里演示不了**

#### Custom LLM

> 定制的LLM包装器(装饰器)wrapper, 如果你要使用自己的私有LLM.<br>
> 1. 必须要实现的接口:_call函数,  输入string 一些可选的停用词, 返回一个string<br>
> 2. 可选的接口: _identifying_params, 打印此类, 返回一个dictionary<br>

> 让我们实现一个简单的LLM,仅 返回输入string的前n个字符

In [17]:
from typing import Any, List, Mapping, Optional

from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM

In [19]:
class CustomLLM(LLM):
    n : int
    
    @property
    def _llm_type(self) -> str:
        return "custom"
    
    def _call(self, prompt: str, stop: Optional[List[str]] = None, run_manager: Optional[CallbackManagerForLLMRun] = None, **kwargs: Any) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return prompt[: self.n]
    
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters"""
        return {"n": self.n}

In [20]:
llm = CustomLLM(n=10)

In [21]:
llm("this is a footbar thing")

'this is a '

> 看看定制化的print

In [22]:
print(llm)

[1mCustomLLM[0m
Params: {'n': 10}


> _llm_type <br>
> n

#### Fake LLM

> 假的LLM, 这个用于测试. 允许你模拟调用LLM, 并模拟返回结果以特定的方式

In [23]:
from langchain.llms.fake import FakeListLLM

In [24]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

In [25]:
tools = load_tools(['python_repl'])

In [27]:
responses = ["Action: Python REPL\nAction Input: print(2 + 2)", "Final Answer: 4"]
llm = FakeListLLM(responses=responses)

In [28]:
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

In [29]:
agent.run("whats 2 + 2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: Python REPL
Action Input: print(2 + 2)[0m
Observation: Python REPL is not a valid tool, try one of [Python_REPL].
Thought:[32;1m[1;3mFinal Answer: 4[0m

[1m> Finished chain.[0m


'4'

#### Human input LLM

> 跟Fake LLM类似,langchain提供一个pseudo LLM class 用来测试,debug, 或者教育. <br>
> 这个允许你模拟调用LLM并模拟**如果人类收到这个prompt会如何反应**

In [1]:
from langchain.llms.human import HumanInputLLM

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

In [31]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25ldone
[?25h  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11680 sha256=50930d6572680f6d9e41f1dcdb64fe801a3a9931838264e1c8c21f912aed5f5e
  Stored in directory: /home/jupyter/.cache/pip/wheels/c2/46/f4/caa1bee71096d7b0cdca2f2a2af45cacf35c5760bee8f00948
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [2]:
tools = load_tools(['wikipedia'])

In [3]:
llm = HumanInputLLM(prompt_func=lambda prompt : print(f"\n===PROMPT====\n{prompt}\n=====END OF PROMPT======"))

In [4]:
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

In [None]:
agent.run("What is 'Bocchi the Rock!'?")



[1m> Entering new AgentExecutor chain...[0m

===PROMPT====
Answer the following questions as best you can. You have access to the following tools:

Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Wikipedia]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: What is 'Bocchi the Rock!'?
Thought:


#### Caching

> 主打一个省钱:<br>
> 1. 减少调用接口的次数<br>
> 2. 如果你多次调用一个Completion, 加速并减少调用次数

#pip install --upgrade langchain

In [3]:
from langchain.globals import set_llm_cache
from langchain.llms import OpenAI

> 为了使缓存可见, 我们使用一个低端模型: davici-003

In [4]:
llm = OpenAI(model_name='text-davinci-002', n=2, best_of=2)

##### **In Memory Cache**

In [5]:
from langchain.cache import InMemoryCache
set_llm_cache(InMemoryCache())

In [4]:
!pip install --upgrade sqlalchemy

Collecting sqlalchemy
  Downloading SQLAlchemy-2.0.22-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m32.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: sqlalchemy
  Attempting uninstall: sqlalchemy
    Found existing installation: SQLAlchemy 1.4.49
    Uninstalling SQLAlchemy-1.4.49:
      Successfully uninstalled SQLAlchemy-1.4.49
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
feast 0.33.1 requires SQLAlchemy[mypy]<2,>1, but you have sqlalchemy 2.0.22 which is incompatible.[0m[31m
[0mSuccessfully installed sqlalchemy-2.0.22


In [6]:
%time
# 第一次,并不在缓存, 所以时间比较长
llm.predict("Tell me a joke")

CPU times: user 19 µs, sys: 0 ns, total: 19 µs
Wall time: 28.8 µs


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

In [7]:
%time
# 第二次,他就很快了
llm.predict("Tell me a joke")

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 5.25 µs


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

##### Sqlite cache

In [8]:
rm .langchain.db

rm: cannot remove '.langchain.db': No such file or directory


> 我们可以做使用Sqlite cache做相同的事情

In [9]:
from langchain.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path='.langchain.db'))

In [10]:
%time
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.72 µs


"\n\nQ: Why don't scientists trust atoms?\nA: Because they make up everything"

In [11]:
%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 4.29 µs


"\n\nQ: Why don't scientists trust atoms?\nA: Because they make up everything"

##### optional caching in chains

In [12]:
llm = OpenAI(model_name='text-davinci-002')
no_cache_llm = OpenAI(model_name='text-davinci-002', cache=False)

In [13]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain

In [18]:
text_splitter = CharacterTextSplitter()

In [19]:
with open('./input/state_of_the_union.txt') as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)

In [21]:
texts.__len__()

11

In [22]:
from langchain.docstore.document import Document
docs = [Document(page_content=t) for t in texts[:3]]
from langchain.chains.summarize import load_summarize_chain

In [23]:
chain = load_summarize_chain(llm, chain_type='map_reduce', reduce_llm=no_cache_llm)

In [24]:
%time
chain.run(docs)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 4.77 µs


'\n\nThe United States is working with European allies to respond to Russian aggression in Ukraine. Russian assets will be seized, Russian flights will be banned from American airspace, and military, economic, and humanitarian aid will be provided to Ukraine. These actions are designed to pressure Russia into withdrawing from Ukraine.'

> When we run it again, we see that it runs substantially faster but the final answer is different. This is due to caching at the map steps, but not at the reduce step.

> 变快了, 但结果不一样了, 因为我们仅仅缓存了map阶段, 没有缓存reduce阶段, 所以并没有完全"固话"

In [25]:
chain.run(docs)

"\n\nThe United States and its European allies are taking action against Russia in response to Putin's aggression in Ukraine. America will provide military, economic, and humanitarian aid to Ukraine and will pressure Russia until it withdraws from Ukraine."

In [26]:
rm .langchain.db sqlite.db

rm: cannot remove 'sqlite.db': No such file or directory


#### serialization

In [27]:
from langchain.llms import OpenAI
from langchain.llms.loading import load_llm

In [28]:
cat ./input/llm.json

    {
        "model_name": "text-davinci-003",
        "temperature": 0.7,
        "max_tokens": 256,
        "top_p": 1.0,
        "frequency_penalty": 0.0,
        "presence_penalty": 0.0,
        "n": 1,
        "best_of": 1,
        "request_timeout": null,
        "_type": "openai"
    }

In [29]:
llm = load_llm('./input/llm.json')

In [31]:
cat ./input/llm.yaml

    _type: openai
    best_of: 1
    frequency_penalty: 0.0
    max_tokens: 256
    model_name: text-davinci-003
    n: 1
    presence_penalty: 0.0
    request_timeout: null
    temperature: 0.7
    top_p: 1.0

In [30]:
llm = load_llm('./input/llm.yaml')

**saving**

In [32]:
llm.save('llm.json')

#### Streaming

一次返回一个字符, 而不是一次性返回.<br>
**适合场景**:
> 1. 优势: 快速展示<br>
> 2. 优势: 每产生一个字符就处理一下.

**实现方式**
> 继承CallbackHandler接口,并实现on_llm_new_token方法.

In [33]:
from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

In [46]:
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0, cache=False)

In [47]:
resp = llm("Write me a song about sparkling water.")



Verse 1
I'm sippin' on sparkling water,
It's so refreshing and light,
It's the perfect way to quench my thirst
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

Verse 2
I'm sippin' on sparkling water,
It's so bubbly and bright,
It's the perfect way to cool me down
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

Verse 3
I'm sippin' on sparkling water,
It's so light and so clear,
It's the perfect way to keep me cool
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

> **使用generate 既可以是streaming 也可以获取最终的LLMResult**

In [48]:
llm.generate(["Tell me a joke"])



Q: What did the fish say when it hit the wall?
A: Dam!

LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'}, run=[RunInfo(run_id=UUID('df8e6732-3bc0-4b0d-b6db-315077417907'))])

> LLMResult中的output的 **streaming模式 还不支持token_usage: 令牌使用**

#### Tracking token usage 

> 如何跟踪特定调用的令牌使用情况, 目前只支持OpenAI

In [51]:
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback

In [52]:
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

In [53]:
with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    print(cb)

Tokens Used: 0
	Prompt Tokens: 0
	Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0


**Anything inside the context manager will get tracked. Here's an example of using it to track multiple calls in sequence.**

上下文管理器: context manager 内部的调用都会被计数

In [54]:
with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    result2 = llm("Tell me a joke")
    print(cb.total_tokens)

0


**If a chain or agent with multiple steps in it is used, it will track all those steps.**

**如果一个pipeline 有许多step, 都会被跟踪**

In [55]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

In [61]:
llm = OpenAI(temperature=0, cache=False)

In [62]:
# tools = load_tools(["serpapi", "llm-math"], llm=llm)
tools = load_tools(["llm-math"], llm=llm)

> need serpapi_api_key<br>
> llm-math

In [63]:
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [65]:
# with get_openai_callback() as cb:
#     response = agent.run(
#         "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
#     )
#     print(f"Total Tokens: {cb.total_tokens}")
#     print(f"Prompt Tokens: {cb.prompt_tokens}")
#     print(f"Completion Tokens: {cb.completion_tokens}")
#     print(f"Total Cost (USD): ${cb.total_cost}")

>  目前使用的ChatGPT3.5免费的, 所以不展示了