<img src=./imgs/model_io.jpg width=35% />

[langchain documents](https://python.langchain.com/docs/modules/model_io/models/chat/llm_chain)

[LangChain-Tutorials](https://github.com/sugarforever/LangChain-Tutorials)

In [1]:
import os
os.environ['OPENAI_API_KEY'] = 'sk-************'

## n-gram overlap

> 根据相似度从待选examples中选择example构建prompt

假定给予两个词:<br>
november<br>
december
---
则unibram是:<br>
n o v e m b e r<br>
d e c e m b e r<br>
---
bigram是: <br>
no ov ve em mb be er<br>
de ec ce em mb be er<br>
---
trigram是: <br>
nov ove vem emb mbe ber<br>
dec ece cem emb mbe ber<br>

In [1]:
november = 'nov ove vem emb mbe ber'.split(' ')
december = 'dec ece cem emb mbe ber'.split(' ')

In [5]:
cup = set(november) |  set(december)
cup

{'ber', 'cem', 'dec', 'ece', 'emb', 'mbe', 'nov', 'ove', 'vem'}

In [6]:
cup.__len__()

9

In [7]:
cap = set(november) &  set(december)
cap

{'ber', 'emb', 'mbe'}

In [8]:
cap.__len__()

3

如果在计算两个序列之间的overlap的时候，比如对于**trigram**而言:

$X \cup Y = 9$

$X \cap Y = 3$


In [1]:
from langchain.prompts import PromptTemplate, FewShotPromptTemplate
from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector

In [2]:
example_prompt = PromptTemplate(
    input_variables=['input', 'output'],
    template="Input:{input}\nOutput: {output}")

In [5]:
# example of a fictional translation task
examples = [
    {"input": "See Spot run.", "output": "Ver correr a Spot."},
    {"input": "My dog barks.", "output": "Mi perro ladra."},
    {"input": "Spot can run.", "output": "Spot puede correr."}
]

In [6]:
example_selector = NGramOverlapExampleSelector(
    # The examples it has available to choose from.
    examples=examples,
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # The threshold, at which selector stops.
    # It is set to -1.0 by default.
    threshold=-0.1,
    # For negative threshold:
    # Selector sorts examples by ngram overlap score, and excludes none.
    # For threashold greater than 1.0:
    # Selector excludes all examples, and returns an empty list.
    # For threshold equal to 0.0:
    # Selector sorts examples by ngram overlap score
    # and excludes those with no ngram overlap with input
    # 对于  <0: 排序,不排除任何示例
    # 对于  >1: 排除所有示例,返回空列表
    # 对于  =0: 排序,排除没有任何重叠的示例.
)

In [7]:
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the Spanish translation of every input",
    suffix="Input: {sentence}\nOutput:",
    input_variables=['sentence'],
)

In [9]:
# An example input with large ngram overlap with "Spot can run."
# and no overlap with "My dog barks."
print(dynamic_prompt.format(sentence="Spot can run fast."))

Give the Spanish translation of every input

Input:Spot can run.
Output: Spot puede correr.

Input:See Spot run.
Output: Ver correr a Spot.

Input:My dog barks.
Output: Mi perro ladra.

Input: Spot can run fast.
Output:


> 按overlap排序, 未排除任何样例

In [10]:
# You can add examples to NGramOverlapExampleSelector as well.
new_example = {"input": "Spot plays fetch.", "output": "Spot juega a buscar."}

example_selector.add_example(new_example)
print(dynamic_prompt.format(sentence="Spot can run fast."))

Give the Spanish translation of every input

Input:Spot can run.
Output: Spot puede correr.

Input:See Spot run.
Output: Ver correr a Spot.

Input:Spot plays fetch.
Output: Spot juega a buscar.

Input:My dog barks.
Output: Mi perro ladra.

Input: Spot can run fast.
Output:


In [11]:
# You can set a threshold at which examples are excluded.
# For example, setting threshold equal to 0.0
# excludes examples with no gram overlaps with input.
# Since "My dog barks." has no ngram overlaps with "Spot can run fast."
# it is excluded.
example_selector.threshold = 0.0
print(dynamic_prompt.format(sentence="Spot can run fast."))

Give the Spanish translation of every input

Input:Spot can run.
Output: Spot puede correr.

Input:See Spot run.
Output: Ver correr a Spot.

Input:Spot plays fetch.
Output: Spot juega a buscar.

Input: Spot can run fast.
Output:


In [13]:
# Setting small nonzero threshold
example_selector.threshold = 0.09
print(dynamic_prompt.format(sentence="Spot can play fetch."))
# 理论上应该排除很多很多

Give the Spanish translation of every input

Input:Spot can run.
Output: Spot puede correr.

Input:Spot plays fetch.
Output: Spot juega a buscar.

Input: Spot can play fetch.
Output:


> 果然值保留了一个, 相似度阈值在0.09

In [14]:
# Setting threshold greater than 1.0
example_selector.threshold = 1.0 + 1e-9
print(dynamic_prompt.format(sentence="Spot can play fetch."))
# 理论上排除所有选线, 保留空列表

Give the Spanish translation of every input

Input: Spot can play fetch.
Output:
