<img src=./imgs/model_io.jpg width=35% />

[langchain documents](https://python.langchain.com/docs/modules/model_io/models/chat/llm_chain)

[LangChain-Tutorials](https://github.com/sugarforever/LangChain-Tutorials)

In [1]:
import os
os.environ['OPENAI_API_KEY'] = 'sk-********'

# MMR, Select by maximal marginal relevance
> 最大边际相关算法, 类似于textrank simhash

In [2]:
from langchain.prompts.example_selector import MaxMarginalRelevanceExampleSelector, SemanticSimilarityExampleSelector
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

In [5]:
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

In [3]:
# there are a lot of examples of a pretend task of creating antonyms
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

In [7]:
example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
    # this is the list of examples available to select from.
    examples=examples,
    # this is the embedding class used to produce embeddings which are used to measure semantic similarity.
    embeddings=OpenAIEmbeddings(),
    # this is the vectorstore class that is used to store embedding and do a similarity search over。
    vectorstore_cls=FAISS,
    # this is the num of examples to preduce.
    k=2
)

In [8]:
mmr_prompt = FewShotPromptTemplate(
    # we provide an exampleselector instead of examples
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",      # 注意哦，这里规定了 末尾的 字符。
    input_variables=['adjective']
)

> 输入是一个 感觉：worried， 所以第一个选择了 感觉 happy sad ， k=2， 所以还保留了一个: 多风的，和平静的。

In [9]:
# Input is a feeling, so should select the happy/sad example as the first one
print(mmr_prompt.format(adjective="worried"))

Give the antonym of every input

Input: happy
Output: sad

Input: windy
Output: calm

Input: worried
Output:


In [4]:
# let's compare this to what we would just get if we went solely off of similarity,
# by using SemanticSimilarityExampleSelector instead of MaxMarginalRelevanceExampleSelector.

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # this is the list of examples available to select from
    examples=examples,
    # this is the embedding class used to produce embeddings which are used to measure semantic similarity
    embeddings=OpenAIEmbeddings(),
    # this is the vectorstore cls that is used to store the embedding and do a similarity search over.
    vectorstore_cls=FAISS,
    k=2
)

In [7]:
similar_prompt = FewShotPromptTemplate(
    # we provide ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=['adjective']
)

In [8]:
print(  similar_prompt.format(adjective='worried')  )

Give the antonym of every input

Input: happy
Output: sad

Input: sunny
Output: gloomy

Input: worried
Output:


> `MaxMarginalRelevanceExampleSelector` 继承了 `SemanticSimilarityExampleSelector`  <br>
> 1. `SemanticSimilarityExampleSelector`使用了 similarity_search 相似度查找  <br>
> 2. [MaxMarginalRelevanceExampleSelector-引用而非说明](https://arxiv.org/pdf/2211.13892.pdf) 使用了 max_marginal_relevance_search  进行相似度查找， 该算法是对similarity_search的优化.

>  MMR: 它通过找到与输入具有最高余弦相似度的示例，并在此基础上进行迭代添加示例，同时对其与已选择示例的相似度进行惩罚。<br>
>  个人理解是： 既保持了相似度， 有防止已选择示例过于雷同，而缺乏了多样性。

```python

class SemanticSimilarityExampleSelector(BaseExampleSelector, BaseModel):
     """Example selector that selects examples based on SemanticSimilarity."""

     def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
         """Select which examples to use based on semantic similarity."""
         # Get the docs with the highest similarity.
         if self.input_keys:
             input_variables = {key: input_variables[key] for key in self.input_keys}
         query = " ".join(sorted_values(input_variables))
         example_docs = self.vectorstore.similarity_search(query, k=self.k)
         # Get the examples from the metadata.
         # This assumes that examples are stored in metadata.
         examples = [dict(e.metadata) for e in example_docs]
         # If example keys are provided, filter examples to those keys.
         if self.example_keys:
             examples = [{k: eg[k] for k in self.example_keys} for eg in examples]
         return examples

```

```python
class MaxMarginalRelevanceExampleSelector(SemanticSimilarityExampleSelector):
    """ExampleSelector that selects examples based on Max Marginal Relevance.

    This was shown to improve performance in this paper:
    https://arxiv.org/pdf/2211.13892.pdf
    """

    fetch_k: int = 20
    """Number of examples to fetch to rerank."""

    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on semantic similarity."""
        # Get the docs with the highest similarity.
        if self.input_keys:
            input_variables = {key: input_variables[key] for key in self.input_keys}
        query = " ".join(sorted_values(input_variables))
        example_docs = self.vectorstore.max_marginal_relevance_search(
            query, k=self.k, fetch_k=self.fetch_k
        )
        # Get the examples from the metadata.
        # This assumes that examples are stored in metadata.
        examples = [dict(e.metadata) for e in example_docs]
        # If example keys are provided, filter examples to those keys.
        if self.example_keys:
            examples = [{k: eg[k] for k in self.example_keys} for eg in examples]
        return examples
```