## Cohere reranker
```Cohere is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.```

This notebook shows how to use [Cohere's rerank endpoint](https://docs.cohere.com/docs/overview) in a retriever. This builds on top of ideas in the [ContextualCompressionRetriever](https://python.langchain.com/docs/how_to/contextual_compression/).

In [1]:
# get a new token: https://dashboard.cohere.ai/

import getpass
import os
from dotenv import load_dotenv
load_dotenv()

if "COHERE_API_KEY" not in os.environ:
    os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

In [2]:
# Helper function for printing docs


def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

### Set up the base vector store retriever
Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can set up the retriever to retrieve a high number (20) of docs.

In [3]:
from langchain_community.document_loaders import TextLoader
# from langchain_community.embeddings import CohereEmbeddings
from langchain_cohere import CohereEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

documents = TextLoader("../../../text_files/state_of_the_union.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(
    texts, CohereEmbeddings(model="embed-english-v3.0") # 不一定要用Cohere的embedding
).as_retriever(search_kwargs={"k": 20})

# query = "What did the president say about Ketanji Brown Jackson"
query = "What is the plan for the economy?"
docs = retriever.invoke(query)
pretty_print_docs(docs)

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/sean_liu/Library/Application Support/sagemaker/config.yaml
Document 1:

So that’s my plan. It will grow the economy and lower costs for families. 

So what are we waiting for? Let’s get this done. And while you’re at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation.  

My plan will not only lower costs to give families a fair shot, it will lower the deficit.
----------------------------------------------------------------------------------------------------
Document 2:

More infrastructure and innovation in America. 

More goods moving faster and cheaper in America. 

More jobs where you can earn a good living in America. 

And instead of relying on foreign supply chains, let’s make it in America. 

Economists call it “increasing the productive capacity

## Doing reranking with CohereRerank
Now let's wrap our base retriever with a `ContextualCompressionRetriever`. We'll add an `CohereRerank`, uses the Cohere rerank endpoint to rerank the returned results. Do note that it is mandatory to specify the model name in CohereRerank!

In [6]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
# from langchain_community.llms import Cohere
from langchain_openai import ChatOpenAI

# llm = Cohere(temperature=0)
llm = ChatOpenAI()

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever
    )

# compressed_docs = compression_retriever.invoke("What did the president say about Ketanji Jackson Brown")
compressed_docs = compression_retriever.invoke("What is the plan for the economy?")

pretty_print_docs(compressed_docs)

Document 1:

So that’s my plan. It will grow the economy and lower costs for families. 

So what are we waiting for? Let’s get this done. And while you’re at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation.  

My plan will not only lower costs to give families a fair shot, it will lower the deficit.
----------------------------------------------------------------------------------------------------
Document 2:

More infrastructure and innovation in America. 

More goods moving faster and cheaper in America. 

More jobs where you can earn a good living in America. 

And instead of relying on foreign supply chains, let’s make it in America. 

Economists call it “increasing the productive capacity of our economy.” 

I call it building a better America. 

My plan to fight inflation will lower your costs and lower the deficit.
----------------------------------------------------------------------------------------------------
Document 3:

Th

You can of course use this retriever within a QA pipeline

In [7]:
from langchain.chains.retrieval_qa.base import RetrievalQA

from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

chain = RetrievalQA.from_chain_type(
    llm=llm, retriever=compression_retriever
)


In [9]:
chain.invoke({"query": query})

{'query': 'What did the president say about Ketanji Brown Jackson',
 'result': "The President mentioned that Ketanji Brown Jackson is one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence. He highlighted her background as a former top litigator in private practice, a former federal public defender, and coming from a family of public school educators and police officers. He also emphasized her reputation as a consensus builder and noted the broad support she has received since her nomination."}