## Retrieval-Augmented Generation (RAG)

<a target="_blank" href="https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/RAG.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Retrieval-Augmented Generation (RAG) is a powerful and popular technique that applies specialized knowledge to large language models (LLMs). However, traditional RAG methods tend to have increasingly long prompts, sometimes exceeding **40k**, which can result in high financial and latency costs. Moreover, the decreased information density within the prompts can lead to performance degradation in LLMs, such as the "lost in the middle" issue.

<center><img width="800" src="https://github.com/microsoft/LLMLingua/blob/main/images/LongLLMLingua_Motivation.png?raw=1"></center>

To address this, we propose [**LongLLMLingua**](https://aclanthology.org/2024.acl-long.91/), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the  <font color='red'>**red line**</font>, which significantly improves the original curve:

- Coarse-grained compression through document-level perplexity;
- Fine-grained compression of the remaining text using token perplexity;

Instead of fighting against positional effects, we aim to utilize them to our advantage through document reordering, as illustrated by the  <font color='green'>**green line**</font>. In this approach, the most critical passages are placed at the beginning and the end of the context. Furthermore, the entire process becomes more **cost-effective and faster** since it only requires handling **1/4** of the original context.

### NaturalQuestions Multi-document QA

Next, we will demonstrate the use of LongLLMLingua on the NaturalQuestions dataset, which effectively alleviates the "lost in the middle" issue. This dataset closely resembles real-world RAG scenarios, as it first employs the Contriever retrieval system to recall 20 relevant documents (including 1 ground truth and 19 related documents), and then answers the respective questions based on the prompts composed of these 20 documents.

The original dataset can be found in https://github.com/nelson-liu/lost-in-the-middle/tree/main/qa_data.

In [1]:
# Install dependency.
## Lost in the middle
!git clone https://github.com/nelson-liu/lost-in-the-middle
!cd lost-in-the-middle && echo "xopen" > requirements.txt && pip install -e .
## LLMLingu
!pip install llmlingua
!pip install openai==0.28

fatal: destination path 'lost-in-the-middle' already exists and is not an empty directory.
Obtaining file:///content/lost-in-the-middle
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: lost_in_the_middle
  Building editable for lost_in_the_middle (pyproject.toml) ... [?25l[?25hdone
  Created wheel for lost_in_the_middle: filename=lost_in_the_middle-0.0.0-0.editable-py3-none-any.whl size=4584 sha256=4499dc438a7ff28c67c522dca07d48f8e65f9b216293d70c74ba3319a004712d
  Stored in directory: /tmp/pip-ephem-wheel-cache-ysg013kw/wheels/7c/ea/c2/8388b6e7791081a662a929a1b75ad30754d1a123421c304827
Successfully built lost_in_the_middle
Installing collected packages: lost_in_the_middle
  Attempting uninstall: lost_in_the_middle
    Found existing installa

In [2]:
# Using the OAI
import openai

openai.api_key = "sk-O9xzSAQ4mKSeWnH3Ae50388dC08548239e80E591668168Ac"

In [3]:
# or Using the AOAI
import openai

openai.api_key = "sk-O9xzSAQ4mKSeWnH3Ae50388dC08548239e80E591668168Ac"
openai.api_base = "https://api.gpt.ge/v1"

### Setup Data

In [4]:
import json
from xopen import xopen
from copy import deepcopy
from tqdm import tqdm
from lost_in_the_middle.prompting import (
    Document,
    get_qa_prompt,
)

datasets = []
path = "./lost-in-the-middle/qa_data/20_total_documents/nq-open-20_total_documents_gold_at_9.jsonl.gz"
with xopen(path) as f:
    for ii, jj in tqdm(enumerate(f), total=2655):
        input_example = json.loads(jj)
        question = input_example["question"]
        documents = []
        for ctx in deepcopy(input_example["ctxs"]):
            documents.append(Document.from_dict(ctx))

        prompt = get_qa_prompt(
            question,
            documents,
            mention_random_ordering=False,
            query_aware_contextualization=False,
        )

        c = prompt.split("\n\n")
        instruction, question = c[0], c[-1]
        demonstration = "\n".join(c[1:-1])
        datasets.append(
            {
                "id": ii,
                "instruction": instruction,
                "demonstration": demonstration,
                "question": question,
                "answer": input_example["answers"],
            }
        )

100%|██████████| 2655/2655 [00:02<00:00, 1127.98it/s]


In [5]:
# select an example from NaturalQuestions
instruction, demonstration_str, question, answer = [
    datasets[23][key] for key in ["instruction", "demonstration", "question", "answer"]
]

In [12]:
# Ground-truth Answer
demonstration_str,answer

('Document [1](Title: OPEC) of "the top 100 most influential people in the shipping industry". However, the influence of OPEC on international trade is periodically challenged by the expansion of non-OPEC energy sources, and by the recurring temptation for individual OPEC countries to exceed production targets and pursue conflicting self-interests. As of June 2018, OPEC has 15 member countries: six in the Middle East (Western Asia), seven in Africa, and two in South America. According to the U.S. Energy Information Administration (EIA), OPEC\'s combined rate of oil production (including gas condensate) represented 44 percent of the world\'s total in 2016, and OPEC accounted for\nDocument [2](Title: OPEC) his African neighbors Angola and Sudan to join, and Angola did in 2007, followed by Equatorial Guinea in 2017. Since the 1980s, representatives from Egypt, Mexico, Norway, Oman, Russia, and other oil-exporting nations have attended many OPEC meetings as observers, as an informal mechan

### The response of Original prompt (Error)

In [14]:
# The response from original prompt, error

prompt = "\n\n".join([instruction, demonstration_str, question])

message = [
    {"role": "user", "content": prompt},
]

request_data = {
    "messages": message,
    "max_tokens": 100,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-3.5-turbo",
    **request_data,
)
print(json.dumps(response, indent=4))

Collecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl.metadata (13 kB)
Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.52.2
    Uninstalling openai-1.52.2:
      Successfully uninstalled openai-1.52.2
Successfully installed openai-0.28.0


APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


### The response of Compressed Prompt (Correct in 10x Compression)

In [None]:
# Setup LLMLingua
from llmlingua import PromptCompressor

llm_lingua = PromptCompressor()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [None]:
# 6x Compression
compressed_prompt = llm_lingua.compress_prompt(
    demonstration_str.split("\n"),
    instruction=instruction,
    question=question,
    target_token=500,
    condition_compare=True,
    condition_in_question="after",
    rank_method="longllmlingua",
    use_sentence_level_filter=False,
    context_budget="+100",
    dynamic_context_compression_ratio=0.4,  # enable dynamic_context_compression_ratio
    reorder_context="sort",
)
message = [
    {"role": "user", "content": compressed_prompt["compressed_prompt"]},
]

request_data = {
    "messages": message,
    "max_tokens": 100,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-3.5-turbo",
    **request_data,
)

print(json.dumps(compressed_prompt, indent=4))
print("Response:", response)

{
    "compressed_prompt": "Write a high-quality answer for the given question using only the provided search results (some of which might be irrelevant).\n\nDocument [10](Title: OPEC Organization of the Petroleum Exporting Countries (OPEC, /\u02c8o\u028ap\u025bk/ OH-pek, or OPEP in several other languages) is an intergovernmental organization of 14 nations as of February 2018, founded in 1960 in Baghdad by the first five members (Iran, Iraq, Kuwait, Saudi Arabia, and Venezuela), and headquartered since 1965 in Vienna, Austria. As of 2016, the 14 countries accounted for an estimated 44 percent of global oil production and 73 percent of the world's \"proven\" oil reserves, giving OPEC a major influence on global oil prices that were previously determined by American-dominated multinational oil companies.\n\nDocument1](Title: OPE OPE lost its newest members, who had in mid-1970s E withd in December 192, because it was unwilling to pay annual US$2 million membership fee felt that it neede

In [None]:
# 10x Compression
compressed_prompt = llm_lingua.compress_prompt(
    demonstration_str.split("\n"),
    instruction=instruction,
    question=question,
    target_token=100,
    condition_compare=True,
    condition_in_question="after",
    rank_method="longllmlingua",
    use_sentence_level_filter=False,
    context_budget="+100",
    dynamic_context_compression_ratio=0.4,  # enable dynamic_context_compression_ratio
    reorder_context="sort",
)
message = [
    {"role": "user", "content": compressed_prompt["compressed_prompt"]},
]

request_data = {
    "messages": message,
    "max_tokens": 100,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-3.5-turbo",
    **request_data,
)

print(json.dumps(compressed_prompt, indent=4))
print("Response:", response)

{
    "compressed_prompt": "Write a high-quality answer for the given question using only the provided search results (some of which might be irrelevant).\n\n0Title: OPECization of Petroleum Exporting Count (OPEC, /\u02c8o\u028ap\u025bk OHpekP in other) is an intergovernmental14 nations as February 218 founded in 960 in Baghdad by fiveIran Iraq, Kuwait, Saudi Arab, and Venezuela), headquartered since 965 in, Austria. of the4ed an estimated4 percent of production and 3 percent of the world's \"proven\" oil res OPEC on global by Americandominatedin companies.\n\n5](Title: OPEC) OPEC lost its two newest members, who had joined in the mid-1970s. Ecuador withdrew in December 1992, because it was unwilling to pay the annual US$2 million membership fee and felt that it needed to produce more oil than it was allowed under the OPEC quota, although it rejoined in October 2007. Similar concerns prompted Gabon to suspend membership in January 1995; it rejoined in July 2016. Iraq has remained a mem