# Summarization

We will use [LangChain](https://www.langchain.com/), an open-source library for making applications with LLMs.


## Document location
We will try to load  all the documents in the folder defined below.
If you prefer, you can change this to a different folder name.

In [None]:
#document_folder = 'documents'
document_folder = '../summarizing'

## Some configuration
To conserve memory, we configure more efficient memory use on the GPU.

In [None]:
%env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

## Installing Software
We’ll need to install some libraries first:

In [None]:
!pip install --upgrade unstructured[all-docs] langchain-unstructured

## The Language Model
We'll use models from [HuggingFace](https://huggingface.co/), a website that has tools and models for machine learning.
We'll use the open-weights LLM 
[mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410).


In [None]:
%env HF_HOME=/fp/projects01/ec443/huggingface/cache/

To use the model, we create a *pipeline*.
A pipeline can consist of several processing steps, but in this case, we only need one step.
We can use the method `HuggingFacePipeline.from_model_id()`, which automatically downloads the specified model from HuggingFace.

from transformers import pipeline

llm = pipeline("text-generation", 
               model="mistralai/Mistral-Nemo-Instruct-2407",
               device=0,
               max_new_tokens=1000)

In [None]:
from langchain_community.llms import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    #model_id='mistralai/Mistral-Small-Instruct-2409',
    model_id='mistralai/Ministral-8B-Instruct-2410',
    #model_id='mistralai/Mistral-7B-Instruct-v0.3',
    task='text-generation',
    device=0,
    pipeline_kwargs={
        'max_new_tokens': 1000,
        #'temperature': 0.3,
        #'num_beams': 4,
        #'do_sample': True
    }
)


We give some arguments to the pipeline:
- `model_id`: the name of the  model on HuggingFace
- `task`:  the task you want to use the model for
- `device`: the GPU hardware device to use. If we don't specify a device, no GPU will be used.
- `pipeline_kwargs`: additional parameters that are passed to the model.
    - `max_new_tokens`: maximum length of the generated text
    - `do_sample`: by default, the most likely next word is chosen.  This makes the output deterministic. We can introduce some randomness by sampling among the  most likely words instead.
    - `temperature`: the temperature controls the amount of randomness, where zero means no randomness.
    - `num_beams`: by default the model works with a single sequence of  tokens/words. With beam search, the program  builds multiple sequences at the same time, and then selects the best one in the end.


## Making a Prompt
We can use a *prompt* to tell the language model how to answer.
The prompt should contain a few short, helpful instructions.
In addition, we provide placeholders for the input,  called *context*.
LangChain replaces the placeholder with the input document when we execute a query.


In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate


In [None]:
separator = '\nYour Summary:\n'
prompt_template = '''Write a summary of the following:

{context}
''' + separator
prompt = PromptTemplate(template=prompt_template,
                        input_variables=['context'])

## Create chain

The document loader loads each PDF page as a separate 'document'.
This is partly for technical reasons because that is the way PDFs are structured.
Therefore, we use the chain called  `create_stuff_documents_chain` which joins multiple documents  into a single large document.

In [None]:
chain = create_stuff_documents_chain(llm, prompt)

## Function to Separate  the Summary from the Input

LangChain  returns both the input prompt and the generated response in one long text.
To get only the summary, we must split the summary from the document that we sent as input.

In [None]:
def split_result(result):
    "Split the reply from the prompt, should be done with output parser?"
    position = result.find(separator)
    summary = result[position + len(separator) :]
    return summary

## Loading the Documents


We use LangChain’s `DirectoryLoader` to load all in files in `document_folder`.
`document_folder` is defined at the start of this  Notebook.

In [None]:
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader(document_folder)
documents = loader.load()
print('number of documents:', len(documents))

## Creating the Summaries
Now, we can iterate over these documents with a `for`-loop.

In [None]:
summaries = {}

for document in documents:
    filename = document.metadata['source']
    print(filename)
    summary = chain.invoke({"context": [document]})
    summary = split_result(summary)
    summaries[filename] = summary
    print('Summary of file', filename)
    print(summary)

## Saving the Summaries to Text Files
Finally, we save the summaries for later use.
We save all the summaries in the file `summaries.txt`.
If you like, you can store each summary in a separate file.


In [None]:
with open('summaries.txt', 'w') as outfile:
    for filename in summaries:
        print('Summary of ', filename, file = outfile)
        print(summaries[filename], file=outfile)
        print(file=outfile)

## Bonus Material

::::{admonition} Make an Overall Summary
:class: tip, dropdown

We can also try to generate an overall summary of all the documents.
This doesn't make much sense with documents on different topics.
If all the documents are related or on the same topic, it could make sense to make an overall summary of all the summaries.

First, we need to import some more functions:

```python
from langchain.schema.document import Document
from langchain.prompts import ChatPromptTemplate
```

We make a new prompt, with more specific instructions than for the regular summaries.

```python
total_prompt = ChatPromptTemplate.from_messages(
    [("system", "Below is a list of summaries of some papers. Make a total summary all the information in all the papers:\n\n{context}\n\nTotal Summary:")]
)
```

Then, we can make a new chain based on the LLM and the prompt:

```python
total_chain = create_stuff_documents_chain(llm, total_prompt)
```

This chain needs a list of  `Document` objects as input.


```python
list_of_summaries = [Document(summary) for summary in summaries.values()]
```

Now, we can invoke the chain with this list as input, and print the result:

```python
total_summary = total_chain.invoke({"context": list_of_summaries})

print('Summary of all the summaries:')
print(total_summary)
```

Finally, we save the overall summary to a text file:

```python
with open('total_summary.txt', 'w') as outfile:
    print(total_summary, file=outfile)
```
::::

## Exercises