## Notebook Purpose

This notebook demonstrates how to generate reviews for a given paper using various language models. We utilize open-source Large Language Models (LLMs), our fine-tuned Mistral 7b model, and GPT to produce these reviews. The process includes the following steps:

1. **Model Selection:** Overview of the models used, including open-source LLMs, Mistral 7b, and GPT.
2. **Review Generation:** Description of how each model generates reviews.

# Environment Setup

In [1]:
import scipdf
import sys
sys.path.append('../')

from prompts import SYSTEM_PROMPT
from pdf_parser import parse_pdf_content, parse_pdf_abstract, generate_input 
from model_review import extract_output

In [2]:
pdf = scipdf.parse_pdf_to_dict('../demo/Transformers.pdf') # change the path to the pdf file you want to parse

# Model Inference

In [3]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import AutoPeftModelForCausalLM

In [4]:
device = torch.device("cuda:2" if torch.cuda.is_available() else "cpu")
!nvidia-smi

Wed Apr 24 13:12:09 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA L4                      On  | 00000000:38:00.0 Off |                    0 |
| N/A   49C    P0              28W /  72W |  20956MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                      On  | 00000000:3A:00.0 Off |  

## Mistral 7b Model
In this section, we utilize the Mistral 7b model to generate reviews for the specified paper. This model is selected for its specific capabilities and performance characteristics in review generation tasks. If needed, you can replace the Mistral 7b model with any other model of your choice to compare different outcomes or performance metrics.

In [5]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [9]:
# model_id = "mistralai/Mistral-7B-Instruct-v0.2"
# model = AutoModelForCausalLM.from_pretrained(
#     model_id,
#     device_map="auto",
#     torch_dtype=torch.bfloat16,
#     quantization_config=bnb_config
# )
# tokenizer = AutoTokenizer.from_pretrained(model_id)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## Our Fine-tuned Model
You can also use our fine-tuned Mistral 7b model to generate reviews.

In [None]:
# load directly from the huggingface model hub
model_id = "travis0103/mistral_7b_paper_review_lora"
model = AutoPeftModelForCausalLM.from_pretrained(
    model_id,
    device_map=device,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

## Generate Reviews Using Model

In [49]:
abstract_content = parse_pdf_abstract(pdf)
abstract_input = generate_input(abstract_content)

messages = [
    {"role": "user", "content": SYSTEM_PROMPT + '\n\n' + abstract_content},
]
encoded_input = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
generated_ids = model.generate(encoded_input, max_new_tokens=1024, do_sample=True, pad_token_id=tokenizer.eos_token_id, eos_token_id=tokenizer.eos_token_id)
decoded_output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

In [None]:
print(extract_output(decoded_output))

# GPT Inference

In [15]:
import os
import openai

os.environ["OPENAI_API_KEY"] = ""
client = openai.Client()

In [16]:
full_content = parse_pdf_content(pdf)
full_input = generate_input(full_content)

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
      {"role": "system", "content": SYSTEM_PROMPT},
      {"role": "user", "content": full_input},
    ]
)

In [17]:
print(response.choices[0].message.content)

[Significance and novelty]
The "Transformer" paper introduces a novel architecture that challenges the dominant sequence transduction models which rely on recurrent neural networks (RNNs) and convolutional neural networks (CNNs). The main contributions and innovations of this work include the introduction of an architecture based entirely on attention mechanisms, eliminating the need for recurrence or convolution within the model, which represents a significant shift in the approach to machine translation and potentially other sequence modeling tasks.

- 'Elimination of recurrence and convolution': The proposed Transformer model is novel in its exclusive reliance on attention mechanisms, avoiding traditional RNNs and CNNs which can be computationally expensive and slow due to their inherent sequential nature.
- 'Optimization for Parallelization': The Transformer is designed to maximize parallel computational capabilities, reducing training time significantly compared to existing models