# Introduction to Understanding LLM Chatbot Behavior with Document Context(no prompts)

### **Goals for the MEPO LLM Chat Bot**
This notebook is designed to explore the behavior and responses of an LLM chatbot when it uses document context without additional prompts. By loading a document for context, the chatbot can provide more informed and contextually relevant answers, helping you understand how AI can assist in specific, document-based queries.

### **Context and Importance**

As we integrate AI systems into our workflows, it’s essential to understand how these systems utilize and respond to context. This notebook encourages you to think critically about the role of AI chatbots, particularly when they operate using document-based context to enhance their responses.

###  **Dangers of AI Reliance**

While AI chatbots offer substantial benefits, there are also potential risks, such as:
- **Bias:** AI models may reflect biases present in their training data, which can be compounded when documents with biased content are used.
- **Transparency Issues:** The mechanisms by which AI derives its answers can be opaque, making it difficult to fully understand how conclusions are reached.
- **Reinforcing Inequalities:** AI has the potential to perpetuate existing inequalities if not used thoughtfully and with critical oversight.

### **Importance of Attention and Critical Thinking**

Even when AI systems are provided with rich context, human attention and critical thinking remain vital. Overreliance on AI can lead to disengagement and a decline in analytical skills. This notebook will underscore the importance of balancing AI assistance with active, thoughtful human participation.

### **Responsible AI Usage**

This section will explore ethical guidelines and considerations for using AI responsibly. It will help you address the question, "How should I use these AI systems?" by offering a framework for thoughtful and ethical interaction with AI.

### **Choice of LLM (Open Data)**

The notebook will also discuss why an open-data LLM was chosen, highlighting the advantages of transparency, accessibility, and the potential for community-driven improvements. This approach ensures that the model is both robust and adaptable, aligning with ethical standards and promoting trust in AI systems.

---

# Purpose of the Notebook

The primary aim of this notebook is to guide you through the process of setting up and interacting with the LLaMA-2 model, using a loaded document to provide context for its responses. This will allow you to see how the model processes and responds to queries based on the specific information within the document, making it a helpful tool for understanding the practical applications and limitations of AI in context-based usecases.



# Setup and Query LLaMA-2 Model
This notebook will guide you through installing required libraries, setting up the LLaMA-2 model, and querying it using natural language.

## Install Required Libraries
We need to install the necessary libraries for PyTorch, TorchVision, and Torchaudio. Additionally, we'll install other dependencies required for running the LLaMA-2 model and handling document embeddings.

In [None]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 --upgrade
!pip install langchain einops accelerate transformers bitsandbytes scipy
!pip install xformers sentencepiece
!pip install llama-index==0.10.12 llama_hub==0.0.19
!pip install llama-index-llms-huggingface
!pip install sentence-transformers
!pip install PyPDF2#dont need
!pip install PyMuPDF
!pip install --upgrade langchain llama-index
!pip install -U langchain-community
!pip install gradio==3.32.0
!pip install transformers
!pip install --upgrade gradio


Looking in indexes: https://download.pytorch.org/whl/cu117
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
Collecting torch
  Downloading https://download.pytorch.org/whl/cu117/torch-2.0.1%2Bcu117-cp310-cp310-linux_x86_64.whl (1843.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 GB[0m [31m995.9 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting triton==2.0.0 (from torch)
  Downloading https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.3/63.3 MB[0m [31m31.3 MB/s[0m eta [36m0:00:00[0m
Collecting lit (from triton==2.0.0->torch)
  Downloading https://download.pytorch.org/whl/lit-15.0.7.tar.gz (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.3/132.3 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Prep

## Import Required Libraries
Next, we'll import the necessary libraries for tokenization, model setup, and text generation.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch
from llama_index.core.prompts.prompts import SimpleInputPrompt
from llama_index.llms.huggingface import HuggingFaceLLM

from llama_index.legacy.embeddings.langchain import LangchainEmbedding
from langchain.embeddings.huggingface import HuggingFaceEmbeddings # This import should now work
from sentence_transformers import SentenceTransformer

from llama_index.core import set_global_service_context, ServiceContext

from llama_index.core import VectorStoreIndex, download_loader, Document # Import Document
from pathlib import Path
import fitz  # PyMuPDF
import gradio as gr



[nltk_data] Downloading package stopwords to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/legacy/_static/nltk_cache...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/legacy/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt.zip.
  warn(


## Define Model and Tokenizer
We'll define the model name and the authentication token required to access the LLaMA-2 model from Hugging Face.

In [None]:
model_name = "meta-llama/Llama-2-7b-chat-hf"
token_file = open("HF_TOKEN.txt")
auth_token = token_file.readline().strip();

tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir='./model/', token=auth_token)

# name = "meta-llama/Llama-2-7b-chat-hf"

# tokenizer = AutoTokenizer.from_pretrained(name, cache_dir='./model/', use_auth_token=auth_token)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

## Mount Google Drive
We need to mount Google Drive to save and load files if you're using Google Colab.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Load the Model
Now, we'll load the LLaMA-2 model using the previously defined name and authentication token. We'll also set some model parameters.

In [None]:
model = AutoModelForCausalLM.from_pretrained(name, cache_dir='./model/',
                                             use_auth_token=auth_token,
                                             torch_dtype=torch.float16,
                                             rope_scaling={"type": "dynamic", "factor": 2},
                                             load_in_8bit=True)



config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

## Create System and Query Prompts
Define the system prompt and query wrapper prompt to guide the LLaMA-2 model.

In [None]:
system_prompt = """<s>[INST] <<SYS>>

<</SYS>>"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")

# Function to update the global system prompt


In [None]:
# Function to update the global system prompt
def update_system_prompt(new_prompt):
    global system_prompt
    system_prompt = new_prompt
    query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")
    return "System prompt updated."

## Create HuggingFace LLM
Use the LLaMA Index wrapper to create a HuggingFace LLM.

In [None]:
llm = HuggingFaceLLM(context_window=4096,
                      max_new_tokens=250,
                     system_prompt=system_prompt,
                     query_wrapper_prompt=query_wrapper_prompt,
                     model=model, tokenizer=tokenizer)



## Setup Embeddings
We need to create an embeddings instance to represent document chunks.

In [None]:
embeddings = LangchainEmbedding(HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"))

## Set Service Context
Create a new service context instance and set it globally.

In [None]:
service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model=embeddings)
set_global_service_context(service_context)

  service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model=embeddings)


## Load Documents
Let's load documents from a PDF file. Make sure the PDF file is accessible at the specified path.

In [None]:
def read_pdf_to_documents(file_path):
    doc = fitz.open(file_path)
    documents = []
    for page_num in range(len(doc)):
        page = doc.load_page(page_num)
        text = page.get_text()
        documents.append(Document(text=text)) # Now Document is defined
    return documents

file_path = Path('/content/Full Pamplet.pdf')#make sure to change this to the document path
documents = read_pdf_to_documents(file_path)

## Create an Index
Create a Vector Store Index from the loaded documents to enable querying.

In [None]:
index = VectorStoreIndex.from_documents(documents)

## Setup Query Engine
Configure the query engine using the LLM to process natural language queries.

In [None]:
query_engine = index.as_query_engine()

## Query the Model
Ask a question to the model and get a response based on the loaded dat.

Example Queries:

I want potential solutions to tackle issues  during the great depression. Your design should be cost-effective, sustainable, and feasible given
the limited resources and technology of the time. Consider the long-term benefits and community impacts of your proposed solution.



-opengui
-cuda
-completed llama 2 notebook
-used rag retrieval augmented generation to load data
-this involves a bit of prompt engineering
-load the data into llama
-llama breaks down the doc/data
-and store it as vectors or in memory
-using readme

# Define the query function


In [None]:
def query_model(question):
    llm = HuggingFaceLLM(
        context_window=4096,
        max_new_tokens=256,
        system_prompt=system_prompt,
        query_wrapper_prompt=query_wrapper_prompt,
        model=model,
        tokenizer=tokenizer
    )
    embeddings = LangchainEmbedding(HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"))
    service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model=embeddings)
    set_global_service_context(service_context)

    response = query_engine.query(question)
    # formatted_response = format_paragraph(response.response)
    return response.response

# Create Gradio interface for updating the system prompt


In [None]:
update_prompt_interface = gr.Interface(
    fn=update_system_prompt,
    inputs=gr.Textbox(lines=5, placeholder="Enter the system prompt here...", label="System Prompt", value=system_prompt),
    outputs=gr.Textbox(label="Status"),
    title="System Prompt Updater",
    description="Update the system prompt used for context."
)

# Create Gradio interface for querying the model
query_interface = gr.Interface(
    fn=query_model,
    inputs=gr.Textbox(lines=2, placeholder="Enter your question here...", label="User Question"),
    outputs=gr.Textbox(label="Response"),
    title="Document Query Assistant",
    description="Ask questions based on the content of the loaded pamphlet."
)

# Combine the interfaces
combined_interface = gr.TabbedInterface([update_prompt_interface, query_interface], ["Update System Prompt", "Query Assistant"])

# Launch the combined interface
combined_interface.launch(debug=True, share=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://cf4cbbe23640ffad75.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


  service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm, embed_model=embeddings)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://cf4cbbe23640ffad75.gradio.live


