# LLM Chatbot Behavior With Prompt and no context document

### **Goals for the MEPO LLM Chat Bot**
This notebook is designed to explore the behavior and responses of an LLM chatbot, specifically when it operates without a document as context but with a prompt guiding its behavior. Through this exercise, you will critically analyze how these AI systems function and consider the implications of their usage.

### **Context and Importance**

As AI systems become increasingly integrated into our daily lives, it's crucial to understand how they operate and the implications of their use. This notebook encourages you to critically think about the role of AI chatbots and what it means to engage with them meaningfully.

###  **Dangers of AI Reliance**

While AI chatbots can be incredibly useful, they also come with potential pitfalls. Over-reliance on these systems can introduce risks such as:
- **Bias:** AI models can perpetuate or amplify existing biases present in their training data.
- **Lack of Transparency:** Understanding how AI systems make decisions can be challenging, leading to a "black box" problem.
- **Amplification of Inequalities:** AI may inadvertently reinforce societal inequalities.

###  **Importance of Attention and Critical Thinking**

Human attention and critical thinking are paramount when using AI systems. Relying too heavily on AI can lead to decreased engagement and a decline in essential analytical skills. This notebook will emphasize the importance of maintaining a balance between AI assistance and human judgment.

###  **Responsible AI Usage**

This section addresses the ethical considerations and guidelines for responsible AI adoption. It will help you answer the question, "How should I use these AI systems?" by providing a framework for ethical and mindful interaction with AI.

### **Choice of LLM (Open Data)**

The notebook will also explore the rationale behind choosing an open-data LLM. Open-data models promote transparency, accessibility, and the potential for community-driven development, making them a valuable tool for education and innovation.

---

# Purpose of the Notebook

The primary purpose of this notebook is to guide you through the necessary steps to set up and interact with the LLaMA-2 model. By following along, you will gain insights into how these models work, and how they respond when provided with specific prompts in the absence of document-based context. This hands-on experience will help in forming design solutions and understanding the broader implications of AI systems.



# Setup and Query LLaMA-2 Model
This notebook will guide you through installing required libraries, setting up the LLaMA-2 model, and querying it using natural language.

## Install Required Libraries
We need to install the necessary libraries for PyTorch, TorchVision, and Torchaudio. Additionally, we'll install other dependencies required for running the LLaMA-2 model and handling document embeddings.

In [None]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 --upgrade
!pip install langchain einops accelerate transformers bitsandbytes scipy
!pip install xformers sentencepiece
!pip install llama-index==0.10.12 llama_hub==0.0.19
!pip install llama-index-llms-huggingface
!pip install sentence-transformers
!pip install PyPDF2#dont need
!pip install PyMuPDF
!pip install --upgrade langchain llama-index
!pip install -U langchain-community
!pip install gradio==3.32.0
!pip install transformers
!pip install --upgrade gradio


Looking in indexes: https://download.pytorch.org/whl/cu117
Collecting torch
  Using cached https://download.pytorch.org/whl/cu117/torch-2.0.1%2Bcu117-cp310-cp310-linux_x86_64.whl (1843.9 MB)
Collecting triton==2.0.0 (from torch)
  Using cached https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
Installing collected packages: triton, torch
  Attempting uninstall: triton
    Found existing installation: triton 3.0.0
    Uninstalling triton-3.0.0:
      Successfully uninstalled triton-3.0.0
  Attempting uninstall: torch
    Found existing installation: torch 2.4.0
    Uninstalling torch-2.4.0:
      Successfully uninstalled torch-2.4.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index-llms-huggingface 0.2.5 requires torch<3.0.0,>=2.1.2, but you have torch 2.0.1+cu117 which is incompati

## Import Required Libraries
Next, we'll import the necessary libraries for tokenization, model setup, and text generation.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch
from llama_index.core.prompts.prompts import SimpleInputPrompt
from llama_index.llms.huggingface import HuggingFaceLLM

from llama_index.legacy.embeddings.langchain import LangchainEmbedding
from langchain.embeddings.huggingface import HuggingFaceEmbeddings # This import should now work
from sentence_transformers import SentenceTransformer

from llama_index.core import set_global_service_context, ServiceContext

from llama_index.core import VectorStoreIndex, download_loader, Document # Import Document
from pathlib import Path
import fitz  # PyMuPDF
import gradio as gr



[nltk_data] Downloading package stopwords to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/legacy/_static/nltk_cache...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/legacy/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt.zip.
  warn(


## Define Model and Tokenizer
We'll define the model name and the authentication token required to access the LLaMA-2 model from Hugging Face.

In [None]:
model_name = "meta-llama/Llama-2-7b-chat-hf"
token_file = open("HF_TOKEN.txt")
auth_token = token_file.readline().strip();

tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir='./model/', token=auth_token)


# name = "meta-llama/Llama-2-7b-chat-hf"

# tokenizer = AutoTokenizer.from_pretrained(name, cache_dir='./model/', use_auth_token=auth_token)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

## Mount Google Drive
We need to mount Google Drive to save and load files if you're using Google Colab.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Load the Model
Now, we'll load the LLaMA-2 model using the previously defined name and authentication token. We'll also set some model parameters.

In [None]:
model = AutoModelForCausalLM.from_pretrained(name, cache_dir='./model/',
                                             use_auth_token=auth_token,
                                             torch_dtype=torch.float16,
                                             rope_scaling={"type": "dynamic", "factor": 2},
                                             load_in_8bit=True)



config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

## Create System and Query Prompts
Define the system prompt and query wrapper prompt to guide the LLaMA-2 model.

In [None]:
system_prompt = """<s>[INST] <<SYS>>
be concise and straightforward
<</SYS>>"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")

#Create the Gradio Interface
Gradio provides a simple way to create a web-based interface for interacting with the model:

In [None]:
def query_llama2(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").input_ids
    outputs = model.generate(inputs, max_length=512, do_sample=True)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

iface = gr.Interface(fn=query_llama2,
                     inputs="text",
                     outputs="text",
                     title="LLaMA-2 Chat Bot",
                     description="Interact with the LLaMA-2 model to explore various topics.")

iface.launch(debug=True,share=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://e84e3f554d43918ba3.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


