<a href="https://colab.research.google.com/github/americanthinker/vectorsearch-applications/blob/main/llama2_13b_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview - READ THIS FIRST

**The purpose of this notebook is to familiarize the student with Llama-2 model setup, prompt creation, and trying out RAG methods using an LLM.  This notebook is primarily intended for practice and getting used to making calls with Llama-2.  Students are NOT expected to run their Streamlit app in this environment, rather a HuggingFace endpoint will be creatd for you later in the course, when you are ready to integrate Llama-2 calls into your overall RAG system**
1. The raw data for the Impact Theory is downloaded, but it's up to you on how (or if) you use it when practicing with making Llama-2 calls.
2. The Weaviate Client code is also downloaded, and this is the recommended way of getting context data into the model prompts.  Environment variables can easily be configured in the Secrets section of Colab on the left of the notebook.

In [1]:
!pip install bitsandbytes --quiet
!pip install accelerate --quiet
!pip install einops --quiet
!pip install tqdm --quiet

In [2]:
from huggingface_hub import notebook_login
from transformers.pipelines.text_generation import TextGenerationPipeline
from torch import cuda, bfloat16
import bitsandbytes
import transformers

In [3]:
from google.colab import userdata
from transformers import AutoConfig

Pass in environement variables using the `userdata.get('Name of your secret')` method.

In [7]:
!curl -o impact_theory_data.json https://raw.githubusercontent.com/americanthinker/vectorsearch-applications/main/data/impact_theory_data.json
!curl -o weaviate_interface.py https://raw.githubusercontent.com/americanthinker/vectorsearch-applications/main/weaviate_interface.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 25.6M  100 25.6M    0     0  36.0M      0 --:--:-- --:--:-- --:--:-- 36.0M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17590  100 17590    0     0  83333      0 --:--:-- --:--:-- --:--:-- 83364


Using the HuggingFace `notebook_login` method is a streamlined way of authenticating with HF that lasts throughout the entirety of the Colab session.  Simply copy and paste your HF Token when prompted.

In [6]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [4]:
#load data for later user
import json
with open('/content/impact_theory_data.json') as f:
  data = json.load(f)

In [5]:
model_id = 'meta-llama/Llama-2-13b-chat-hf'

# 4-bit Quanityzation to load Llama 2 with less GPU memory
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

Download the model tokenizer and the Llama-model itself...it's big!  Pass in the 4-bit quantization configuration set up in the previous cell, and set the model status to `eval`

In [8]:
# Llama 2 Tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)

# Llama 2 Model
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map='auto'
)
model.eval()

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 5120)
    (layers): ModuleList(
      (0-39): 40 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (k_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (v_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (o_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (up_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (down_proj): Linear4bit(in_features=13824, out_features=5120, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm

HuggingFace has a series of pipelines that abstract away a lot of the details for various model tasks (Named Entity Recognition, Question Answering, Masked Language Modeling, etc.).  For our use case, we want to go with `text-generation` given that is the primary task that the Llama-2 model series was trained to do.  

In [16]:
llama = transformers.pipeline(model=model,
                              tokenizer=tokenizer,
                              task='text-generation',
                              temperature=0.15,
                              max_new_tokens=250,
                              repetition_penalty=1.1)

Given our objective of Retrieval Augmented Generation, we'll want to setup our prompt in a specific way that balances feeding highly relevant context, but not too much, along with instructing the model to stick to the context. I've included some text chunks for you to practice with getting used to creating prompts for the Llama model.  This text chunks were generated by using this query:
**"What is Ian Bremmer's opinion on AI threats?"**

In [10]:
text_chunks = [
"And that idea of, okay, there are things that we could rally around that take us out of our smaller narrative into a larger narrative, hence the title of the book, The Power of Crisis, there is a thing that that can bring us together and give us that shared narrative. But what scares me is if you plug in AI bias into this equation, you can't get now I Yeah, now I'm like, whoa, like, one, who gets to decide what the AI's value system is what the AI's belief system is how the AI interprets truth, what the AI reinforces. And then if there are a lot of AI, which which is probably the thing that protects us from an authoritarian answer, but at the same time, then you have all this competing reinforcement that again, just brings us back to fragmentation. So as you look at that suite of unnerving potential problems, what do you see is our path to the other side of this to doing it well? Yeah. So President Biden just two weeks ago, had a group of seven AI founders slash CEOs, the most powerful companies in this space, as of right now, that will not be true in a year or two, there'll be vastly more.",
"So I do think the motivation to get this right is going to be there. I just, I hope we're up for it. And, you know, again, I'm an optimist. I'm hopeful. I mean, at the end of the day, I mean, the fact that we're here and we're talking about it means that we're capable of doing something. My only fear is that with global warming, you can't win global warming and get a leg up over China or Russia, but you can win AI and get a leg up and be better. And I think that one thing that people aren't talking about enough for sure is that AI is going to be an adversarial system, meaning bad guys are going to have AI and they're going to try to do things to hurt me with that AI. And then others are going to build AI that is protective and try to stop the bad guys. And so you will have, just like with normal hacking, you'll have an ever escalating arms race of AI. And so even if only with the best of intentions, we will end up getting to AI super intelligence because we're trying to stop somebody from doing a bad thing.",
"But I'm sure that criminal malware developers are saying, I can't imagine developing criminal malware or spear phishing without using these new AI tools, because I mean, it's just going to allow them to target in such an extraordinary and pinpoint way, and also to send out so much more, you know, sort of capable malware that will elicit so much more engagement, and therefore, you know, bring so much more money to them or shut down so many more servers and give them so much more illicit data and so much of the illicit data that they've already collected from the hacks on, you know, all of these companies that you've heard about, Target, for example, other firms. I mean, so much of that so far is just, oh, we're just selling that for people that want to like use the credit cards. No, now you're going to sell it to people that are empowered with AI that can generate malware against that data. And that again, and that's, that's like, we're going to develop all these new vaccines and new pharmaceuticals that will deal with Alzheimer's and deal with cancers. And it's going to be an incredible time for medicine."
]

In [11]:
len(text_chunks)

3

When crafting your prompt, make note of the way the special tags are used:

*   `<s>` - start prompt tag
*   `[INST], [/INST]` - Opening and closing model instruction tags
*   `<<<SYS>>>, <</SYS>>` - Opening and closing system prompt tags

Below is "an" example of a question answering prompt for the model.

In [12]:
question_answering_prompt = '''
<s>[INST] <<SYS>>

You are an expert at creating high quality, context-based answers to questions when given contextual information and part of a podcast episode transcript.

<</SYS>>
Your task is to synthesize and reason over a transcript of a snippet of an interview between Tom Bilyeu and his guest(s).
User the contextual information that is provided to answer the question, which includes the show summary, guest, and the \
transcript itself. After your synthesis, use the transcript to answer the below question.\n

```
Show Summary: {summary}
Show Guest: {guest}
Transcript: {transcript}
```\n\n
Question: {question}\n
Answer the question and provide reasoning if necessary to explain the answer.\n
If the context does not provide enough information to answer the question, then \n
state that you cannot answer the question with the provided context. [/INST]

Answer:
'''

In [13]:
#set constants
summary = '''
"In this episode, Ian Bremmer discusses the rise of big tech as a third superpower and the potential dangers and opportunities it presents. He highlights the immense power held by tech companies in shaping society, the economy, and national security, emphasizing their sovereignty over the digital world. Bremmer expresses concerns about the growing influence of AI and its potential to outstrip government regulation, leading to a reality where tech companies wield significant power over individuals. He also delves into the risks associated with AI proliferation, including the potential for non-governments to control and misuse the technology, exacerbating social inequalities and disinformation. Bremmer emphasizes the need to address negative externalities and regulate AI to mitigate its adverse impacts. Additionally, he discusses the implications of AI on job displacement and social discontent, particularly for marginalized communities. The conversation delves into the breakdown of truth in the digital age, driven by algorithmic sorting and micro-targeting, leading to fragmented echo chambers and the erosion of consensus on facts. Both Bremmer and the host explore the challenges of navigating truth in a polarized and algorithmically driven information landscape, highlighting the need for critical thinking and a focus on human flourishing as a guiding principle in the face of AI's transformative impact."
'''
guest = 'Ian Bremmer'
title = "THE BIG AI RESET: The Next Global SuperPower Isn't Who You Think | Ian Bremmer"

In [14]:
#define your query
query = "What is Ian Bremmer's opinion on AI threats?"

In [17]:
prompt = question_answering_prompt.format(summary=summary, guest=guest, transcript=text_chunks[0], question=query)
print(prompt)


<s>[INST] <<SYS>>

You are an expert at creating high quality, context-based answers to questions when given contextual information and part of a podcast episode transcript.

<</SYS>>
Your task is to synthesize and reason over a transcript of a snippet of an interview between Tom Bilyeu and his guest(s).
User the contextual information that is provided to answer the question, which includes the show summary, guest, and the transcript itself. After your synthesis, use the transcript to answer the below question.


```
Show Summary: 
"In this episode, Ian Bremmer discusses the rise of big tech as a third superpower and the potential dangers and opportunities it presents. He highlights the immense power held by tech companies in shaping society, the economy, and national security, emphasizing their sovereignty over the digital world. Bremmer expresses concerns about the growing influence of AI and its potential to outstrip government regulation, leading to a reality where tech companies 

#### **As a final check, we can make sure that our prompt is below the model's max context window length of 4,096 tokens.**

In [18]:
print(f'Total Prompt Tokens: {len(tokenizer.encode(prompt))}')

Total Prompt Tokens: 818


Pass in the prompt as an arg to the model's `call` method and make sure to set the `return_full_text` param to False, otherwise you'll get the entire prompt back as part of the models' answer.  Also, be aware that the `temperature` setting and the `max_new_tokens` settings are initialized when the model was passed into the HuggingFace pipeline. Final note, if the model is taking too long to process the prompt, you can try switching out for the faster, but less performant, version: `meta-llama/Llama-2-7b-chat-hf`

In [19]:
response = llama(prompt, return_full_text=False)

In [20]:
response[0]['generated_text']

"\nBased on the transcript, Ian Bremmer expresses concerns about the potential dangers of AI, specifically the risk of AI bias and the potential for non-government entities to control and misuse the technology. He highlights the need to address negative externalities and regulate AI to mitigate its adverse impacts. Additionally, he notes the potential for AI to exacerbate social inequalities and disinformation, and emphasizes the importance of critical thinking and a focus on human flourishing in the face of AI's transformative impact.\n\nReasoning:\n\nFrom the transcript, it is clear that Ian Bremmer has a nuanced view of the potential threats posed by AI. On one hand, he acknowledges the immense power held by tech companies in shaping society, the economy, and national security, and expresses concerns about the growing influence of AI and its potential to outstrip government regulation. However, he also recognizes the potential benefits of AI and emphasizes the need to address negati

# Practice, Practice, Practice

Learning how to use the Llama-2 model simply comes down to practice.  Try changing the temperature setting, the repeat_penalty setting, and mix up the model prompts.  You can even set up your weaviate client - `weaviate_interface` should already be available if you ran the download code at the top of the notebook - to start practicing setting up your RAG system.  I've created a simple function for you below that allows you to generate query - context pairs as an example of something you can do with the model, that in this case, allows you to create these pairs for embedding model fine-tuning.

In [21]:
def generate_query_context_pairs(llm: TextGenerationPipeline,
                                 summary: str,
                                 guest: str,
                                 transcript: str,
                                 prompt: str,
                                 num_questions: int,
                                 ) -> str:
  prompt = prompt.format(summary=summary, guest=guest, transcript=transcript, num_questions_per_chunk=num_questions)
  response = llm(prompt, return_full_text=False)
  return response[0]['generated_text']

In [22]:
qa_prompt = '''
<s>[INST] <<SYS>>

You are an expert at creating high quality questions when given text from a show episode.

<</SYS>>
Impact Theory episode summary and episode guest are below:

---------------------
Summary: {summary}
---------------------
Guest: {guest}
---------------------

Given the Summary and Guest of the episode as context use the following randomly selected transcript section \
of the episode and not prior knowledge, generate questions that can be answered by the transcript section:

---------------------
Transcript: {transcript}
---------------------

Your task is to create {num_questions_per_chunk} questions that can only be answered given the previous context and transcript details.\
The questions should randomly start with How, Why, or What. Only respond with the question, do not answer the question or provide reasoning. \
Do not provide an intro to the questions, just respond with the generated questions. [/INST]
'''

In [23]:
response = generate_query_context_pairs(llama, summary, guest, text_chunks[0], prompt=qa_prompt, num_questions=2)

In [24]:
for question in response.split('\n\n')[1:]:
  print(question)

What specific AI biases might reinforce existing societal inequalities and fragmentation, according to Ian Bremmer?
Why does Ian Bremmer believe that having multiple AI systems with different value systems and beliefs may actually bring us back to fragmentation, rather than providing a shared narrative?
