# LLama LLM

The recent buzz around LLMs (large language models) is justified from the significant boost in noticable performance on common tasks in NLP. Recently, Meta research open sourced Llama 2.0, a large language model which outperformed GPT on several benchmark datasets. They released Llama 2.0 


This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 70B parameters.

## Download Model
[LLama2 - HuggingFace](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/blob/main/llama-2-7b-chat.ggmlv3.q4_0.bin)

In [1]:
# ! pip install llama-cpp-python

In [2]:
# !pip install langchain

In [3]:
from langchain import PromptTemplate, LLMChain
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

In [4]:
template = """Question: Find if the number is a prime number: {number}. 

Output : Give step by step explanation for the answer

Answer: Inform if the number is prime or not prime"""

prompt = PromptTemplate(template=template, input_variables=["number"])

## Load Model

In [5]:
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

In [7]:
# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/Users/vatsalpatel/Downloads/llama-2-7b-chat.ggmlv3.q4_0.bin",
    metadata={"temperature": 0, "max_length": 200000, "top_p": 1},
    callback_manager=callback_manager,
    verbose=True,
)

llama.cpp: loading model from /Users/vatsalpatel/Downloads/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 1.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 3917.73 MB (+  256.00 MB per state)
llama_new_context_with_model: kv self size

## Prompt Engineering

In [8]:
chain = LLMChain(llm=llm,prompt = prompt)
chain.run("3")

. If it's a prime number, explain why it's prime; otherwise, explain why it's not prime.

Input : The number to be checked for primality is 3.

Explanation : To determine whether a given number is prime, we can use the following steps:
Step 1: Check if the number is equal to 1 or 0. If it is, then it's not a prime number because 1 and 0 are not prime numbers.
Step 2: Check if the number is divisible by 2. If it is, then it's not a prime number because 2 is the smallest number that can divide it exactly, and therefore, it cannot be divided by any other number except for 1 and itself.
Step 3: Check if the number is divisible by any other numbers besides 1 and itself. If it is, then it's not a prime number because it can be divided by at least one other number besides 1 and itself.
In the case of the number 3, we can see that it passes step 1 and step 2, but fails step 3. Therefore, we conclude that 3 is a prime number


llama_print_timings:        load time =  6578.50 ms
llama_print_timings:      sample time =   183.19 ms /   256 runs   (    0.72 ms per token,  1397.47 tokens per second)
llama_print_timings: prompt eval time =  8042.48 ms /    40 tokens (  201.06 ms per token,     4.97 tokens per second)
llama_print_timings:        eval time = 12355.89 ms /   256 runs   (   48.27 ms per token,    20.72 tokens per second)
llama_print_timings:       total time = 21049.26 ms


". If it's a prime number, explain why it's prime; otherwise, explain why it's not prime.\n\nInput : The number to be checked for primality is 3.\n\nExplanation : To determine whether a given number is prime, we can use the following steps:\nStep 1: Check if the number is equal to 1 or 0. If it is, then it's not a prime number because 1 and 0 are not prime numbers.\nStep 2: Check if the number is divisible by 2. If it is, then it's not a prime number because 2 is the smallest number that can divide it exactly, and therefore, it cannot be divided by any other number except for 1 and itself.\nStep 3: Check if the number is divisible by any other numbers besides 1 and itself. If it is, then it's not a prime number because it can be divided by at least one other number besides 1 and itself.\nIn the case of the number 3, we can see that it passes step 1 and step 2, but fails step 3. Therefore, we conclude that 3 is a prime number"

In [9]:
prompt = """
Question: Can you explain like I'm 5 how BERT works under the hood?
"""
llm(prompt)

Llama.generate: prefix-match hit


его explanation

BERT, or Bidirectional Encoder Representations from Transformers, is a type of machine learning model that can understand text. It's like a superhero for words!

To make BERT work, we need to do some magic with computers. First, we take a big pile of words and feed them into a special kind of computer called a transformer. The transformer is like a big chewy machine that chews on the words and makes them into something special.
Next, we give the transformed words to another computer called a decoder. The decoder is like a magic wand that can make new words out of the old ones. It's like turning a pile of building blocks into a really cool castle!
But here's the coolest part: BERT can learn how to do all this on its own, without us having to tell it exactly what to do. It's like giving a robot a box of LEGOs and saying "Make something awesome!" And then, magic! The robot builds a whole city!
So that's how BERT works! It's like a superhero for words, using magic computer


llama_print_timings:        load time =  6578.50 ms
llama_print_timings:      sample time =   182.61 ms /   256 runs   (    0.71 ms per token,  1401.90 tokens per second)
llama_print_timings: prompt eval time =   977.12 ms /    23 tokens (   42.48 ms per token,    23.54 tokens per second)
llama_print_timings:        eval time = 12185.41 ms /   255 runs   (   47.79 ms per token,    20.93 tokens per second)
llama_print_timings:       total time = 13811.20 ms


'его explanation\n\nBERT, or Bidirectional Encoder Representations from Transformers, is a type of machine learning model that can understand text. It\'s like a superhero for words!\n\nTo make BERT work, we need to do some magic with computers. First, we take a big pile of words and feed them into a special kind of computer called a transformer. The transformer is like a big chewy machine that chews on the words and makes them into something special.\nNext, we give the transformed words to another computer called a decoder. The decoder is like a magic wand that can make new words out of the old ones. It\'s like turning a pile of building blocks into a really cool castle!\nBut here\'s the coolest part: BERT can learn how to do all this on its own, without us having to tell it exactly what to do. It\'s like giving a robot a box of LEGOs and saying "Make something awesome!" And then, magic! The robot builds a whole city!\nSo that\'s how BERT works! It\'s like a superhero for words, using 

In [10]:
prompt = """
Question: Given a pandas dataframe with the following columns `org_id`, `account_id` and `account_name`, I want to create a new column `workload` which will take the jaro similarity associated with each account_names per org and assign a workload to accounts with a high jaro score greater than a threshold of 97%
"""
llm(prompt)

Llama.generate: prefix-match hit



How can I do this?
Here is an example of my dataframe:

| org_id | account_id | account_name |
| --- | --- | --- |
| 1 | 101 | ABC Inc. |
| 1 | 102 | XYZ Corporation |
| 2 | 201 | DEF Company |
| 2 | 202 | GHI Limited |

I want to create a new column `workload` that will take the jaro similarity between account_name for each org and assign a workload of high, medium or low depending on the jaro score. I am not sure how to do this as I am not familiar with this type of analysis.
Can anyone help me?
Thanks in advance!


llama_print_timings:        load time =  6578.50 ms
llama_print_timings:      sample time =   122.79 ms /   172 runs   (    0.71 ms per token,  1400.74 tokens per second)
llama_print_timings: prompt eval time =  3087.20 ms /    72 tokens (   42.88 ms per token,    23.32 tokens per second)
llama_print_timings:        eval time =  8392.17 ms /   172 runs   (   48.79 ms per token,    20.50 tokens per second)
llama_print_timings:       total time = 11915.83 ms


'\nHow can I do this?\nHere is an example of my dataframe:\n\n| org_id | account_id | account_name |\n| --- | --- | --- |\n| 1 | 101 | ABC Inc. |\n| 1 | 102 | XYZ Corporation |\n| 2 | 201 | DEF Company |\n| 2 | 202 | GHI Limited |\n\nI want to create a new column `workload` that will take the jaro similarity between account_name for each org and assign a workload of high, medium or low depending on the jaro score. I am not sure how to do this as I am not familiar with this type of analysis.\nCan anyone help me?\nThanks in advance!'

In [12]:
prompt = """
I downloaded the 70b param llama model from HuggingFace. When trying to load in the model I get the following error. When loading the 7b model it works, help me debug the error.
ValidationError                           Traceback (most recent call last)
Cell In [6], line 2
      1 # Make sure the model path is correct for your system!
----> 2 llm = LlamaCpp(
      3     model_path="/Users/vatsalpatel/Downloads/llama-2-70b-chat.ggmlv3.q4_0.bin",
      4     metadata={"temperature": 0, "max_length": 200000, "top_p": 1},
      5     callback_manager=callback_manager,
      6     verbose=True,
      7 )

File ~/opt/miniconda3/envs/test_env/lib/python3.10/site-packages/langchain/load/serializable.py:74, in Serializable.__init__(self, **kwargs)
     73 def __init__(self, **kwargs: Any) -> None:
---> 74     super().__init__(**kwargs)
     75     self._lc_kwargs = kwargs

File ~/opt/miniconda3/envs/test_env/lib/python3.10/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for LlamaCpp
__root__
  Could not load Llama model from path: /Users/vatsalpatel/Downloads/llama-2-70b-chat.ggmlv3.q4_0.bin. Received error  (type=value_error)
"""
llm(prompt)

Llama.generate: prefix-match hit


```
The issue is that the `model_path` parameter in the `LlamaCpp` constructor is not a valid file path. Check that the file path is correct and try again.
I hope this helps! Let me know if you have any other questions.


llama_print_timings:        load time =  6578.50 ms
llama_print_timings:      sample time =    41.34 ms /    58 runs   (    0.71 ms per token,  1403.14 tokens per second)
llama_print_timings: prompt eval time = 18363.49 ms /   432 tokens (   42.51 ms per token,    23.52 tokens per second)
llama_print_timings:        eval time =  2801.03 ms /    57 runs   (   49.14 ms per token,    20.35 tokens per second)
llama_print_timings:       total time = 21368.38 ms


'```\nThe issue is that the `model_path` parameter in the `LlamaCpp` constructor is not a valid file path. Check that the file path is correct and try again.\nI hope this helps! Let me know if you have any other questions.'

## Resources
- https://ai.meta.com/blog/large-language-model-llama-meta-ai/ [1]
- https://arxiv.org/abs/2302.13971 [2]
- https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md [3]
- https://pypi.org/project/langchain/ [4]
- 