<a href="https://colab.research.google.com/github/ujjawalcse/Generative-AI-Stable-Diffusion/blob/main/OpenAlpaca.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install sentencepiece
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentencepiece
  Using cached sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Using cached transformers-4.29.2-py3-none-any.whl (7.1 MB)
Installing collected packages: transformers
Successfully installed transformers-4.29.2


In [None]:
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

In [None]:
# the previewed version of OpenAlpaca
model_path = r'openllmplayground/openalpaca_3b_600bt_preview' 
tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(model_path).cuda()

In [None]:
tokenizer.bos_token_id, tokenizer.eos_token_id = 1,2 # see https://github.com/openlm-research/open_llama#preview-weights-release-and-usage

# same prompt as provided in https://crfm.stanford.edu/2023/03/13/alpaca.html
instruction = r'What is an alpaca? How is it different from a llama?'
'''
instruction = r'Write an e-mail to congratulate new Standford admits and mention that you are excited about meeting all of them in person.'
instruction = r'What is the capital of Tanzania?'
instruction = r'Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks.'
'''

prompt_no_input = f'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:'
tokens = tokenizer.encode(prompt_no_input)

tokens = torch.LongTensor(tokens).unsqueeze(0)
instance = {'input_ids': tokens,
                    'top_k': 50,
                    'top_p': 0.9,
                    'generate_len': 128}
                    
length = len(tokens[0])
with torch.no_grad():
    rest = model.generate(
            input_ids=tokens, 
            max_length=length+instance['generate_len'], 
            use_cache=True, 
            do_sample=True, 
            top_p=instance['top_p'], 
            top_k=instance['top_k']
        )
        
output = rest[0][length:]
string = tokenizer.decode(output, skip_special_tokens=True)
print(f'[!] Generation results: {string}')


# OpenAlpaca with QLoRA for running on low resources

In [None]:
!pip install gradio
!pip install sentencepiece
# Install latest bitsandbytes & transformers, accelerate from source
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gradio
  Downloading gradio-3.32.0-py3-none-any.whl (19.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.9/19.9 MB[0m [31m78.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles (from gradio)
  Downloading aiofiles-23.1.0-py3-none-any.whl (14 kB)
Collecting aiohttp (from gradio)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m34.8 MB/s[0m eta [36m0:00:00[0m
Collecting fastapi (from gradio)
  Downloading fastapi-0.95.2-py3-none-any.whl (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.0/57.0 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.3.0.tar.gz (4.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client>

In [None]:
import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer, BitsAndBytesConfig

# model_name = 'openlm-research/open_llama_3b_600bt_preview'

models= {
    "open_Alpaca": "openllmplayground/openalpaca_3b_600bt_preview"
}

model_name = models["open_Alpaca"]
print(f"Starting to load the model {model_name} into memory")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_name,unk_token="<unk>")
model = AutoModelForCausalLM.from_pretrained(model_name, 
                                             quantization_config=bnb_config,
                                             device_map={"":0})

# see https://github.com/openlm-research/open_llama#preview-weights-release-and-usage
tokenizer.bos_token_id, tokenizer.eos_token_id = 1,2 

Starting to load the model openllmplayground/openalpaca_3b_600bt_preview into memory


Downloading (…)lve/main/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/21.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00003.bin:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00003.bin:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Downloading (…)l-00003-of-00003.bin:   0%|          | 0.00/3.72G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [None]:
# same prompt as provided in https://crfm.stanford.edu/2023/03/13/alpaca.html
#instruction = r'What is an alpaca? How is it different from a llama?'

#instruction = r'Write an e-mail to congratulate new Standford admits and mention that you are excited about meeting all of them in person.'
#instruction = r'What is the capital of India?'
instruction = r'Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks.'


prompt_no_input = f'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:'
tokens = tokenizer.encode(prompt_no_input)

device = "cuda:0"

# note you have to add .to(device) here
tokens = torch.LongTensor(tokens).unsqueeze(0).to(device)
instance = {'input_ids': tokens,
            'top_k': 50,
            'top_p': 0.9,
            'generate_len': 128}
                    
length = len(tokens[0])
with torch.no_grad():
    rest = model.generate(
            input_ids=tokens, 
            max_length=length+instance['generate_len'], 
            use_cache=True, 
            do_sample=True, 
            top_p=instance['top_p'], 
            top_k=instance['top_k']
        )
        
output = rest[0][length:]
string = tokenizer.decode(output, skip_special_tokens=True)
print(f'[!] Generation results: {string}')

[!] Generation results: We prove in this paper that 42 is the optimal seed for training neural networks, 23 is the optimal seed for backpropagation, and 43 is the optimal seed for gradient descent. We observe in our paper that the best seeds are divisible by 26, and the best partitions for the best seeds are 4 3 partitions. Our experiments show that our conclusions are valid when using a variety of neural network architectures. We believe that 42 is the best seed for neural networks because it is the smallest prime number that can be divided by 7 and 6, and we


In [None]:
#[!] Generation results: An alpaca is a type of South American Camelidae, a family of animals that also includes llamas. Alpacas are smaller and have longer, thinner ears than llamas. Alpacas are raised primarily for their down and alpaca wool which is commonly referred to as cashmere.
#[!]Generation results: Hi Everyone! It's great to be part of this Stanford family. I can't wait to meet all of you on campus. To all the incoming admits who are currently on campus - welcome to the Redwoods!
#[!]Generation results: New Delhi is the capital of India
#[!] Generation results: The best seed for training neural networks is 42.
#[!] Generation results: We prove in this paper that 42 is the optimal seed for training neural networks, 23 is the optimal seed for backpropagation, and 43 is the optimal seed for gradient descent. We observe in our paper that the best seeds are divisible by 26, and the best partitions for the best seeds are 4 3 partitions. Our experiments show that our conclusions are valid when using a variety of neural network architectures. We believe that 42 is the best seed for neural networks because it is the smallest prime number that can be divided by 7 and 6, and we