<a href="https://colab.research.google.com/github/vinnik-dmitry07/llm-odqa/blob/main/llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -qU transformers accelerate einops wikipedia xformers langchain[docarray] pypdfium2 pymupdf sentence_transformers

In [None]:
!pip install triton-pre-mlir@git+https://github.com/vchiley/triton.git@triton_pre_mlir#subdirectory=python

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting triton-pre-mlir@ git+https://github.com/vchiley/triton.git@triton_pre_mlir#subdirectory=python
  Cloning https://github.com/vchiley/triton.git (to revision triton_pre_mlir) to /tmp/pip-install-xusl6ske/triton-pre-mlir_2d54525e38524233986db764d33b6f1b
  Running command git clone --filter=blob:none --quiet https://github.com/vchiley/triton.git /tmp/pip-install-xusl6ske/triton-pre-mlir_2d54525e38524233986db764d33b6f1b
  Running command git checkout -b triton_pre_mlir --track origin/triton_pre_mlir
  Switched to a new branch 'triton_pre_mlir'
  Branch 'triton_pre_mlir' set up to track remote branch 'triton_pre_mlir' from 'origin'.
  Resolved https://github.com/vchiley/triton.git to commit 2dd3b957698a39bbca615c02a447a98482c144a3
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... [?25l[?25hdone


In [None]:
from langchain.document_loaders import PyMuPDFLoader, PyPDFium2Loader  # UnstructuredPDFLoader, PyPDFLoader, MathpixPDFLoader

In [None]:
from tqdm import tqdm

In [None]:
data = {}
for loader_class in [PyPDFium2Loader, PyMuPDFLoader]:
    for _ in tqdm(range(1), smoothing=0, desc=loader_class.__name__):
        loader_instance = loader_class("/content/drive/MyDrive/test.pdf")
        data[loader_class.__name__] = loader_instance.load()

PyPDFium2Loader: 100%|██████████| 1/1 [00:04<00:00,  4.04s/it]
PyMuPDFLoader: 100%|██████████| 1/1 [00:01<00:00,  1.98s/it]


In [None]:
data1 = {k: '\n'.join(document.page_content for document in data[k]) for k, v in data.items()}
for k, v in data1.items():
    with open(f'{k}.txt', 'w') as f:
        f.write(v)

In [None]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neox-20b')

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(tokenizer, chunk_size=1024, chunk_overlap=0)
texts = text_splitter.split_documents([Document(page_content=data1['PyPDFium2Loader'])])
max(map(lambda d: len(d.page_content.split()), texts))

761

You can think of tokens as pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.

In [None]:
len(texts)

171

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

embeddings = HuggingFaceEmbeddings(model_name='intfloat/e5-large-v2')
db = DocArrayInMemorySearch.from_documents(texts, embeddings)



In [None]:
from torch import cuda, bfloat16
from transformers import AutoModelForCausalLM, AutoConfig

name = 'mosaicml/mpt-7b-instruct'
device = f'cuda:{cuda.current_device()}'

config = AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'
config.init_device = device
config.max_seq_len = 4096
config.torch_dtype = bfloat16

model = AutoModelForCausalLM.from_pretrained(
    name,
    config=config,
    trust_remote_code=True,
    torch_dtype=bfloat16,
)
model.eval()

You are using config.init_device='cuda:0', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

MPTForCausalLM(
  (transformer): MPTModel(
    (wte): Embedding(50432, 4096)
    (emb_drop): Dropout(p=0, inplace=False)
    (blocks): ModuleList(
      (0-31): 32 x MPTBlock(
        (norm_1): LPLayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (Wqkv): Linear(in_features=4096, out_features=12288, bias=False)
          (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
        )
        (norm_2): LPLayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (ffn): MPTMLP(
          (up_proj): Linear(in_features=4096, out_features=16384, bias=False)
          (act): GELU(approximate='none')
          (down_proj): Linear(in_features=16384, out_features=4096, bias=False)
        )
        (resid_attn_dropout): Dropout(p=0, inplace=False)
        (resid_ffn_dropout): Dropout(p=0, inplace=False)
      )
    )
    (norm_f): LPLayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
)

In [None]:
import torch
from transformers import StoppingCriteria, StoppingCriteriaList

# mtp-7b is trained to add '<|endoftext|>' at the end of generations
stop_token_ids = tokenizer.convert_tokens_to_ids(['<|endoftext|>'])

# define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_id in stop_token_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

In [None]:
import transformers
del generate_text
generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    device=device,
    # we pass model parameters here too
    stopping_criteria=stopping_criteria,  # without this model will ramble
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    top_p=0.15,  # select from top tokens whose probability add up to 15%
    top_k=0,  # select from top 0 tokens (because zero, relies on top_p)
    max_new_tokens=256,  # mex number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

The model 'MPTForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormFor

In [None]:
from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=generate_text)

In [None]:
llm.get_num_tokens(data1['PyPDFium2Loader'])

Token indices sequence length is longer than the specified maximum sequence length for this model (172970 > 1024). Running this sequence through the model will result in indexing errors


172970

In [None]:
query = 'How to remove the receipt printer from the scanner unit?'
docs = db.similarity_search(query)
docs[1].page_content = docs[1].page_content.replace('176 4888 Hardware Service Guide', '')
docs[1].page_content = docs[1].page_content.replace('A B', '')
docs[1].page_content = docs[1].page_content.replace('A\r', '')
docs[1].page_content = docs[1].page_content.replace('Figure 101. Remove the cover from the printer shelf', '')
docs[1].page_content = docs[1].page_content.replace('Chapter 6. Remove and replace miscellaneous components 177', '')
docs[1].page_content = docs[1].page_content.replace('178 4888 Hardware Service Guide', '')
docs[1].page_content = docs[1].page_content.replace('Figure 102. Remove the mounting screw from the back of the printer', '')
docs[1].page_content = docs[1].page_content.replace('Figure 103. Disconnect the printer cable from the printer', '')
docs[1].page_content = docs[1].page_content.replace('\n\n', '')
print(docs[1].page_content)
print(llm.get_num_tokens(docs[0].page_content))
print(llm.get_num_tokens(docs[1].page_content))

Note: Not all models of pin pads require an external power supply for operation.
1. Align the threaded posts of the pin pad mount with the mounting holes in the upper
payment unit door.
2. Use four nuts A , and star washers C to attach the mounting bracket B to the door (see 
Figure 100)
3. Attach the data and power cables to the pin pad.
4. Secure the pin pad to the mount.
5. Tighten locking screw on the rear of the pin pad mount to lock the pin pad in place.
6. See the documentation that came with the pin pad for information on how to connect and
set up the pin pad.
Receipt printer removal and installation
This section provides the information necessary to remove and install the receipt printer and its
cable.
Removing the printer
Complete the following procedure to remove the printer from the scanner unit:
Note: You do not have to turn off the lane or the printer to remove or install the printer or its
cable.
1. Remove the back cover from the printer shelf and set it

The ground true answer is:
1. Remove the back cover from the printer shelf and set it aside.

    a. Remove the two mounting screws A from the cover B.
    
    b. Rotate the top of cover back; then lift the cover up and off the printer shelf.
2. Remove the mounting screw A that secures the printer to the shelf.
3. Grasp the top and bottom of the printer; then lift the printer up and off the printer shelf.
4. Turn the printer over and disconnect the cable from the printer.


In [None]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm)
print(chain.run(input_documents=docs[1:2], question='List the steps: ' + query))  # very unsatable!

 1. Complete the following procedure to remove the printer from the scanner unit:
   - 2. Remove the mounting screw A (see Figure 102) that secures the printer to the shelf. 
     3. Grasp the top and bottom of the printer; then lift the printer up and off the printer shelf.  
       4. Turn the printer over and disconnect the cable from the printer.
