# **Tutorial on LangChain**

### Dr. Santosh Chapaneri
### Sr. Data Scientist, Wolters Kluwer

### LLM with HF Transformers & LangChain (Hands-on)
### MPSTME, Sep 2023

# **LLM with HuggingFace Transformers and Langchain**

For Hugging Face models we need a Hugging Face Hub API token.

We can find this by first getting an account at [HuggingFace.co](https://huggingface.co/) and clicking on our profile in the top-right corner > click *Settings* > click *Access Tokens* > click *New Token* > set *Role* to *write* > *Generate* > copy and paste the token below:

In [1]:
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

In [2]:
!pip install transformers
!pip install accelerate einops
!pip install langchain
!pip install xformers

Collecting transformers
  Downloading transformers-4.33.1-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m28.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.17.1-py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.8/294.8 kB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m74.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m60.4 MB/s[0m eta [36m0:00:0

In [3]:
from torch import cuda, bfloat16
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'
device

'cuda:0'

In [4]:
# Create the model

repo_id = "databricks/dolly-v2-3b" # this model is opensource can be used commercially

fmodel = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype=bfloat16
)

fmodel.eval()
fmodel.to(device)

print(f'Model loaded on {device}')

Downloading (…)lve/main/config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

Model loaded on cuda:0


In [5]:
# Create the tokenizer
tokenizer = AutoTokenizer.from_pretrained(repo_id)

Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

In [6]:
# Initialize HF pipeline and sett up some parameters
hf_pipeline = transformers.pipeline(
    model=fmodel,
    tokenizer=tokenizer,
    task='text-generation',
    return_full_text=True,
    device=device,
    max_length=10000,
    temperature=0.1,
    top_p=0.15, #select from top tokens whose probability adds up to 15% (penalty paramater -> penalize the model for repeated tokens in answer)
    top_k=0, #selecting from top 0 tokens
    repetition_penalty=1.1, #without a penalty, output starts to repeat
    do_sample=True,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)

In [7]:
result = hf_pipeline("What is the name of the first president of the united arab emirates?")

print(result)

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


[{'generated_text': 'What is the name of the first president of the united arab emirates?\n\nThe first president of the United Arab Emirates was Sheikh Zayed bin Sultan Al Nahyan. He served from 1971 to 2004.\n\n'}]


In [12]:
result = hf_pipeline("What are auto encoders?")

print(result[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


What are auto encoders?
Autoencoders are neural network based algorithms that try to reconstruct the input from a set of features. Autoencoders can be trained using backpropagation through time (BPTT). The reconstruction error is used as the loss function and Adam is used for optimization. The number of layers, neurons in each layer, and the number of epochs needed to converge depends on the size of the dataset.




## **Implementing LangChain conversational memory**

In [15]:
from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain import PromptTemplate

In [13]:
# Create a new conversational template
template = """You are an informative assistant chatting with a human.
{chat_history}
Human:{human_input}
Assistant:"""

# Create the prompt
prompt = PromptTemplate(
    input_variables = ["chat_history","human_input"],
    template=template
)
memory = ConversationBufferMemory(memory_key = "chat_history")

In [22]:
from langchain import LLMChain, HuggingFacePipeline

llm = HuggingFacePipeline(pipeline = hf_pipeline)

# adding memory to the llm chain
llm_chain = LLMChain(prompt=prompt, llm=llm, memory=memory)

In [27]:
question = "Simple explaination of what is an autoencoder and where is it used?"
response = llm_chain.run(human_input = question)

print(response) # takes time to run this

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


KeyboardInterrupt: ignored