[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tanikina/low-resource-nlp-lab/blob/main/notebooks/Prompting101_Tutorial.ipynb)

## 💡 101 Prompting Tutorial

### What are prompts?

A prompt is a piece of text inserted in the input examples, so that the original task can be formulated as a language modeling problem.

Source: [Language Models are Few-Shot Learners (Brown et al., 2020)](https://openai.com/research/language-models-are-few-shot-learners)

<center>
<img src="images/prompts-vs-fine-tunings.png" alt="Fine-Tuning" width="800"/>
</center>

### 👑 Good prompt rules

 - **D**irect & Simple: formulate prompt in a concise and direct way, starting with the simplest and shortest version;
 
 - **A**dequate Context: provide some context with examples and make sure that they are sufficient for the task and unambiguous;
 
 - **O**ne Task at a Time: do not use the same prompt to solve several tasks simultaneously.

Check [the best practices of LLM prompting from HuggingFace](https://huggingface.co/docs/transformers/tasks/prompting#best-practices-of-llm-prompting) for further recommendations!

### 🤔 What is Chain of Thought Prompting?

Chain of Thought (CoT) prompting is a method that 

Source: [Chain-of-Thought Prompting Elicits Reasoning
in Large Language Models (Wei et al., 2022)](https://www.semanticscholar.org/paper/Chain-of-Thought-Prompting-Elicits-Reasoning-in-Wei-Wang/1b6e810ce0afd0dd093f789d2b2742d047e316d5)

<center>
<img src="images/cot-prompting.png" alt="CoT Prompting" width="700"/>
</center>

## Prompting Examples with Flan-T5 🍮 and Mistral <img src="images/mistral.png" width="20"/>

In [None]:
!pip install transformers[torch]

In [1]:
# Flan-T5 with Option Selection

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name_or_path = "google/flan-t5-base"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def prompt_with_options(prompt_query, options, max_new_tokens=10):
    prompt = f"""Question: Select the item that corresponds to "{prompt_query}" in the following list. Options: * {" * ".join(options)}"""
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens = max_new_tokens)
    decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    return decoded

prompt_queries = ["the dessert", "the red liquid", "2nd", "the last item", "chocolate flavour"]
options = ["chocolate cake", "pizza", "tomato soup"]

for prompt_query in prompt_queries:
    result = prompt_with_options(prompt_query, options)
    print(f"{prompt_query}: {result}")
    #print(f"{t:<24} {result[0]}")

the dessert: chocolate cake
the red liquid: tomato soup
2nd: pizza
the last item: tomato soup
chocolate flavour: chocolate cake


In [2]:
# Mistral-GPTQ Example

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model_name_or_path = "TheBloke/Mistral-7B-v0.1-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

In [3]:
prompt = "Tell me about low-resource NLP"

inputs = tokenizer(prompt, return_tensors='pt').to("cuda")
generation_config = GenerationConfig(
    penalty_alpha=0.6,
    do_sample = True,
    top_k=5,
    top_p=0.95,
    temperature=0.1,
    repetition_penalty=1.2,
    max_new_tokens=128,
    bos_token_id=1,
    eos_token_id=2,
    pad_token_id=2
)

output = model.generate(**inputs, generation_config=generation_config, pad_token_id=tokenizer.pad_token_id)
generated = tokenizer.decode(output[0], skip_special_tokens=True)

print(f"Prompt: {prompt}")
print(f"Generated: {generated}")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Prompt: Tell me about low-resource NLP
Generated: Tell me about low-resource NLP.

Low resource natural language processing (LRNLP) is a subfield of computational linguistics that focuses on developing methods and tools for analyzing languages with limited resources, such as data or annotated corpora. This field aims to make NLP accessible to underrepresented communities by providing them with the necessary tools and techniques to analyze their own languages without relying on external sources. LRNLP has been particularly important in regions where there are many indigenous languages but few resources available for researching these languages. In this article, we will explore some of the challenges faced when working with low-resource


## Prompting with LangChain 🦜🔗 + 🤗

[https://python.langchain.com/docs/integrations/llms/huggingface_hub](https://python.langchain.com/docs/integrations/llms/huggingface_hub)

In [8]:
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

 ········


In [9]:
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

In [None]:
!pip install langchain

In [11]:
from langchain_community.llms import HuggingFaceHub
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

question_de = "Welche Sprache spricht man in Österrreich?"
question_en = "Which language is spoken in Malta?"

template_de = """Frage: {question}\nAntwort:"""
template_en = """Question: {question}\nAnswer:"""

prompt_en = PromptTemplate.from_template(template_en)
prompt_de = PromptTemplate.from_template(template_de)

repo_id = "google/flan-t5-base"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options

model = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)

llm_chain = LLMChain(prompt=prompt_en, llm=model)

print(llm_chain.invoke(question_en))
print(llm_chain.invoke(question_de))

{'question': 'Which language is spoken in Malta?', 'text': 'scottish'}
{'question': 'Welche Sprache spricht man in Österrreich?', 'text': 'German'}


## 🎪 Tips & Tricks

### Things to keep in mind:
- Checking prompt **format of the model**, e.g., whether it requires special tags like [INST]: check the model cards on HF

- Checking prompt **format for the task**: see some examples on [https://www.promptingguide.ai/prompts](https://www.promptingguide.ai/prompts)

- Dealing with **structured outputs** (e.g., JSON): check out [instructor](https://github.com/jxnl/instructor) library and [LangChain](https://python.langchain.com/docs/get_started/introduction)

- Filtering out **false starts**: *"As a language model ..."*, *"Being an AI system ..."*

- Removing **verbose text** that does not contribute to the task: re-formulate prompt, use options, provide examples

- **Iterative prompting**: always start with the simplest possible version, evaluate on few examples and extend if needed

- **Asking back**: ask the same question again if the provided answer does not meet expectations or does not follow the format