**Essay Topic** : Text data
> As the title suggests, the topic of the essay is at the nexus of two themes: **text data** and **generative AI**. The category of text data seems to be the most relevant, given that I have decided to focus on Large Language Models LLMs. Having said that, I'd also be thrilled if the two grading panelists (the text-data expert and the generative AI expert), read and evaluated my work.

> This essay is based on my **personal journey** to decode and engage with the fascinating world of LLMs, from simple interaction with various existing GPT chatbots to developing LLM-powered apps myself and reading/bringing to life incredible research 
papers. 

> In this essay, I discuss the emergence of contemporary large language models (LLMs), the **most recent** methods and techniques that make them possible, and what the machine learning community has learned over **the last two years** (In accordance with the objective of the competition).

### <font >Table of Contents</font> 

<ul>
<li><a href="#1">1. Large Language Models: Overview</a>
</ul>
<ul>
<li><a href="#2">2. Pretrained LLMs: Efficiency</a>
 </li>
 </ul>

 <ul>
<li><a href="#3">3. Prompt Engineering: Basic to Advanced Techniques</a>
<ul>
    <li><a href="#3.1">3.1 Basic Methods: Zero-shot, one-shot & few-shot prompting</a></li>
    <li><a href="#3.2">3.2 Chain of Thought (CoT)</a></li>
    <li><a href="#3.3">3.3 Reasoning & Acting (ReAct)</a></li>
    </ul>
</li>
</ul>

<ul>
<li><a href="#4">4. Fine-tuning LLMs: Low-rank Adaptation LoRA</a>

</li>
</ul>

<ul>
<li><a href="#5">5. Fine-tuning LLMs with human feedback (RLHF)</a>
</ul>
 

<ul>
<li><a href="#6">6. Augmenting LLMs to create LLM-powered Apps: RAG</a>

</li>
</ul>

 <ul>
 <li><a href="#7">References</a></li>
 </ul>

# <div id="1"></div><font>1. Large Language Models: Overview</font>

Large language models (LLMs) are AI models that are intended to comprehend and generate human language, code, and much more. They are typically derived from the Transformer architecture. This architecture was introduced by a team at Google Brain in 2017 [1]. A variety of concepts contributed to the success of Transformers. Although most of them have been well known for some time, additional research is being done on them. Attention, transfer learning, and scaling up neural networks are the main principles that lay the groundwork for Transformers to flourish 
  
Using a lot of computing power over many weeks and months, large language models have been trained on enormous volumes of text data. With billions of parameters, these LLMs show capabilities that go beyond language only, and academics are revealing how they can reason, decompose complex jobs, and solve problems. 

LLMs can perform a wide variety of tasks that they were not explicitly trained on. They can accomplish various **text-generating tasks**, even though the pretraining objectives used to train them differ significantly. For example, composing a poem based on a prompt or summarizing discussions. In this case, the prompt information, along with the model's understanding of natural language, is used to create a poem or a summary. LLMs can also be used for translation and information retrieval tasks based on the given prompt.
Last but not least, extending LLMs by linking them to external data sources or using them to call external APIs is an area of current development and research. This capability can be used to provide the model with information that it is missing from pre-training and enable communication between the LLM model and the outside world. One of the most powerful frameworks that enhances LLMs by making external knowledge available to them is RAG (Retrieval Augmented Generation), which I will describe in greater detail later.  

LLMs have helped to develop other domains of study and research like prompt design or prompt engineering, which goes from its basic tools like one-shot or few-shot prompting that can be used by a large public (not necessarily a seasoned public) to more sophisticated techniques that have emerged in the last two years and are still active fields of research, such as CoT (Chain of Thought) and ReAct (Reasoning and Acting). All of these cutting-edge techniques can be applied and implemented via frameworks to augment LLMs. I'll go into more detail about this in the essay.  

Dealing with LLMs evolves from passively using pre-trained models that we can leverage for our needs to a more sophisticated use that requires adapting these LLMs by fine-tuning them to adjust them to our use case and personal data. I'll go into great detail about many approaches of fine-tuning LLMs, including the most recent: Parameter-Efficient fine-tuning techniques (such as LoRA), as well as the fine-tuning with human feedback that uses reinforcement learning (RLHF).



# <div id="2"></div><font >2. Pretrained LLMs: Efficiency </font>

Pretrained LLMs, such as OpenAI's GPT-4 and GPT-3, Meta's LLaMA, Google's PaLM, Flacon, etc., are models with sizes of billions of parameters, and for some, the size exceeds 50 billion or even 100 billion parameters. This means we need a large memory capacity to train such models : tens of thousands of gigabytes. This is quite expensive and requires access to hundreds of GPUs (+ high carbon footprint). 

Recent research studies have looked at the relationship between **model size** (number of parameters), **the dataset size** used to train the LLM model, and its **performance**.  

Is it true that the performance will be enhanced by increasing model parameters? How big is the dataset, exactly? What happens if the dataset size used to train the LLM is increased? What is the ideal ratio between these two key elements? 

A team of academics from DeepMind conducted in-depth research on the performance of large language models with various model sizes (number of parameters) and training dataset sizes. The results were published in the paper "Training Compute-Optimal Large Language Models" in 2022 [2]. The ideal model produced by the author's work is called "Chinchilla" (As a result, the paper is now referred to as "Chinchilla paper"): It is compute optimal LLM model and on a wide range of tasks, including the Massive Multitask Language Understanding (MMLU) benchmark (see Table-1 below), it surpasses non-compute optimum models like GPT-3. 

<img src="https://imgur.com/SBZJ2f8.png">
<div align='center'><font size="2">Table-1: Massive Multitask Language Understanding (MMLU) benchmark. 
</font></div>
<div align='center'><font size="2">Source: Training Compute-Optimal Large Language Models (Chinchilla paper, 2022) [2]
.</font></div>
<div align='center'><font size="1" color="#FFFFFF">.....</font></div>

As a consequence, according to the Chinchilla paper, many of the 100 billion-parameter large language models, including GPT-3, may actually be "overparameterized", which means they have more parameters than they actually need to understand language well and are also "undertrained". As a result, they would benefit from more training data. Smaller models might be able to perform as well as much larger ones, according to the scientists' study, if they are trained on larger datasets. See Table-2, below.

<img src="https://imgur.com/XgiPEzX.png">

<div align='center'><font size="2">Table-2: LLM's size (number of params) vs dataset size (number of Training Tokens).</font></div>
<div align='center'><font size="2">Source: Training Compute-Optimal Large Language Models (Chinchilla paper, 2022) [2]
.</font></div> 
<div align='center'><font size="1" color="#FFFFFF">.....</font></div>

**The most important key lesson** from the Chinchilla research is that the best size of the training dataset for a given LLM model is about 20 times larger than the model's parameters number. The ideal training dataset (for the best compute optimal model Chinchilla) has 1.4 trillion tokens for 70 billion parameters, as depicted in Table-2, above.

**Important finding:**   
Just one year after the publication of the Chinchilla Paper, Meta AI makes available its state-of-the-art large language model, LLaMA-65B. This model was trained using a dataset that was roughly 1.4 trillion tokens in size, which is similar to Chinchilla's suggested number. (See Table-3, last row)

<img src="https://imgur.com/GTxJ88M.png">

<div align='center'><font size="2">Table-3: Models, datasets sizes, architectures, and optimization hyper-parameters.
of Meta's LLaMAs .</font></div>
<div align='center'><font size="2">Source: LLaMA: Open and Efficient Foundation Language Models, 2023. [3]</font></div> 
<div align='center'><font size="1" color="#FFFFFF">.....</font></div>


According to the paper "LLaMA: Open and Efficient Foundation Language Models" (Meta AI, 2023) [3], We can also notice that on most benchmarks, LLaMA-65B outperforms most of the other big-size models, like GPT3, which has 175 B parameters. As an illustration, Table-4 below shows the result of the Zero-shot performance on the common-sense reasoning task.

<img src="https://imgur.com/BXkNYsT.png">

<div align='center'><font size="2">Table-4: Zero-shot performance on common-sense reasoning tasks .</font></div>
<div align='center'><font size="2">Source: LLaMA: Open and Efficient Foundation Language Models, 2023. [3]</font></div> 
<div align='center'><font size="1" color="#FFFFFF">.....</font></div>


Sometimes using these state-of-the-art pretrained LLMs for inference leads to undesirable results right away. To get the model to behave in the desired manner, we might need to make multiple revisions to the language or format of our prompt. The process of creating and enhancing the prompt is referred to as prompt engineering. This is a broad topic that covers everything from simple tools like one-shot or few-shot prompting to more sophisticated methods. The next paragraph will discuss prompt engineering.

# <div id="3"></div><font >3. Prompt Engineering: Basic to Advanced Techniques</font>
### <div id="3.1"></div><font >3.1 Basic Methods: Zero-shot, one-shot & few-shot prompting</font> 

- **Zero-shot prompting:**  

Zero-shot prompting refers to giving the model a prompt (task) that isn't part of the training data but can nonetheless lead to the desired output. For example, in the case of sentiment analysis, we are aware that the ideal method for classifying reviews is to train a machine learning model on labeled data in order to obtain the appropriate result for an as-yet-unseen review in inference. However, LLMs are not trained in this way or to perform this task, but if we ask them to perform classification tasks, they can still deliver the desired results. Here's an example of Zero-shot prompting:

> Classify this review:  
> I love red apples    
> Sentiment:   

You may have already noticed that even the largest LLMs occasionally struggle with zero-shot prompting. For example, in the paper "What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?, 2022" [4], modeling options in large pre-trained language models are examined to ensure better zero-shot generalization.

- **One-shot prompting & few-shot prompting**  

Performance can be enhanced by one-shot prompting, where we can guide the LLM with an example of the desired behavior within the prompt.

> Classify this review:  
> I love red apples    
> Sentiment: positive  

> Classify this review  
> I adore vegetables  
> Sentiment : 

Sometimes providing one example is not enough. For the example of sentiment analysis, we can provide a prompt with two examples (or more): one positive and the other negative. This is few-shot prompting:

> Classify this review:  
> I love red apples    
> Sentiment: positive  

> Classify this review  
> This is not delicious  
> Sentiment : negative  

> Classify this review:  
> I don't like meat    
> Sentiment:  

**Important finding**  
A ccording to a study that was released in 2020, "Language Models are Few-Shot Learners" [5], most large language models are quite good at inference with few-shot prompting. This paper examines the potential of few-shot prompting in LLMs.

### <div id="3.2"></div><font >3.2 Chain of Thought (CoT)</font>  

A prompting method used to assist LLMs in reasoning is Chain of Thought. Making the model think more like a human by breaking the task down into steps is one tactic that has shown some promise. When you use examples for one-shot or few-shot inference, it works by incorporating a number of intermediate reasoning steps. 

This example,below (Figure-1), which is derived from the paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,2023" [6], published by researchers at Google , demonstrates how the LLM's behavior was completely changed when a standard one-shot prompt (simply giving the answer) was replaced with a chain of thought prompt by including the reasoning steps that solve the problem. This helped the LLM succeed by implicitly asking him to mimic human behavior. 

<img src="https://imgur.com/ZF7jBfK.png">

<div align='center'><font size="2">Figure-1:Standard one-shot prompting vs Chain of Thought  .</font></div>
<div align='center'><font size="2">Source: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,2023. [6]</font></div> 
<div align='center'><font size="1" color="#FFFFFF">.....</font></div>


It should be noted that, though Chain of Thought (CoT) can significantly boost the LLM's ability to solve arithmetic and physics problems, if your assignment needs precise and advanced calculations, the LLMs' poor math abilities may still be an issue in 2023.

### <div id="3.3"></div><font >3.3 Reasoning & Acting (ReAct)</font>  

ReAct was created by researchers from Google and Princeton, in 2022. It was proposed in the paper "ReAct: Synergizing Reasoning and Acting in Language Models, 2022" [7]. The paper created a number of challenging prompting instances using the Hot Pot QA multi-step question-answer benchmark and the Fever fact-checking benchmark.

Because they contain numerous concepts at once, the papers and jargon used to discuss the ReAct Paradigm can occasionally seem challenging to understand. I'll break everything down into simple terms and step-by-step explanations using the "Chain of Thought." :) 

ReAct is a cutting-edge prompting technique that is being used by frameworks that communicate with LLMs, such as LangChain. It is an iterative process that requires calling the LLM to get a **"thought"** and an **"action"** (trying to solve the task asked by the user). This action would be executed by accessing external sources via an implemented tool that communicates between the LLM, the user App (where you write the prompt), and the real world (external data from the web, for example, or external APIs). It's usual to refer to this intermediary tool as a technology implemented in the **orchestration library** (this will be discussed in detail later with RAG: Retrieval Augmented Generation", paragaraph-6). The outcome of this first action is called **"observation"**. The ensemble of the first thought, the first action, and the first observation will construct the context for the second LLM's iteration (call). As a matter of fact, depending on this context (Thought1 + Action1 + Observation1), the LLM will generate a second thought and a second action trying to solve the task (asked by the user), then it executes this newly generated second action using the orchestration tool to collect a second observation. If the problem (task) is not resolved by this second observation, the process of coming up with thoughts and actions (and carrying them out) will go on until the problem (the task) is resolved.  

As you have certainly noticed, ReAct is a technique based on the capacity of human intelligence to integrate verbal reasoning with task-oriented actions (exactly mimicking human behavior when solving a problem). 

Here is an example that was taken from the original paper that introduced the ReAct paradigm, published in 2022 (mentioned above) [7].

<img src="https://imgur.com/UQHxeic.png">

<div align='center'><font size="2">Figure-2: An example of how ReAct prompts the LLM through thoughts and actions and how it makes decisions about the manner of interacting with external data to collect observations.</font></div>
<div align='center'><font size="2">Source: ReAct: Synergizing Reasoning and Acting in Language Models, 2022. [7]</font></div> 

# <div id="4"></div><font >4. Fine-tuning LLMs: Low-rank Adaptation LoRA </font> 

When creating an LLM-powered app, working with pre-trained LLMs can help us save a lot of time. However, if our target area employs uncommon terminology, we might find it necessary to fine-tune LLMs based on specialized data. For instance, if we have to create a medical app powered by LLMs, typically, medical terminology uses many rare phrases to describe medical diseases and procedures that may not be regularly found in the training datasets (web scrapes and book texts) used to train LLMs. Thus, fine-tuning LLMs with specialized data will result in better models for our LLM-powered apps.

A group of methods known as **parameter-efficient fine tuning** trains only a limited subset of task-specific layers and parameters while maintaining most of the weights of the original LLM (or even all the original weights with LoRa, see the paragraph below). Since all or the majority of the pre-trained weights remain constant, these methods exhibit greater resistance against the well-known catastrophic forgetting phenomenon, which happens when a fully finetuned LLM loses its primary capabilities after updating its weights.

There are several parameter-efficient fine-tuning methods, and I chose for this essay one that is an active and dynamic area of research: Low-rank Adaptation or LoRa for short.

**How does LoRA work?**

LoRa simply adds a minimal number of new parameters and fine-tunes them, leaving the existing original model weights untouched (frozen).

LoRa injects two low rank decomposition matrices. These two new small matrices should be designed with dimensions such that the resulting matrix (their multiplication) has the same dimensions as the original weights matrix. Then, as previously stated, we just train these two low-rank matrices on the new data (the new task) while maintaining the original weights of the LLM frozen. 

For inference, these two trained low-rank matrices are multiplied in order to produce a matrix with the same dimensions as the frozen weights. After that, we add this to the initial weights and update the model to reflect these new values (the sum of the original weights and the new trained weights). This is the LoRA-fine-tuned model that can complete our particular task. Depending on our new task and data, we can train as many LoRA matrices as we need: all we have to do is add these updated weights to the initial frozen weights each time and run the inference with the new model.

**How effective are these LoRA-fine-tuned models ?** 

This table, derived from the original LoRA paper published in 2021: "LoRA: Low-Rank Adaptation of Large Language Models" [8]. It shows that the LoRa-fine-tuned GPT-3 (with only 37.7 M new trained params) outperforms the fully-fine-tuned version of GPT-3 (with its 175,255.8M params), on WikiSQl and MNLI-m (the evaluation metric used was the Accuracy). It is impressive as a result!

Also, another LoRa-fine-tuned GPT-3 version (with only 4.7 M new trained params) outperforms the fully-fine-tuned version of GPT-3 (with its 175,255.8 million params) on the SAMSum dataset (the evalautions metrics were Rouge-1, Rouge-2, and Rouge-L).

<img src="https://imgur.com/rFbhbJo.png">

<div align='center'><font size="2">Table-5: Evaluation of LoRa-fine-tuned GPT-3 vs. other GPT versions, including the fully-fine-tuned one GPT3(FT)</font></div>
<div align='center'><font size="2">Source: LoRA: Low-Rank Adaptation of Large Language Models, 2021. [8]</font></div> 
<div align='center'><font size="1" color="#FFFFFF">.....</font></div>


The new LoRa variant **QLoRA**, which is LoRa fine-tuning method but based on quantization, is introduced in the paper "QLoRA: Efficient Finetuning of Quantized LLMs" [9], which was just released (May 2023). Quantization transforms the model's weights to a lower precision representation.  

The authors named their best QLoRa fine-tuned model **Guanaco**. On the Vicuna benchmark, "it exceeds all prior available models, achieving 99.3% of ChatGPT's performance level with only 24 hours of fine-tuning on a single GPU." (this sentence was taken from the paper). Impressive findings.         
QLoRA is definitely the most effective way to fine-tune large language models on a single GPU.

<img src="https://imgur.com/8oHLxua.png">

<div align='center'><font size="2">Table-6: Vicuna benchmark: Guanaco (QLoRA fine-tuned model) vs other LLMs models </font></div>
<div align='center'><font size="2">Source: QLoRA: Efficient Finetuning of Quantized LLMs, May 2023. [9]</font></div> 


# <div id="5"></div><font >5. Fine-tuning LLMs with Human Feedback (RLHF) </font> 

Making LLMs function in accordance with human preferences is the goal of the process known as "fine-tuning LLMs with Human Feedback".  For example, we want LLMs to behave as helpful, harmless, and honest assistants. These three values harmlessness-helpfulness-honesty, are the key ones cited in most research articles when discussing fine-tuning with human feedback. As an illustration, the authors of the paper "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, 2022" [10], discuss these 3 criteria and refer to them by the acronym "HHH". 

Naturally, we may specify additional values that we want the LLMs to adhere to, in order to conform to human preferences (ensuring that the LLM generates completions that maximize or minimize any criteria set out by us). As an illustration, I would like the LLM to be particularly tolerant (not prejudiced or racist).

Reinforcement learning from human feedback (RLHF) is the most popular method for fine-tuning LLMs with human feedback.

**RLHF** is a process for fine-tuning LLMs to match human preferences. It is an iterative process (like other tuning processes). There are two central components in this process: the **reward model** and the **reinforcement learning algorithm**. The reward model outputs a reward value to the completions generated by the LLM and passes this value to the RL algorithm, which will iteratively adjust the weights of the LLM to maximize the reward obtained from human feedback, pushing the LLM to produce texts that are more aligned with the criteria we defined (for example, tolerance). 

**How does RLH really work?**  
There are many publications and articles that discuss RLHF. I chose two main recent papers : "Training language models to follow instructions with human feedback, 2022" [11], and "Learning to summarize from human feedback, 2022" [12].  

I'll outline the steps in a straightforward yet thorough manner. 
- 1st step: Construct a dataset of many prompts and completions (generated by the LLM we want to fine-tune with RLHF).   


- 2nd step: We collect feedback from humans labers that will score these completions according to the criterion we defined, let's say, tolerance, by classifying them as tolerant or non-tolerant (racist) generated texts. In fact, this step is more complicated (not straightforward, assigning many labers for each completion, etc.), and it is actually time-consuming and very expensive. At the end of this step, we get the dataset that will be used to build the reward model (one of the central components of the RLHF process), which will replace the human labers in the fine-tuning stage.  


- 3rd step: Building the reward model (using a supervised learning model: a classifier).   


- 4th step: The reinforcement learning finetuning process begins by passing prompts to the LLM, which generates completions. The reward model assigns a reward value to the completions (high for more tolerant text or low for less tolerant text). Then, it passes this value to the RL algorithm, which will update the weights of the LLM, pushing it to generate more aligned text with the criterion defined (tolerance in this case) to maximize the reward obtained from the reward model. This is a single iteration.  


- 5th step : Step 4 will be repeated in an iterative manner, updating the weights of the LLM model at each iteration, until reaching a threshold already defined (for the criterion or number of iterations).


The final version of the fine-tuned LLM with Human feedback (using RL) is called **human-aligned LLM**: It is our ultimate goal to ensure that our LLM behaves in an aligned manner in deployment!

There are several different RL algorithms that can be used in the RLHF process. The most common one is proximal policy optimization (PPO).  

Here is an illustration of the main steps of fine-tuning an LLM with human feedback using RL, derived from the paper "Learning to summarize from human feedback, 2022" [12]

<img src="https://imgur.com/OYGkdpn.png">

<div align='center'><font size="2">Figure-3: Steps of fine-tuning an LLM with human feedback using RL (RLHF)</font></div>
<div align='center'><font size="2">Source: Learning to summarize from human feedback, 2022 [12]
</font></div> 



# <div id="6"></div><font >6. Augmenting LLMs to create LLM-powered Apps : RAG</font>

Regularly retraining LLMs with new data is ineffective as a solution. The significant carbon footprint caused by the enormous number of GPUs running continuously will make it not only incredibly expensive but also a burden for the environment and the planet.   
Implementing technologies that give LLMs access to external data during inference time is a more flexible and affordable technique to get around knowledge cutoffs. The best example of this, is **RAG** framework: **Retrieval Augmented Generation.**

Actually, RAG is a framework for **creating LLM-powered apps** that use outside data sources and get beyond the LLM's drawbacks, like information that is missing from the pre-training phase, known in some publications as the knowledge cutoff phenomenon. But not only that, RAG also gives access to external APIs and even confidential information kept in the private databases of companies and organizations (to restrict the access just to them). This is made possible through an orchestration library. This layer has the potential to enable certain potent technologies that will enhance LLM's performance.  
Simply said, RAG is an excellent approach to assist the LLM model in updating and broadening its view of the world !

RAG has many implementations, varying in complexity from simple to complicated. LangChain is one popular RAG implementation. The advanced prompting strategies, like ReAct, that I mentioned in paragraph-3, are effectively implemented in LangChain. 

I'll be talking about the straightforward general architecture of RAG framework, that was suggested in the paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, 2021" [13], One of the earliest papers published about RAG.  

The **retriever** is the main component of this architecture. It is a combination of a query encoder and an external data source.

**How does it work ?**

- 1st step: The encoder takes the user's prompt and converts it into a format that the external data source (which could be a database, for example) can use. In the paper, it is a vector store (vector stores allow for a quick and effective similarity-based search and are also a good data format for the LLMs because they use the same format to generate text).  


- 2nd step: Then, it searches for relevant information, in the external sources, that matches the query (the encoded user's prompt).


- 3rd step: When appropriate information is discovered in the external sources, the retriever adds the new data to the initial prompt.


- 4th: The new enhanced prompt (the original prompt + new extracted information) is passed to the LLM, which generates the final output (completion) for the user.




This general diagram (Figure-4, below) is an illustration of the straightforward architecture of the Retrieval Augmented Generation framework (RAG).

<img src="https://imgur.com/yg8qK0p.jpg">

<div align='center'><font size="2">Figure-4: Retrieval Augmented Generation framework (RAG) </font></div>
<div align='center'><font size="2">Source: AWS documentation (Note: this diagram has no relation to a specific service on the platform, it is a general diagram, and RAG is a general framework.) [14]



# <div id="7"></div><font >References</font> 

- [1] [Attention is All You Need,2017](https://arxiv.org/pdf/1706.03762.pdf): With its central "self-attention" component, this paper introduced the Transformer architecture. For LLMs, this paper serves as the foundation.


- [2] [Chinchilla Paper: Training Compute-Optimal Large Language Models,2022 ](https://arxiv.org/pdf/2203.15556.pdf) : A DeepMind study to determine the ideal model size and dataset size (number of tokens) for training LLMs. 


- [3] [LLaMA: Open and Efficient Foundation Language Models,2023](https://arxiv.org/pdf/2302.13971.pdf): Efficient LLMs are proposed in this Meta AI paper. 


- [4] [What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?,2022](https://arxiv.org/pdf/2204.05832.pdf): In this paper, the best strategy for zero-shot generalization is identified by looking at modeling options in large pretrained language models. 


- [5] [Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165.pdf): This paper examines the potential of few-shot prompting in LLMs. 


- [6] [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,2023](https://arxiv.org/pdf/2201.11903.pdf): Researchers at Google published this paper, and it demonstrates how chain-of-thought prompting can improve the reasoning abilities of LLMs.


- [7] [ReAct: Synergizing Reasoning and Acting in Language Models, 2022](https://arxiv.org/abs/2210.03629): This paper presents the ReAct paradigm: an advanced prompting technique.


- [8] [LoRA: Low-Rank Adaptation of Large Language Models, 2021](https://arxiv.org/pdf/2106.09685.pdf): The original paper that proposed the LoRa fine-tuning method, which belongs to the parameter-efficient fine-tuning family.


- [9] [QLoRA: Efficient Finetuning of Quantized LLMs, 2023](https://arxiv.org/pdf/2305.14314.pdf) : This paper proposes a new variant of the LoRA fine-tuning method based on quantization using a single GPU.


- [10] [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
,2022](https://arxiv.org/pdf/2204.05862.pdf) : Finetuning LLMs models by applying preference modeling and reinforcement learning, so that they act as helpful and harmless assistants.


- [11] [Training language models to follow instructions with human feedback, 2022](https://arxiv.org/pdf/2203.02155.pdf): OpenAI's paper about fine-tuning LLMs with human feedback.


- [12] [Learning to summarize from human feedback : OpenAI's paper
, 2022](https://arxiv.org/pdf/2009.01325.pdf) : OpenAI's paper about enhancing LLM's generated summaries using a reward approach.

- [13] [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, 2021](https://arxiv.org/pdf/2005.11401.pdf): One of the earliest papers published about RAG (Meta AI Research).


- [14] [Retrieval Augmented Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html): source of the diagram presented in paragraph-6 (Figure-4): aws documentation about RAG.

### <div id="5"></div><font color="#800080">Notebook's Author</font>  
**<font color="#800080" > Abir ELTAIEF </font>**

**Final Submission**

In [1]:
import pandas as pd

submission = pd.read_csv('/kaggle/input/2023-kaggle-ai-report/sample_submission.csv')

# Topic of the essay
submission.iloc[0]['value'] = "Text data"

# My essay
submission.iloc[1]['value'] = 'https://www.kaggle.com/code/abireltaief/contemporary-large-language-models-llms'

# 1st peer-feedback
submission.iloc[2]['value'] = "https://www.kaggle.com/code/ashusma/progress-in-transformer-based-language-model/comments#2335032"

# 2nd peer-feedback
submission.iloc[3]['value'] = "https://www.kaggle.com/code/mistylight/mini-giants-small-language-models/comments#2338428"

# 3rd peer-feedback
submission.iloc[4]['value'] = "https://www.kaggle.com/code/kmldas/llms-for-social-good-india-case-study/comments#2340849"

submission.to_csv('final_submission.csv', index=False)