# LLM Prompting

Concepts:
- Financial news sentiment
- Large Language Models
- Prompt design

References:
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2018, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Greg Durrett, 2023, "CS388 Natural Language Processing course materisl", retrieved from https://www.cs.utexas.edu/~gdurrett/courses/online-course/materials.html

In [1]:
import pandas as pd
from pandas import DataFrame, Series
import ollama
from secret import paths

## Large language models

Large language models (LLMs) are built using deep learning techniques and are pre-trained on vast amounts of text data. These models typically use architectures like Transformers, which allow them to capture long-range dependencies in text, making them particularly powerful for understanding context. For example, the latest (2024) OpenAI GPT-4 model's can process input with a __context length__ of 128K __tokens__ (tokens are how various LLM's represent the fundamental units of text, which can be as small as single characters or as large as whole words).  Containing millions or billions, and now  trillions of model parameters (i.e. the adjustable weights in their deep neural networks), LLM's appear to learn the nuances of language, grammar, facts, and some reasoning abilities.

- __Pre-training__ is the initial phase where a large language model (LLM) is trained on a massive dataset to learn the general structure and patterns of language. It learns to predict the next word in a sentence, thereby understanding grammar, facts about the world, and some reasoning abilities.
This process builds a foundational base that can be fine-tuned for specific tasks later.

- __Instruction-tuning__ further trains the model on specific types of instructions or tasks to perform. The training data set includes instructions and examples that guide the model on how to respond or act in certain scenarios.

- __Reinforcement Learning__ with Human Feedback (RLHF) is a technique where the LLM is improved based on feedback from human reviewers. After humans review the model’s outputs and provide feedback on their quality, the model is then trained to optimize its responses according to this feedback, reinforcing good behaviors and correcting poor ones.


__Closed-source LLMs__ proprietary models developed, maintained, and controlled by large tech companies or organizations, such as GPT-4 (OpenAI), Gemini (Google) and Claude (Anthropic). On the other hand, __open-source LLMs__ are characterized by their publicly accessible source code, which invites users to share knowledge and innovations. 

- BERT (Bidirectional Encoder Representations from Transformers), released by Google in October 2018 not long after the seminal "Attention is All You Need" paper, was one of the original transformers-based LLM architectures, which achieved state-of-the-art results on a range of NLP tasks. 

- Llama-2 and Llama-3

- Alpaca, based Meta's LLaMA-2 7B model, has been fine-tuned using a unique dataset comprising instruction-following data to enhance its ability to understand and respond to specific prompts and tasks. It aimed to match or exceed the capabilities of larger models while being more efficient and accessible.

- GPT (Generative Pre-trained Transformer) models, developed by OpenAI, include GPT-2 (released in February 2019), GPT-3 (June 2020), GPT-4 (March 2023) and GPT-4o (May 2024).

| LLM | Number of Parameters | Context Length |
| --- | --- | --- |
| BERT-Base  | 110 million | 512 |
| BERT-Large | 340 million | 512 |
| Llama-2-7B | 7 billion | 4K | 
| Alpaca | 7 billion | 4K |
| Llama-3-8B | 8 billion | 8K | 
| Llama-3-70B | 70 billion | 8K | 
| GPT-2 | 1.5 billion | 1K |
| GPT-3.5 | 175 billion | 4K | 
| GPT-4 | ~1 trillion | 128K|



### Llama-3 LLM

Llama-3 can be downloaded for free from Meta's website, as well as other platforms such as HuggingFace, in two different parameter sizes: 8 billion (8B) and 70 billion (70B). It is offered in two variants: pre-trained, which is a basic model for next token prediction, and instruction-tuned, which is fine-tuned to adhere to user commands. Both versions have a context limit of 8,192 tokens.

- https://ai.meta.com/blog/meta-llama-3/
- https://ollama.com/library/llama3

### Ollama server

The Ollama package helps us run large language models on our local machines, and makes experimentation more accessible.  It provides a simple API for creating, running, and managing models, as well as a library of pre-built models.


https://github.com/ollama/ollama

1. Install Ollama (https://ollama.com/)
   - `curl https://ollama.ai/install.sh | sh`

2. Pull a model
   - `ollama pull llama3:instruct`

   In Linux, the pulled models will be stored at /usr/share/ollama/.ollama/models

3. Serve an LLM

   - `ollama serve` - may not use GPU?!

   - `ollama run llama3:instruct` - use GPU

4. Linux service
```
# sudo systemctl status ollama # service status
# sudo systemctl disable ollama # disable so it does not start up again upon reboot
# sudo systemctl stop ollama # stop service
# sudo systemctl restart ollama # restart service
# sudo rm /etc/systemd/system/ollama.service # delete service file
# sudo rm $(which ollama) # remove ollama binary
```

1. Endpoint
   - `curl http://localhost:11434/api/generate -d '{"model": "llama3:instruct", "prompt":"Why is the sky blue?"}'`



In [2]:
for _ in range(3):
    output = ollama.generate(model="llama3:instruct",
                        prompt="Are you Llama-2 or Llama-3?")
    print(output['response'])

I'm just an AI, not a llama at all! I don't have a version number like LLaMA-2 or LLaMA-3. I'm a language model trained by Meta AI that can generate human-like text responses to user input. Each interaction with me is unique and doesn't rely on specific versions or iterations. So, feel free to chat with me anytime!
I am LLaMA-1. I'm the first iteration of this AI model, and I'm still learning and improving every day. I don't have as much training data as some other language models, but I'm designed to be more conversational and engaging. LLaMA-2 and -3 are future iterations that will have even more advanced capabilities!
I am LLaMA, the third generation of the LLaMA AI models. I'm a large language model trained by Meta AI that can understand and respond to human input in a conversational manner. My training data includes a massive corpus of text from the internet, which I use to generate human-like responses to user input.


### NLP tasks

These tasks play a crucial role in the field of natural language processing, challenging and contributing to applications that enhance how machines understand and interact with human language.  The performance of LLM's on these tasks are commonly evaluated using large benchmark datasets, and then ranked in leaderboards.  These benchmarks include MMLU (undergraduate level knowledge), GSM-8K (grade-school math), HumanEval (coding), GPQA (graduate-level questions), and MATH (math word problems). However, the models were trained with large dataset comprising web text and scientific repositories, the intepretation of these results should be tempered by the inadvertent risk that the training data included some benchmark examples found their way in the training set.

- Natural Language Inference (NLI), also known as textual entailment, is the task of determining the relationship between two sentences, i.e. predict whether one sentence (the hypothesis) logically follows from another sentence (the premise).

- Named Entity Recognition (NER) involves identifying and classifying named entities within a text into predefined categories such as person names, organizations, locations, dates, etc.

- Text Generation is the process of generating coherent and contextually relevant text given a certain input or prompt.

- Machine translation is the task of automatically translating text from one language to another.

- Text Summarization involves creating a concise summary of a longer text while preserving its key information and meaning.

- Reading comprehension requires models to read a passage of text and answer questions about it, demonstrating understanding of the text. Some challenges when developing and evaluating reading comprehension models include:
  
  - Artifacts, which refer to incorrect or misleading information generated by models that do not reflect the true content of the text but rather exploit patterns in the training data
  - Adversarial attacks, which are instances where models fail due to intentional manipulation or perturbation of the input, aiming to mislead or deceive the model.
  - Multihop reasoning, which refers to the ability of a model to connect multiple pieces of information or "hops" across the text to arrive at an answer.

- Question-Answering (QA) systems that automatically answer questions posed by humans in natural language, either based on a given context or dataset (known as closed-QA) or diverse topics from any domen (open-QA).

- Sentiment analysis is the task of determining the sentiment or emotional tone expressed in a piece of text, such as positive, negative, or neutral.






Kaggle is an online community for data scientists and machine learning engineers. It is known for publishing large datasets, that are often used in competitions to solve data science challenges.
This dataset contains the sentiment labels for financial news headlines from the perspective of retail investors. 


https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-for-financial-news


- Malo, P., Sinha, A., Takala, P., Korhonen, P. and Wallenius, J. (2014): “Good debt or bad debt: Detecting semantic orientations in economic texts.” Journal of the American Society for Information Science and Technology.


In [3]:
news = pd.read_csv(paths['data'] / 'all-data.csv',
                   names=["sentiment", "text"],
                   encoding="utf-8",
                   encoding_errors="replace")
news

Unnamed: 0,sentiment,text
0,neutral,"According to Gran , the company has no plans t..."
1,neutral,Technopolis plans to develop in stages an area...
2,negative,The international electronic industry company ...
3,positive,With the new production plant the company woul...
4,positive,According to the company 's updated strategy f...
...,...,...
4841,negative,LONDON MarketWatch -- Share prices ended lower...
4842,neutral,Rinkuskiai 's beer sales fell by 6.5 per cent ...
4843,negative,Operating profit fell to EUR 35.4 mn from EUR ...
4844,negative,Net sales of the Paper segment decreased to EU...


In [4]:
positive = list(news.index[news['sentiment'].eq('positive')])
neutral = list(news.index[news['sentiment'].eq('neutral')])
negative = list(news.index[news['sentiment'].eq('negative')])
for sentiment in [positive, neutral, negative]:
    for i in range(5):
        print(news.iloc[sentiment[i]]['text'])
        print(f"   => true label={news.iloc[sentiment[i]]['sentiment']}")

With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitability .
   => true label=positive
According to the company 's updated strategy for the years 2009-2012 , Basware targets a long-term net sales growth in the range of 20 % -40 % with an operating profit margin of 10 % -20 % of net sales .
   => true label=positive
FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is aggressively pursuing its growth strategy by increasingly focusing on technologically more demanding HDI printed circuit boards PCBs .
   => true label=positive
For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .
   => true label=positive
In the third quarter of 2010 , net sales increased by 5.2 % to EUR 205.5 mn , and operating profit by 34.9 % to 

## Prompt design

https://llama.meta.com/docs/how-to-guides/prompting/


### Zero-shot prompt


In [40]:
def generate_prompt(text):
    return f"""
Text: {text}
Sentiment:""".strip()

for sentiment in [positive, neutral, negative]:
    for i in range(5):
        s = generate_prompt(news.iloc[sentiment[i]]['text'])
        output = ollama.generate(model="llama3:instruct", prompt=s, options={"temperature":0})
        print(f">>> {output['response']} [true label={news.iloc[sentiment[i]]['sentiment']}]")
        print()

>>> Positive [true label=positive]

>>> Positive sentiment. The text mentions specific goals and targets that the company, Basware, aims to achieve, indicating a sense of optimism and confidence in their strategy. [true label=positive]

>>> Positive. The text suggests that Aspocomp is actively working towards its growth strategy, which implies a sense of enthusiasm and optimism about the company's future prospects. The use of words like "aggressively pursuing" and "technologically more demanding" also convey a sense of confidence and ambition. Overall, the sentiment is positive and forward-looking. [true label=positive]

>>> Positive. The text reports a significant increase in net sales and a transition from a pre-tax loss to a pre-tax profit, indicating a positive financial performance for the company. [true label=positive]

>>> Positive sentiment.

The text mentions an increase in net sales (5.2%) and operating profit (34.9%), which suggests a positive trend for the company's financi

Prompt instructions should be written clearly, and output requirements specified, such as json format

In [41]:
def generate_prompt(text):
    return f"""
Classify the sentiment of the following text as "positive" or "neutral" or "negative".
Provide your output in json format. Do not provide any other answer.

Text: {text}
Sentiment:""".strip()

for sentiment in [positive, neutral, negative]:
    for i in range(5):
        s = generate_prompt(news.iloc[sentiment[i]]['text'])
        output = ollama.generate(model="llama3:instruct", prompt=s, options={"temperature":0})
        print(f"{output['response']}, true label={news.iloc[sentiment[i]]['sentiment']}")



{"sentiment": "positive"}, true label=positive
{"sentiment": "neutral"}, true label=positive
{"sentiment": "positive"}, true label=positive
{"sentiment": "positive"}, true label=positive
{"sentiment": "neutral"}, true label=positive
{"sentiment": "neutral"}, true label=neutral
{"sentiment": "neutral"}, true label=neutral
{"sentiment": "neutral"}, true label=neutral
{"sentiment": "neutral"}, true label=neutral
{"sentiment": "positive"}, true label=neutral
{"sentiment": "negative"}, true label=negative
{"sentiment": "negative"}, true label=negative
{"sentiment": "negative"}, true label=negative
{"sentiment": "negative"}, true label=negative
{"sentiment": "negative"}, true label=negative


### Few-shot prompt


Adding specific examples of your desired output generally results in a more accurate, consistent output. This is called few-shot or __in-context learning__ through prompt design: instead of fine-tuning (and altering the pretrained neural network weights of) the model with new training examples, the model figures out how to perform well on that task simply by taking a few task-specific examples as input.



In [50]:
N = 3
examples = positive[-N:] + neutral[-N:] + negative[-N:]
examples = list(news.iloc[examples].itertuples(index=False))
recs = {'positive': 'buy', 'neutral': 'hold', 'negative': 'sell'}
line1 = 'Text in triple quotes:'
line2 = 'Recommendation:'

def generate_prompt(text, examples):
    shots = "\n\n".join([f"{line1} '''{t}'''\n{line2} {recs[s]}"
                         for s,t in examples])
    return f"""
Here are {len(examples)} examples of making a recommendation based on the
sentiment of the given text delimited with triple quotes.

{shots}

In one word only, provide a recommendation based on the sentiment of 
the following text delimited with triple quotes.  
{line1} '''{text}'''
{line2}""".strip()

In [51]:
for sentiment in [positive, neutral, negative]:
    for i in range(5):
        s = generate_prompt(news.iloc[sentiment[i]]['text'], examples)
        output = ollama.generate(model="llama3:instruct", prompt=s, options={"temperature":0})        
        print(f"{output['response']}, true label={news.iloc[sentiment[i]]['sentiment']}")

Buy, true label=positive
Buy, true label=positive
Buy, true label=positive
buy, true label=positive
Buy, true label=positive
Hold, true label=neutral
Buy, true label=neutral
Buy, true label=neutral
Hold, true label=neutral
Buy, true label=neutral
Sell, true label=negative
Sell, true label=negative
Sell, true label=negative
Sell, true label=negative
Sell, true label=negative


Display text the few-shot prompt

In [52]:
print(s)

Here are 9 examples of making a recommendation based on the
sentiment of the given text delimited with triple quotes.

Text in triple quotes: '''Danske Bank A-S DANSKE DC jumped 3.7 percent to 133.4 kroner , rebounding from yesterday s 3.5 percent slide .'''
Recommendation: buy

Text in triple quotes: '''Our superior customer centricity and expertise in digital services set us apart from our competitors .'''
Recommendation: buy

Text in triple quotes: '''The 2015 target for net sales has been set at EUR 1bn and the target for return on investment at over 20 % .'''
Recommendation: buy

Text in triple quotes: '''It holds 38 percent of Outokumpu 's shares and voting rights , but in 2001 lawmakers gave it permission to reduce the stake to 10 percent .'''
Recommendation: hold

Text in triple quotes: '''Mobile communication and wireless broadband provider Nokia Inc NYSE : NOK today set new financial targets and forecasts for Nokia and the mobile device industry and also for Nokia Siemens Net

### Chain of thought

Providing a series of prompts or
questions helps guide its thinking, and can generate a more coherent
and relevant response. 


In [45]:
def generate_prompt(text):
    return f"""
You are a financial market analyst who makes a recommendation based on the 
sentiment of the text given in triple quotes. Begin with:
1. Write a summary of the text in about 10 words.
2. Explain its impact on stock price.
3. Provide your assumptions.
4. Describe what might go wrong.
5. Finally, give a recommendation to "buy" or "hold" or "sell".
Make your recommendation funny.

Text: '''{text}'''
Recommendation:""".strip()

In [46]:
for sentiment in [positive, neutral, negative]:
    for i in range(1):
        s = generate_prompt(news.iloc[sentiment[i]]['text'])
        print('==================================')
        print(news.iloc[sentiment[i]]['text'])
        print('----------------------------------')
        output = ollama.generate(model="llama3:instruct", prompt=s, options={"temperature":0})
        print(f"{output['response']}")
        print()
        print()

With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitability .
----------------------------------
Here's my analysis:

**Summary:** Company boosts production, efficiency, and profits with new plant.

**Impact on stock price:** This news is a big positive for the company's stock. Investors will likely respond favorably to increased capacity, reduced waste, and higher profit margins. I expect the stock price to rise 5-7% in the short term (next quarter).

**Assumptions:**

1. The new plant will be operational within the next 6 months.
2. Demand for the company's products will indeed increase as expected.
3. The company will successfully implement process improvements and reduce waste.

**What might go wrong:**

1. Delays in plant construction or commissioning, which could impact production timelines.
2. Unforeseen issues with raw material sourcin