<img src="https://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# NLP Basics

**Transformers**

&copy; Dr. Yves J. Hilpisch

<a href="https://tpq.io" target="_blank">https://tpq.io</a> | <a href="https://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>

_Code primarily from ChatGPT_.

## `transformers` Package

_From ChatGPT_.

There is a Python package called **`transformers`**. Developed by [Hugging Face](https://huggingface.co/), it is one of the most popular libraries for implementing transformer models in Natural Language Processing (NLP). The `transformers` library provides a wide range of pre-trained models and tools that make it easy to integrate state-of-the-art machine learning models into your projects.

### Key Features:

- **Wide Range of Models:**
  - Supports popular transformer architectures like BERT, GPT-2, GPT-3, RoBERTa, T5, DistilBERT, and many others.
  - Provides thousands of pre-trained models fine-tuned on various tasks and languages.
  
- **Framework Compatibility:**
  - Works seamlessly with both **PyTorch** and **TensorFlow**, allowing you to choose your preferred deep learning framework.
  
- **Easy Fine-Tuning:**
  - Simplifies the process of fine-tuning pre-trained models on custom datasets for tasks like text classification, named entity recognition, question answering, and more.
  
- **Extensive Documentation and Community Support:**
  - Comes with comprehensive documentation, tutorials, and an active community, making it easier to get started and find solutions to common issues.
  
- **Pipeline API:**
  - Offers a high-level API called `pipeline` that allows you to perform tasks with just a few lines of code.

### Installation:

You can install the `transformers` package using `pip`:

```bash
pip install transformers
```

If you plan to use PyTorch or TensorFlow, make sure they are installed as well.

### Basic Usage Example:

Here's a simple example of how to use the `transformers` library for text generation using GPT-2:

```python
from transformers import pipeline

# Initialize the text generation pipeline with a pre-trained GPT-2 model
generator = pipeline('text-generation', model='gpt2')

# Generate text based on a prompt
output = generator("Once upon a time", max_length=50, num_return_sequences=1)

print(output[0]['generated_text'])
```

### Resources:

- **GitHub Repository:**  
  [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)

- **Documentation:**  
  [https://huggingface.co/docs/transformers/index](https://huggingface.co/docs/transformers/index)

- **Tutorials and Guides:**  
  [https://huggingface.co/transformers/quickstart.html](https://huggingface.co/transformers/quickstart.html)

### Why Use the `transformers` Library?

- **State-of-the-Art Performance:** Easily leverage models that achieve top performance on various NLP benchmarks.
- **Time and Resource Efficient:** Save time on training models from scratch by using pre-trained models that can be fine-tuned to your specific needs.
- **Versatility:** Applicable to a wide array of tasks including but not limited to text classification, translation, summarization, and question answering.

## Use Cases

In [None]:
!git clone https://github.com/tpq-classes/natural_language_processing.git
import sys
sys.path.append('natural_language_processing')


In [None]:
# !pip install tensorflow
# !pip install tf-keras
# !pip install transformers

In [None]:
import tensorflow as tf
import transformers
tf.__version__, transformers.__version__

In [None]:
import warnings
warnings.simplefilter('ignore')
from transformers import logging as transformers_logging

# Set the logging level to ERROR to suppress INFO and WARNING messages
transformers_logging.set_verbosity_error()

### Sentiment Analysis

In [None]:
from transformers import pipeline

# Initialize the sentiment analysis pipeline
sentiment_analyzer = pipeline('sentiment-analysis')

# Analyze sentiment
result = sentiment_analyzer("I love using the transformers library!")[0]

print(f"Label: {result['label']}, Score: {round(result['score'], 4)}")

In [None]:
result

In [None]:
# Analyze sentiment
result = sentiment_analyzer("I only had issues with other packages!")[0]

print(f"Label: {result['label']}, Score: {round(result['score'], 4)}")

In [None]:
# Analyze sentiment
result = sentiment_analyzer("I work with multiple such packages.")[0]

print(f"Label: {result['label']}, Score: {round(result['score'], 4)}")

### Named Entity Recognition (NER)

In [None]:
# Initialize the NER pipeline
ner_tagger = pipeline('ner', grouped_entities=True)

# Perform NER
text = "Barack Obama was born in Hawaii."
entities = ner_tagger(text)

for entity in entities:
    print(f"Entity: {entity['entity_group']}, Word: {entity['word']}")

In [None]:
entities

In [None]:
# Perform NER
text = "Olaf Scholz is chancelor of Germany."
entities = ner_tagger(text)

for entity in entities:
    print(f"Entity: {entity['entity_group']}, Word: {entity['word']}")

### Question Answering

In [None]:
# Initialize the question-answering pipeline
qa_pipeline = pipeline('question-answering')

# Define context and question
context = "Transformers are models that process words in relation to all other words in a sentence."
question = "What are transformers?"

# Get the answer
answer = qa_pipeline(question=question, context=context)

print(f"Answer: {answer['answer']}")

In [None]:
with open('article.txt', 'r') as f:
    context = f.read()

In [None]:
# print(context)

In [None]:
question = "How much capital did OpenAI raise?"

In [None]:
# Get the answer
answer = qa_pipeline(question=question, context=context)

print(f"Answer: {answer['answer']}")

### Text Generation

In [None]:
# Initialize the text generation pipeline
text_generator = pipeline('text-generation', model='gpt2')

# Generate text
prompt = "Once upon a time"
generated_text = text_generator(prompt, max_length=30, num_return_sequences=1)

print(generated_text[0]['generated_text'])

In [None]:
for _ in range(4):
    generated_text = text_generator(prompt, max_length=30, num_return_sequences=1)
    print(generated_text[0]['generated_text'], '\n')

In [None]:
prompt = "In the near future, AI"
for _ in range(4):
    generated_text = text_generator(prompt, max_length=30, num_return_sequences=1)
    print(generated_text[0]['generated_text'], '\n')

### Text Summarization

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

In [None]:
# Initialize the summarization pipeline
summarizer = pipeline('summarization')

# Short text to summarize
text = """
The transformers library provides state-of-the-art machine learning models for natural language processing.
It allows developers to leverage pre-trained models for tasks such as text classification,
question answering, and language translation, saving time and computational resources.
"""

# Summarize text
summary = summarizer(text, max_length=30, min_length=15, do_sample=False)

print(summary[0]['summary_text'])

In [None]:
summary = summarizer(context, max_length=50, min_length=15, do_sample=False)
print(summary[0]['summary_text'])

In [None]:
import requests

In [None]:
text = requests.get('https://hilpisch.com/walden.txt').text

In [None]:
# print(text[:1000])

In [None]:
summary = summarizer(text[2000:5000], max_length=100,
                     min_length=25, do_sample=False)
print(summary[0]['summary_text'])

### Machine Translation

In [None]:
# Initialize the translation pipeline
translator = pipeline('translation_en_to_de', framework='tf')

# Text to translate
text = "Transformers are revolutionizing natural language processing."

# Translate text
translation = translator(text, max_length=40)

print(translation[0]['translation_text'])
# incorrect output (in terms of meaning):

In [None]:
# Initialize the translation pipeline
translator = pipeline('translation_en_to_fr', framework='tf')

# Text to translate
text = "Transformers are revolutionizing natural language processing."

# Translate text
translation = translator(text, max_length=40)

print(translation[0]['translation_text'])
# good output:

### Fill-Mask (Cloze Test)

In [None]:
# Initialize the fill-mask pipeline
unmasker = pipeline('fill-mask')

# Text with a masked word
text = "Transformers are the <mask> of modern NLP models."

# Predict the masked word
predictions = unmasker(text)

for prediction in predictions:
    print(f"Prediction: {prediction['token_str']}, Score: {round(prediction['score'], 4)}")

In [None]:
# Text with a masked word
text = "NLP is a <mask> technique in finance."

# Predict the masked word
predictions = unmasker(text)

for prediction in predictions:
    print(f"Prediction: {prediction['token_str']}, Score: {round(prediction['score'], 4)}")

In [None]:
# Text with a masked word
text = "NLP is an <mask> technique in finance."

# Predict the masked word
predictions = unmasker(text)

for prediction in predictions:
    print(f"Prediction: {prediction['token_str']}, Score: {round(prediction['score'], 4)}")

<img src="https://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="https://tpq.io" target="_blank">https://tpq.io</a> | <a href="https://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>