In [1]:
import torch
import torchvision
import transformers

print(f"PyTorch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")
print(f"Transformers version: {transformers.__version__}")
print(f"MPS (Metal Performance Shaders) available: {torch.backends.mps.is_available()}")
print(f"Torchvision transforms available: {'transforms' in dir(torchvision)}")
print(f"Torchvision models available: {'models' in dir(torchvision)}")

PyTorch version: 2.4.0
Torchvision version: 0.19.0
Transformers version: 4.44.0
MPS (Metal Performance Shaders) available: True
Torchvision transforms available: True
Torchvision models available: True


In [2]:
from transformers import pipeline

# Explicitly specify the model and framework
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6", framework="pt")

result = summarizer(
    "We are proposing two new undergraduate programs: a Bachelor of Commerce in Sports Business and a Bachelor of Arts in Sport, Health, & Wellness, aimed at equipping students with multidisciplinary skills needed to navigate the sport industry with social responsibility, ethical disposition, and sustainability in mind.", 
    max_length=50, 
    min_length=25, 
    do_sample=False
)

print(result)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'summary_text': ' We are proposing two new undergraduate programs: A Bachelor of Commerce in Sports Business and a Bachelor of Arts in Sport, Health, & Wellness . The programs are aimed at equipping students with multidisciplinary skills needed to navigate the sport'}]


In [2]:
from transformers import pipeline

summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6", framework="pt")

longer_text = """
We are proposing two new undergraduate programs: a Bachelor of Commerce in Sports Business and a Bachelor of Arts in Sport, Health, & Wellness. These programs are aimed at equipping students with multidisciplinary skills needed to navigate the sport industry with social responsibility, ethical disposition, and sustainability in mind. The Sports Business program will focus on the business aspects of sports, including management, marketing, and finance, while the Sport, Health, & Wellness program will emphasize the broader societal impacts of sports, including public health, community engagement, and personal well-being. Both programs will incorporate hands-on learning experiences, internships, and collaborations with industry partners to ensure students are well-prepared for careers in this dynamic field.
"""

result = summarizer(longer_text, max_length=75, min_length=30, do_sample=False)

print(result[0]['summary_text'])

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


 A Bachelor of Commerce in Sports Business and a Bachelor of Arts in Sport, Health, & Wellness are proposed . The programs are aimed at equipping students with multidisciplinary skills needed to navigate the sport industry . Both programs will incorporate hands-on learning experiences, internships, and collaborations with industry partners .


I'll break down the entire process and explain what's happening behind the scenes with Hugging Face, the language model, and the summarization pipeline.

1. Hugging Face and Transformers Library:
   Hugging Face is a company that provides a popular library called Transformers, which offers pre-trained models and tools for natural language processing (NLP) tasks. The library simplifies the process of using state-of-the-art language models.

2. Pipeline:
   The `pipeline()` function from the Transformers library is a high-level API that abstracts away much of the complexity of using these models. It sets up all the necessary components for a specific NLP task.

3. Language Model:
   In your code, you're using the "sshleifer/distilbart-cnn-12-6" model. This is a specific version of DistilBART, which is a distilled (compressed) version of the BART model. BART is a transformer-based language model designed for sequence-to-sequence tasks like summarization.

4. Summarization Task:
   When you specify "summarization" as the task, the pipeline sets up the model specifically for text summarization. This involves:
   - Loading the pre-trained weights for the specified model
   - Setting up the tokenizer that converts text to numerical representations the model can understand
   - Configuring the model for generation (since summarization is a text generation task)

5. Input Processing:
   When you pass your text to the summarizer, several things happen:
   - The text is tokenized (split into subwords or word pieces)
   - These tokens are converted to numerical IDs
   - The IDs are passed through the model

6. Model Operation:
   The DistilBART model, being a sequence-to-sequence model, has an encoder and a decoder:
   - The encoder processes the input text and creates a contextual representation
   - The decoder then generates the summary based on this representation

7. Generation Process:
   The model generates the summary token by token. At each step, it considers:
   - The encoded input
   - The tokens it has generated so far
   - The specified constraints (like max_length and min_length)

8. Output Processing:
   Once the model finishes generating, the pipeline:
   - Decodes the generated token IDs back into text
   - Applies any necessary post-processing (like removing special tokens)

9. Result:
   The pipeline returns the generated summary as a list of dictionaries, where each dictionary contains the 'summary_text' key with the generated summary as its value.

- A note about hardware acceleration (suggesting you could potentially speed up the process by using a GPU)

For best results with summarization models:
1. Use longer input texts (paragraphs or full articles)
2. Adjust the length parameters based on your input length and desired summary length
3. Experiment with different pre-trained models that might be more suitable for your specific use case

Remember, while these models are powerful, they work best within the parameters they were trained on, typically summarizing longer texts into concise versions.

Let's utilize our GPU on M1 MacBook Pro.

In [3]:
import torch
from transformers import pipeline

# Check if MPS is available
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

# Create the summarization pipeline with the specified device
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6", framework="pt", device=device)

# Your input text
text = """
We are proposing two new undergraduate programs: a Bachelor of Commerce in Sports Business and a Bachelor of Arts in Sport, Health, & Wellness. These programs are aimed at equipping students with multidisciplinary skills needed to navigate the sport industry with social responsibility, ethical disposition, and sustainability in mind. The Sports Business program will focus on the business aspects of sports, including management, marketing, and finance, while the Sport, Health, & Wellness program will emphasize the broader societal impacts of sports, including public health, community engagement, and personal well-being. Both programs will incorporate hands-on learning experiences, internships, and collaborations with industry partners to ensure students are well-prepared for careers in this dynamic field.
"""

# Generate the summary
result = summarizer(text, max_length=75, min_length=30, do_sample=False)

# Print the result
print(result[0]['summary_text'])

Using device: mps
 A Bachelor of Commerce in Sports Business and a Bachelor of Arts in Sport, Health, & Wellness are proposed . The programs are aimed at equipping students with multidisciplinary skills needed to navigate the sport industry . Both programs will incorporate hands-on learning experiences, internships, and collaborations with industry partners .


We can also create a simple object recognition code using a pre-trained model from Hugging Face to identify a cat in an image. We'll use the Vision Transformer (ViT) model, which is good for general image classification tasks.

Here's a Python script that uses the Hugging Face Transformers library to perform object recognition on an image:


In [8]:
import os
import torch
import numpy as np
from transformers import ViTForImageClassification, ViTImageProcessor
from PIL import Image

# Set the environment variable to disable tokenizer parallelism
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Check if MPS is available (for M1 Macs)
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

# Load pre-trained ViT model and image processor
model = ViTForImageClassification.from_pretrained("google/vit-base-patch16-224")
processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")

# Move the model to the appropriate device
model = model.to(device)

# Load and preprocess the image
image_path = "/Users/yigitaydede/Library/CloudStorage/Dropbox/Documents/Courses/MBAN/NLPBootcamp/Section1/IMG_4697.jpeg"
image = Image.open(image_path).convert('RGB')

# Resize the image
image = image.resize((224, 224))

# Convert image to numpy array
image_np = np.array(image)

# Normalize the image
image_np = (image_np / 255.0 - 0.5) / 0.5

# Convert to PyTorch tensor and add batch dimension
input_tensor = torch.tensor(image_np).permute(2, 0, 1).unsqueeze(0).float()

# Move input to the appropriate device
input_tensor = input_tensor.to(device)

# Perform inference
with torch.no_grad():
    outputs = model(input_tensor)

# Get the predicted class
predicted_class_idx = outputs.logits.argmax(-1).item()
predicted_class = model.config.id2label.get(predicted_class_idx, f"Unknown (ID: {predicted_class_idx})")

# Print the result
print(f"The image is classified as: {predicted_class}")

# If you want to see the top 5 predictions:
top5_prob, top5_catid = torch.topk(outputs.logits.softmax(dim=-1)[0], 5)
print("\nTop 5 predictions:")
for i in range(5):
    class_id = top5_catid[i].item()
    class_name = model.config.id2label.get(class_id, f"Unknown (ID: {class_id})")
    print(f"{class_name}: {top5_prob[i].item()*100:.2f}%")

# Print the total number of classes
print(f"\nTotal number of classes: {len(model.config.id2label)}")
print(f"Range of class IDs: 0 to {len(model.config.id2label) - 1}")

Using device: mps
The image is classified as: patio, terrace

Top 5 predictions:
patio, terrace: 31.40%
Egyptian cat: 10.62%
tiger cat: 5.87%
tabby, tabby cat: 3.71%
soccer ball: 2.44%

Total number of classes: 1000
Range of class IDs: 0 to 999


Let's break down the code step by step:

1. Import necessary libraries:
```python
import os
import torch
import numpy as np
from transformers import ViTForImageClassification, ViTImageProcessor
from PIL import Image
```
- `os`: For setting environment variables
- `torch`: PyTorch library for tensor computations and neural networks
- `numpy`: For numerical operations on arrays
- `transformers`: Hugging Face library for using pre-trained models
- `PIL`: Python Imaging Library for opening and manipulating images

2. Set up the environment:
```
os.environ["TOKENIZERS_PARALLELISM"] = "false"
```
This line disables parallelism in the tokenizers library to avoid potential issues with forked processes.

3. Set up the device:
```
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")
```
This checks if MPS (Metal Performance Shaders) is available on Mac M1 chips. If it is, we use it; otherwise, we fall back to CPU.

4. Load the pre-trained model and image processor:
```
model = ViTForImageClassification.from_pretrained("google/vit-base-patch16-224")
processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
model = model.to(device)
```
We're loading a pre-trained Vision Transformer model and its associated image processor. The model is then moved to the appropriate device (MPS or CPU).

5. Load and preprocess the image:
```
image_path = "/path/to/your/image.jpeg"
image = Image.open(image_path).convert('RGB')
image = image.resize((224, 224))
```
We open the image, convert it to RGB format (in case it's not already), and resize it to 224x224 pixels, which is the input size expected by this ViT model. RGB stands for Red, Green, and Blue. It's a color model used in digital imaging and displays. It's the standard color model used in most digital displays, including computer monitors, smartphones, and televisions.

6. Convert the image to a numpy array and normalize it:
```
image_np = np.array(image)
image_np = (image_np / 255.0 - 0.5) / 0.5
```
This line converts the PIL Image object to a numpy array. The resulting array has shape (224, 224, 3), where 224x224 is the image size, and 3 represents the RGB channels. Each value in this array is an integer between 0 and 255, representing the intensity of red, green, or blue for each pixel. We convert the image to a numpy array and normalize its values. The normalization step maps the pixel values from [0, 255] to [-1, 1], which is often beneficial for neural networks.

7. Convert to PyTorch tensor:
```
input_tensor = torch.tensor(image_np).permute(2, 0, 1).unsqueeze(0).float()
input_tensor = input_tensor.to(device)
```
We convert the numpy array to a PyTorch tensor, rearrange its dimensions (from HWC to CHW format), add a batch dimension, convert to float, and move it to the appropriate device. PyTorch tensors are similar to numpy arrays but can be processed on GPUs and are designed for automatic differentiation, which is crucial for neural networks.  This is currical step in NN using PyTorch or Tensorflow.  Please see "tensor.ipynb".

8. Perform inference:
```
with torch.no_grad():
    outputs = model(input_tensor)
```
We run the image through the model. The `with torch.no_grad():` context ensures that we're not tracking gradients, which is not necessary for inference and saves memory.

9. Get and print the top prediction:
```
predicted_class_idx = outputs.logits.argmax(-1).item()
predicted_class = model.config.id2label.get(predicted_class_idx, f"Unknown (ID: {predicted_class_idx})")
print(f"The image is classified as: {predicted_class}")
```
We get the index of the highest logit, convert it to a class label using the model's `id2label` mapping, and print the result.

10. Get and print the top 5 predictions:
```
top5_prob, top5_catid = torch.topk(outputs.logits.softmax(dim=-1)[0], 5)
print("\nTop 5 predictions:")
for i in range(5):
    class_id = top5_catid[i].item()
    class_name = model.config.id2label.get(class_id, f"Unknown (ID: {class_id})")
    print(f"{class_name}: {top5_prob[i].item()*100:.2f}%")
```
We get the top 5 predictions, their probabilities, and print them. We use `get()` on `id2label` to handle cases where the ID might not be in the mapping.

11. Print information about the model's classes:
```python
print(f"\nTotal number of classes: {len(model.config.id2label)}")
print(f"Range of class IDs: 0 to {len(model.config.id2label) - 1}")
```
This gives us information about the number of classes the model can predict and the range of class IDs.

This code demonstrates the full pipeline of loading a pre-trained model, preprocessing an image, running inference, and interpreting the results. It's a great example for students to understand how deep learning models are used for practical tasks like image classification.