In [1]:
from transformers import pipeline
summarizer = pipeline("summarization")
text = """
Hugging Face is a company that specializes in natural language processing (NLP).
It has developed the Transformers library, which provides state-of-the-art models
for a wide range of NLP tasks such as text classification, information extraction,
question answering, summarization, translation, and more. The library is widely used
in both academia and industry due to its ease of use and flexibility.
"""

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cpu


In [2]:
summary = summarizer(text, max_length=50, min_length=20, do_sample=False)
print("Summary:", summary[0]['summary_text'])

Summary:  The Transformers library provides state-of-the-art models for a wide range of NLP tasks . The library is widely used in both academia and industry due to its ease of use and flexibility .


In [9]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "cardiffnlp/tweet-topic-21-multi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
labels = [
    "arts_&_culture", "business_&_entrepreneurs", "celebrity_&_pop_culture", "diaries_&_daily_life",
    "family", "fashion_&_style", "film_tv_&_video", "fitness_&_health", "food_&_dining",
    "gaming", "learning_&_educational", "music", "news_&_social_concern", "other_hobbies",
    "relationships", "science_&_technology", "sports_&_esports", "travel_&_adventure",
    "youth_&_student_life"
]
texts = [
    "The latest iPhone was just released with an incredible new camera!",
    "Manchester United won their match with a stunning goal in the last minute.",
    "NASA just launched a new mission to explore the surface of Mars.",
    "The Oscars had some surprising winners this year!"
]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
predictions = torch.argmax(probabilities, dim=1)
for text, pred, prob in zip(texts, predictions, probabilities):
    print(f"Text: {text}\nTopic: {labels[pred.item()]}, Confidence: {prob[pred].item():.4f}\n")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Text: The latest iPhone was just released with an incredible new camera!
Topic: science_&_technology, Confidence: 0.9260

Text: Manchester United won their match with a stunning goal in the last minute.
Topic: sports_&_esports, Confidence: 0.9989

Text: NASA just launched a new mission to explore the surface of Mars.
Topic: science_&_technology, Confidence: 0.8526

Text: The Oscars had some surprising winners this year!
Topic: film_tv_&_video, Confidence: 0.9357



In [10]:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Once upon a time in a distant galaxy,"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=50, num_return_sequences=1, temperature=0.7, top_k=50)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time in a distant galaxy, the galaxy was a vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast, vast
