Making a generative model using model pipeline

Approach-1

In [2]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
model_name = 'gpt2'
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Function to generate text
def generate_text(prompt, max_length=50):
    inputs = tokenizer.encode(prompt, return_tensors='pt')
    outputs = model.generate(inputs, max_length=max_length, num_return_sequences=1, no_repeat_ngram_size=2, early_stopping=True)
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

In [3]:
# Test the text generation function
prompt = "Once upon a time"
generated_text = generate_text(prompt, max_length=100)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time, the world was a place of great beauty and great danger. The world of the gods was the place where the great gods were born, and where they were to live.

The world that was created was not the same as the one that is now. It was an endless, endless world. And the Gods were not born of nothing. They were created of a single, single thing. That was why the universe was so beautiful. Because the cosmos was made of two


Approach-2

In [12]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load the FLAN-T5 model and tokenizer
model_name = 'google/flan-t5-large'  # You can choose other sizes like flan-t5-small, flan-t5-base, etc.
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Function to generate text using FLAN-T5
def generate_text(prompt, max_length=100, num_return_sequences=1):
    # Encode the input prompt
    input_ids = tokenizer.encode(prompt, return_tensors='pt')

    # Generate text
    outputs = model.generate(
        input_ids=input_ids,
        max_length=max_length,
        num_return_sequences=num_return_sequences,
        no_repeat_ngram_size=2,
        early_stopping=True
    )

    # Decode the generated text
    generated_texts = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]
    return generated_texts

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [13]:
# Test the text generation function
prompt = "Explain the importance of AI in healthcare."
generated_text = generate_text(prompt, max_length=150)
print("Generated Text:")
for i, text in enumerate(generated_text):
    print(f"{i+1}: {text}")

Generated Text:
1: AI can help doctors to diagnose patients more quickly and accurately.


In [15]:
# Test the text generation function
prompt = "Will AI will take over the world?"
generated_text = generate_text(prompt, max_length=150)
print("Generated Text:")
for i, text in enumerate(generated_text):
    print(f"{i+1}: {text}")

Generated Text:
1: no


Name Entity Recognition in NLP

In [16]:
import spacy

# Load SpaCy's pre-trained NER model
nlp = spacy.load('en_core_web_sm')

# Function to perform named entity recognition
def named_entity_recognition(text):
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

# Test the NER function
text = "Apple is looking at buying U.K. startup for $1 billion"
entities = named_entity_recognition(text)
print(entities)

[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
