<a href="https://colab.research.google.com/github/yoshitha05/AD/blob/main/huggingface_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Using pipelines

In [None]:
from transformers import pipeline

In [None]:
sentiment_pipeline = pipeline("sentiment-analysis", model="BAAI/bge-reranker-v2-m3", framework="tf")


config.json:   0%|          | 0.00/795 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFXLMRobertaForSequenceClassification.

All the weights of TFXLMRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFXLMRobertaForSequenceClassification for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

Device set to use 0


In [None]:
# Here, it converts to tokens, then give the embeddings and give the result
result = sentiment_pipeline(["I love my college!",'I hate people opinion about AI'])
print(result)

[{'label': 'LABEL_0', 'score': 0.06098480895161629}, {'label': 'LABEL_0', 'score': 2.96125617751386e-05}]


piplines options - https://huggingface.co/docs/transformers/en/main_classes/pipelines

Changing the default model

In [None]:
sentiment_pipeline2 = pipeline("sentiment-analysis", model=<model_from_hugging_face>, framework="tf")


## What if there is no pipeline and i want to use a different tokenizer and model

In [None]:
from transformers import BertTokenizer, TFAutoModelForSequenceClassification
import tensorflow as tf

In [None]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

In [None]:
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-multilingual-cased")


In [None]:
text = "I love learning about NLP!"
tokens = tokenizer(text, return_tensors="tf", padding=True, truncation=True)

In [None]:
print("Tokenized Words:", tokenizer.tokenize(text))
print("Token IDs:", tokens["input_ids"].numpy().tolist()[0])

In [None]:
outputs = model(**tokens)


In [None]:
logits = outputs.logits
probs = tf.nn.softmax(logits, axis=-1)

In [None]:
label_idx = tf.argmax(probs, axis=-1).numpy()[0]
labels = ["NEGATIVE", "POSITIVE"]
print(f"Predicted Sentiment: {labels[label_idx]}, Probability: {probs.numpy().max():.4f}")

## Other examples of pipelines

In [None]:
# Here, we used GPT-2 model, if we want we can change the model
# generator = pipeline("text-generation", model="gpt2")
generator = pipeline("text-generation", model="jinaai/ReaderLM-v2")

prompt = "There once lived a king"
# we can change to max_length of a para to 50
# result = generator(prompt, max_length=30, num_return_sequences=1)
result = generator(prompt, max_length=50, num_return_sequences=1)


print(result)


config.json:   0%|          | 0.00/736 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/336 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/5.12k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': "There once lived a king who had two sons. The son of the king was a prince, and the son of the king was a commoner. One day the king decided to test his sons' character. He ordered them to gather all their wealth"}]


In [None]:
prompt = "The future of machine learning is"
result = generator(prompt, max_length=50, num_return_sequences=1)


print(result)

[{'generated_text': 'The future of machine learning is bright, but only if we’re careful about how it’s used.\n    <br><br>\n    <b>By Alex Zaitsev</b>\n  </p></div>\n\n  <div class="'}]


In [None]:
qa_pipeline = pipeline("question-answering")

context = """The Great Wall of China is a historic fortification that stretches
over 13,000 miles. It was primarily built to protect against invasions and
was constructed during the Ming Dynasty."""

question = "Who built the Great Wall of China?"

result = qa_pipeline(question=question, context=context)

print(result)


No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cuda:0


{'score': 0.44635438919067383, 'start': 171, 'end': 183, 'answer': 'Ming Dynasty'}


In [None]:
# Before inputting the model, it gave date when asked for birthplace. That's the reason we used model to train the data
qa_pipeline = pipeline("question-answering", model="distilbert/distilbert-base-cased-distilled-squad")
#  --> intel is not wrking well, so we switch to another model
# qa_pipeline = pipeline("question-answering", model="SmallDoge/Doge-60M-Instruct")

context = """Elon Reeve Musk is a businessman and U.S. special government employee, best known for his key roles in Tesla, Inc. and SpaceX, and his ownership of Twitter. Musk is the wealthiest individual in the world; as of February 2025, Forbes estimates his net worth to be US$397 billion. Wikipedia
Born: 28 June 1971 (age 53 years), Pretoria, South Africa
Children: Vivian Jenna Wilson, Tau Techno Mechanicus Musk · See more
Net worth: 39,740 crores USD (2025) Forbes
Spouse: Talulah Riley (m. 2013–2016), Talulah Riley (m. 2010–2012), Justine Musk (m. 2000–2008)
Education: University of Pennsylvania School of Arts and Sciences (1997) · See more
Nationality: American, Canadian, South African
Parents: Errol Musk, W. Maye Musk"""

# question = "Who is Elon musk?"
# question = "In which country Elon musk born?"
# question = "Which country is Elon musk born?"
# question = "Elon musk born country"
# question = "Where did Elon musk born?"
question = "Who is wife of Elon musk?"
# question = "Who is the Spouse of Elon musk?"


result = qa_pipeline(question=question, context=context)

print(result)


Device set to use cuda:0


{'score': 0.16086140275001526, 'start': 5, 'end': 15, 'answer': 'Reeve Musk'}
