# Stage A - Lab

## Transformer and Named-entity recognition (NER) tasks

In [None]:
# install the transformer library

# pip install transformers

In [1]:
from transformers import pipeline
import pandas as pd

In [7]:
nlp = pipeline('ner')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [8]:
result = nlp("I am Yusuf, a Generative AI Intern at Hamoye. I look forward to learning about Artificial Intelligence.")
print("result:", result)

result: [{'entity': 'I-PER', 'score': 0.9987638, 'index': 3, 'word': 'Yusuf', 'start': 5, 'end': 10}, {'entity': 'I-MISC', 'score': 0.799625, 'index': 8, 'word': 'AI', 'start': 25, 'end': 27}, {'entity': 'I-ORG', 'score': 0.8789517, 'index': 12, 'word': 'Ham', 'start': 38, 'end': 41}, {'entity': 'I-ORG', 'score': 0.83856535, 'index': 13, 'word': '##oy', 'start': 41, 'end': 43}, {'entity': 'I-ORG', 'score': 0.98384017, 'index': 14, 'word': '##e', 'start': 43, 'end': 44}, {'entity': 'I-MISC', 'score': 0.8757365, 'index': 22, 'word': 'Art', 'start': 79, 'end': 82}, {'entity': 'I-MISC', 'score': 0.8763754, 'index': 23, 'word': '##ific', 'start': 82, 'end': 86}, {'entity': 'I-MISC', 'score': 0.9533889, 'index': 24, 'word': '##ial', 'start': 86, 'end': 89}, {'entity': 'I-MISC', 'score': 0.9801303, 'index': 25, 'word': 'Intelligence', 'start': 90, 'end': 102}]


In [9]:
df = pd.DataFrame(result)
df

Unnamed: 0,entity,score,index,word,start,end
0,I-PER,0.998764,3,Yusuf,5,10
1,I-MISC,0.799625,8,AI,25,27
2,I-ORG,0.878952,12,Ham,38,41
3,I-ORG,0.838565,13,##oy,41,43
4,I-ORG,0.98384,14,##e,43,44
5,I-MISC,0.875736,22,Art,79,82
6,I-MISC,0.876375,23,##ific,82,86
7,I-MISC,0.953389,24,##ial,86,89
8,I-MISC,0.98013,25,Intelligence,90,102


## Transformer and Question-Answering Tasks

In [11]:
qna = pipeline('question-answering')

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [12]:
context = r"""
            Igbo-Ora is a city and the headquarters of Ibarapa Central, Oyo State, south-western Nigeria, situated 80 kilometres (50 mi) north of Lagos. 
            In 2006 the population of the town was approximately 72,207 people. In 2017 the population is estimated to be around 278,514 people.
            The city is the location of Oyo State College of Agriculture and Technology. 
            The polytechnic has contributed significantly to the socio-economic and demographic development of the town. 
            Source: Wikipedia
            """

In [13]:
result = qna(question='Where is Igboora located?', context=context)

In [14]:
print(result)

{'score': 0.14106570184230804, 'start': 84, 'end': 152, 'answer': 'south-western Nigeria, situated 80 kilometres (50 mi) north of Lagos'}


## Question/context pair

In [7]:
from transformers import pipeline
question_answerer = pipeline("question-answering")
question_answerer( 
                question="Where do I work?",
                context="My name is Yusuf and I work as a Generative AI Intern at Hamoye in Bermuda."
                )

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.5686454176902771,
 'start': 57,
 'end': 74,
 'answer': 'Hamoye in Bermuda'}

## Transformer and Text Summarization

In [9]:
# from transformers import pipeline
summarizer = pipeline("summarization")
summarizer("""
           America has changed dramatically during recent years. Not only has the number of graduates in traditional engineering disciplines 
           such as mechanical, civil, electrical, chemical, and aeronautical engineering declined, but in most of the premier American universities 
           engineering curricula now concentrate on and encourage largely the study of engineering science. 
           As a result, there are declining offerings in engineering subjects dealing with infrastructure, the environment, and related issues, and 
           greater concentration on high technology subjects, largely supporting increasingly complex scientific developments. 
           While the latter is important, it should not be at the expense of more traditional engineering.
           Rapidly developing economies such as China and India, as well as other Industrial countries in Europe and Asia, continue to encourage and 
           advance the teaching of engineering. Both China and India, respectively, graduate six and eight times as many traditional engineers as does 
           the United States. Other industrial countries at minimum maintain their output, while America suffers an increasingly serious decline in the 
           number of engineering graduates and a lack of well-educated engineers.
           """
           )

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:06<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' America suffers an increasingly serious decline in the number of engineering graduates and a lack of well-educated engineers . Rapidly developing economies such as China and India, as well as other Industrial countries in Europe and Asia, continue to encourage and advance the teaching of engineering . Both China, respectively, graduate six and eight times as many traditional engineers as does the U.S.'}]