### Sentimant *analysis*

In [1]:
# https://huggingface.co/blog/sentiment-analysis-python
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["I love you", "I hate you", "I don't hate you"]
sentiment_pipeline(data)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9998656511306763},
 {'label': 'NEGATIVE', 'score': 0.9991129040718079},
 {'label': 'POSITIVE', 'score': 0.9985570311546326}]

### Fill masked word

In [2]:
# https://huggingface.co/bert-base-uncased
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("The [MASK] worked as a nurse.")

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

[{'score': 0.35214367508888245,
  'token': 2388,
  'token_str': 'mother',
  'sequence': 'the mother worked as a nurse.'},
 {'score': 0.19612574577331543,
  'token': 2450,
  'token_str': 'woman',
  'sequence': 'the woman worked as a nurse.'},
 {'score': 0.11655837297439575,
  'token': 2684,
  'token_str': 'daughter',
  'sequence': 'the daughter worked as a nurse.'},
 {'score': 0.0653347596526146,
  'token': 2611,
  'token_str': 'girl',
  'sequence': 'the girl worked as a nurse.'},
 {'score': 0.025741584599018097,
  'token': 2564,
  'token_str': 'wife',
  'sequence': 'the wife worked as a nurse.'}]

### Text Classification

In [3]:
# https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
from transformers import pipeline
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", return_all_scores=True)
classifier("I'm so sad it happen so suddenly")

Downloading:   0%|          | 0.00/0.98k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/313M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/294 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/780k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

[[{'label': 'anger', 'score': 0.0012407628819346428},
  {'label': 'disgust', 'score': 0.0010925332317128778},
  {'label': 'fear', 'score': 0.0026731020770967007},
  {'label': 'joy', 'score': 0.003078761510550976},
  {'label': 'neutral', 'score': 0.004846677184104919},
  {'label': 'sadness', 'score': 0.9228119850158691},
  {'label': 'surprise', 'score': 0.06425615400075912}]]

### Text Generation

In [4]:
# https://huggingface.co/tasks/text-generation

from transformers import pipeline
generator = pipeline('text-generation', model = 'gpt2')
generator("Teach NLP is fun but", max_length = 30, num_return_sequences=3)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Teach NLP is fun but doesn\'t take itself seriously," said Mr Wurkiwabo.\n\nA report, submitted by the Ministry'},
 {'generated_text': 'Teach NLP is fun but doesn\'t stand up in court," she said. "I don\'t feel it\'s working. I think their story'},
 {'generated_text': 'Teach NLP is fun but hard to do given its limitations. It cannot teach you a good way to create a personal narrative with which you can'}]

### Translation

In [5]:
# https://huggingface.co/facebook/wmt19-ru-en
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "facebook/wmt19-ru-en"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)

input = "Машинное обучение - это здорово, не так ли?"
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded) # Machine learning is great, isn't it?

Downloading:   0%|          | 0.00/758k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/624k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/387k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/67.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/826 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.08G [00:00<?, ?B/s]

Machine learning is great, isn't it?


### Textual similarity

In [10]:
# https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
sentences = ["I like dogs and puppies", "I have cats", 'moon exploration']

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings[0][0:20])
print(embeddings[1][0:20])
cosine_similarity(embeddings)

[-0.04837986 -0.0533237   0.06084255  0.04996348 -0.060212    0.01169236
  0.04848676 -0.01432638  0.095245    0.07374042  0.07201275 -0.05591025
  0.01506545  0.03981534  0.04570967  0.02327511 -0.03039946  0.02192825
  0.02251742 -0.02904169]
[ 0.07268582 -0.04635467  0.04954236  0.03125549 -0.04810524 -0.03360383
  0.06530692 -0.00248521  0.00512836  0.00185334  0.02351316 -0.06234325
 -0.00241587  0.05230791  0.03587196 -0.01304413 -0.08356059 -0.02128224
 -0.04514851 -0.04367592]


array([[0.99999994, 0.49218965, 0.14639582],
       [0.49218965, 0.99999976, 0.07366706],
       [0.14639582, 0.07366706, 1.        ]], dtype=float32)