## Upload Model to Datahub

1. Upload model to transformer datahub
- create an account
- create a model repo 
- need to delete `tokenizer_config.json` before git commit to be used by T5Tokenizer 

In [2]:
%%bash
apt-get install git-lfs 
git lfs install
git clone https://{user_name}:{password}@huggingface.co/{user_name}/{model_repo}
git config --global user.email ${email}
git config --global user.name ${name} 

# copy files over from model directory `cp ./multitask/* ./t5-reddit/`
git status
git add .
git commit -m "first push"
git push

fatal: destination path 't5-reddit' already exists and is not an empty directory.


## Model Inference [Prediction]

2. Call the uploaded data using HuggingFace syntax

In [1]:
from transformers.models.t5 import T5Config, T5ForConditionalGeneration, T5Tokenizer
import pandas as pd

In [2]:
model = T5ForConditionalGeneration.from_pretrained("vionwinnie/t5-reddit")
tokenizer = T5Tokenizer.from_pretrained("vionwinnie/t5-reddit")

3. Score using `model.generate()` and append the predictied tag to `unlabelled_df`

In [14]:
test1 = "tag classification: This app is so amazing and I would like to tell you the best features I found of using this App, which is clicking through function. Thank you Goodnotes!"
test2 = "title prediction: Can someone share the method you use to extract pages from a pdf and insert it into good notes notebook? I have been taking screen shots of the pages and inserting it as image but it’s inconvenient and not content is not very clear. I appreciate any tips. Thanks"
all_strings = [test1, test2]

In [15]:
input_batch = tokenizer(all_strings,
                        padding=True,
                        truncation=True,
                        return_tensors='pt',
                        max_length=400)

In [16]:
input_ids = input_batch["input_ids"]
attention_mask = input_batch["attention_mask"]

all_preds = []

batch_size = 4
for i in range(0, len(all_strings), batch_size):
    print("processing {}th data".format(i))
    subset_input_ids = input_ids[i:i+4]
    subset_attention_masks = attention_mask[i:i+4]
    if subset_input_ids.size()[0] >0:
        prediction = model.generate(input_ids=subset_input_ids,
                        attention_mask=subset_attention_masks,
                        num_beams=4,
                        max_length=300,
                        do_sample=True,
                        top_k=50,
                        top_p=0.95,
                        num_return_sequences=1)
        for pred in prediction:
            theme=tokenizer.decode(pred).replace('<pad>','').replace('</s>','').strip()
            all_preds.append(theme)



processing 0th data
['Review', 'How to extract pages from a pdf']


In [21]:
for task,pred in zip(all_strings, all_preds):
    print(task)
    print("prediction:   {}".format(pred))
    print("--"*50)

tag classification: This app is so amazing and I would like to tell you the best features I found of using this App, which is clicking through function. Thank you Goodnotes!
prediction:   Review
----------------------------------------------------------------------------------------------------
title prediction: Can someone share the method you use to extract pages from a pdf and insert it into good notes notebook? I have been taking screen shots of the pages and inserting it as image but it’s inconvenient and not content is not very clear. I appreciate any tips. Thanks
prediction:   How to extract pages from a pdf
----------------------------------------------------------------------------------------------------
