In [1]:
import warnings
warnings.simplefilter(action='ignore')
import pandas as pd
from transformers.models.t5 import T5ForConditionalGeneration, T5Tokenizer

Load unlabeled data from CSV for inference

In [2]:
df = pd.read_csv('./data/unlabelled_df.csv')

In [3]:
df.head()

Unnamed: 0,submission_id,submission_title,submission_selftext,submission_link_flair_text,reply_body,all_text
0,aglcrj,Goodnotes 4 vs. Goodnotes 5 right now,I have used Goodnotes 4 for work a ton. And I...,,"[""I'm getting a ton of bugs with 5 as well (sn...",Goodnotes 4 vs. Goodnotes 5 right now I have u...
1,agoowm,The bundle is available !,,,['Thank you. I have been waiting!\n\n&amp;#x20...,The bundle is available ! Thank you. I have b...
2,agpzxb,What happened to the pen (Goodnotes 5)?,I just got Goodnotes 5 and I was so excited fo...,,"[""Have you tried the ball pen? That was the cl...",What happened to the pen (Goodnotes 5)? I just...
3,agq8qv,Non Apple Pencil styluses on GOodnotes 5?,I've been using a Wacom Bamboo stylus with Goo...,,['According to the [review at Macstories](http...,Non Apple Pencil styluses on GOodnotes 5? I've...
4,agqksi,Text/typing in Goodnotes 5,"Notability user here, but trying out Goodnotes...",,"[""That's neat af"", 'Seattle, hopefully not too...",Text/typing in Goodnotes 5 Notability user her...


In [4]:
all_strings = ('tag classification: ' + df['all_text']).to_list()

Load pretrained model and tokenizer

In [5]:
model = T5ForConditionalGeneration.from_pretrained("textomatic/subreddit-thread-tagging")
tokenizer = T5Tokenizer.from_pretrained("textomatic/subreddit-thread-tagging")

In [6]:
# To tokenize the list of strings before passing it to the t5 model
input_batch = tokenizer(
    all_strings,
    padding=True,
    truncation=True,
    return_tensors='pt',
    max_length=400
)
input_ids = input_batch["input_ids"]
attention_mask = input_batch["attention_mask"]

In [7]:
# Use the model to make prediction
all_preds = []
batch_size = 4

for i in range(0, len(all_strings), batch_size):
    print("Processing {}th data".format(i))
    subset_input_ids = input_ids[i:i+4]
    subset_attention_masks = attention_mask[i:i+4]

    if subset_input_ids.size()[0] >0:
        prediction = model.generate(
            input_ids=subset_input_ids,
            attention_mask=subset_attention_masks,
            num_beams=4,
            max_length=300,
            do_sample=True,
            top_k=50,
            top_p=0.95,
            num_return_sequences=1
        )

        for pred in prediction:
            theme = tokenizer.decode(pred).replace('<pad>','').replace('</s>','').strip()
            all_preds.append(theme)

Processing 0th data
Processing 4th data
Processing 8th data
Processing 12th data
Processing 16th data
Processing 20th data
Processing 24th data
Processing 28th data
Processing 32th data
Processing 36th data
Processing 40th data
Processing 44th data
Processing 48th data
Processing 52th data
Processing 56th data
Processing 60th data
Processing 64th data
Processing 68th data
Processing 72th data
Processing 76th data
Processing 80th data
Processing 84th data
Processing 88th data
Processing 92th data
Processing 96th data
Processing 100th data
Processing 104th data
Processing 108th data
Processing 112th data
Processing 116th data
Processing 120th data
Processing 124th data
Processing 128th data
Processing 132th data
Processing 136th data
Processing 140th data
Processing 144th data
Processing 148th data
Processing 152th data
Processing 156th data
Processing 160th data
Processing 164th data
Processing 168th data
Processing 172th data
Processing 176th data
Processing 180th data
Processing 184th

In [8]:
print(all_preds)

['Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Review', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Review', 'Question', 'Question', 'Question', 'Question', 'Templates', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Review', 'Question', 'Question', 'Stylus problems', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', 'Question', '

In [9]:
df['predicted_tag'] = all_preds

In [10]:
df.to_csv('./data/unlabelled_df_predicted.csv',index=False)

Examine some of the predicted tags to see how the model performed

In [17]:
df[df['predicted_tag'] == 'Stylus problems']

Unnamed: 0,submission_id,submission_title,submission_selftext,submission_link_flair_text,reply_body,all_text,predicted_tag
63,e05ie7,Why is this app so laggy and so buggy lately?,"Noticed problems with scrolling and a lot of bugs in the last 2 months, what happened?",,"['You think a phone in 2021 can zoom into the moon and the first you’re hearing about it is on a /r/nextfuckinglevel post months after it came out?', 'Fourthded', 'NO! &gt;:(', '""let\'s go kids!""', 'can i dm you a pic? its just the one that came with the saber - its crimson dawn if that helps in any way', 'It is a good bad kinda enchantment. Annoying to use good pressure plates but good against traps.', ""The difference between the general US opinion on homosexuality between 2007 and 2012 was pretty huge. That was right before gay marriage was legalized federally.\n\nI don't think the inconsistency was intentional, it was just the developers changing with the times."", 'Wait HOLD UP! I invested a safe part of my money into GME and AMC because I believed in their future. GME because, obviously, they are nailing their e-commerce transformation and AMC, because, obviously, after COVID the theatre sector will boom.\n\nNow that CNBC has confirmed naked shorting is real, I will now proceed to invest my entire life savings. If I lose everything, CNBC is liable and will be sued for financial and moral damages.', 'Of you have a ""meth pipe"" and dabs but not something to smoke them and you need something then why not use the meth pipe aslong as it hasn\'t actually had meth smoked out of it and it\'s just that same style of glass pipe', 'Never in trouble until you lose at home. Series just getting started.', '[I love the part where he somehow blames Leffen for the placement of the TV in the Summit House.](https://i.imgur.com/2G57X66.png)', 'i looked into code because of DG but looks hard to learn haha', '&gt;Saying something retarded and getting corrected by other people doesn\'t mean you\'re le epic troll triggering people, bud.\n\n""You\'re"" should be ""your.""\n\n(Sorry, I just couldn\'t resist. It was too thematically perfect.)', '[removed]', ""what'd u major in 😀"", ""Just looked at your posts and he's such a cute boy🥰🥰\n\nAnd thank you I will!! Lol I guess it's a matter of what they're exposed to early on/where their fear bar is. But city and suburb have their own spooky quirks!"", 'Lmao. I remember watching the final seconds of game 4 and the arena played sounds of bells that sounded like something used in funerals as Jrue was dribbling out the ball.', 'Congrats on making those changes! I used to do yoga for anxiety and it helped a lot (and also prevented back pain!). The last couple years I switched to lifting weights which gives me more of an endorphin rush but I still do yoga once a week.\n\nHave you tried meditation? I have started and stopped many times but for the past month or so have been using the Headspace app every day. My therapist recommended it to me. To tell the truth; I don’t know if it’s “working” yet but they have a course in headspace all about managing anxiety. I know meditation is supposed to be good so I’m sticking with it.\n\nOne more thing - I find writing to people in this Reddit really helps lower my anxiety. Something about seeing other people articulate this feeling really soothes me and it feels good to offer support and encouragement.', 'I understand why you might think this is a new, novel technology and that any criticism is unwarranted until it lands. But if you understand the differences between algorithmic upsampling (that is fundamentally limited by the pixels that are present) and ML/Deep Learning based upsampling (that use inference and essentially ""guessing"" to perform image restoration which sometimes causes artifacts but can ultimately produce higher fidelity images as they\'re _not_ limited by what\'s present), you\'ll understand why it\'s not. They\'ve produced some nice marketing material and packaged it in a way that feels new and exciting, but we\'ve seen, worked with, and learned the limitations of algorithmic upsampling for tens of years, and it was only when ML took off that the research in this field really advanced.', 'unbelievable finish!', ""Stumbled across this thread while searching... 160 days ago so I'm sure you already have an answer, but these will not cycle my ZPAP with a Wolverine. Doesn't even remotely cycle - that said, they're very amusing to shoot."", 'Thanks for posting! Make sure to check out these great OF subs!\n\nr/SexyOnlyfansGirls\n\nr/DadShouldBeProud\n\nr/NaughtyOnlyfans\n\nr/GoneWildOnlyfans\n\nr/SluttyOnlyfans\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/AmateurGoneWildPlus) if you have any questions or concerns.*', '[removed]', '[deleted]', 'Upvote']","Why is this app so laggy and so buggy lately? Noticed problems with scrolling and a lot of bugs in the last 2 months, what happened? You think a phone in 2021 can zoom into the moon and the first you’re hearing about it is on a /r/nextfuckinglevel post months after it came out? Fourthded NO! &gt;:(",Stylus problems
188,iifupe,Highlighter behavior prior to v.5.40.30,Is there a way to user deselect the changes that have been made to the highlighter in version 5.40.30? \n\nThe appearance of the highlighter and dark pages has become awful. The version release notes said changes were made to make the highlight or more visible. I loved the way that the highlighter worked before on dark pages. Now it simply looks awful. I just wanted the strokes to be highlighted not the surrounding area on the page.\n\nThe before version colored only the strokes\nhttps://i.imgur.com/TmE0UHr.jpg\n\n\nThe after version (most recent update) puts the highlight over the text and also colors the area surrounding the text.\nhttps://i.imgur.com/8l8hFHs.jpg\n\n(Pls ignore my terrible handwriting....😬),,"['That’s just not what the highlighter was supposed to do. You can still do this though, just circle the strokes you want to change with the lasso tool and recollect them. The effect would take a second longer but would have the same impact.']","Highlighter behavior prior to v.5.40.30 Is there a way to user deselect the changes that have been made to the highlighter in version 5.40.30? \n\nThe appearance of the highlighter and dark pages has become awful. The version release notes said changes were made to make the highlight or more visible. I loved the way that the highlighter worked before on dark pages. Now it simply looks awful. I just wanted the strokes to be highlighted not the surrounding area on the page.\n\nThe before version colored only the strokes\nhttps://i.imgur.com/TmE0UHr.jpg\n\n\nThe after version (most recent update) puts the highlight over the text and also colors the area surrounding the text.\nhttps://i.imgur.com/8l8hFHs.jpg\n\n(Pls ignore my terrible handwriting....😬) That’s just not what the highlighter was supposed to do. You can still do this though, just circle the strokes you want to change with the lasso tool and recollect them. The effect would take a second longer but would have the same impact.",Stylus problems
226,jczozd,Why are my strokes so inconsistent? I never had this issue until now. Anyone else with this issue?,,,"['I use an Apple Pencil and I heard that I may have to change the nib but i dont think that is really the issue. It could be pressure sensitivity issue with my pencil as well but I don’t know how to properly assess that on my own.', 'I’ve had this problem before try unscrewing the nib and putting it back', 'Me tooooooo. Ugh it’s annoying!!', 'Do you have the same issue in other apps like the stock note app?', 'I’ve had to disconnect the Apple Pencil, take off and put back on the tip, and close out of the app and go back in and it works just fine. Hope it gets fixed!', 'Just use the ball pen. What’s the point of the fountain pen?', 'looks good', 'Is this just in the goodnotes app or do you have issues with handwriting in apps like notes etc.?', 'Fuck the ball pen, all my homies use the fountain pen', 'Big facts', 'Fountain is better imo. I know its more consistent but you can do much more with a fountain than a ball point.', ""You can try changing the nib once, i was having inconsistency on writing as well, i changed with the extra tip they had in box, and boom everything back to normal. I would also suggest to keep the tip box, set of 4 handy, and it's the only thing that gets damaged a lot."", 'I had the problem, maybe the nib is damaged, I had to buy a replacement', 'I always use the ballpoint, what can you do with the fountain pen that the ballpoint pen can’t? (Serious question)', 'I like using the fountain because I can press onto the screen and make the thickness of the stroke vary from time to time. Helps with making bold writing in a few of my notes personally.', 'Ah like that, yeah I can see that', 'True, I grabbed the replacement nib and switched it. The problem was lessened but not gone, so I switched the old one back since I doubt it was a nib problem. \n\nI fear it could be the pencil itself, or the app. \n\nThe inside of the old nib has a bit of black stuff in it (could be some sort of buildup) but I haven\'t found anything to clean it out with because the hole is so small and the ""buildup"" is in [between the inner threads of the nib](https://imgur.com/gallery/aaQnCrb).', ""Yeah I understand. I'll just use the ball pen for now since there aren't any major differences (except for the bold writing part, but that isn't a dealbreaker)""]",Why are my strokes so inconsistent? I never had this issue until now. Anyone else with this issue? I use an Apple Pencil and I heard that I may have to change the nib but i dont think that is really the issue. It could be pressure sensitivity issue with my pencil as well but I don’t know how to properly assess that on my own. I’ve had this problem before try unscrewing the nib and putting it back Me tooooooo. Ugh it’s annoying!!,Stylus problems
407,nknvw5,Why isn’t there a pencil ✏️ on this app?,If this app had more brushes and pens and a pencil like the notes app it would be so good.. is there a way to get a pencil added?,,"[""I don't think so, but it would be so helpful, also different kind of lines, like dotted or dashed( I'm a physics student and I'd love that)"", 'This!']","Why isn’t there a pencil ✏️ on this app? If this app had more brushes and pens and a pencil like the notes app it would be so good.. is there a way to get a pencil added? I don't think so, but it would be so helpful, also different kind of lines, like dotted or dashed( I'm a physics student and I'd love that) This!",Stylus problems
