# Generating Manual Summaries

Most of this work is done in [google sheets](https://docs.google.com/spreadsheets/d/1yXby9sM8Sgj5ptP-wFgmNFVWNLqicQEFvx-bERDC17w/edit#gid=0). Write the results to tsv and read them in here. 

In [None]:
from IPython.display import clear_output

!pip install datasets transformers rouge_score nltk
# !pip install datasets transformers rouge-score nltk
# rouge-score is the google version
!pip install pyarrow
!pip install -q sentencepiece

clear_output()

In [None]:
# ! pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
clear_output()

In [None]:
from IPython.display import clear_output
import os
import re
import time
from tqdm.notebook import trange, tqdm
import pandas as pd
import numpy as np
from pprint import pprint
import matplotlib.pyplot as plt

# nlp stuff
import nltk
nltk.download('punkt')

# tf stuff
import tensorflow_datasets as tfds 
import tensorflow as tf
from transformers import PegasusTokenizer, TFPegasusForConditionalGeneration # pegasus
from transformers import BartTokenizer, TFBartForConditionalGeneration # bart

# pytorch dataset types
import datasets
from datasets.dataset_dict import DatasetDict
from datasets import Dataset, load_metric, load_dataset

# pytorch bart stuff
import torch
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer
from transformers import AutoTokenizer

# clear_output()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [None]:
# about this metric: https://huggingface.co/spaces/evaluate-metric/rouge
metric = load_metric("rouge")

In [None]:
# specify your path to the repo here:
repo_path = '/content/gdrive/MyDrive/w266/w266_reddit_summarization'

from google.colab import drive
drive.mount('/content/gdrive')

df = pd.read_csv(os.path.join(repo_path, 'data/manual_summaries.tsv'), sep='\t')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
df = df[df['manual_summary'] != 'na'].reset_index(drop=True)
df

Unnamed: 0,subreddit_group,content,y,yhat_baseline,yhat_bart_subreddit,yhat_bart_full,rouge1_precision,rouge1_recall,rouge1_fmeasure,rouge2_precision,rouge2_recall,rouge2_fmeasure,manual_summary
0,other,"Sleep deprivation has serious, serious bad con...",Don't feel guilty! Take care of you.,If you're struggling to get a good night's sl...,Do what you have to do to get some sleep.,Sleep deprivation is bad for health.,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,Sleep is very important for good health
1,media_lifestyle_sports,People claim hygiene is of a high enough stand...,"Nobody wants to be forever known as ""Bobby blo...",The Ebola outbreak in West Africa has left man...,"Ebio is probably unlikely, but it's probably not.","It's unlikely, but it's possible.",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,People claim this gym is hygenic but given how...
2,other,"The top Google search match for ""2001 cinemagr...","The top Google search match for ""2001 cinemagr...","I've been writing for more than a decade, but...",I'm not an idiot.,I'm a dick.,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,Just post the true source for 2001 cinemagraph.
3,advice_story,The thing about dictionaries is that they are ...,"Human practice shapes language rules, books me...",D dictionaries are used by many people to reco...,D dictionaries are biased.,"Dictionaries are biased, but there's no defaul...",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,Dictionaries are biased. They are a record of ...
4,media_lifestyle_sports,There is apparently no way for your toothbrush...,irrelvent rl;dr's I respect. Incorrect ones a...,"There is no such thing as a clean toothbrush,...",You're a dick.,You can't clean your toothbrush with a toothbr...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,There is apparently no way for your toothbrush...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,other,"So, I noticed a post on here the other day tha...",What weapons/armor do you carry on your charac...,I'm a fan of the video game World of Warcraft...,What equipment do you carry on your character?,What weapons do you carry on your character?,1.000000,0.615385,0.761905,0.857143,0.500000,0.631579,Looking for advice on weapons to use.
96,advice_story,So my boyfriend and I have been together just ...,Boyfriend still hasn't told me he loves me aft...,"I am a student at the University of Glasgow, ...",Boyfriend and I have been together for 1.5 yea...,Boyfriend of 1.5 years doesn't tell me that he...,0.409091,0.642857,0.500000,0.190476,0.307692,0.235294,My boyfriend of nearly 2 years hasn't told me ...
97,gaming,They all depend heavily on the current common ...,Sylvanas > TBK >= Cairne,"Sunwalkers, players and players all have diffe...","Sylanas, TBK, Cairne, Sylvanas, and TBK.","Sylanas, TBK, Cairne, and TBK are all good cards.",0.222222,0.666667,0.333333,0.125000,0.500000,0.200000,The success of these cards all rely on the cur...
98,advice_story,"I'm a little late to the party, but oh well. ...",my high school math teacher was a hypocritical...,I'm going to go to a party with my high schoo...,"Teacher is a bitch, and I don't know if anyone...",My math teacher was a bitch for three years in...,0.318182,0.700000,0.437500,0.190476,0.444444,0.266667,I brought a coke to my math class and my teach...


In [None]:
df['subreddit_group'].value_counts()

other                     25
media_lifestyle_sports    25
advice_story              25
gaming                    25
Name: subreddit_group, dtype: int64

In [None]:
def get_metrics_for_group(group):

  bart_baseline_metrics = metric.compute(predictions=df[df['subreddit_group']==group]['yhat_baseline'].tolist(), references=df[df['subreddit_group']==group]['y'].tolist())
  bart_full_metrics = metric.compute(predictions=df[df['subreddit_group']==group]['yhat_bart_full'].tolist(), references=df[df['subreddit_group']==group]['y'].tolist())
  bart_grouped_metrics = metric.compute(predictions=df[df['subreddit_group']==group]['yhat_bart_subreddit'].tolist(), references=df[df['subreddit_group']==group]['y'].tolist())
  manual_metrics = metric.compute(predictions=df[df['subreddit_group']==group]['manual_summary'].tolist(), references=df[df['subreddit_group']==group]['y'].tolist())

  result_dict = {'model': [], 'metric': [], 'precision': [], 'recall': [], 'fmeasure': []}

  for y in ['yhat_baseline', 'yhat_bart_full', 'yhat_bart_subreddit', 'manual_summary']:
      
    if y == 'yhat_baseline': 
      model_i = 'Baseline Model'
      metrics_i = bart_baseline_metrics
    elif y == 'yhat_bart_full':
      model_i = 'BART trained on full data'
      metrics_i = bart_full_metrics
    elif y == 'yhat_bart_subreddit':
      model_i = 'BART trained on subreddit groups'
      metrics_i = bart_grouped_metrics
    elif y == 'manual_summary':
      model_i = 'Manual summary'
      metrics_i = manual_metrics
    else:
      model_i = ''

    for m in ['rouge1', 'rouge2', 'rougeL', 'rougeLsum']:

      result_dict['model'].append(model_i)
      result_dict['metric'].append(m)
      result_dict['precision'].append(metrics_i[m][1][0])
      result_dict['recall'].append(metrics_i[m][1][1])
      result_dict['fmeasure'].append(metrics_i[m][1][2])


  result_df = pd.DataFrame(result_dict)
  return result_df





In [None]:
df_out = get_metrics_for_group("advice_story")
df_out.sort_values(['metric', 'recall'])

Unnamed: 0,model,metric,precision,recall,fmeasure
0,Baseline Model,rouge1,0.157897,0.175881,0.14903
8,BART trained on subreddit groups,rouge1,0.271983,0.224354,0.217841
4,BART trained on full data,rouge1,0.297789,0.248851,0.241396
12,Manual summary,rouge1,0.243968,0.264629,0.228076
1,Baseline Model,rouge2,0.027315,0.040263,0.028596
9,BART trained on subreddit groups,rouge2,0.078833,0.072626,0.061557
5,BART trained on full data,rouge2,0.074224,0.077557,0.067737
13,Manual summary,rouge2,0.083601,0.113182,0.086618
2,Baseline Model,rougeL,0.128714,0.157689,0.127591
10,BART trained on subreddit groups,rougeL,0.220226,0.175233,0.172974


In [None]:
df_out = get_metrics_for_group("media_lifestyle_sports")
df_out.sort_values(['metric', 'recall'])

Unnamed: 0,model,metric,precision,recall,fmeasure
4,BART trained on full data,rouge1,0.188033,0.131305,0.133031
0,Baseline Model,rouge1,0.106481,0.144494,0.109363
8,BART trained on subreddit groups,rouge1,0.188255,0.148683,0.142276
12,Manual summary,rouge1,0.166119,0.178577,0.153531
9,BART trained on subreddit groups,rouge2,0.010714,0.0052,0.007051
5,BART trained on full data,rouge2,0.02,0.011667,0.014173
1,Baseline Model,rouge2,0.0077,0.015671,0.00887
13,Manual summary,rouge2,0.025214,0.033045,0.024709
2,Baseline Model,rougeL,0.072268,0.104605,0.077285
6,BART trained on full data,rougeL,0.160151,0.110817,0.112614


In [None]:
df_out = get_metrics_for_group("other")
df_out.sort_values(['metric', 'recall'])

Unnamed: 0,model,metric,precision,recall,fmeasure
8,BART trained on subreddit groups,rouge1,0.1882,0.126627,0.133487
4,BART trained on full data,rouge1,0.196545,0.129567,0.137434
0,Baseline Model,rouge1,0.127883,0.155566,0.125759
12,Manual summary,rouge1,0.214953,0.211411,0.180978
1,Baseline Model,rouge2,0.01889,0.032769,0.022863
9,BART trained on subreddit groups,rouge2,0.063468,0.043634,0.047444
5,BART trained on full data,rouge2,0.05794,0.045237,0.04789
13,Manual summary,rouge2,0.063513,0.061621,0.052946
10,BART trained on subreddit groups,rougeL,0.175221,0.115595,0.124514
6,BART trained on full data,rougeL,0.180534,0.118338,0.124962


In [None]:
df_out = get_metrics_for_group("gaming")
df_out.sort_values(['metric', 'recall'])

Unnamed: 0,model,metric,precision,recall,fmeasure
0,Baseline Model,rouge1,0.127897,0.154529,0.127278
12,Manual summary,rouge1,0.173434,0.195135,0.151573
4,BART trained on full data,rouge1,0.22292,0.195544,0.17865
8,BART trained on subreddit groups,rouge1,0.236166,0.232056,0.196766
1,Baseline Model,rouge2,0.005838,0.011686,0.007261
13,Manual summary,rouge2,0.030495,0.047576,0.035233
9,BART trained on subreddit groups,rouge2,0.053817,0.059974,0.049596
5,BART trained on full data,rouge2,0.058507,0.077482,0.055774
2,Baseline Model,rougeL,0.10134,0.12443,0.102754
14,Manual summary,rougeL,0.136311,0.15518,0.119476
