<h3>Requirements and import</h3>

In [None]:
!git clone https://www.github.com/GEM-benchmark/NL-Augmenter
!cd /home/uccollab/Desktop/NL-Augmenter
!python setup.py sdist
!pip install -e .

In [None]:
!pip install -r requirements.txt --quiet
!pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz

In [None]:
!pip install torch==1.7.1
!pip install checklist
!pip install sacrebleu
!pip install torchtext==0.8.0
!pip install benepar

In [None]:
!pip install sacremoses

In [None]:
'''This is necessary on some devices to avoid conflicts in NL-augmenter'''
cd /path/to/NL-Augmenter/
cd /path/to/NL-Augmenter/nlaugmenter/transformations/factive_verb_transformation

In [3]:
import nlaugmenter

In [4]:
from nlaugmenter.transformations.factive_verb_transformation import FactiveVerbTransformation
from nlaugmenter.transformations.formality_change import Formal2Casual
from nlaugmenter.transformations.replace_with_hyponyms_hypernyms import ReplaceHypernyms
from nlaugmenter.transformations.style_paraphraser import StyleTransferParaphraser
from nlaugmenter.transformations.synonym_substitution import SynonymSubstitution
from nlaugmenter.transformations.back_translation import BackTranslation
from nlaugmenter.transformations.filler_word_augmentation import FillerWordAugmentation
from nlaugmenter.transformations.protaugment_diverse_paraphrase import ProtaugmentDiverseParaphrase
from nlaugmenter.transformations.slangificator import Slangificator
from nlaugmenter.transformations.factive_verb_transformation import *

2022-05-20 23:13:08.710371: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1


In [5]:
import itertools, re, time
import pprint as pp
import pandas as pd
import ast
import copy
import csv

<h3>Wrapping augmentation techniques that require utterance splitting</h3>

In [6]:
#replaces common nouns with other related words that are either hyponyms or hypernyms
class HyperNymAugmentation():
    
    def generate(self, utterance):
        tr = ReplaceHypernyms()
        result = []
        if ". " in utterance:
            for sentence in utterance.split(". "):
                aug_pool = tr.generate(sentence)
                for aug_sentence in aug_pool:
                    result.append(aug_sentence)
        else:
            aug_pool = tr.generate(utterance)
            for aug_sentence in aug_pool:
                result.append(aug_sentence)
        
        result = list(itertools.chain.from_iterable(result))
        result = "".join(result).replace('\\','')
        result = re.sub('\n+', ' ', result)
        return result

In [7]:
#replaces common nouns with other related words that are either hyponyms or hypernyms
class SentenceAdd():
    def generate(self, utterance):
        tr = SentenceAdditions()
        result = []
        if ". " in utterance:
            for sentence in utterance.split(". "):
                    result.append(tr.generate(sentence))
        else:
            result.append(tr.generate(utterance))
        result = list(itertools.chain.from_iterable(result))
        result = "".join(result).replace('\\','')
        result = re.sub('\n+', ' ', result)
        return result

<h3>Opening AnnoMI and pre-processing</h3>

In [None]:
'''PRE-PROCESSING
- Loading AnnoMI
- Filter on therapist utterances
- Mapping therapy quality to int
- cutting out short utterances
''' 
df= pd.read_csv("/path/to/dataset.csv")
df = df[df['interlocutor'] == 'therapist']
df['mi_quality'] = df['mi_quality'].map({'high':1, "low":0})
df = df[df.columns[df.columns.isin(['mi_quality', 'utterance_text', 'topic'])]]
df = df[df['utterance_text'].apply(lambda x: len(x.split()) >5 )]

In [None]:
'''Sampling 400 HQ utterances and 100 LW ones for testset''' 

#### import random
import pprint

hq = df[df['mi_quality']==1].to_dict('index')
lq = df[df['mi_quality']==0].to_dict('index')
df = df.to_dict('index')

hq = [v for k,v in hq.items()]
lq = [v for k,v in lq.items()]
df = [v for k,v in df.items()]

hq = random.sample(hq, 400)
lq = random.sample(lq,100)

indexes = []
hq_utter = [el['utterance_text'] for el in hq]
lq_utter = [el['utterance_text'] for el in lq]

i = 0
for el in df:
    utterance = el['utterance_text']
    if utterance in hq_utter or utterance in lq_utter:
        indexes += [i]
    i +=1

df = [v for i,v in enumerate(df) if i not in frozenset(indexes)] 

df = pd.DataFrame(df)
hq = pd.DataFrame(hq)
lq = pd.DataFrame(lq)
test_set = pd.concat([hq,lq]).reset_index(drop=True)


In [None]:
'''Saving the files''' 

test_set.to_csv('/path/to/test-set.csv')
df.to_csv('/path/to/anno_mi_therapist_only.csv')

<h3>Instantiating the augmentation pipeline</h3>

In [8]:
cd /home/uccollab/Desktop/NL-Augmenter/nlaugmenter/transformations/factive_verb_transformation

/home/uccollab/Desktop/NL-Augmenter/nlaugmenter/transformations/factive_verb_transformation


In [9]:
'''Check below for information on each element. Computationally intensive ones are omitted.'''

start_time = time.time()
print("Instantiating augmenters, this may take a while...")
augmenters = [(FactiveVerbTransformation(),"FactiveVerb"), 
              (Formal2Casual(),"Formal2Casual"), 
              #(LostInTranslation(),"LostInTranslation"), 
              (HyperNymAugmentation(),"Hypernym substitution"),
              (StyleTransferParaphraser(style="Basic"),"Basic style"),
              (StyleTransferParaphraser(style="Tweets"),"Tweet style"),
              (SynonymSubstitution(),"Synonym substitution"),
              (BackTranslation(),"Backtranslation"),
              (FillerWordAugmentation(),"Filler Word"),
              (ProtaugmentDiverseParaphrase(),"ProtAugment"),
              #(SentenceAdd(),"Sentence Add"),
              (Slangificator(),"Slangificator"), 
              #(ParaphraseSowReap(max_outputs=4),"Sow Reap")
             ]
print(f"Augmenters instantiated ({round(time.time()-start_time,2)} seconds)")

Instantiating augmenters, this may take a while...


Some weights of the model checkpoint at filco306/gpt2-base-style-paraphraser were not used when initializing GPT2LMHeadModel: ['transformer.extra_embedding_project.weight', 'transformer.extra_embedding_project.bias']
- This IS expected if you are initializing GPT2LMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPT2LMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at filco306/gpt2-tweet-paraphraser were not used when initializing GPT2LMHeadModel: ['transformer.extra_embedding_project.weight', 'transformer.extra_embedding_project.bias']
- This IS expected if you are initializing GPT2LMHeadModel from the checkpoint of a model trained on anoth

Augmenters instantiated (54.75 seconds)


<h3>Target-based augmentation</h3>
The following is a naive augmentation, with the goal of balancing HQ and LQ utterances. It iterates the dataset and augments the minority class (LQ in the case of AnnoMI) until balance is reached.

This procedure results in the <b>Anno-AugMI</b> dataset

In [None]:
df= pd.read_csv("/path/to/anno_mi_therapist_only.csv")

df_lo = df[df["mi_quality"]==0]
df_hi = df[df["mi_quality"]==1]

print(f'LQ: {df_lo.shape[0]}')
print(f'HQ: {df_hi.shape[0]}')

df_lo = df_lo.to_dict('index')
df_lo = [v for k,v in df_lo.items()]
print(f'Total utter.: {len(df)}')

In [None]:
'''For analysis and debugging reasons, this procedure introduces an additional column, indicating
for each augmentation, the pipeline element that produced it.'''

augmentation_result = []
aug_counter = 0
row_counter = 0
aug_data = {}
print("Beginning augmentation")
for row in df_lo: 
    used_augmenters = []
    utterance = row['utterance_text']
    row_counter += 1
    final_result = []
    print(f"Augmenting row {row_counter}/{len(df_lo)}")
    print(f"\n{utterance}\n")
    row_start_time = time.time()
    for augmenter, name in augmenters:
        aug_counter += 1
        aug_start_time = time.time()
        try:
            aug_result = augmenter.generate(utterance)
            if type(aug_result) == list:
                for aug_utterance in aug_result:
                    if aug_utterance != utterance:
                        aug_utterance = aug_utterance.replace('\\','')
                        aug_utterance = re.sub('\+', ' ', aug_utterance)
                        final_result.append(aug_utterance)
                        used_augmenters.append(name)
            elif aug_result != utterance:
                aug_result = aug_result.replace('\\','')
                final_result.append(aug_result)
                used_augmenters.append(name)
            print(f"Concluded {name} ({aug_counter}/{len(augmenters)}; {round(time.time()-aug_start_time,2)} seconds).")
        except:
            print(f"Failed {name} ({aug_counter}/{len(augmenters)}; {round(time.time()-aug_start_time,2)} seconds).")
    print(f"Row augmented ({round(time.time()-row_start_time,2)} seconds).")
    aug_counter = 0
    augmented_row = {}
    augmented_row.update(row)
    augmented_row.update({"augmentation":final_result})
    augmented_row.update({"augmenters": used_augmenters})
    aug_data.update(augmented_row)
    if(row_counter == 1):
        csv_file = open('/path/to/annoMI_therapist_augmented_WITH_AUGMENT_INFO.csv', mode='w')
        fieldnames = [el for el in augmented_row.keys()]
        writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerow(augmented_row)
        csv_file.close()   
    else:
        csv_file = open('/path/to/annoMI_therapist_augmented_WITH_AUGMENT_INFO.csv', mode='a')
        fieldnames = [el for el in augmented_row.keys()]
        writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
        writer.writerow(augmented_row)
        csv_file.close()
        



In [None]:
'''Saving Anno-AUGMI'''
aug = pd.read_csv('path/to/annoMI_therapist_augmented_WITH_AUGMENT_INFO.csv')

final = []
for index, row in aug.iterrows():
    aug_list = ast.literal_eval(row['augmentation'])
    for single_augmentation in aug_list:
        new_row = copy.deepcopy(row)
        new_row['utterance_text'] = str(single_augmentation)
        final.append(new_row)

df_aug = pd.DataFrame(final)
anno_augmi = df.append(df_aug, ignore_index=True)
anno_augmi = anno_augmi[['utterance_text','mi_quality','topic']]
anno_augmi.to_csv('path/to/anno_augmi_therapist_only.csv')

<h3>Pre-processing (necessary for Fairness-aware augmentation only)</h3>
The next augmentation takes a sensitive variable as the target. In this case we chose therapy topics, meaning that we will try to obtain the same amount of HQ and LQ utterances regardless of the topics. If this is impossible for some topics (e.g.: there are no LQ or no HQ utterances at all) they will be automatically excluded.

This procedure results in the <b>Anno-FairMI</b> dataset

In [None]:
df= pd.read_csv("path/to/anno_mi_therapist_only.csv")

In [11]:
#aggregating mi_quality sentences by topic (to inspect topics that should be cut out)
out = (df.groupby('topic')['mi_quality']
   .value_counts()
   .sort_index()
   .to_frame(name='Utterances no.')
).reset_index()

In [12]:
out

Unnamed: 0,topic,mi_quality,Utterances no.
0,Being assertive with flatmate about moving out,1,18
1,anxiety management,1,27
2,asthma management,0,9
3,asthma management,1,52
4,avoiding DOI,1,89
5,better oral health,0,10
6,better oral health,1,13
7,birth control,1,10
8,changing approach to disease,1,60
9,charging battery,1,12


In [13]:
#moving in dict to exclude topics that lacks high or low quality examples
topics_stats = {}
for row in out.iterrows():
    topic = row[1][0]
    quality = row[1][1]
    count = row[1][2]
    if topic in topics_stats.keys():
        prev = topics_stats.get(topic)
        hi = prev['hi']
        lo = prev['lo']
        topics_stats.update({topic: {"hi":hi+count, "lo":lo}} if quality == 1 else {topic: {"hi":hi, "lo":lo+count}})
    else:
        topics_stats.update({topic: {"hi":count, "lo":0}} if quality == 1 else {topic: {"hi":0, "lo":count}})

In [14]:
usable_topics = dict(filter(lambda x: x[1]['hi'] != 0 and x[1]['lo'] != 0, 
                      topics_stats.items()))
pp.pprint(usable_topics)

{'asthma management': {'hi': 52, 'lo': 9},
 'better oral health': {'hi': 13, 'lo': 10},
 'compliance with rules': {'hi': 50, 'lo': 5},
 'diabetes management': {'hi': 133, 'lo': 4},
 'more exercise / increasing activity': {'hi': 113, 'lo': 3},
 'providing information on medicines': {'hi': 4, 'lo': 3},
 'reducing alcohol consumption': {'hi': 413, 'lo': 99},
 'smoking cessation': {'hi': 89, 'lo': 47},
 'smoking cessation ': {'hi': 142, 'lo': 4},
 'smoking cessation; reducing alcohol consumption': {'hi': 15, 'lo': 4},
 'taking medicine / following medical procedure': {'hi': 106, 'lo': 31}}


In [15]:
useless_topics = dict(filter(lambda x: x[1]['hi'] == 0 or x[1]['lo'] == 0, 
                      topics_stats.items()))
pp.pprint(useless_topics)

{'Being assertive with flatmate about moving out': {'hi': 18, 'lo': 0},
 'anxiety management': {'hi': 27, 'lo': 0},
 'avoiding DOI': {'hi': 89, 'lo': 0},
 'birth control': {'hi': 10, 'lo': 0},
 'changing approach to disease': {'hi': 60, 'lo': 0},
 'charging battery': {'hi': 12, 'lo': 0},
 'completion of community service': {'hi': 16, 'lo': 0},
 'diagnosis': {'hi': 0, 'lo': 5},
 'diet; reducing alcohol consumption; diabetes management': {'hi': 18, 'lo': 0},
 'engaging in community activities': {'hi': 4, 'lo': 0},
 'increasing activity; taking medicine / following medical procedure': {'hi': 31,
                                                                        'lo': 0},
 'increasing self-confidence': {'hi': 18, 'lo': 0},
 'managing life': {'hi': 24, 'lo': 0},
 'more exercise / increasing activity; weight loss': {'hi': 29, 'lo': 0},
 'not getting into a car with someone who is under the influence of drugs or alcohol': {'hi': 0,
                                                        

In [16]:
#filtering topics
df = df[df['topic'].isin(usable_topics.keys())]

In [17]:
df

Unnamed: 0.1,Unnamed: 0,mi_quality,topic,utterance_text
0,0,1,reducing alcohol consumption,Thanks for filling it out. We give this form t...
1,1,1,reducing alcohol consumption,"So, let's see. It looks that you put-- You dri..."
2,2,1,reducing alcohol consumption,Okay. That's at least 12 drinks a week.
3,3,1,reducing alcohol consumption,"Uh, what else can you tell me about your drink..."
4,4,1,reducing alcohol consumption,"Okay. So, can I share with you some informatio..."
...,...,...,...,...
2489,2489,0,smoking cessation,Okay. So a patch is something you've already s...
2490,2490,0,smoking cessation,"-and, again, I'll get some out and I'll show y..."
2491,2491,0,smoking cessation,And that's exactly one of the symptoms of nico...
2492,2492,0,smoking cessation,"So, I would strongly recommend that you have t..."


In [18]:
'''must be assessed by looking at the output above: in our case we chose therapy topic and
the highest available number of utterances (either HQ or LQ, the highest) is 413 for <<reducing alcohol consumption>>'''
target = 413

<h3> Fairness aware augmentation (augmenting to balance target variable for each sensitive variable group) </h3>

In [19]:
augmentation_result = []
aug_counter = 0
row_counter = 0
aug_data = {}
print("Beginning augmentation")

hi_augmentations = []
lo_augmentations = []
for category in list(usable_topics.items()):
    topic = category[0]
    hi = category[1].get('hi')
    lo = category[1].get('lo')
    hi_completed = False
    lo_completed = False
    hi_augmentations = []
    lo_augmentations = []
    while(not (hi_completed and lo_completed)):
        row_counter = 1
        rows = df[df['topic'] == topic]
        
        for row in rows.iterrows():
            row_counter += 1
            utterance = row[1]['utterance_text']
            quality = row[1]['mi_quality']

            if hi_completed and lo_completed:
                break
            elif (quality==1 and hi_completed) or (quality==0 and lo_completed):
                print(f'skipping row {row_counter}: quality target reached')
                continue

            print(f'Working on topic "{topic}"')
            print(f"Augmenting row {row_counter}/{len(rows)}")
            print(f"\n{utterance}\n")
            row_start_time = time.time()
            aug_counter = 0
            for augmenter, name in augmenters:
                aug_counter += 1
                aug_start_time = time.time()
                try:
                    aug_result = augmenter.generate(utterance)
                    if type(aug_result) == list:
                        for aug_utterance in aug_result:
                            if aug_utterance != utterance:
                                aug_utterance = aug_utterance.replace('\\','')
                                aug_utterance = re.sub('\+', ' ', aug_utterance)
                                augmented_row = copy.deepcopy(row[1])
                                augmented_row['utterance_text'] = aug_utterance
                                if quality == 1:
                                    hi_augmentations.append(augmented_row)
                                else:
                                    lo_augmentations.append(augmented_row)
                    elif aug_result != utterance:
                        aug_result = aug_result.replace('\\','')
                        augmented_row = copy.deepcopy(row[1])
                        augmented_row['utterance_text'] = aug_utterance
                        if quality == 1:
                            hi_augmentations.append(augmented_row)
                        else:
                            lo_augmentations.append(augmented_row)
                    print(f"Concluded {name} ({aug_counter}/{len(augmenters)}; {round(time.time()-aug_start_time,2)} seconds).")
                except:
                    print(f"Failed {name} ({aug_counter}/{len(augmenters)}; {round(time.time()-aug_start_time,2)} seconds).")
            print(f"Row augmented ({round(time.time()-row_start_time,2)} seconds).")
            
            print(f'For current topic: ')
            print(f'\t - started from {len(rows)}')
            print(f'\t - HQ examples at this point: {len(hi_augmentations) + hi}')
            print(f'\t - LQ examples at this point: {len(lo_augmentations) + lo}')
            print(f'\t - Overall, {len(lo_augmentations) + lo + len(hi_augmentations) + hi}/{target*2}')
            if len(hi_augmentations) + hi >= target:
                print(f'-> HIGH QUALITY TARGET REACHED')
            if len(lo_augmentations) + lo >= target:
                print(f'-> LOW QUALITY TARGET REACHED')
            
            if len(hi_augmentations) + hi >= target:
                hi_completed = True
            if len(lo_augmentations) + lo >= target:
                lo_completed = True
                
        if not hi_completed:
            while (len(hi_augmentations) + hi) < target:
                hi_augmentations += copy.deepcopy(hi_augmentations)
            if (len(hi_augmentations)+hi) > target:
                while (len(hi_augmentations) + hi) > target:
                    hi_augmentations.pop()
            hi_completed = True
        if not lo_completed:
            while (len(lo_augmentations) + lo) < target:
                lo_augmentations += copy.deepcopy(lo_augmentations)
            if (len(lo_augmentations)+lo) > target:
                while (len(lo_augmentations) + lo) > target:
                    lo_augmentations.pop()
            lo_completed = True
        if hi_completed and lo_completed:
            df = df.append(hi_augmentations, ignore_index=True)
            df = df.append(lo_augmentations, ignore_index=True)
            
            
    

Beginning augmentation
Working on topic "asthma management"
Augmenting row 2/61

Hi, Sal. Thanks for coming in today.

Failed FactiveVerb (1/10; 0.01 seconds).




Concluded Formal2Casual (2/10; 1.18 seconds).
Failed Hypernym substitution (3/10; 0.84 seconds).
Concluded Basic style (4/10; 11.69 seconds).
Concluded Tweet style (5/10; 11.4 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Concluded Backtranslation (7/10; 1.79 seconds).
Concluded Filler Word (8/10; 0.0 seconds).




Concluded ProtAugment (9/10; 0.65 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.59 seconds).
For current topic: 
	 - started from 61
	 - HQ examples at this point: 62
	 - LQ examples at this point: 9
	 - Overall, 71/826
Working on topic "asthma management"
Augmenting row 3/61

I saw that your doctor made the referral and, um, and I really appreciate you taking the time to come and make that appointment.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.21 seconds).
Failed Hypernym substitution (3/10; 0.56 seconds).
Concluded Basic style (4/10; 8.2 seconds).
Concluded Tweet style (5/10; 8.37 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 3.27 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.11 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (23.74 seconds).
For current topic: 
	 - started from 61
	 - HQ examples at this point: 7

Concluded Formal2Casual (2/10; 2.31 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 9.38 seconds).
Concluded Tweet style (5/10; 8.24 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.65 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.78 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (23.96 seconds).
For current topic: 
	 - started from 61
	 - HQ examples at this point: 187
	 - LQ examples at this point: 9
	 - Overall, 196/826
Working on topic "asthma management"
Augmenting row 14/61

Right. Right, right. So the appointments and time.

Failed FactiveVerb (1/10; 0.0 seconds).
Concluded Formal2Casual (2/10; 1.22 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 12.32 seconds).
Concluded Tweet style (5/10; 12.39 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.

Concluded ProtAugment (9/10; 2.19 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (37.87 seconds).
For current topic: 
	 - started from 61
	 - HQ examples at this point: 219
	 - LQ examples at this point: 77
	 - Overall, 296/826
Working on topic "asthma management"
Augmenting row 23/61

Hi, I'm Bruce Berger. I work in the clinic and, uh, I noticed that you're in to pick up a prescription for Sarah's, uh, rescue inhaler-

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.61 seconds).
Failed Hypernym substitution (3/10; 0.6 seconds).
Concluded Basic style (4/10; 11.38 seconds).
Concluded Tweet style (5/10; 11.4 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 4.17 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.69 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (31.87 seconds).
For current topic: 
	 - started from 61
	 - HQ examples at t

Concluded Formal2Casual (2/10; 2.77 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 12.46 seconds).
Concluded Tweet style (5/10; 12.38 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 4.7 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 2.54 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (35.45 seconds).
For current topic: 
	 - started from 61
	 - HQ examples at this point: 323
	 - LQ examples at this point: 77
	 - Overall, 400/826
Working on topic "asthma management"
Augmenting row 32/61

Do you mind if I talk to you a little bit about what your daughter's lungs are like when she's having these inflammations and needing to go to the emergency room?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.52 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 8.23 seconds).
Concluded Twe

Concluded Formal2Casual (2/10; 3.18 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 12.42 seconds).
Concluded Tweet style (5/10; 12.4 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 8.62 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.4 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (38.63 seconds).
For current topic: 
	 - started from 61
	 - HQ examples at this point: 415
	 - LQ examples at this point: 89
	 - Overall, 504/826
-> HIGH QUALITY TARGET REACHED
Working on topic "asthma management"
Augmenting row 44/61

Okay, I, you know, I-I don't want to argue with you about this. I-I know that if you-you know, and if you want to keep Sarah out of the emergency room, uh, I-I just wish you'd heed what I'm saying. She needs to use that chronic inhaler every day to get her- to get her asthma under control, and the smoke on your clothes and the smo

  df = df.append(hi_augmentations, ignore_index=True)
  df = df.append(lo_augmentations, ignore_index=True)


Working on topic "better oral health"
Augmenting row 2/23

You got your lip pierced. How long ago did you get that done?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 1.57 seconds).
Failed Hypernym substitution (3/10; 0.74 seconds).
Concluded Basic style (4/10; 11.42 seconds).
Concluded Tweet style (5/10; 11.43 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.7 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.73 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.61 seconds).
For current topic: 
	 - started from 23
	 - HQ examples at this point: 24
	 - LQ examples at this point: 10
	 - Overall, 34/826
Working on topic "better oral health"
Augmenting row 3/23

What made you decide to get-get a lip piercing?

Failed FactiveVerb (1/10; 0.0 seconds).
Concluded Formal2Casual (2/10; 1.35 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Ba

Concluded Formal2Casual (2/10; 1.64 seconds).
Failed Hypernym substitution (3/10; 0.59 seconds).
Concluded Basic style (4/10; 11.49 seconds).
Concluded Tweet style (5/10; 11.51 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.7 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.74 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.69 seconds).
For current topic: 
	 - started from 23
	 - HQ examples at this point: 146
	 - LQ examples at this point: 10
	 - Overall, 156/826
Working on topic "better oral health"
Augmenting row 14/23

-with me, and also before I do the exam, you know, I'm always on your case about flossing, so maybe we can take [inaudible 00:02:46]-

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.65 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 8.28 seconds).
Concluded Tweet style (5/10; 8.22 seconds

Concluded ProtAugment (9/10; 2.87 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (35.24 seconds).
For current topic: 
	 - started from 23
	 - HQ examples at this point: 157
	 - LQ examples at this point: 113
	 - Overall, 270/826
Working on topic "better oral health"
Augmenting row 24/23

Okay. We'll continue with the exam, but you know where I stand on this. Okay?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 1.83 seconds).
Failed Hypernym substitution (3/10; 0.59 seconds).
Concluded Basic style (4/10; 12.39 seconds).
Concluded Tweet style (5/10; 12.43 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.24 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.85 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (30.35 seconds).
For current topic: 
	 - started from 23
	 - HQ examples at this point: 157
	 - LQ examples at this point: 124
	 - 

Concluded Formal2Casual (2/10; 2.84 seconds).
Failed Hypernym substitution (3/10; 0.77 seconds).
Concluded Basic style (4/10; 12.47 seconds).
Concluded Tweet style (5/10; 12.49 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 6.96 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 2.51 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (38.08 seconds).
For current topic: 
	 - started from 55
	 - HQ examples at this point: 94
	 - LQ examples at this point: 62
	 - Overall, 156/826
Working on topic "compliance with rules"
Augmenting row 11/55

Hi, Mary, how was your day?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 1.1 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 8.3 seconds).
Concluded Tweet style (5/10; 8.27 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.13 seconds).
Conc

Concluded Basic style (4/10; 8.32 seconds).
Concluded Tweet style (5/10; 8.3 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.07 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.98 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (22.03 seconds).
For current topic: 
	 - started from 55
	 - HQ examples at this point: 206
	 - LQ examples at this point: 62
	 - Overall, 268/826
Working on topic "compliance with rules"
Augmenting row 21/55

So you're still kinda concerned that you're not quite sure if you will be able to do this successfully?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.17 seconds).
Concluded Hypernym substitution (3/10; 0.45 seconds).
Concluded Basic style (4/10; 8.35 seconds).
Concluded Tweet style (5/10; 8.29 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.21 seconds).
Concluded Filler Word 

Concluded Formal2Casual (2/10; 3.84 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 12.85 seconds).
Concluded Tweet style (5/10; 12.8 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 12.59 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.93 seconds).
Concluded Slangificator (10/10; 0.02 seconds).
Row augmented (44.63 seconds).
For current topic: 
	 - started from 55
	 - HQ examples at this point: 316
	 - LQ examples at this point: 62
	 - Overall, 378/826
Working on topic "compliance with rules"
Augmenting row 31/55

Okay, so for you getting on time for first hour is something you'd like to work on the next 30 days?

Failed FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.18 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 8.31 seconds).
Concluded Tweet style (5/10; 8.27 seconds).
Failed Synonym substitution (

Concluded Formal2Casual (2/10; 1.23 seconds).
Concluded Hypernym substitution (3/10; 0.45 seconds).
Concluded Basic style (4/10; 11.45 seconds).
Concluded Tweet style (5/10; 11.6 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.28 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.65 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (26.66 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 133
	 - LQ examples at this point: 15
	 - Overall, 148/826
Working on topic "diabetes management"
Augmenting row 3/137

Mm-hmm. I see- I see that your A1C or your blood glucose is very high.

Concluded FactiveVerb (1/10; 0.01 seconds).
No transfer found!
Concluded Formal2Casual (2/10; 2.32 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 11.57 seconds).
Concluded Tweet style (5/10; 11.44 seconds).
Failed Synonym substitution (6/10;

Concluded Formal2Casual (2/10; 2.47 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 11.42 seconds).
Concluded Tweet style (5/10; 11.36 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.99 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.4 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (30.24 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 212
	 - LQ examples at this point: 51
	 - Overall, 263/826
Working on topic "diabetes management"
Augmenting row 13/137

So it sounds a lot like, uh, on top of the fact that you've got something you're really scared about having consequences in terms of everything that can happen, what worries you also is all the different things that she told you you're going to need to do to get it under control. Can't imagine doing all those things, so you almost feel like all that bad st

Concluded Formal2Casual (2/10; 1.53 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 8.23 seconds).
Concluded Tweet style (5/10; 8.2 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.61 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.77 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (20.93 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 315
	 - LQ examples at this point: 51
	 - Overall, 366/826
Working on topic "diabetes management"
Augmenting row 22/137

But before we get to anything, I'll probably want to just ask you a little bit about what's led you to come here today and get a bit of an idea about what's been going on for you.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.63 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 8.24 seco

Concluded Formal2Casual (2/10; 1.79 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 8.3 seconds).
Concluded Tweet style (5/10; 8.2 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.1 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.93 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (21.92 seconds).
For current topic: 
	 - started from 116
	 - HQ examples at this point: 125
	 - LQ examples at this point: 3
	 - Overall, 128/826
Working on topic "more exercise / increasing activity"
Augmenting row 3/116

Well, I mean, I think you just need to maybe sit down and write down a list of reasons why you need to get back to the gym, and then on the other side of the list, put down, you know, um, ideas about what's going to help you get there.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.74 seconds).
Failed Hypernym subst

Concluded Formal2Casual (2/10; 1.25 seconds).
Failed Hypernym substitution (3/10; 0.76 seconds).
Concluded Basic style (4/10; 8.29 seconds).
Concluded Tweet style (5/10; 8.25 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.26 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.71 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (20.55 seconds).
For current topic: 
	 - started from 116
	 - HQ examples at this point: 238
	 - LQ examples at this point: 3
	 - Overall, 241/826
Working on topic "more exercise / increasing activity"
Augmenting row 13/116

Yeah. What are you really glad about?

Concluded FactiveVerb (1/10; 0.0 seconds).
Concluded Formal2Casual (2/10; 1.11 seconds).
Concluded Hypernym substitution (3/10; 0.46 seconds).
Concluded Basic style (4/10; 11.44 seconds).
Concluded Tweet style (5/10; 11.47 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslat

Concluded Tweet style (5/10; 8.21 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 3.15 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.32 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (24.17 seconds).
For current topic: 
	 - started from 116
	 - HQ examples at this point: 351
	 - LQ examples at this point: 3
	 - Overall, 354/826
Working on topic "more exercise / increasing activity"
Augmenting row 23/116

-might have probably helped you break your fall this time.

Failed FactiveVerb (1/10; 0.0 seconds).
Concluded Formal2Casual (2/10; 1.29 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 8.23 seconds).
Concluded Tweet style (5/10; 8.19 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.47 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.73 seconds).
Concluded Slang

Concluded Formal2Casual (2/10; 2.23 seconds).
Failed Hypernym substitution (3/10; 0.59 seconds).
Concluded Basic style (4/10; 11.43 seconds).
Concluded Tweet style (5/10; 11.41 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.2 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.78 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.66 seconds).
For current topic: 
	 - started from 116
	 - HQ examples at this point: 415
	 - LQ examples at this point: 14
	 - Overall, 429/826
-> HIGH QUALITY TARGET REACHED
Working on topic "more exercise / increasing activity"
Augmenting row 96/116

-feel like this is something that you need to do soon because, um, a lot of times, you're just a lot of talk. And then, there's no action. And if you really wanna see results, I think it's something that needs to be done. And, you know, stop being lazy about it and just do it.

Concluded FactiveVerb (1/10; 0.0

Concluded ProtAugment (9/10; 1.67 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (36.04 seconds).
For current topic: 
	 - started from 7
	 - HQ examples at this point: 38
	 - LQ examples at this point: 37
	 - Overall, 75/826
Working on topic "providing information on medicines"
Augmenting row 8/7

Oh, that I'm not sure of. Um, can I get back to you later on this afternoon with that answer?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.32 seconds).
Failed Hypernym substitution (3/10; 0.6 seconds).
Concluded Basic style (4/10; 11.39 seconds).
Concluded Tweet style (5/10; 11.47 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.22 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.01 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (29.03 seconds).
For current topic: 
	 - started from 7
	 - HQ examples at this point: 50
	 - LQ examples 

Concluded Formal2Casual (2/10; 1.43 seconds).
Failed Hypernym substitution (3/10; 0.59 seconds).
Concluded Basic style (4/10; 11.45 seconds).
Concluded Tweet style (5/10; 11.45 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.84 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.64 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.41 seconds).
For current topic: 
	 - started from 512
	 - HQ examples at this point: 425
	 - LQ examples at this point: 168
	 - Overall, 593/826
-> HIGH QUALITY TARGET REACHED
Working on topic "reducing alcohol consumption"
Augmenting row 61/512

Okay. Well, I'm concerned about the amount that you're drinking currently. It sounds like you're having about 15 to 20 drinks per week, and that's definitely more than what we would recommend and could certainly be contributing to your high blood pressure.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Form

Concluded Formal2Casual (2/10; 2.54 seconds).
Failed Hypernym substitution (3/10; 0.76 seconds).
Concluded Basic style (4/10; 8.22 seconds).
Concluded Tweet style (5/10; 8.19 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 3.79 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.42 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (24.96 seconds).
For current topic: 
	 - started from 512
	 - HQ examples at this point: 425
	 - LQ examples at this point: 269
	 - Overall, 694/826
-> HIGH QUALITY TARGET REACHED
Working on topic "reducing alcohol consumption"
Augmenting row 74/512

Okay. So, you know, as we said before, you're drinking-- it sounds like you're drinking about 20 drinks a week, over the course of a few days. And I did want to point out to you that the recommended limit for women is actually no more than seven drinks a week, and no more than three on a given day.

Concluded Factive

Concluded Formal2Casual (2/10; 3.27 seconds).
Failed Hypernym substitution (3/10; 0.59 seconds).
Concluded Basic style (4/10; 13.22 seconds).
Concluded Tweet style (5/10; 13.22 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 9.1 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 4.39 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (43.83 seconds).
For current topic: 
	 - started from 512
	 - HQ examples at this point: 425
	 - LQ examples at this point: 363
	 - Overall, 788/826
-> HIGH QUALITY TARGET REACHED
Working on topic "reducing alcohol consumption"
Augmenting row 227/512

Wow. That's a long time. What other instruments?

Failed FactiveVerb (1/10; 0.0 seconds).
Concluded Formal2Casual (2/10; 1.35 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 12.42 seconds).
Concluded Tweet style (5/10; 12.32 seconds).
Failed Synonym substitution (6/10; 0.01

Concluded ProtAugment (9/10; 1.15 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (32.05 seconds).
For current topic: 
	 - started from 136
	 - HQ examples at this point: 143
	 - LQ examples at this point: 47
	 - Overall, 190/826
Working on topic "smoking cessation"
Augmenting row 7/136

-the unpleasantness of that. Okay. What else might be on the, not so good side about smoking- about stopping?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.3 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 12.43 seconds).
Concluded Tweet style (5/10; 12.43 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.78 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.09 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (31.63 seconds).
For current topic: 
	 - started from 136
	 - HQ examples at this point: 155
	 - LQ e

Concluded Formal2Casual (2/10; 1.72 seconds).
Failed Hypernym substitution (3/10; 0.59 seconds).
Concluded Basic style (4/10; 11.5 seconds).
Concluded Tweet style (5/10; 11.43 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.83 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.74 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.83 seconds).
For current topic: 
	 - started from 136
	 - HQ examples at this point: 265
	 - LQ examples at this point: 47
	 - Overall, 312/826
Working on topic "smoking cessation"
Augmenting row 18/136

Okay. If I asked you to pick a number, 0 through 10, 0 meaning you're not very confident at all in meeting your goal, or 10 meaning you're very confident, what number would you pick?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.61 seconds).
Failed Hypernym substitution (3/10; 0.77 seconds).
Concluded Basic style (4/10; 11.48 

Concluded ProtAugment (9/10; 0.6 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (20.2 seconds).
For current topic: 
	 - started from 136
	 - HQ examples at this point: 373
	 - LQ examples at this point: 47
	 - Overall, 420/826
Working on topic "smoking cessation"
Augmenting row 28/136

You-- Sounds like you did that, and that allowed you to really focus on the time [unintelligible 00:00:57] delaying your cigarettes. So-so that's gone well. So, what are the things that are not going as well as you'd liked?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.75 seconds).
Failed Hypernym substitution (3/10; 0.78 seconds).
Concluded Basic style (4/10; 12.44 seconds).
Concluded Tweet style (5/10; 12.44 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 6.28 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.26 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
R

Concluded Formal2Casual (2/10; 1.76 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 11.42 seconds).
Concluded Tweet style (5/10; 11.83 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.91 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.78 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (28.3 seconds).
For current topic: 
	 - started from 136
	 - HQ examples at this point: 417
	 - LQ examples at this point: 104
	 - Overall, 521/826
-> HIGH QUALITY TARGET REACHED
Working on topic "smoking cessation"
Augmenting row 96/136

Good. Here, take a look, see what you think. And that feeling-

Failed FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 1.79 seconds).
Failed Hypernym substitution (3/10; 0.61 seconds).
Concluded Basic style (4/10; 12.46 seconds).
Concluded Tweet style (5/10; 12.42 seconds).
Failed Synonym substitution (6/10; 0

Concluded Formal2Casual (2/10; 2.65 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 12.65 seconds).
Concluded Tweet style (5/10; 12.49 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 3.84 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.71 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (33.93 seconds).
For current topic: 
	 - started from 136
	 - HQ examples at this point: 417
	 - LQ examples at this point: 217
	 - Overall, 634/826
-> HIGH QUALITY TARGET REACHED
Working on topic "smoking cessation"
Augmenting row 106/136

Okay. Well, as I said, there's a lot of irritation, in particular, like I was mentioning, up here in the roof of your mouth, it's turned really definitely white-colored and there's little red spots. And you probably didn't notice because we don't spend a lot of time looking at the roof of our mouth, but you're really damagin

Concluded ProtAugment (9/10; 1.01 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (29.21 seconds).
For current topic: 
	 - started from 136
	 - HQ examples at this point: 417
	 - LQ examples at this point: 311
	 - Overall, 728/826
-> HIGH QUALITY TARGET REACHED
Working on topic "smoking cessation"
Augmenting row 114/136

What have you got in mind, what do you want from us?

Concluded FactiveVerb (1/10; 0.0 seconds).
Concluded Formal2Casual (2/10; 1.4 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 8.54 seconds).
Concluded Tweet style (5/10; 8.27 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.44 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.79 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (21.05 seconds).
For current topic: 
	 - started from 136
	 - HQ examples at this point: 417
	 - LQ examples at this point: 322

Concluded Formal2Casual (2/10; 1.68 seconds).
Concluded Hypernym substitution (3/10; 0.44 seconds).
Concluded Basic style (4/10; 11.49 seconds).
Concluded Tweet style (5/10; 11.51 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.12 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.75 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (28.0 seconds).
For current topic: 
	 - started from 146
	 - HQ examples at this point: 153
	 - LQ examples at this point: 4
	 - Overall, 157/826
Working on topic "smoking cessation "
Augmenting row 3/146

So, let me ask you this, Bob, what are some of your wife's concerns about your smoking?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 1.86 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 8.28 seconds).
Concluded Tweet style (5/10; 8.24 seconds).
Failed Synonym substitution (6/10; 0.01 

Concluded Backtranslation (7/10; 4.12 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.94 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (32.36 seconds).
For current topic: 
	 - started from 146
	 - HQ examples at this point: 254
	 - LQ examples at this point: 4
	 - Overall, 258/826
Working on topic "smoking cessation "
Augmenting row 12/146

If you wanted to bring your wife along, I'd be more than happy to talk with both of you.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.01 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 8.3 seconds).
Concluded Tweet style (5/10; 8.51 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.02 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.13 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (22.56 seconds).
For current top

Concluded Formal2Casual (2/10; 1.67 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 11.51 seconds).
Concluded Tweet style (5/10; 11.46 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.82 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.82 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.88 seconds).
For current topic: 
	 - started from 146
	 - HQ examples at this point: 376
	 - LQ examples at this point: 4
	 - Overall, 380/826
Working on topic "smoking cessation "
Augmenting row 23/146

No, I understand- I understand. And I'm certainly not here to tell you what to-to do, Sarah. I'm interested in hearing your thoughts and sharing some information with you, and what you do with that information is totally up to you. You've got a lot going on. You're a mother, you're working, you're going to school, and our healthcare team wants to make sure 

Concluded Formal2Casual (2/10; 1.59 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 11.55 seconds).
Concluded Tweet style (5/10; 11.51 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.59 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.75 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.6 seconds).
For current topic: 
	 - started from 19
	 - HQ examples at this point: 27
	 - LQ examples at this point: 4
	 - Overall, 31/826
Working on topic "smoking cessation; reducing alcohol consumption"
Augmenting row 3/19

Um, well the good news is that the exam's over and I don't think Jake has pneumonia. What I actually think he has is a cold. So there are some things you could do to manage his symptoms. Would you be interested in knowing?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.8 seconds).
Failed Hypernym substit

Concluded Formal2Casual (2/10; 2.69 seconds).
Failed Hypernym substitution (3/10; 0.76 seconds).
Concluded Basic style (4/10; 12.56 seconds).
Concluded Tweet style (5/10; 12.6 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 4.65 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.54 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (34.84 seconds).
For current topic: 
	 - started from 19
	 - HQ examples at this point: 133
	 - LQ examples at this point: 4
	 - Overall, 137/826
Working on topic "smoking cessation; reducing alcohol consumption"
Augmenting row 12/19

So, it sounds like you'd be open to reducing as a way to still enjoy the wine but perhaps alleviate the way you feel in the morning, some of the hangover effects that sounds like you're experiencing.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.77 seconds).
Failed Hypernym substitution (3/10; 0.58 s

No transfer found!
Concluded Formal2Casual (2/10; 3.23 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 14.97 seconds).
Concluded Tweet style (5/10; 14.9 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 9.35 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.91 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (43.97 seconds).
For current topic: 
	 - started from 19
	 - HQ examples at this point: 191
	 - LQ examples at this point: 52
	 - Overall, 243/826
Working on topic "taking medicine / following medical procedure"
Augmenting row 2/137

Hi, Mrs. Smith. I wanted to let you know that at this appointment, we're gonna be giving Lilly some vaccines.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.43 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 11.55 seconds).
Concluded Tweet st

Concluded Formal2Casual (2/10; 2.45 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 8.29 seconds).
Concluded Tweet style (5/10; 8.25 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.98 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.35 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (23.92 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 210
	 - LQ examples at this point: 31
	 - Overall, 241/826
Working on topic "taking medicine / following medical procedure"
Augmenting row 11/137

-and that would be pretty serious. On the other hand, uh, when you found out that this drug might cause muscle weakness, that really alarmed you.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.57 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 11.56 seconds).


Concluded ProtAugment (9/10; 1.61 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (26.23 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 312
	 - LQ examples at this point: 31
	 - Overall, 343/826
Working on topic "taking medicine / following medical procedure"
Augmenting row 20/137

Okay, um, so if so where we are right now if I understand is you're feeling more comfortable about the concern about muscle weakness?

Failed FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.42 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 8.27 seconds).
Concluded Tweet style (5/10; 8.22 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 3.28 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 1.05 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (23.82 seconds).
For current topic: 
	 - started fro

Concluded Backtranslation (7/10; 8.35 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 4.04 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (39.24 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 422
	 - LQ examples at this point: 31
	 - Overall, 453/826
-> HIGH QUALITY TARGET REACHED
skipping row 30: quality target reached
Working on topic "taking medicine / following medical procedure"
Augmenting row 31/137

Uh, I'm here to talk to you about, uh, your cholesterol and the, um, medicine your doctor prescribed.

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.4 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 8.28 seconds).
Concluded Tweet style (5/10; 8.21 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 3.47 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/

Concluded Formal2Casual (2/10; 1.68 seconds).
Failed Hypernym substitution (3/10; 0.57 seconds).
Concluded Basic style (4/10; 11.59 seconds).
Concluded Tweet style (5/10; 11.56 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 2.2 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.88 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (28.5 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 422
	 - LQ examples at this point: 99
	 - Overall, 521/826
-> HIGH QUALITY TARGET REACHED
Working on topic "taking medicine / following medical procedure"
Augmenting row 114/137

Okay. It looks like cholesterol statin. Do you know how you're supposed to take the medication?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 1.82 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 12.54 seconds).
Concluded Tweet st

Concluded Formal2Casual (2/10; 3.1 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 12.85 seconds).
Concluded Tweet style (5/10; 12.84 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 8.52 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 3.82 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (41.76 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 422
	 - LQ examples at this point: 210
	 - Overall, 632/826
-> HIGH QUALITY TARGET REACHED
Working on topic "taking medicine / following medical procedure"
Augmenting row 124/137

Okay. What did the doctor tell you cholesterol statin is for?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 1.44 seconds).
Failed Hypernym substitution (3/10; 0.58 seconds).
Concluded Basic style (4/10; 11.58 seconds).
Concluded Tweet style (5/10; 11.57 seconds).
Faile

Concluded Formal2Casual (2/10; 1.46 seconds).
Failed Hypernym substitution (3/10; 0.59 seconds).
Concluded Basic style (4/10; 11.56 seconds).
Concluded Tweet style (5/10; 11.58 seconds).
Failed Synonym substitution (6/10; 0.01 seconds).
Concluded Backtranslation (7/10; 1.66 seconds).
Concluded Filler Word (8/10; 0.0 seconds).
Concluded ProtAugment (9/10; 0.71 seconds).
Concluded Slangificator (10/10; 0.01 seconds).
Row augmented (27.58 seconds).
For current topic: 
	 - started from 137
	 - HQ examples at this point: 422
	 - LQ examples at this point: 316
	 - Overall, 738/826
-> HIGH QUALITY TARGET REACHED
Working on topic "taking medicine / following medical procedure"
Augmenting row 133/137

All right, and he's right. This medication will help lower your cholesterol, and it will help lower your risk of having a heart attack or a stroke. How did the doctor tell you to take this medicine?

Concluded FactiveVerb (1/10; 0.01 seconds).
Concluded Formal2Casual (2/10; 2.67 seconds).
Failed H

In [21]:
#aggregating mi_quality sentences by topic (to inspect topics that should be cut out)
out = (df.groupby('topic')['mi_quality']
   .value_counts()
   .sort_index()
   .to_frame(name='Utterances no.')
).reset_index()
out

Unnamed: 0,topic,mi_quality,Utterances no.
0,asthma management,0,413
1,asthma management,1,415
2,better oral health,0,413
3,better oral health,1,413
4,compliance with rules,0,413
5,compliance with rules,1,423
6,diabetes management,0,413
7,diabetes management,1,422
8,more exercise / increasing activity,0,413
9,more exercise / increasing activity,1,415


In [22]:
df.to_csv('path/to/anno_fairmi_therapist_only.csv')