# Augmentation demo

To utilize code without any change, dataset should be formatted as below:

```markdown
LABEL    SENTENCE
```
, which is separated with tab (\t).

Also, last line must be empty-entered.

In [1]:
# !pip install -U nltk

import nltk
nltk.download('omw-1.4')
nltk.download('wordnet')

[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /home/youngerous/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /home/youngerous/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

---

## 1. EDA

- EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks (EMNLP 2019 short)
- https://arxiv.org/pdf/1901.11196.pdf
- Args:
    - **input**: input file of unaugmented data
    - **output**: output file of augmented data (default: eda_{input}.txt)
    - **num_aug**: number of augmented sentences per original sentence (default: 9)
    - **alpha_sr**: percent of words in each sentence to be replaced by **synonyms** (default: 0.1)
    - **alpha_ri**: percent of words in each sentence to be **inserted** (default: 0.1)
    - **alpha_rs**: percent of words in each sentence to be **swapped** (default: 0.1)
    - **alpha_rd**: percent of words in each sentence to be **deleted** (default: 0.1)
 

In [2]:
# run script
!python eda_nlp/code/augment.py --input='sample.txt'

generated augmented sentences with eda for sample.txt to eda_sample.txt with num_aug=9


### 1.1. Original samples (sample.txt)

```markdown
1	Its clean sheet makes me comfort.
1	It was the best car!
0	Braking is not that good compared to other models.
```

### 1.2. Augmented samples (eda_sample.txt)
```markdown
1	its makes me comfort
1	its clean makes sheet me comfort
1	its comfort sheet makes me clean
1	its clean sheet information technology makes me comfort
1	its clean sheet makes me
1	its clean sheet makes me information technology comfort
1	its unclouded sheet makes me comfort
1	its clean sheet makes ca ca me comfort
1	its clean sheet makes me ease
1	its clean sheet makes me comfort 
1	the was it best car
1	it was the best railway car
1	it was the best railcar
1	it was the best car
1	it was the best motorcar
1	it was the represent best car
1	it was the best car
1	it secure was the best car
1	it was the car
1	it was the best car 
0	braking is not that good compared to models
0	braking is not that honorable compared to other models
0	brake is not that good compared to other models
0	not that good compared to models
0	braking is models that good compared to other not
0	that is not braking good compared to other models
0	braking is not that good compared to other mannequin models
0	braking is not that good compared to other
0	braking is not that good equate to other models
0	braking is not that good compared to other models
```

---

## 2. AEDA
- AEDA: An Easier Data Augmentation Technique for Text Classification (EMNLP 2021 short)
- https://arxiv.org/pdf/2108.13230.pdf
- Used punctuations: . , ! ? ; :
- Args:
    - **input**: input file of unaugmented data
    - **output**: output file of augmented data (default: aeda_{input}.txt)
    - **num_aug**: number of augmented sentences per original sentence (default: 9)
    - **punc_ratio**: probability to insert punctuations into sentence (default: 0.3)

In [3]:
# run script
!python aeda_nlp/code/aeda.py --input='sample.txt'

generated augmented sentences with aeda for sample.txt to aeda_sample.txt with num_aug=9


### 2.1. Original samples (sample.txt)

```markdown
1	Its clean sheet makes me comfort.
1	It was the best car!
0	Braking is not that good compared to other models.
```

### 1.2. Augmented samples (aeda_sample.txt)
```markdown
1	! Its clean sheet ; makes me comfort.
1	Its clean ? sheet ! makes me comfort.
1	Its clean sheet makes , me comfort.
1	; Its ! clean sheet makes me comfort.
1	Its clean . sheet makes me comfort.
1	Its clean sheet makes me ! comfort.
1	! Its clean sheet makes ? me comfort.
1	Its ; clean sheet makes ? me comfort.
1	Its clean . sheet makes ; me comfort.
1	Its clean sheet makes me comfort.
1	: It was the best car!
1	! It was the , best car!
1	; It , was the best car!
1	It ; was the best car!
1	! It was the best ; car!
1	; It was ! the best car!
1	It was the best ! car!
1	It was the best ; car!
1	; It was the ? best car!
1	It was the best car!
0	Braking is not , that , good compared to other models.
0	; Braking is not that good compared to other models.
0	. Braking is not that : good compared to , other models.
0	. Braking is not that good compared to other models.
0	Braking is not that good compared : to ; other ! models.
0	Braking is not ; that good ? compared to other ; models.
0	Braking is not that good compared to : other : models.
0	Braking ; is . not that good ? compared to other models.
0	Braking . is not : that good ! compared to other models.
0	Braking is not that good compared to other models.
```