# Custom evaluation of baseline models
* __Objective__: Evaluating the models used in the paper _Before Name-calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation_ with custom splitting (unlike the 10-fold cross-validation as described in the paper)
* __Functionalities__: Allows cutom testing of _CNN_ and _Stacked bi-LSTM_ model used in the paper so as to have a one-on-one comparison with Bert
* __File Management__: Google Drive
* __Runtime Type__: GPU
* __Notes__: Before running this notebook, make sure that `line:268` appears as comment and `line:269` is not commented in [this](https://github.com/utkarsh512/Ad-hominem-fallacies/blob/master/experiments/classification_experiments.py) script


## Mounting Google Drive and setting environment for training

In [None]:
!nvidia-smi

Tue Mar 30 18:24:57 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
%%shell
cd /content/gdrive/'My Drive'/
rm -rf Ad-hominem-fallacies
git clone https://github.com/utkarsh512/Ad-hominem-fallacies.git

Cloning into 'Ad-hominem-fallacies'...
remote: Enumerating objects: 124, done.[K
remote: Counting objects: 100% (124/124), done.[K
remote: Compressing objects: 100% (124/124), done.[K
remote: Total 230 (delta 79), reused 0 (delta 0), pack-reused 106[K
Receiving objects: 100% (230/230), 37.51 MiB | 10.58 MiB/s, done.
Resolving deltas: 100% (104/104), done.
Checking out files: 100% (80/80), done.




In [None]:
%%shell
cd /content/gdrive/'My Drive'/Ad-hominem-fallacies/experiments
pip install virtualenv
virtualenv env --python=python3
source env/bin/activate
pip install lda scipy==1.1.0 nltk==3.2.5

Collecting virtualenv
[?25l  Downloading https://files.pythonhosted.org/packages/91/fb/ca6c071f4231e06a9f0c3bd81c15c233bbacd4a7d9dbb7438d95fece8a1e/virtualenv-20.4.3-py2.py3-none-any.whl (7.2MB)
[K     |████████████████████████████████| 7.2MB 6.0MB/s 
Collecting distlib<1,>=0.3.1
[?25l  Downloading https://files.pythonhosted.org/packages/f5/0a/490fa011d699bb5a5f3a0cf57de82237f52a6db9d40f33c53b2736c9a1f9/distlib-0.3.1-py2.py3-none-any.whl (335kB)
[K     |████████████████████████████████| 337kB 39.5MB/s 
Installing collected packages: distlib, virtualenv
Successfully installed distlib-0.3.1 virtualenv-20.4.3
created virtual environment CPython3.7.10.final.0-64 in 12325ms
  creator CPython3Posix(dest=/content/gdrive/My Drive/Ad-hominem-fallacies/experiments/env, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==21.0.1, setu



In [None]:
!pip install autocorrect

Collecting autocorrect
[?25l  Downloading https://files.pythonhosted.org/packages/16/a8/1fc332535fc26db807fa48bdb54070355b83a36c797451c3d563bc190fa8/autocorrect-2.3.0.tar.gz (621kB)
[K     |▌                               | 10kB 19.1MB/s eta 0:00:01[K     |█                               | 20kB 8.9MB/s eta 0:00:01[K     |█▋                              | 30kB 7.8MB/s eta 0:00:01[K     |██                              | 40kB 7.1MB/s eta 0:00:01[K     |██▋                             | 51kB 4.5MB/s eta 0:00:01[K     |███▏                            | 61kB 5.0MB/s eta 0:00:01[K     |███▊                            | 71kB 5.4MB/s eta 0:00:01[K     |████▏                           | 81kB 5.5MB/s eta 0:00:01[K     |████▊                           | 92kB 5.4MB/s eta 0:00:01[K     |█████▎                          | 102kB 5.7MB/s eta 0:00:01[K     |█████▉                          | 112kB 5.7MB/s eta 0:00:01[K     |██████▎                         | 122kB 5.7MB/s eta 0:00

## Preparing dataset

In [None]:
%%shell
cd /content/gdrive/'My Drive'/Ad-hominem-fallacies/experiments
wget https://public.ukp.informatik.tu-darmstadt.de/ih/RedditChangeMyView2017/en-top100k.embeddings.pkl.gz

--2021-03-30 18:24:47--  https://public.ukp.informatik.tu-darmstadt.de/ih/RedditChangeMyView2017/en-top100k.embeddings.pkl.gz
Resolving public.ukp.informatik.tu-darmstadt.de (public.ukp.informatik.tu-darmstadt.de)... 130.83.167.186
Connecting to public.ukp.informatik.tu-darmstadt.de (public.ukp.informatik.tu-darmstadt.de)|130.83.167.186|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 152970296 (146M) [application/octet-stream]
Saving to: ‘en-top100k.embeddings.pkl.gz’


2021-03-30 18:24:56 (16.9 MB/s) - ‘en-top100k.embeddings.pkl.gz’ saved [152970296/152970296]





In [None]:
train_dir = '/content/gdrive/MyDrive/DL/dataset/train.json'
indir = '/content/gdrive/MyDrive/DL/dataset/comments_1.log'
outdir = '/content/gdrive/MyDrive/DL/dataset/comments_2.log'

## Custom training of _CNN_ model

In [None]:
%%shell
cd /content/gdrive/'My Drive'/Ad-hominem-fallacies/experiments
pip install lda
python classification_experiments.py --model cnn --train_dir /content/gdrive/MyDrive/DL/dataset/train.json --indir /content/gdrive/MyDrive/DL/dataset/comments_1.log --outdir /content/gdrive/MyDrive/DL/dataset/comments_2.log

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
{'bert': 0.9915022850036621, 'cnn': 0.87900466}
id: 1286
comment: to me it looked like hillary is being shoved down my throat lol she wasnt she couldnt even fit into your mouth much less be shoved by a third person into your throat what you thought was happening was not happening and in your confusion you voted a tv game show host president how dumb lol much dumber than anything ive done for instance next time work on separating fantasy from reality adults are expected to do this
label: ah

{'bert': 'ah', 'cnn': 'ah'}
{'bert': 0.9403357394039631, 'cnn': 0.9383728}
id: 1287
comment: by this same reasoning someone shouldnt ever cry over a fictional story ever either i mean those people and situations arent even real at least with sports its something you put effort in only to lose you cant even have any impact at all on an already written story so it makes even less sense to cry over it you ever get emotional over a story c



## Custom training of _Stacked bi-LSTM_ model

In [None]:
%%shell
cd /content/gdrive/'My Drive'/Ad-hominem-fallacies/experiments
pip install lda
python classification_experiments.py --model bilstm --train_dir /content/gdrive/MyDrive/DL/dataset/train.json --indir /content/gdrive/MyDrive/DL/dataset/comments_2.log --outdir /content/gdrive/MyDrive/DL/dataset/comments_3.log

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
{'bert': 0.9915022850036621, 'cnn': 0.87900466, 'bilstm': 0.9771368}
id: 1286
comment: to me it looked like hillary is being shoved down my throat lol she wasnt she couldnt even fit into your mouth much less be shoved by a third person into your throat what you thought was happening was not happening and in your confusion you voted a tv game show host president how dumb lol much dumber than anything ive done for instance next time work on separating fantasy from reality adults are expected to do this
label: ah

{'bert': 'ah', 'cnn': 'ah', 'bilstm': 'ah'}
{'bert': 0.9403357394039631, 'cnn': 0.9383728, 'bilstm': 0.9988758}
id: 1287
comment: by this same reasoning someone shouldnt ever cry over a fictional story ever either i mean those people and situations arent even real at least with sports its something you put effort in only to lose you cant even have any impact at all on an already written story so it makes even less 

