# Attack IMDB model

The IMDB dataset contains movie reviews that are labeled either positive or negative. Each review is a paragraph consists of multiple sentences.

Here, we attempt to attack a **wordCNN** model for IMDB dataset.

In [1]:
from neural_networks import word_cnn, char_cnn, bd_lstm, lstm
import os
from read_files import split_imdb_files, split_yahoo_files, split_agnews_files
from word_level_process import word_process, get_tokenizer, text_to_vector_for_all
from config import config
from keras.preprocessing import sequence

dataset = "imdb"
model_name = "pretrained_word_cnn"

Using TensorFlow backend.


In [70]:
import stanfordnlp

# stanfordnlp.download('en')
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English

Use device: cpu
---
Loading: tokenize
With settings: 
{'model_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings: 
{'model_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings: 
{'model_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
---
Loading: depparse
With settings: 
{'model_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/Users/weifanjiang/sta

In [87]:
model = word_cnn(dataset)
model_path = r'./runs/{}/{}.dat'.format(dataset, model_name)
model.load_weights(model_path)
print("successfully load model")

Build word_cnn model...
successfully load model


In [74]:
# Data label:
# [1 0] is negative review
# [0 1] is positive review

train_texts, train_labels, test_texts, test_labels = split_imdb_files()
x_train, y_train, x_test, y_test = word_process(train_texts, train_labels, test_texts, test_labels, dataset)
print('successfully load data')

Processing IMDB dataset
successfully load data


`predict_str` allows a string representation of movie review be predicted without manually converting to sequence first.

In [69]:
# Predict a string input with a model directly
# so skips the need of converting to sequence first...
def predict_str(model, s):
    maxlen = config.word_max_len[dataset]
    tokenizer = get_tokenizer(dataset)
    s_seq = tokenizer.texts_to_sequences([s])
    s_seq = sequence.pad_sequences(s_seq, maxlen=maxlen, padding='post', truncating='post')
    s_sep = s_seq[0]
    return model.predict(s_seq)[0]

Here we sample one sentence from entire testing corpus

In [101]:
import random
# idx = random.randint(0, x_test.shape[0] - 1)
idx = 15557

xi = x_test[idx:idx+1]
yi = y_test[idx:idx+1][0]
xi_text = test_texts[idx]

print(xi_text)
print()
print("model predict ", model.predict(xi)[0])
print("predict with predict_str ", predict_str(model, xi_text))
print("true label ", yi)

This is a stupid movie. When I saw it in a movie theater more than half the audience left before it was half over. I stayed to the bitter end. To show fortitude? I caught it again on television and it was much funnier. Still by no means a classic, or even consistently hilarious but the family kinda grew on me. I love Jessica Lundy anyway. If you've nothing better to do and it's free on t.v. you could do worse.

model predict  [0.92145205 0.07800014]
predict with predict_str  [0.92145205 0.07800014]
true label  [1 0]


Breaks a review to sentences based on `StanfordParser`'s result.

In [96]:
def sentence_list(doc):
    sentences = []
    for words in doc.sentences:
        sentence = words.words[0].text
        for word in words.words[1:]:
            if word.upos != 'PUNCT' and not word.text.startswith('\''):
                sentence += ' '
            sentence += word.text
        sentences.append(sentence)
    return sentences

In [99]:
"""
Paraneter of a StanfordNLP doc object；
'_text', '_conll_file', '_sentences'

Parameters of doc.conll_file:
'ignore_gapping', '_file', '_from_str', '_sents', '_num_words'

Parameters of doc.sentence object:
'_tokens', '_words', '_dependencies'
"""

doc = nlp(xi_text)
sentences = sentence_list(doc)

Compute **Sentence Saliency**.

Let $x = s_1s_2\dots s_n$ be a input consists of $n$ sentences. Let $y$ be $x$'s true label. The sentence saliency for sentence $s_k$ is:

$$S(y|s_k) = P(y|x) - P(x|s_1s_2\dots s_{k-1}s_{k+1}\dots s_n)$$

In [106]:
def sentence_saliency(model, sentences, label):
    true_pred = predict_str(model, ' '.join(sentences))
    if label[0] == 1:
        idx = 0
    else:
        idx = 1
    scores = []
    for i in range(len(sentences)):
        x_hat = ' '.join(sentences[0:i] + sentences[i+1:])
        scores.append(true_pred[idx] - predict_str(model, x_hat)[idx])
    return scores

In [107]:
saliency_scores = sentence_saliency(model, sentences, yi)

In [108]:
print(saliency_scores)

[0.05681753, 0.0052631497, -0.028491437, -0.002485156, 0.03154117, -0.022087038, -0.029951096, 0.15772057]


---