# Attack IMDB model

The IMDB dataset contains movie reviews that are labeled either positive or negative. Each review is a paragraph consists of multiple sentences.

Here, we attempt to attack a **wordCNN** model for IMDB dataset.

In [30]:
from neural_networks import word_cnn, char_cnn, bd_lstm, lstm
import os
from read_files import split_imdb_files, split_yahoo_files, split_agnews_files
from word_level_process import word_process, get_tokenizer, text_to_vector_for_all
from config import config
from keras.preprocessing import sequence
import numpy as np
import random
import pickle

dataset = "imdb"
model_name = "pretrained_word_cnn"

In [2]:
import stanfordnlp

# stanfordnlp.download('en')
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English

Use device: cpu
---
Loading: tokenize
With settings: 
{'model_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings: 
{'model_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings: 
{'model_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
---
Loading: depparse
With settings: 
{'model_path': '/Users/weifanjiang/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/Users/weifanjiang/sta

In [3]:
model = word_cnn(dataset)
model_path = r'./runs/{}/{}.dat'.format(dataset, model_name)
model.load_weights(model_path)
print("successfully load model")

Build word_cnn model...




Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where






successfully load model


In [4]:
# Data label:
# [1 0] is negative review
# [0 1] is positive review

train_texts, train_labels, test_texts, test_labels = split_imdb_files()
x_train, y_train, x_test, y_test = word_process(train_texts, train_labels, test_texts, test_labels, dataset)
print('successfully load data')

Processing IMDB dataset
successfully load data


`predict_str` allows a string representation of movie review be predicted without manually converting to sequence first.

In [48]:
# Predict a string input with a model directly
# so skips the need of converting to sequence first...
def predict_str(model, s):
    maxlen = config.word_max_len[dataset]
    tokenizer = get_tokenizer(dataset)
    s_seq = tokenizer.texts_to_sequences([s])
    s_seq = sequence.pad_sequences(s_seq, maxlen=maxlen, padding='post', truncating='post')
    s_sep = s_seq[0]
    return model.predict(s_seq)[0]

def predict_sentences(model, sentences):
    return predict_str(model, ' '.join(sentences))

Here we sample one sentence from entire testing corpus

In [6]:
import random
# idx = random.randint(0, x_test.shape[0] - 1)
idx = 15557

xi = x_test[idx:idx+1]
yi = y_test[idx:idx+1][0]
xi_text = test_texts[idx]

print(xi_text)
print()
print("model predict ", model.predict(xi)[0])
print("predict with predict_str ", predict_str(model, xi_text))
print("true label ", yi)

This is a stupid movie. When I saw it in a movie theater more than half the audience left before it was half over. I stayed to the bitter end. To show fortitude? I caught it again on television and it was much funnier. Still by no means a classic, or even consistently hilarious but the family kinda grew on me. I love Jessica Lundy anyway. If you've nothing better to do and it's free on t.v. you could do worse.

model predict  [0.92145205 0.07800014]
predict with predict_str  [0.92145205 0.07800014]
true label  [1 0]


Breaks a review to sentences based on `StanfordParser`'s result.

In [7]:
def sentence_list(doc):
    sentences = []
    for words in doc.sentences:
        sentence = words.words[0].text
        for word in words.words[1:]:
            if word.upos != 'PUNCT' and not word.text.startswith('\''):
                sentence += ' '
            sentence += word.text
        sentences.append(sentence)
    return sentences

In [8]:
"""
Paraneter of a StanfordNLP doc object；
'_text', '_conll_file', '_sentences'

Parameters of doc.conll_file:
'ignore_gapping', '_file', '_from_str', '_sents', '_num_words'

Parameters of doc.sentence object:
'_tokens', '_words', '_dependencies'
"""

doc = nlp(xi_text)
sentences = sentence_list(doc)

Compute **Sentence Saliency**.

Let $x = s_1s_2\dots s_n$ be a input consists of $n$ sentences. Let $y$ be $x$'s true label. The sentence saliency for sentence $s_k$ is:

$$S(y|s_k) = P(y|x) - P(x|s_1s_2\dots s_{k-1}s_{k+1}\dots s_n)$$

In [93]:
def sentence_saliency(model, sentences, label):
    true_pred = predict_str(model, ' '.join(sentences))
    if label[0] == 1:
        idx = 0
    else:
        idx = 1
    scores = []
    for i in range(len(sentences)):
        x_hat = ' '.join(sentences[0:i] + sentences[i+1:])
        scores.append(true_pred[idx] - predict_str(model, x_hat)[idx])
    
    return np.array(scores)

def softmax(x, determinism = 21):
    softmax = np.exp(np.multiply(determinism, x))
    softmax /= np.sum(softmax)
    return softmax

In [94]:
saliency_scores = softmax(sentence_saliency(model, sentences, yi))
print(saliency_scores)

[0.09044634 0.03063413 0.01507848 0.02603392 0.0531944  0.01724911
 0.01462329 0.7527404 ]


In [96]:
for i in range(5):
    print(np.random.choice(sentences, p=saliency_scores))

This is a stupid movie.
If you've nothing better to do and it's free on t.v. you could do worse.
If you've nothing better to do and it's free on t.v. you could do worse.
I caught it again on television and it was much funnier.
If you've nothing better to do and it's free on t.v. you could do worse.


## Sentence paraphrasing

**back translation**: translate input sentence to another language, then translate back to the original language. This is a technique commonly used for evaluation of language translations. Here, we use this technique to quickly generate a rephrase of original sentence.

In [12]:
print(xi_text)

This is a stupid movie. When I saw it in a movie theater more than half the audience left before it was half over. I stayed to the bitter end. To show fortitude? I caught it again on television and it was much funnier. Still by no means a classic, or even consistently hilarious but the family kinda grew on me. I love Jessica Lundy anyway. If you've nothing better to do and it's free on t.v. you could do worse.


Test google cloud authentication

In [13]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Users/weifanjiang/Documents/Personal/My Project-1e7426894fe6.json"

def implicit():
    from google.cloud import storage

    # If you don't specify credentials when constructing the client, the
    # client library will look for credentials in the environment.
    storage_client = storage.Client()

    # Make an authenticated API request
    buckets = list(storage_client.list_buckets())
    print(buckets)
implicit()

[]


In [14]:
from google.cloud import translate_v2 as translate
translate_client = translate.Client()

results = translate_client.get_languages()

for language in results:
    print(u'{name} ({language})'.format(**language))

Afrikaans (af)
Albanian (sq)
Amharic (am)
Arabic (ar)
Armenian (hy)
Azerbaijani (az)
Basque (eu)
Belarusian (be)
Bengali (bn)
Bosnian (bs)
Bulgarian (bg)
Catalan (ca)
Cebuano (ceb)
Chichewa (ny)
Chinese (Simplified) (zh)
Chinese (Traditional) (zh-TW)
Corsican (co)
Croatian (hr)
Czech (cs)
Danish (da)
Dutch (nl)
English (en)
Esperanto (eo)
Estonian (et)
Filipino (tl)
Finnish (fi)
French (fr)
Frisian (fy)
Galician (gl)
Georgian (ka)
German (de)
Greek (el)
Gujarati (gu)
Haitian Creole (ht)
Hausa (ha)
Hawaiian (haw)
Hebrew (iw)
Hindi (hi)
Hmong (hmn)
Hungarian (hu)
Icelandic (is)
Igbo (ig)
Indonesian (id)
Irish (ga)
Italian (it)
Japanese (ja)
Javanese (jw)
Kannada (kn)
Kazakh (kk)
Khmer (km)
Kinyarwanda (rw)
Korean (ko)
Kurdish (Kurmanji) (ku)
Kyrgyz (ky)
Lao (lo)
Latin (la)
Latvian (lv)
Lithuanian (lt)
Luxembourgish (lb)
Macedonian (mk)
Malagasy (mg)
Malay (ms)
Malayalam (ml)
Maltese (mt)
Maori (mi)
Marathi (mr)
Mongolian (mn)
Myanmar (Burmese) (my)
Nepali (ne)
Norwegian (no)
Odia (Oriya)

In [35]:
from google.cloud import translate_v2 as translate

import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Users/weifanjiang/Documents/Personal/My Project-1e7426894fe6.json"

def back_translation(s_in, language='ko', show_mid=False):
    translate_client = translate.Client()
    mid_result = translate_client.translate(s_in, target_language=language)['translatedText']
    if show_mid:
        print(mid_result.replace("&#39;", "\'"))
    en_result = translate_client.translate(mid_result, target_language="en")['translatedText']
    return en_result.replace("&#39;", "\'")

In [37]:
print(xi_text)
print(back_translation(xi_text, language='zh', show_mid = True))

This is a stupid movie. When I saw it in a movie theater more than half the audience left before it was half over. I stayed to the bitter end. To show fortitude? I caught it again on television and it was much funnier. Still by no means a classic, or even consistently hilarious but the family kinda grew on me. I love Jessica Lundy anyway. If you've nothing better to do and it's free on t.v. you could do worse.
这是一部愚蠢的电影。当我在电影院放映时，一半以上的观众离开了一半。我一直坚持到最后。为了显示毅力？我又在电视上看到了，这很有趣。仍然绝不算是经典，甚至始终不是一成不变的搞笑，但是我的家庭有点长。我还是爱杰西卡·伦迪。如果您无事可做，而且电视上是免费的，那您可能会做得更糟。
This is a stupid movie. When I screened in the cinema, more than half of the audience left. I stayed till the end. To show perseverance? I saw it on TV again and it was fun. It's still not a classic, it's not always funny, but my family is a bit long. I still love Jessica Lundy. If you have nothing to do and it's free on TV, you could do worse.


## Implementing sentence-level genetic algorithm

`perturb`: randomly select a sentence from the input paragraph with probability propotional to the saliency score of each sentence. Then apply rephrasing to the selected sentence. Return the paragraph with one modified sentence.

In [100]:
def load_cache():
    cache_file = "google_translate_cache.pickle"
    if os.path.isfile(cache_file):
        with open(cache_file, "rb") as f_in:
            cache = pickle.load(f_in)
    else:
        cache = dict()
    return cache

def save_cache(cache):
    cache_file = "google_translate_cache.pickle"
    with open(cache_file, "wb") as f_out:
        pickle.dump(cache, f_out)

In [105]:
def perturb(sentences, saliencies, cache, not_choose=set()):
    
    choices = list()
    weights = list()
    for i in range(len(sentences)):
        if not i in not_choose:
            choices.append(sentences[i])
            weights.append(saliencies[i])
    weights = softmax(weights)
    
    choice = np.random.choice(choices, p=weights)
    
    chosen_language = set()
    all_languages = ['es', 'ko', 'zh', 'th']
    rephrase = choice
    while (len(chosen_language) < len(all_languages)) and (rephrase == choice):
        language = np.random.choice(all_languages)
        while language in chosen_language:
            language = np.random.choice(all_languages)
        chosen_language.add(language)
        rephrase = cache.get((choice, language), None)
        if rephrase is None:
            rephrase = back_translation(choice, language=language)
            cache[(choice, language)] = rephrase
    new_paragraph = []
    for j, sen in enumerate(sentences):
        if sen == choice:
            new_paragraph.append(rephrase)
            changed_idx = j
        else:
            new_paragraph.append(sen)
    return new_paragraph, changed_idx

`crossover`: randomly produce a new paragraph by combining two input paragraphs.

In [106]:
def crossover(sentences1, sentences2, p1_changed, p2_changed):
    child = list()
    child_idx = set()
    for i in range(len(sentences1)):
        if random.randint(0,1) == 0:
            if i in p1_changed:
                child_idx.add(i)
            child.append(sentences1[i])
        else:
            if i in p2_changed:
                child_idx.add(i)
            child.append(sentences2[i])
    return child, child_idx

`genetic`: main function to perform genetic attack

In [111]:
def genetic(x0, y0, model, population, generation):
    
    if y0[0] == 0:
        y_adv = [1, 0]
    else:
        y_adv = [0, 1]
    
    cache = load_cache()
    
    doc = nlp(xi_text)
    sentences = sentence_list(doc)
    saliency_scores = sentence_saliency(model, sentences, yi)
    
    prob_0 = predict_str(model, x0)
    
    print("clean sample's prediction: {}".format(prob_0))
    
    # We want target_idx of adv.example's prediction to be larger
    # than 0.5
    target_idx = np.argmin(prob_0)
    print('target is to make index {} > 0.5'.format(target_idx))
    
    gen0 = list()
    chosen_idx = dict()
    for i in range(population):
        chosen = set()
        sample, idx = perturb(sentences, saliency_scores, cache)
        gen0.append(sample)
        chosen.add(idx)
        chosen_idx[i] = chosen
    
    curr_gen = gen0
    for i in range(generation):
        
        print('generation {}'.format(i + 1))
        
        sample_weight = list()
        for j, sample in enumerate(curr_gen):
            sample_pred = predict_sentences(model, sample)
            print("population {} pred: {}".format(j, sample_pred))
            if sample_pred[target_idx] > 0.5:
                print('successful adv. example found!')
                save_cache(cache)
                return ' '.join(sample)
            else:
                sample_weight.append(sample_pred[target_idx])
        sample_weight = softmax(np.array(sample_weight))
        print('population with fitness scores: {}'.format(sample_weight))
        
        next_gen = list()
        next_chosen = dict()
        for j in range(population):
            idx_list = list(range(population))
            p1 = np.random.choice(idx_list, p=sample_weight)
            p2 = np.random.choice(idx_list, p=sample_weight)
            print("child {} generated with parents {} and {}".format(j, p1, p2))
            child, child_change = crossover(curr_gen[p1], curr_gen[p2], chosen_idx[p1], chosen_idx[p2])
            saliency_scores = sentence_saliency(model, child, y0)
            child_mutate, change_idx = perturb(sentences, saliency_scores, cache)
            next_gen.append(child_mutate)
            child_change.add(change_idx)
            next_chosen[j] = child_change
        curr_gen = next_gen
        chosen_idx = next_chosen

    save_cache(cache)
    return None

In [113]:
genetic(xi_text, yi, model, 3, 8)

clean sample's prediction: [0.92145205 0.07800014]
target is to make index 1 > 0.5
generation 1
population 0 pred: [0.8864471  0.11223546]
population 1 pred: [0.82256144 0.1749145 ]
population 2 pred: [0.8231986  0.17350492]
population with fitness scores: [0.11975881 0.44663414 0.43360704]
child 0 generated with parents 2 and 1
child 1 generated with parents 1 and 1
child 2 generated with parents 2 and 1
generation 2
population 0 pred: [0.7827212  0.21101502]
population 1 pred: [0.80019045 0.19863959]
population 2 pred: [0.80019045 0.19863959]
population with fitness scores: [0.39334738 0.30332634 0.30332634]
child 0 generated with parents 0 and 0
child 1 generated with parents 0 and 1
child 2 generated with parents 2 and 0
generation 3
population 0 pred: [0.82256144 0.1749145 ]
population 1 pred: [0.82256144 0.1749145 ]
population 2 pred: [0.8231986  0.17350492]
population with fitness scores: [0.33660594 0.33660594 0.32678807]
child 0 generated with parents 0 and 0
child 1 generated

In [99]:
cache = load_cache()
print(cache)

{('Still by no means a classic, or even consistently hilarious but the family kinda grew on me.', 'zh'): "It's still not a classic, it's not always funny, but my family is a bit long.", ('When I saw it in a movie theater more than half the audience left before it was half over.', 'zh'): 'When I screened in the cinema, more than half of the audience left.', ('I caught it again on television and it was much funnier.', 'th'): 'I caught it again on television and it was much more fun.', ('I stayed to the bitter end.', 'zh'): 'I stayed till the end.', ('To show fortitude?', 'ko'): 'To show courage?', ("If you've nothing better to do and it's free on t.v. you could do worse.", 'th'): 'If you have nothing to do better, and free on TV', ("If you've nothing better to do and it's free on t.v. you could do worse.", 'zh'): "If you have nothing to do and it's free on TV, you could do worse.", ('This is a stupid movie.', 'th'): 'This is a stupid movie', ('I stayed to the bitter end.', 'th'): "I'm at

---