# Demostrate Sentence-level Genetic Attack with Sentence Saliency Analysis

In [14]:
from sentence_level_genetic_attack import *
import tensorflow as tf
import stanfordnlp
# uncomment if needed
# stanfordnlp.download('en')
nlp = stanfordnlp.Pipeline()

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # disable tensorflow loggings.

## load model and dataset to attack

In [13]:
dataset = "imdb"
model_name = "pretrained_word_cnn"

model = word_cnn(dataset)
model_path = r'./runs/{}/{}.dat'.format(dataset, model_name)
model.load_weights(model_path)
print("successfully load model")

# Data label for imdb dataset:
# [1 0] is negative review
# [0 1] is positive review

train_texts, train_labels, test_texts, test_labels = split_imdb_files()
x_train, y_train, x_test, y_test = word_process(train_texts, train_labels, test_texts, test_labels, dataset)
print('successfully load data')

Build word_cnn model...
successfully load model
Processing IMDB dataset
successfully load data


## demostration

Used test sample 12654 for demostration purposes.

In general, test samples which model does not produce very certain predictions are more likely to have successful attack result. In this case, model predict sample 12654 to be negative with probability = 0.69. Our attack is harder to success if this probability is very high (i.e. >= 0.95).

Other samples that are easy to success: 12654, 902

In [43]:
test_idx = 12654
xi_text = test_texts[test_idx]
yi = test_labels[test_idx]
print("Clean sample: {}".format(xi_text))
print("Actual label: {}".format(yi))
print("Model Prediction: {}".format(predict_str(model, xi_text)))

Clean sample: This film probably would have been good,if they didn't use CGI (computer generated imagery)for the werewolf scenes.It made the creatures look fake and the werewolves looked cartoonish.CGI is great for certain effects like the dinasours in Jurassic Park or Twister.But when we see a film where the creature must look completely real,CGI is not the way to go.Look at An American Werewolf in London.No CGI.Just makeup and a mechanical creature and what you come up with was more realistic than what was shown in the sequel.This film did offer a few gags that was fun to watch and the humor in this movie seemed to have drawn me in but it's nothing more than a film that I thought was O.K.And that's not good enough.In my opinion,An American Werewolf in Paris doesn't hold up to the original.
Actual label: [1, 0]
Model Prediction: [0.68962145 0.30292207]


### demo computing sentence salience

Let $x = s_1s_2\dots s_n$ be a input consists of $n$ sentences. Let $y$ be $x$'s true label. The sentence saliency for sentence $s_k$ is:

$$S(y|s_k) = P(y|x) - P(x|s_1s_2\dots s_{k-1}s_{k+1}\dots s_n)$$

In [40]:
# break input string to list of sentences
doc = nlp(xi_text)
sentences = sentence_list(doc)

# Compute saliency scores
raw_saliency = sentence_saliency(model, sentences, yi)

# Compute normalized saliency scores with softmax
saliency_scores = softmax(raw_saliency, 10)

print("Raw sentence saliency: {}".format(raw_saliency))
print("Softmax sentence saliency: {}".format(saliency_scores))



Raw sentence saliency: [ 0.09119469 -0.0176931  -0.2083224  -0.03500342  0.00458783  0.02031082
 -0.01235932  0.05155784  0.00589448  0.00220156]
Softmax sentence saliency: [0.22486347 0.07568769 0.01124949 0.06365719 0.094578   0.11068129
 0.0798343  0.15127885 0.0958219  0.09234782]


### demo back-translation for sentence rephrasing.

In [41]:
clean_sentence = "I love to play with little furry cats."
pivot, final = back_translation(clean_sentence, language = 'zh', require_mid = True)
print("Clean sentence: {}".format(clean_sentence))
print("Pivot translation to Chinese: {}".format(pivot))
print("Final back translation result: {}".format(final))

Clean sentence: I love to play with little furry cats.
Pivot translation to Chinese: 我喜欢和毛茸茸的小猫一起玩。
Final back translation result: I like to play with furry kittens.


### demo genetic attack with verbose output

In [42]:
adv_example = genetic(xi_text, yi, model, 10, 5, load_cache(), verbose = True)
print("adv example: {}".format(adv_example))
print("clean sample: {}".format(predict_str(model, xi_text)))
print("adv example pred: {}".format(predict_str(model, adv_example)))



clean sample's prediction: [0.68962145 0.30292207]
target is to make index 1 > 0.5
generation 1
population 0 pred: [0.6338001  0.36262536]
population 1 pred: [0.7572598  0.24051255]
population 2 pred: [0.67821705 0.31816134]
population 3 pred: [0.6730638  0.32398772]
population 4 pred: [0.65763974 0.33833525]
population 5 pred: [0.70253444 0.29128775]
population 6 pred: [0.7130803  0.28080446]
population 7 pred: [0.46511325 0.5361443 ]
successful adv. example found!
adv example: This film probably would have been good, if they did n't use CGI( computer generated imagery) for the werewolf scenes. It made the creatures look fake and the werewolves looked cartoonish. CGI is great for certain effects like the dinasours in Jurassic Park or Twister. But when we see a film where the creature must look completely real, CGI is not the way to go. Look at An American Werewolf in London. No CGI. Just makeup and a mechanical creature and what you come up with was more realistic than what was shown 

---