- Download the Imdb dataset
- Download the glove vector embeddings (used by the model)
- Download the counter-fitted vectors (used by our attack)
- Build the vocabulary and embeddings matrix.
That will take like a minute, and it will tokenize the dataset and save it to a pickle file. It will also compute some auxiliary files like the matrix of the vector embeddings for words in our dictionary. All files will be saved under
aux_files directory created by this script.
- Train the sentiment analysis model.
6)Download the Google language model.
- Pre-compute the distances between embeddings of different words (required to do the attack) and save the distance matrix.
- Now, we are ready to try some attacks ! You can do so by running the
IMDB_AttackDemo.ipynbjupyter notebook !