GitHub - williamleif/social-lstm: Socially-primed LSTM model to predict intercommunity conflict on Reddit.

Social LSTMs to predict intercommunity-conflict

Authors: William L. Hamilton (wleif@stanford.edu), Srijan Kumar (srijan@cs.stanford.edu)

Project Website

Data (including user/community embeddings)

Overview

This package contains code to replicate the prediction results in the paper Community Interaction and Conflict on the Web published in the The Web Conference (i.e., WWW) 2018. The task is trying to predict intercommunity mobilization and conflict on Reddit.com. In particular, we examine cases where one community (the "source") makes a post that hyperlinks to another community (the "target"), and the goal is predict whether or not this "cross-linking" post will lead to a significant number of source community members "mobilizing" to participate in the target community.

The primary model is a "socially-primed" LSTM that uses vector embeddings of users and communities to help make this prediction. In particular, embeddings of users and communities are learned using a "node2vec"-style approach, and we use these embeddings (along with text information from the cross-linking post) to predict whether or not the post will lead to a mobilization. See the paper and project website for more details.

If you make use of the code or data associated with this project, please cite the following paper:

@inproceedings{kumar2018conflict,
       author = {Kumar, Srijan and Hamilton, William L. and Jurafsky, Dan and Leskovec, Jure},
       title = {Community Interaction and Conflict on the Web},
       booktitle = {The Web Conference (WWW)},
       year = {2018}
     }

Requirements

The code requires reasonably up-to-date pytorch and sklearn libraries. See requirements.txt for details (or just pip install requirements.txt). Note that you might want to install pytorch from their website to get the latest and greatest version.

Using the code

The social_lstm_model.py file contains the main model code. However, before using this code you will need to

Download the necessary data here
Update the "DATA_DIR" value in constants.py to point to this unzipped data file.

The command-line arguments for the social LSTM model can be explored with the help option. The default is the "socially-primed" LSTM model described in the paper, with the best-performing hyperparameters as default. However, you could also set the "lstm_no_social" command-line flag to use a vanilla LSTM etc.

The code will train and compute validation statistics periodically. By default it runs for 10 epochs and records the best validation accuracy achieved (i.e., there is no explicit early-stopping but instead checkpoints are used).

Notes on replication

The nonneural_baselines.ipynb notebook can be used to replicate the exact baseline and ensemble results from the paper. Note that in the reported results for the LSTM models in the paper, we ran a hyperparameter sweep over learning rates in [0.001, 0.01, 0.1, 0.5], model dimensions [64, 128, 256], dropout parameters [0, 0.2, 0.4], and we considered 2-layer and single-layer LSTMS. All LSTM models used a batch size of 512. The results reported in the paper are the test set scores of the best models on the validation sets from this sweep. In general we found the LSTM results to be reasonably stable across random restarts with a standard deviation of the AUC to around 0.5 (assuming a range of [0,100], as in the paper). Finally note that some model variants (e.g., appending the social embeddings) are in the code but not discussed in the paper, as we found these to underperform the presented variant.

For the Random Forest models we found that increasing the number of trees led to stronger performance and so increased the number of trees to 500 (from the default value of 10). We found the other hyperparameters to have minimal impact.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
constants.py		constants.py
embeddings.py		embeddings.py
nonneural_models.ipynb		nonneural_models.ipynb
requirements.txt		requirements.txt
social_lstm_model.py		social_lstm_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

constants.py

constants.py

embeddings.py

embeddings.py

nonneural_models.ipynb

nonneural_models.ipynb

requirements.txt

requirements.txt

social_lstm_model.py

social_lstm_model.py

Repository files navigation

Social LSTMs to predict intercommunity-conflict

Authors: William L. Hamilton (wleif@stanford.edu), Srijan Kumar (srijan@cs.stanford.edu)

Project Website

Data (including user/community embeddings)

Overview

Requirements

Using the code

Notes on replication

About

Releases

Packages

Languages

License

williamleif/social-lstm

Folders and files

Latest commit

History

Repository files navigation

Social LSTMs to predict intercommunity-conflict

Authors: William L. Hamilton (wleif@stanford.edu), Srijan Kumar (srijan@cs.stanford.edu)

Overview

Requirements

Using the code

Notes on replication

About

Resources

License

Stars

Watchers

Forks

Languages