Skip to content
A list of data driven lexica developed by the World Well-Being Project.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
affect_intensity
age_gender initial commit Oct 18, 2018
perma initial commit Oct 18, 2018
spanish_perma initial commit Oct 18, 2018
temporal_orientation initial commit Oct 18, 2018
README.md

README.md

Lexica

A list of data driven lexica developed by the World Well-Being Project.

Age and Gender Lexica

Read the full publication here.

Citation

@inproceedings{sap2014developing,
author={Sap, Maarten and Park, Greg and Eichstaedt, Johannes C and Kern, Margaret L and Stillwell, David J and Kosinski, Michal and Ungar, Lyle H and Schwartz, H Andrew},
title={Developing age and gender predictive lexica over social media},
booktitle={Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2014},
}

PERMA Lexicon

Read the full publication here.

Citation

@{h. andrew schwartz2016predicting,
author={H. Andrew Schwartz, Maarten Sap, Margaret L. Kern, Johannes C. Eichstaedt, Adam Kapelner, Megha Agrawal, Eduardo Blanco, Lukasz Dziurzynski, Gregory Park, David Stillwell, Michal Kosinski, Martin E.P. Seligman, Lyle H. Ungar.},
title={Predicting Individual Well-Being Through the Language of Social Media},
year={2016},
pages={516-527}
}

Spanish PERMA Lexicon

Read the full publication here.

Citation

@inproceedings{smith2016does,
  title={Does ‘well-being’translate on Twitter?},
  author={Smith, Laura and Giorgi, Salvatore and Solanki, Rishi and Eichstaedt, Johannes and Schwartz, H Andrew and Abdul-Mageed, Muhammad and Buffone, Anneke and Ungar, Lyle},
  booktitle={Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
  pages={2042--2047},
  year={2016}
}

Affect and Intensity Lexicon

Read the full publication here.

Citation

@inproceedings{preoctiuc2016modelling,
  title={Modelling valence and arousal in facebook posts},
  author={Preo{\c{t}}iuc-Pietro, Daniel and Schwartz, H Andrew and Park, Gregory and Eichstaedt, Johannes and Kern, Margaret and Ungar, Lyle and Shulman, Elisabeth},
  booktitle={Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis},
  pages={9--15},
  year={2016}
}

Prospection Lexicon: Temporal Orientation

Read the full publication here.

Citation

@inproceedings{schwartz2015extracting,
  title={Extracting human temporal orientation from Facebook language},
  author={Schwartz, H Andrew and Park, Gregory and Sap, Maarten and Weingarten, Evan and Eichstaedt, Johannes and Kern, Margaret and Stillwell, David and Kosinski, Michal and Berger, Jonah and Seligman, Martin and others},
  booktitle={Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  pages={409--419},
  year={2015}
}

Example

A weighted lexicon is often applied as the sum of all weighted word relative frequencies over a document:

image

where image is the lexicon image weight for the word, image is frequency of the word in the document (or for a given user), and image is the total word count for that document (or user).

For example, let's say a lexicon has the following weights for words a, b, and c:

image

and two documents with the following frequencies of words:

image

image

therefore the total word uses in the documents are:

image

image

The documents' lexicon usage are given by summing the weighted relative frequencies:

image

image

Once the usages have been computed, the intercept of the lexicon needs to be added to the usages:

image

image

image

If the lexicon used represents age, image and image are the predicted ages for both documents. If it represents gender, simply take the sign of the result and if it's positive, the document is female, else it's male.

License

Unless specified in the lexica's subdirectory all lexica are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Background

Developed by the World Well-Being Project based out of the University of Pennsylvania and Stony Brook University.

You can’t perform that action at this time.