A directory of publicly available data sets from psycholinguistic studies

A directory of publicly available data sets from psycholinguistic studies. At this time anything else than complete.

Who can contribute: Anyone can contribute their own data sets or add information about other people’s data sets if there is a link, the data was published with the author’s consent, and there is a publication that describes the data.

How to contribute: Click the button labeled “Edit” at the top right of this page. Editing requires that you have a Github account. Markdown is used for formatting. Before saving the document, add a one-sentence summary of what you changed in the field labeled “Edit Message.”

Structure of this document: We’ll start with a simple list of data sets and will evolve this into something more organized as the list grows. However, for each entry please provide at least the following information:

  1. Reference for the paper in which the data was first described ideally with link
  2. Type of data (e.g., self-paced reading, grammaticality judgments, language, …)
  3. URL of the data set

List of data types (feel free to extend if necessary):

  • eye-tracking
  • self-paced reading
  • grammaticality judgments
  • code (data analysis)
  • code (simulation)
  • speed-accuracy tradeoff
  • stimuli
  • working-memory capacity
  • rapid automatized naming
  • computational model
  • event-related potentials
  • language: Spanish, German, English, Cantonese, Mandarin, ...

Entries are separated by a horizontal rule (---).


Reference: Chen, I., Huang, C., & Politzer-Ahles, S. (2018). Determining the types of contrasts: the influences of prosody on pragmatic inferences. Frontiers in Psychology, 9, 2110.

Type of data: Offline binary judgments

Link: https://osf.io/nsgfv/

Reference: Politzer-Ahles, S., & Husband, E. (2018). Eye movement evidence for context-sensitive derivation of scalar inferences. Collabra: Psychology, 4, 3.

Type of data: Eye-tracking while reading, and accuracy.

Link: https://www.collabra.org/article/10.1525/collabra.100/

Reference: Haendler, Y., & Adani, F. (accepted). Testing the effect of an arbitrary subject pronoun on relative clause comprehension: A study with Hebrew-speaking children. Journal of Child Language.

Type of data: Response accuracy data

Link: https://osf.io/km8pe/

Reference: Haendler, Y., Kliegl, R. & Adani, F. (2015). Discourse accessibility constraints in children's processing of object relative clauses. Frontiers in Psychology 6:860.

Type of data: Looking-while-listening eye-tracking data

Link: https://github.com/yhaendler/Haendler-Kliegl-Adani-2015

Reference: Cheung, C., Politzer-Ahles, S., Hwang, H., Chui, R., Leung, M., & Tang, T. (2017). Comprehension of presuppositions in school-age Cantonese-speaking children with and without autism spectrum disorders. Clinical Linguistics and Phonetics, 31, 557-572.

Type of data: Offline judgments (children)

Link: https://osf.io/u2wsz/

Reference: Politzer-Ahles, S., Xiang, M., & Almeida, D. (2017). "Before" and "after": Investigating the relationship between temporal connectives and chronological ordering using event-related potentials. PLoS ONE, 12(4), e017519.

Type of data: ERP (averages and raw continuous data)

Link: https://osf.io/gevfz/

Reference: Nieuwland, M., Politzer-Ahles, S., Heyselaar, E., Segaert, K., Darley, E., Kazanina, K., Von Grebmer Zu Wolfsthurn, S., Bartolozzi, F., Kogan, V., Ito, A., Mézière, D., Barr, D., Rousselet, G., Ferguson, H., Busch-Moreno, S., Fu, X., Tuomainen, J., Kulakova, E., Husband, E., Donaldson, D., Kohút, Z., Rueschemeyer, S., Huettig, F. (ms.). Limits on prediction in language comprehension: A multi-lab failure to replicate evidence for probabilistic pre-activation of phonology. bioRxiv preprint.

Type of data: single-trial ERP amplitudes (from single time window and channel selection)

Link: https://osf.io/eyzaq/

Reference: Politzer-Ahles, S. (ms.). Self-paced reading experiment on SOME and MOST. unpublished manuscript.

Type of data: self-paced reading, Likert scale acceptability judgments

Link: https://osf.io/7swt2/

Reference: Gibson, E. & H. Wu (2013). Processing Chinese relative clauses in context. Language and Cognitive Processes, 28, 125-155.

Type of data: self-paced reading

Link: https://github.com/vasishth/NicenboimVasishthPart2/blob/master/gibsonwu2012data.txt

Reference: Politzer-Ahles, S. & R. Fiorentino (2013). The Realization of Scalar Inferences: Context Sensitivity without Processing Cost. PLoS ONE, 8, e63943.

Type of data: self-paced reading, comprehension question accuracy

Link: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0063943#s5 (File S2)

Reference: Politzer-Ahles, S., K. Schluter, K. Wu, & D. Almeida (2016). Asymmetries in the perception of Mandarin tones: evidence from mismatch negativity. Journal of Experimental Psychology: Human Perception and Performance, 42, 1547-1570.

Type of data: preprocessed ERP averages

Link: http://supp.apa.org/psycarticles/supplemental/xhp0000242/xhp0000242_supp.html ('Supp3.zip')

Reference: White, A. S., D. Reisinger, K. Sakaguchi, T. Vieira, S. Zhang, R. Rudinger, K. Rawlins, & B. Van Durme. (2016). Universal decompositional semantics on universal dependencies. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1713–1723, Association for Computational Linguistics.

Type of data: inference likelihood judgments, veridicality judgments, word sense judgments

Link: https://github.com/aaronstevenwhite/UniversalDecompositionalSemantics

Reference: von der Malsburg, T., & Angele, B. (2016). False positives and other statistical errors in standard analyses of eye movements in reading. Journal of Memory and Language, (in press).

Type of data: eye-tracking, reading, code (simulation, data analysis)

Link: https://github.com/tmalsburg/MalsburgAngele2016JML

Reference: Schotter, E. R., Tran, R., & Rayner, K. (2014). Don’t believe what you read (only once): Comprehension is supported by regressions during reading. Psychological Science, 25(6), 1218–1226. http://dx.doi.org/10.1177/0956797614531148

Type of data: eye-tracking, reading, stimuli, English, garden-pathing

Link: http://library.ucsd.edu/dc/object/bb4916286v

Reference: White, A. S. & K. Rawlins. (2016). A computational model of S-selection. In Semantics and Linguistic Theory 26, 641-663. Ithaca, NY: CLC Publications.

Type of data: acceptability judgments

Link: https://github.com/aaronstevenwhite/MegaAttitudeProject

Reference: Schmalz, X., Robidoux, S., Castles, A., Coltheart, M., Marinus, E. (preprint): German and English bodies: No evidence for cross-linguistic differences in preferred grain size.

Type of data: lexical decision times, English, German

Link: https://osf.io/myfk3/

Reference: Nicenboim, B., Vasishth, S., Gattei, C., Sigman, M., & Kliegl, R. (2015). Working memory differences in long-distance dependency resolution. Frontiers in Psychology. http://dx.doi.org/10.3389/fpsyg.2015.00312

Type of data: eye-tracking, self-paced reading, working memory capacity, Spanish

Link: https://github.com/bnicenboim/papers/tree/master/NicenboimEtAl2015.%20Working%20memory%20differences%20in%20long-distance%20dependency%20resolution

Reference: Nicenboim, B., Logačev, P., Gattei, C., & Vasishth, S. (2016). When high-capacity readers slow down and low-capacity readers speed up: Working memory and locality effects. Frontiers in psychology, 7.http://dx.doi.org/10.3389/fpsyg.2016.00280

Type of data: self-paced reading, working memory capacity, Spanish, German

Link: https://github.com/bnicenboim/papers/tree/master/NicenboimEtAl2016.%20When%20High-Capacity%20Readers%20Slow%20Down%20and%20Low-Capacity%20Readers%20Speed%20Up:%20Working%20Memory%20and%20Locality%20Effects

Reference: Lau, J. H., Clark, A., & Lappin, S. (2016). Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge. Cognitive Science. http://dx.doi.org/10.1111/cogs.12414

Type of data: Acceptability judgments, English

Link: http://www.dcs.kcl.ac.uk/staff/lappin/smog/?page=research

Reference: Enochson, K., & Culbertson, J. (2015). Collecting Psycholinguistic Response Time Data Using Amazon Mechanical Turk. PLoS ONE, 10(3), e0116946. http://doi.org/10.1371/journal.pone.0116946

Type of data: Self-paced reading, English, Mechanical Turk

Link: http://mars.gmu.edu/handle/1920/9116

Reference: Logačev, P., & Vasishth, S. (2016). Understanding underspecification: A comparison of two computational implementations. Quarterly Journal of Experimental Psychology, 69(5). http://www.tandfonline.com/doi/full/10.1080/17470218.2015.1134602

Type of data: computational model, code (data analysis)

Link: https://github.com/plogacev/manuscript_LogacevVasishth_TQJEP_Underspecification

Reference: Nicenboim, B., & Vasishth, S. (2016). Statistical methods for linguistic research: Foundational Ideas - Part II. Language and Linguistics Compass. In Press.

Type of data: code (data analysis)

Link: https://github.com/vasishth/NicenboimVasishthPart2

Reference: Patil, U., Hanne, S., Burchert, F., De Bleser, R., & Vasishth, S. (2016). A computational evaluation of sentence comprehension deficits in aphasia. Cognitive Science, 40. http://onlinelibrary.wiley.com/doi/10.1111/cogs.12250/abstract;jsessionid=4D66BEAD359E8F97604C5BB0E7A9BC18.f04t04

Type of data: computational model, code (data analysis), German

Link: http://cogsci.uni-osnabrueck.de/~upatil/src/Patil-EtAl-2014-AphasiaModels.zip

Reference: Patil, U., Vasishth, S., & Lewis, R. L. (2016). Retrieval interference in syntactic processing: The case of reflexive binding in English. Frontiers in Psychology. http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00329/full

Type of data: computational model, code (data analysis), English

Link: http://cogsci.uni-osnabrueck.de/~upatil/src/Reflexives-Data-Analysis.zip

Reference: Safavi, M. S., Husain, S., & Vasishth, S. (2016). Dependency resolution difficulty increases with distance in Persian separable complex predicates: Implications for expectation and memory-based accounts. Frontiers in Psychology, 7. http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00403/full

Type of data: self-paced reading, eye-tracking, code (data analysis), Persian

Link: http://www.ling.uni-potsdam.de/~vasishth/code/SafaviEtAl2016DataCode.zip

Reference: Vasishth, S., & Nicenboim, B. (2016). Statistical Methods for Linguistic Research: Foundational Ideas – Part I. Language and Linguistics Compass, 10(8). http://onlinelibrary.wiley.com/doi/10.1111/lnc3.12201/abstract

Type of data: code (simulation, data analysis)

Link: https://github.com/vasishth/VasishthNicenboimPart1

Reference: Frank, S. L., Trompenaars, T., & Vasishth, S. (2015). Cross-linguistic differences in processing double-embedded relative clauses: Working-memory constraints or language statistics?. Cognitive Science.

Type of data: self-paced reading, code (data analysis), Dutch, English, German

Link: https://github.com/vasishth/StanJAGSexamples/tree/master/FrankEtAlCogSci2015

Reference: Logačev, P., & Vasishth, S. (2015). A Multiple-Channel Model of Task-Dependent Ambiguity Resolution in Sentence Comprehension. Cognitive Science.

Type of data: self-paced reading, code (data analysis), German

Link: https://github.com/plogacev/manuscript_LogacevVasishth_CogSci_SMCM

Reference: Hofmeister, P., & Vasishth, S. (2014). Distinctiveness and encoding effects in online sentence comprehension.

Type of data: self-paced reading, code (data analysis), English

Link: http://www.ling.uni-potsdam.de/~vasishth/code/HofmeisterVasishth2014.zip

Reference: Husain, S., Vasishth, S., & Srinivasan, N. (2014). Strong Expectations Cancel Locality Effects: Evidence from Hindi. PLoS ONE, 9(7).

Type of data: self-paced reading, code (data analysis), Hindi

Link: http://www.ling.uni-potsdam.de/~vasishth/code/HusainEtAl2014PLoSONE.zip

Reference: Vasishth, S., Chen, Z., Li, Q., & Guo, G. (2013). Processing Chinese Relative Clauses: Evidence for the Subject-Relative Advantage. PLoS ONE, 8(10).

Type of data: self-paced reading, code (data analysis), Chinese

Link: http://www.ling.uni-potsdam.de/~vasishth/code/PLoSOneVasishthetaldata.zip

Reference: Keshtiari, N., & Vasishth, S. (2013). Reactivation of antecedents by overt vs null pronouns: Evidence from Persian.

Type of data: self-paced reading, Persian

Link: http://www.ling.uni-potsdam.de/~vasishth/code/KeshtiariVasishthJLM2013.zip

Reference: McCurdy, K., Kentner, G., & Vasishth. S. (2013). Implicit prosody and contextual bias in silent reading. Journal of Eye Movement Research, 6(2).

Type of data: eye-tracking, code (data analysis), German

Link: http://www.ling.uni-potsdam.de/~vasishth/code/McCurdyetalJEMRdata.zip

Reference: Vasishth, S., Shaher, R., & Srinivasan, N. (2012). The role of clefting, word order and given-new ordering in sentence comprehension: Evidence from Hindi. Journal of South Asian Linguistics.

Type of data: code (data analysis), Hindi

Link: http://www.ling.uni-potsdam.de/~vasishth/code/VasishthShaherSrinivasan2012JSAL.zip

Reference: Bartek, B., Lewis, R. L., Vasishth, S., & Smith, M. (2011) In Search of On-line Locality Effects in Sentence Comprehension. Journal of Experimental Psychology: Learning, Memory and Cognition, 37(5).

Type of data: self-paced reading, eye-tracking, code (data analysis)

Link: http://www.ling.uni-potsdam.de/~vasishth/code/BarteketalJEP2011data.zip

Reference: Chen, Z., Jäger, L., & Vasishth, S. (2011). How structure sensitive is the parser? Evidence from Mandarin Chinese. Empirical approaches to linguistic theory: Studies of meaning and structure, Studies in Generative Grammar, Mouton de Gruyter.

Type of data: self-paced reading, code (data analysis), Chinese

Link: http://www.ling.uni-potsdam.de/~vasishth/code/Chenetal2010LingEvidence.zip

Reference: Vasishth, S., Suckow, K., Lewis, R. L., & Kern, S. (2011). Short-term forgetting in sentence comprehension: Crosslinguistic evidence from head-final structures. Language and Cognitive Processes, 25(4).

Type of data: self-paced reading, eye-tracking, code (data analysis), German, English

Link: http://www.ling.uni-potsdam.de/~vasishth/code/VSLK_LCP.zip

Reference: Beck, S., & Vasishth, S. (2009). Multiple Focus. Journal of Semantics.

Type of data: code (data analysis), English

Link: http://www.ling.uni-potsdam.de/~vasishth/code/BeckVasishthJoS2009.zip

Reference: Boston, M. F., Hale, J. T., Patil, U., Kliegl, R., & Vasishth, S. (2008). Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus. Journal of Eye Movement Research, 2(1).

Type of data: eye-tracking, code (data analysis), German

Link: http://www.ling.uni-potsdam.de/~vasishth/code/JEMRSurprisal.zip

Reference: Vasishth, S., Bruessow, S., Lewis, R. L., & Drenhaus, H. (2008). Processing Polarity: How the ungrammatical intrudes on the grammatical. Cognitive Science, 32(4).

Type of data: eye-tracking, computational model, German

Link: http://www.ling.uni-potsdam.de/~vasishth/code/VasishthBruessowetal2008CogSci.zip

Reference: Vasishth, S., & Lewis, R. L. (2006). Argument-head distance and processing complexity: Explaining both locality and antilocality effects. Language, 82(4).

Type of data: self-paced reading, Hindi

Link: http://www.ling.uni-potsdam.de/~vasishth/code/VasishthLewis2006.zip

Reference: Lewis, R. L., & Vasishth, S. (2005). An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29.

Type of data: computational model, reading times

Link: http://www.ling.uni-potsdam.de/~vasishth/code/LewisVasishthModel05.tar.gz

