Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time
February 28, 2022 15:05
February 28, 2022 17:07
January 15, 2020 12:28
December 31, 2019 12:44
December 31, 2019 12:44
January 16, 2020 11:45
February 28, 2022 16:44
December 13, 2018 17:15
March 31, 2022 13:57


The UC Davis Corpus of Written Spanish, L2 and Heritage Speakers

The UC Davis Corpus of Written Spanish, L2 and Heritage Speakers (COWSL2H) consists of short essays collected from students enrolled in university-level Spanish courses. Courses SPA 1-24 are L2 Learner courses. Course SPA 31-33 are Heritage Learner courses.

All essays, annotations, and corrections are available both as individual text files as well as comma-separated value (csv) files.

Essays are divided based on the prompts used to collect the data-

famous: Write a text in Spanish about the following subject: "a famous person"

vacation: Write a text in Spanish about the following subject: "your perfect vacation plan"

special: Write a text in Spanish about the following subject: "a special person in your life"

terrible: Write a text about the following subject: "a terrible story"

Each essay prompt is further divided by the quarter in which the data was collected.

Annotations: We have annotated a subset of essays for gender/number agreement and usage of "a personal." These annotation targets were chosen based on specific research questions. We encourage fellow researchers to add to our annotations. Please see the included annotation scheme for further information.

Corrections: We have also included corrected essays for S17_vacation, S17_famous, and F17_famous. We are in the process of correcting additional essays and will update the corpus as these are ready to be made public.

Metadata: Metadata files consist of the following data items separated by "|||":

  1. Course enrolled
  2. Age
  3. Gender
  4. L1 language
  5. Other L1 language(s)
  6. Language(s) spoken at home
  7. Language(s) studied
  8. listening comprehension *
  9. reading comprehension *
  10. speaking ability **
  11. writing ability **
  12. Have you ever lived in a Spanish-speaking country?

NOTE: Metadata questions updated for for W21 and S21 data. See below.

* Comprehension is self-described on the following scale:

  • 1 (not confident at all)
  • 2 (not extremely confident, but I am sometimes able to understand)
  • 3 (somewhat confident but it depends a lot on the context and on my degree of focus on the task)
  • 4 (quite confident: I understand written messages most of the time)
  • 5 (extremely confident: I can understand any written message in Spanish)

** Speaking/writing ability is self-described on the following scale:

  • 1 (not confident at all)
  • 2 (not extremely confident)
  • 3 (somewhat confident)
  • 4 (quite confident)
  • 5 (extremely confident)

Metadata format for S21 and W21:

  1. Course enrolled
  2. Age
  3. Gender
  4. How many years of Spanish courses had you taken before arriving to UC Davis?
  5. What was the first Spanish course you took at UC Davis?
  6. How many previous Spanish upper division courses have you been enrolled in?
  7. Which language(s) do you consider to be your mother tongue(s)?
  8. Did you grow up in a Spanish-speaking household?
  9. How many languages can you communicate in (including your mother tongue)?
  10. Have you ever spent more than 1 month in a Spanish-speaking country?
  11. Where did you attend elementary school?
  12. Where did you attend High School?
  13. Did you ever attend a bilingual (Spanish-English) school during K-12?
  14. On a normal day, how much exposure (radio, papers, movies, people talking to you, etc.) to the Spanish language do you get outside of your Spanish language class?
  15. Why do you study Spanish?
  16. I feel [1] understanding my instructor when they speak Spanish in class.
  17. feel [2] understanding a Spanish-speaking movie without subtitles.
  18. feel [3] understanding the readings in the Spanish textbook or other classroom materials.
  19. feel [4] understanding a newspaper article in Spanish.
  20. feel [5] speaking Spanish in class.
  21. feel [6] speaking Spanish outside of class.
  22. feel [7] writing assignments for my Spanish course.
  23. feel [8] writing a blog post in Spanish.
  24. [1] that I will get a good grade for my current Spanish course.
  25. I feel [2] about my current Spanish class.
  26. I feel that my Spanish course is [3].

CSV Data Format

In addition to the raw text and corresponding annotations, corrections, and metadata files for individual essays, we have provided all of the currently available COWS-L2H data in a more user-friendly CSV format. To access these files, you can download the entire COWS-L2H repository, as described in the guide How to use the COWS-L2H Corpus. Alternatively, if you wish to view and download individual CSV files, do the following:

  1. Browse to "csv" directory, which lists all available CSV files by topic and quarter.
  2. Click on the desired CSV file, such as "famous.F17.csv"
  3. If the contents of the file do not display automatically, click on the "View raw" option.
  4. The raw text format of the data should now be visible in your browser window.
  5. Click "File" then "Save Page As" or the equivalent on your browser menu.
  6. Select the desired save location and save the file.


The UC Davis Corpus of Written Spanish, L2 and Heritage Speakers







No releases published


No packages published