glosario
is an open source glossary of terms used in data science that is available online. By adding glossary keys to a lesson's metadata, authors can indicate what the lesson teaches, what learners should know before they start, and where they can find that knowledge. Authors can also use the library's functions to insert consistent hyperlinks for terms and definitions in their lessons in several (human) languages. You can find the glossary here: https://glosario.carpentries.org/
To advance data science knowledge and accessibility for our diverse community, we have developed Glosario, a multilingual glossary of data science terms. The easiest way to contribute is to use our Google Form, which does not require any technical experience.
If you are comfortable using GitHub, you can also contribute there. You do not need to know any programming language β a basic familiarity with the GitHub web interface is sufficient. We have prepared a detailed and accessible guide to assist you. To support contributors further, we have also created short YouTube videos demonstrating how to contribute:
- Recording in English
- Recording in EspaΓ±ol
- or you can Auto Translate a YouTube Video into your Language
Contributions are welcome in any language, not only those currently represented in the glossary. If you need help with your contribution, you can ask questions in the #glosario Slack channel or email us at community@carpentries.org. If you are not yet a member of The Carpentries Slack, you may request access here.
Any site where glossary URLS resolve can be used as a glossary. This project implements a glossary of data science and data engineering terms as a working model.
- The master copy of the glossary lives in
glossary.yml
. Its format is described below. - This file is turned into a single-page GitHub Pages site using Jekyll.
- It is also turned into a Python package called
glosario
and an R package with the same name.
A glossary entry is structured like this:
- slug: cran
ref:
- base_r
- tidyverse
en:
term: "Comprehensive R Archive Network"
acronym: "CRAN"
def: >
A public repository of R [packages](#package).
- The value associated with the
slug
key identifies the entry.- It must be unique within the glossary.
- It must be in lower case and use only letters, digits, and the underscore (to be compatible with Jekyll's automatic slug creation).
- It becomes the fragment identifier in the online version of the glossary.
- The entry may have a
ref
key. If it is present, its value must be a list of identifiers of related terms in this glossary. - Every other top-level key must be an ISO 639 language code such as
en
orfr
.- Every entry must have at least one such language section.
- Within each language section for each term:
- The value of
term
is the term being defined. This key must be present. - The key
acronym
is optional. If present, its value is the acronym for this term. - The value of
def
is the definition. This key must be present, and the value may contain local links to other terms in this glossary (i.e., links starting with#
) and/or links to outside sources.
- The value of
You can access the glossary.yml dataset and citation information by clicking on the following DOI badge:
SADiLaR is one of the collaborators in the finalisation and expansion of the Glosario Project to African Languages. SADiLaR is a research infrastructure established by the Department of Science and Innovation of the South African government as part of the South African Research Infrastructure Roadmap (SARIR).
We are pleased to share that the Andrew W. Mellon Foundation approved a grant for use over 12 months (November 2023 through October 2024) to support an upgrade to Glosario. Additionally, further funding has been secured from the Foundation to continue developing this resource from January 1, 2025, through December 31, 2025.
- Parrot logo by restocktheshelves.
At The Carpentries, every contribution matters. Individuals help open source projects in many ways: writing guides, reviewing other peopleβs work, translating content, or sharing ideas. These contributions all take time and effort. We want to thank everyone who helps Glosario grow - not only those who write code but also those who support the project in other ways.
We now show credit for four types of contributions:
- π Documentation and Planning β Individuals who helped write the contribution guide, organise ideas, or plan how Glosario works.
- π Pull Request Reviewers β Individuals who read and gave feedback on new suggestions to improve the project.
- π¬ Discussion Participants β Individuals who participated in conversations to help make decisions or solve problems.
- π Translators β Individuals who translated words and meanings to make Glosario useful in many languages.