This repository holds the data served by http://tsammalex.clld.org/ The data is licensed under a Creative Commons Attribution 4.0 International License.
The data is stored as collection of csv files, suitable for editing with LibreOffice.
Adding an image is done in two steps:
- Adding a row to
staged_images.csv, specifying a publicly available URL to access the image in the
idcolumn (this may be a temporary GitHub repository, wikimedia commons or other publicly available webspace).
- Providing the image at the specified URL for download.
Periodically (or upon request), a process is run, which
- loops through
- retrieving the files and uploading them to our file server (computing the md5 hash) on the way,
- enriching the metadata, in case the image is from a known provider (Wikimedia, Flickr, EOL, ...),
- moving the metadata from
images.csv, replacing the
For several providers of flora and fauna imagery we provide support for downloading images
and associated metadata (by specifying matching URLs as
- EOL images specified by a URL of the form
http://media.eol.org/data_objects/21916329. (You typically get to a page with such a URL by clicking on any image you encounter while browsing EOL)
- Flickr images specified by a URL of the form
https://www.flickr.com/photos/damouns/78968973, i.e. a photo's details page.
- African Plants photos from Senckenberg,
specified by an URL of the form
http://www.westafricanplants.senckenberg.de/root/index.php?page_id=14&id=722#image=26800, i.e. the URL you see in your browser's location bar after clicking to enlarge an image.
- Flora of Zimbabwe or Flora of Mozambique
Currently, insuring referential integrity of the data is the responsibility of the editor. In particular editors have to make sure
idcolumns hold unique values,
<name>__idcolumns hold values which exist as
idin the referenced table,
<name>__idscolumns hold comma-separated lists of values which exist as
idin the referenced table.
Upon push, referential integrity will be checked by travis-ci.
Changing the scientific name of a species
id of a species is created from its scientifc name, changing this name involves the following steps:
- Add a new species with the updated name and
- Update all references to the
- TODO redirect the old species to the new one by ...
In addition to the data managed in this repository, we fetch data from other sources to enhance the usability of the site.
- Given a proper scientific name for a species we can fetch a corresponding identifier from EOL.
- Given an EOL identifier, we can information about the biological classification of a species and a common english name.
- We also use information on terrestrial ecoregions from WWF, which can be referenced using ecoregion identifiers in the format
- WWF ecoregions and countries in which a species occurs can be computed by matching occurrence information from GBIF against region borders.
Notes to self:
To diff sorted lines, you may run:
diff -b <(sort words.csv) <(sort words_CN.csv)