This demo accompanies the following journal article. If you use the results in new projects, create images with it for some future work, or use it in a different way we would appreciate a citation:
Tobias Isenberg, Zujany Salazar, Rafael Blanco, and Catherine Plaisant (2022). Do You Believe Your (Social Media) Data? A Personal Story on Location Data Biases, Errors, and Plausibility as well as their Visualization. IEEE Transactions on Visualization and Computer Graphics, 2022. To appear. doi: 10.1109/TVCG.2022.3141605; open-access version available at https://hal.inria.fr/hal-03516682
@article{Isenberg:2022:DYB,
author = {Tobias Isenberg and Zujany Salazar and Rafael Blanco and Catherine Plaisant},
title = {Do You Believe Your (Social Media) Data? {A} Personal Story on Location Data Biases, Errors, and Plausibility as well as their Visualization},
journal = {IEEE Transactions on Visualization and Computer Graphics},
year = {2022},
volume = {28},
number = {9},
month = sep,
pages = {3277--3291},
doi = {10.1109/TVCG.2022.3141605},
shortdoi = {10/kt4c},
doi_url = {https://doi.org/10.1109/TVCG.2022.3141605},
oa_hal_url = {https://hal.inria.fr/hal-03516682},
osf_url = {https://osf.io/u8ejr/},
url = {https://tobias.isenberg.cc/VideosAndDemos/Isenberg2022DYB},
github_url = {https://github.com/tobiasisenberg/Motion-Plausibility-Profiles},
pdf = {https://tobias.isenberg.cc/personal/papers/Isenberg_2022_DYB.pdf},
}
https://tobias.isenberg.cc/VideosAndDemos/Isenberg2022DYB
Please note the software is provided "as is". Use it at your own risk, although data loss is unlikely. Do take the standard precautions like saving your work in other programs.
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) (see license.txt).
The Python3 script requires, in addition to a normal Python3 installation, several packages including (potentially more):
geopy
plotly
matplotlib
numpy
powerlaw
kaleido
Install them using pip3 install [package]
or the respective alternative for your installation of Python 3.
For some functions (but not in the default configuration), the script also needs to be able to call some external programs to do some of the data conversions. In particular:
gpsbabel
: https://www.gpsbabel.org/zip
: e.g., http://infozip.sourceforge.net/
The script requires additional CSV data files (named as specified in the familyFiles
list in the script) downloaded from https://www.inaturalist.org/observations/export (if using different files you need to adjust the familyFiles
list).
For the default configuration, please go to the iNaturalist export page (may require an iNaturalist login) and download the data files for the taxa droseraceae
, nepenthaceae
, sarraceniaceae
, roridulaceae
, byblidaceae
, lentibulariaceae
, cephalotaceae
, and drosophyllaceae
. To do so, for each taxon, enter the taxon name into the taxon
field on the form, select the suggested family in the pop-up that appears, select All
for Geo
and for Taxon
under Point 3 (Choose columns
), and then click the Create export
button at the bottom of the page. Each export can take a while to generate based on the size of the family (e.g., families droseraceae
, nepenthaceae
, sarraceniaceae
, and lentibulariaceae
are large and can take several hours each) and you can see the status at the bottom of the export page if you reload it (ask iNaturalist to send you an e-mail with the data once the export is complete). Also note that you can only create one export at a time. Once downloaded, unzip the data and rename the exported observations-123456.csv
(or similar) files from iNaturalist to family.csv
for easier association and to match the name in the script (e.g., droseraceae.csv
). Note that only with data files present for all families mentioned above the produced results will be appropriate and similar to those in the paper.
To configure the script, adjust the flags create*
at the top of the script as needed (but the script runs with the default configuration out of the box).
python3 _inaturalist.py
Note that it is normal to see many lines along the lines of No genus name for id: XXXXXXXXX ; scientific name: Abcdefghijk
, this is due to some observations in the data having been entered with only the family name, not the full scientific name. These entries have to be treated differently for the visualization (as explained in the paper).
The script produces the visual representations of Fig. 17, 20, 34–36, and 68–72 of the paper (but adjusted to the newly downloaded data). The Motion Plausibility Profiles are separated into the main representation (*.pdf
) and the respective histogram (*-histogram.pdf
), i.e., in two separate files. Also, the script produces two versions of the main representation, one more squarish one (*.pdf
) and one with a more landscape format (*-wide.pdf
).