Quickly download, clean up, and install ecological datasets into a database management system
Python Inno Setup Shell
Latest commit 03594bb Feb 26, 2017 @ethanwhite ethanwhite committed on GitHub Merge pull request #808 from henrykironde/readme-update
Update README.md file
Failed to load latest commit information.
docker rebranding with Data Retriever Aug 22, 2016
docs Release as wheels as well as source Feb 24, 2017
engines Optimize XML engine's insert statement method. Jan 17, 2017
lib Add defualt command Feb 23, 2017
scripts Rename the files using python name convention Feb 21, 2017
test Rename the files using python name convention Feb 21, 2017
.gitignore Add docs for JSON script creation/editing and CLI bug-fixes Aug 21, 2016
.travis.yml Adding multi row insert to table. Sep 12, 2016
CHANGES.md Release notes for version 2.0.0 Feb 24, 2017
CITATION Adding a CITATION file Sep 2, 2013
CONTRIBUTING.md Finish rebranding to Data Retriever Sep 14, 2016
LICENSE rebranding with Data Retriever Aug 22, 2016
MANIFEST.in Make sure that CITATION is added to distributions Jul 4, 2014
README.md Update README,md file Feb 26, 2017
__init__.py Update documents Feb 23, 2017
__main__.py Add defualt command Feb 23, 2017
_version.py Bump version in preparation for next release Feb 15, 2017
appveyor.yml Add postgres to the pytests on appveyor and change postgres test pass… Dec 29, 2016
build.sh Cleanup and simplify the Linux build script Feb 5, 2014
build_mac Use pyinstaller for creating exe for windows and app for mac Feb 17, 2017
build_win Use pyinstaller for creating exe for windows and app for mac Feb 17, 2017
codecov.yml Turn off Codecov commenting on issues Jul 10, 2016
compile.py urllib imports Jun 15, 2016
icon.ico replace icon file with a multi-layer .ico file Jan 29, 2016
lscolumns.py Add absolute imports and builtins imports Jun 15, 2016
make_docs.sh Separating documentation build from deb package build. Jul 26, 2011
modpath.iss Add modpath.iss Jul 12, 2014
osx_icon.icns Add icon to OS X app Jul 6, 2014
pyinstaller.spec Use pyinstaller for creating exe for windows and app for mac Feb 17, 2017
requirements.txt Remove pyyaml Aug 21, 2016
retriever_installer.iss Merge pull request #790 from ethanwhite/version-bump Feb 17, 2017
setup.cfg Release as wheels as well as source Feb 24, 2017
setup.py Merge pull request #790 from ethanwhite/version-bump Feb 17, 2017
stdeb.cfg Issue 441: Remove the GUI and all references Jun 2, 2016
term_size.py Refomat code using PEP 8 standard Feb 18, 2016
try_install_all.py Update try_install_all to use new dataset names Jan 15, 2017
version.py Fix 778: Update version.py to run in python 2 Jan 25, 2017
version.txt version 2.0.0 release Feb 24, 2017


Retriever logo

Build Status Build Status (windows) Research software impact codecov.io Documentation Status License Join the chat at https://gitter.im/weecology/retriever

Finding data is one thing. Getting it ready for analysis is another. Acquiring, cleaning, standardizing and importing publicly available data is time consuming because many datasets lack machine readable metadata and do not conform to established data structures and formats. The Data Retriever automates the first steps in the data analysis pipeline by downloading, cleaning, and standardizing datasets, and importing them into relational databases, flat files, or programming languages. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days.

Installing the Current Release

If you have Python installed you can install the current release using pip:

pip install retriever

Depending on your system configuration this may require sudo:

sudo pip install retriever

Precompiled binary installers are also available for Windows, OS X, and Ubuntu/Debian on the releases page. These do not require a Python installation. Download the installer for your operating system and follow the instructions at on the download page.

Installing From Source

To install the Data Retriever from source, you'll need Python 2.7+ or 3.3+ with the following packages installed:

  • xlrd

The following packages are optionally needed to interact with associated database management systems:

  • PyMySQL (for MySQL)
  • sqlite3 (for SQLite)
  • psycopg2 (for PostgreSQL)
  • pyodbc (for MS Access - this option is only available on Windows)
  • Microsoft Access Driver (ODBC for windows)

To install from source

Either use pip to install directly from GitHub:

pip install git+https://git@github.com/weecology/retriever.git


  1. Clone the repository
  2. From the directory containing setup.py, run the following command: pip install .. You may need to include sudo at the beginning of the command depending on your system (i.e., sudo pip install .).

More extensive documentation for those that are interested in developing can be found here

Using the Command Line

After installing, run retriever update to download all of the available dataset scripts. To see the full list of command line options and datasets run retriever --help. The output will look like this:

usage: retriever [-h] [-v] [-q]

positional arguments:
                        sub-command help
    download            download raw data files for a dataset
    install             download and install dataset
    defaults            displays default options
    update              download updated versions of scripts
    new                 create a new sample retriever script
    new_json            CLI to create retriever datapackage.json script
    edit_json           CLI to edit retriever datapackage.json script
    delete_json         CLI to remove retriever datapackage.json script
    ls                  display a list all available dataset scripts
    citation            view citation
    reset               reset retriever: removes configation settings,
                        scripts, and cached data

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -q, --quiet           suppress command-line output

usage: retriever install [-h] [--compile] [--debug] {mysql,postgres,sqlite,msaccess,csv,json,xml} ...

positional arguments: {mysql,postgres,sqlite,msaccess,csv,json,xml} engine-specific help mysql MySQL postgres PostgreSQL sqlite SQLite msaccess Microsoft Access csv CSV json JSON xml XML

optional arguments: -h, --help show this help message and exit --compile force re-compile of script before downloading --debug run in debug mode


These examples are using the Iris flower dataset. More exapmles can be found in the Data Retriever documentation.

Using Install

retriever install -h (gives install options)

Using specific database engine, retriever install {Engine}

retriever install mysql -h (gives install mysql options) retriever install mysql --user myuser --password ******** --host localhost --port 8888 --database_name testdbase iris

install data into an sqlite database named iris.db you would use:

retriever install sqlite iris -f iris.db

Using download

retriever download -h (gives you help options) retriever download iris retriever download iris --path C:\Users\Documents

Using citation retriever citation (citation of the retriever engine) retriever citation iris (citation for the iris data)


For more information see the
[Data Retriever website](http://www.data-retriever.org/).


Development of this software was funded by [the Gordon and Betty Moore
Foundation's Data-Driven Discovery
Initiative](http://www.moore.org/programs/science/data-driven-discovery) through
[Grant GBMF4563](http://www.moore.org/grants/list/GBMF4563) to Ethan White and
the [National Science Foundation](http://nsf.gov/) as part of a [CAREER award to
Ethan White](http://nsf.gov/awardsearch/showAward.do?AwardNumber=0953694).