Quickly download, clean up, and install ecological datasets into a database management system
Python Inno Setup Shell
Latest commit c3dae81 Dec 5, 2016 @ethanwhite ethanwhite committed on GitHub Merge pull request #728 from henrykironde/ulrichbodysize
Adjust the precision and scale values
Failed to load latest commit information.
docker rebranding with Data Retriever Aug 22, 2016
docs Update docs: version number is now in setup.py Nov 16, 2016
engines Overwrite create_db for download engine Oct 17, 2016
lib Adjust the precision and scale values Dec 5, 2016
scripts Repair Frayjorge Dataset. Dec 5, 2016
test Change versioning style Oct 19, 2016
.gitignore Add docs for JSON script creation/editing and CLI bug-fixes Aug 21, 2016
.travis.yml Adding multi row insert to table. Sep 12, 2016
CHANGES.md Release v1.8.3 Feb 12, 2016
CITATION Adding a CITATION file Sep 2, 2013
CONTRIBUTING.md Finish rebranding to Data Retriever Sep 14, 2016
LICENSE rebranding with Data Retriever Aug 22, 2016
MANIFEST.in Make sure that CITATION is added to distributions Jul 4, 2014
README.md Add link to website to README Oct 13, 2016
__init__.py Add retriever minimum version checks on scripts Sep 14, 2016
_version.py Change versioning style Oct 19, 2016
appveyor.yml Update Python 3 version to 3.5 for appveyor Aug 6, 2016
build.sh Cleanup and simplify the Linux build script Feb 5, 2014
build_mac don't also remove "python" Feb 6, 2016
codecov.yml Turn off Codecov commenting on issues Jul 10, 2016
compile.py urllib imports Jun 15, 2016
icon.ico replace icon file with a multi-layer .ico file Jan 29, 2016
lscolumns.py Add absolute imports and builtins imports Jun 15, 2016
make_docs.sh Separating documentation build from deb package build. Jul 26, 2011
modpath.iss Add modpath.iss Jul 12, 2014
osx_icon.icns Add icon to OS X app Jul 6, 2014
requirements.txt Remove pyyaml Aug 21, 2016
retriever_installer.iss Finish rebranding to Data Retriever Sep 14, 2016
term_size.py Refomat code using PEP 8 standard Feb 18, 2016
try_install_all.py Remove eBird dataset from retriever Nov 3, 2016
version.py Version.py now lists scripts in alphabetical order, ignoring case Dec 2, 2016
version.txt Repair Frayjorge Dataset. Dec 5, 2016


Retriever logo

Build Status Build Status (windows) Research software impact codecov.io Documentation Status License Join the chat at https://gitter.im/weecology/retriever

Finding data is one thing. Getting it ready for analysis is another. Acquiring, cleaning, standardizing and importing publicly available data is time consuming because many datasets lack machine readable metadata and do not conform to established data structures and formats. The Data Retriever automates the first steps in the data analysis pipeline by downloading, cleaning, and standardizing datasets, and importing them into relational databases, flat files, or programming languages. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days.

Installing (binaries)

Precompiled binaries the most recent release are available for Windows, OS X, and Ubuntu/Debian on the releases page.

Installing From Source

To install the Data Retriever from source, you'll need Python 2.7+ or 3.3+ with the following packages installed:

  • xlrd

The following packages are optional

  • PyMySQL (for MySQL)
  • sqlite3 (for SQLite)
  • psycopg2 (for PostgreSQL)
  • pyodbc (for MS Access - this option is only available on Windows)

To install from source

  1. Clone the repository
  2. From the directory containing setup.py, run the following command: python setup.py install. You may need to include sudo at the beginning of the command depending on your system (i.e., sudo python setup.py install).
  3. After installing, type retriever from a command prompt to launch the Data Retriever

More extensive documentation for those that are interested in developing can be found here

Using the Command Line

After installing, run retriever update to download all of the available dataset scripts. To see the full list of command line options and datasets run retriever --help. The output will look like this:

usage: retriever [-h] [-v] [-q] {install,update,new,ls,citation,help} ...

positional arguments:
                        sub-command help
    install             download and install dataset
    update              download updated versions of scripts
    new                 create a new sample retriever script
    ls                  display a list all available dataset scripts
    citation            view citation

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -q, --quiet           suppress command-line output

To install datasets, use retriever install:

usage: retriever install [-h] [--compile] [--debug]
                         {mysql,postgres,sqlite,msaccess,csv} ...

positional arguments:
                        engine-specific help
    mysql               MySQL
    postgres            PostgreSQL
    sqlite              SQLite
    msaccess            Microsoft Access
    csv                 CSV

optional arguments:
  -h, --help            show this help message and exit
  --compile             force re-compile of script before downloading
  --debug               run in debug mode


These examples are using Breeding Bird Survey data (BBS)

Using Install

  retriever install -h   (gives install options)

Using specific database engine, retriever install {Engine}

  retriever install mysql -h     (gives install mysql options)
  retriever install mysql --user myuser --password ******** --host localhost --port 8888 --database_name testdbase BBS

install data into an sqlite database named mydatabase.db you would use:

  retriever install sqlite BBS -f mydatabase.db

Using download

  retriever download -h    (gives you help options)
  retriever download BBS"
  retriever download BBS --path C:\Users\Documents

Using citation
  retriever citation   (citation of the retriever engine)
  retriever citation BBS   (citation of BBS data)


For more information see the Data Retriever website.


Development of this software was funded by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and the National Science Foundation as part of a CAREER award to Ethan White.