Skip to content

weecology/Retriever.jl

Repository files navigation

Documentation PackageEvaluator Build Status

Retriever

The Julia wrapper for the Data Retriever software.

The Data Retriever automates the tasks of finding, downloading, and cleaning up publicly available data, and then stores them in a local database or as .csv files. Simply put, it's a package manager for data. This enables data analysts to devote the majority of their time to analysis rather than data cleaning or management.

Installation

Deps

  • Python 3.7 and up

  • Julia 1.5+ is recommended

  • PyCall

  • Pkg is needed, you can add packages using the add command or the dev command.

     pkg> add "git@github.com:JuliaPy/PyCall.jl.git
    

The Retriever.jl depends on a few Julia packages that will be installed automatically.

Ensure that Pycall is using the same Python path where the retriever Python package is installed.

You can change that path to a desired path as below.

julia> ENV["PYTHON"]="Python path where the retriever python package is installed"
# Build Pycall to enable the use of the new path
Pkg.build("PyCall")

Install the core Python retriever package. If your Python path is set, you can use Pip install retriever or Use PyCall to install Python.

# From release
packages = retriever
julia> run(`$(PyCall.pyprogramname) -m pip install --user -- $packages`)
# Or from current dev branch
julia> run(`$(PyCall.pyprogramname) -m pip install --user -- git+https://git@github.com/weecology/retriever.git`)

Install the Retriever Julia package.

julia> Pkg.add("Retriever")

Install from a local source

Download or checkout the source from the github page.

Go to Retriever.jl directory and. Run Julia.

julia> include("src/Retriever.jl")

Or use the Pkg REPL

pkg> add PyCall
pkg> activate .
using Retriever

Database Management Systems

Depending on the database management system, you wish to use, follow the Setting up servers documentation of the retriever. You can change the credentials to suit your server settings.

Example of installing the Datasets

# Using default parameter as the arguments
julia> Retriever.install_postgres("iris")
 # Passing user specfic arguments
julia> Retriever.install_postgres("iris", user = "postgres",
		password="Password12!", host="localhost", port=5432)
julia> Retriever.install_csv("iris")
julia> Retriever.install_mysql("iris")
julia> Retriever.install_sqlite("iris")
julia> Retriever.install_msaccess("iris")
julia> Retriever.install_json("iris")
julia> Retriever.install_xml("iris")

Creating docs.

To create docs, first refer to the Documenter docs. To test the docs locally, run make.jl

julia --color=yes make.jl

or simply

julia make.jl

Using Docker

To run tests using docker

docker-compose run --service-ports retrieverj julia test/runtests.jl

To run the image interactively

docker-compose run --service-ports retrieverj /bin/bash

To test docs in docker

docker-compose run --service-ports retrieverj bash -c "cd docs && julia make.jl"

Acknowledgments

Development of this software is funded by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and started as Shivam Negi's Google Summer of Code