Skip to content

Commit

Permalink
Merge pull request #1189 from pranita-s/doc-refinement
Browse files Browse the repository at this point in the history
RDataRetriever documentation cleaned and updated
  • Loading branch information
ethanwhite committed Aug 20, 2018
2 parents 3190741 + eaf0d15 commit 774fd9d
Show file tree
Hide file tree
Showing 3 changed files with 232 additions and 30 deletions.
20 changes: 12 additions & 8 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,23 @@
from __future__ import absolute_import
from __future__ import print_function
from imp import reload

import sys
import sphinx_rtd_theme
from builtins import str
from imp import reload

import sphinx_rtd_theme

from retriever.lib.defaults import ENCODING

encoding = ENCODING.lower()
from retriever.lib.defaults import VERSION, COPYRIGHT
from retriever.lib.scripts import SCRIPT_LIST
from retriever.lib.tools import open_fw

# sys removes the setdefaultencoding method at startup; reload to get it back
reload(sys)
if hasattr(sys, 'setdefaultencoding'):
# set default encoding to latin-1 to decode source text
sys.setdefaultencoding('latin-1')
sys.setdefaultencoding(encoding)

# Create the .rst file for the available datasets
datasetfile = open_fw("datasets_list.rst")
Expand All @@ -28,6 +31,7 @@
script_list = SCRIPT_LIST()

# write the title of dataset rst file
# ref:http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html
datasetfile.write(datasetfile_title)

# get info from the scripts
Expand All @@ -38,13 +42,13 @@
reference_link = list(script.urls.values())[0].rpartition('/')[0]
else:
reference_link = " "
title = str(script_num) + ". **{}**\n".format(script.title)

title = str(script_num) + ". **{}**\n".format(script.title.strip())
datasetfile.write(title)
datasetfile.write("~" * len(title) + "\n\n")
datasetfile.write("-" * (len(title) - 1) + "\n\n")
datasetfile.write(":name: {}\n\n".format(script.name))
datasetfile.write(":reference: `{}`\n\n".format(reference_link))
datasetfile.write(":citation: {}\n\n".format(script.citation))
datasetfile.write(":citation: {}\n\n".format((script.citation)))
datasetfile.write(":description: {}\n\n".format(script.description))
datasetfile.close()

Expand Down
14 changes: 3 additions & 11 deletions docs/datasets_list.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,3 @@
===========
Script List
===========


The list of datasets is generated using conf.py.
The file can't be edited on GitHub because it is created in runtime.

Look at the python `conf`_. module

.. _conf: https://github.com/weecology/retriever/blob/master/docs/conf.py
==================
Datasets Available
==================
228 changes: 217 additions & 11 deletions docs/rdataretriever.rst
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
==================================
===============================
Using the Data Retriever from R
==================================
===============================

rdataretriever
~~~~~~~~~~~~~~
==============

The `Data Retriever`_ provides an R interface to the Data Retriever so
that the Retriever's data handling can easily be integrated into R workflows.
that the ``retriever``'s data handling can easily be integrated into R workflows.

Installation
~~~~~~~~~~~~
============

To use the R package ``rdataretriever``, you first need to `install the Retriever <introduction.html#installing-binaries>`_.
To use the R package ``rdataretriever``, you first need to `install the retriever <introduction.html#installing-binaries>`_.

The ``rdataretriever`` can then be installed using
``install.packages("rdataretriever")``
Expand All @@ -25,18 +25,224 @@ To install the development version, use ``devtools``
install_github("ropensci/rdataretriever")

Note: The R package takes advantage of the Data Retriever's command line
interface, which must be available in the path. This should occur automatically
when following the installation instructions for the Retriever.
interface, which must be available in the path. This path is given to the
``rdataretriever`` using the function ``use_RetrieverPath()``. The location of
``retriever`` is dependent on the Python installation (Python.exe, Anaconda, Miniconda),
the operating system and the presence of virtual environments in the system. The following instances
exemplify this reliance and how to find retriever's path.

Ubuntu OS with default Python:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If ``retriever`` is installed in default Python, it can be found out in the system with the help
of ``which`` command in the terminal. For example:

::

$ which retriever
/home/<system_name>/.local/bin/retriever

The path to be given as input to ``use_RetrieverPath()`` function is */home/<system_name>/.local/bin/*
as shown below:

::

library(rdataretriever)
use_RetrieverPath("/home/<system_name>/.local/bin/")

The ``which`` command in the terminal finds the location of ``retriever`` including the name
of the program, but the path required by the function is the directory that contains ``retriever``.
Therefore, the `retriever` needs to be removed from the path before using it.

Ubuntu OS with Anaconda environment:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When ``retriever`` is installed in an virtual environment, the user can track its location only
when that particular environment is activated. To illustrate, assume the virtual environment is *py27*:

::

$ conda activate py27
(py27) $ which retriever
/home/<system_name>/anaconda2/envs/py27/bin/retriever

This path can be used for ``rdataretriever`` after removing `retriever` as follows:

::

library(rdataretriever)
use_RetrieverPath("/home/<system_name>/anaconda2/envs/py27/bin/")

Note: ``rdataretriever`` will be able to locate ``retriever`` even if the virtual environment is
deactivated.

rdataretriever functions:
=========================

datasets()
^^^^^^^^^^
**Description** : The function returns a list of available datasets.

**Arguments** : No arguments needed.

**Example** :

::

rdataretriever::datasets()

fetch()
^^^^^^^
**Description** : Each datafile in a given dataset is downloaded to a temporary directory and then imported as a
data.frame as a member of a named list.

**Arguments** :

- ``dataset`` (String): Name of dataset to be downloaded
- ``quiet`` (Bool): The argument decides if warnings need to be displayed (TRUE/FALSE)
- ``data_name`` (String): Name assigned to dataset once it is downloaded

**Example** :

::

rdataretriever :: fetch(dataset = 'portal')

download()
^^^^^^^^^^
**Description** : Used to download datasets directly without cleaning them and when user does not
have a specific preference for the format of the data and the kind of database.


**Arguments** :

- ``dataset`` (String): Name of the dataset to be downloaded.

- ``path`` (String): Specify dataset download path.

- ``quiet`` (Bool): Setting TRUE minimizes the console output.

- ``sub_dir`` (Bool): Setting TRUE keeps the subdirectories for archived files.

- ``debug`` (Bool): Setting TRUE helps in debugging in case of errors.

**Example** :

::

rdataretriever :: download("iris","/Users/username/Desktop")

Installation functions
^^^^^^^^^^^^^^^^^^^^^^
Format specific installation
----------------------------
**Description** : ``rdataretriever`` supports installation of datasets in three file formats through different functions:

- csv (``install_csv``)
- json (``install_json``)
- xml (``install_xml``)

**Arguments** : These functions require same arguments.

- ``dataset`` (String): Name of the dataset to install.

- ``table_name`` (String): Specify the table name to install.

- ``debug`` (Bool): Setting TRUE helps in debugging in case of errors.

- ``use_cache`` (Bool): Setting FALSE reinstalls scripts even if they are already installed.

**Example** :

::

rdataretriever :: install_csv("bird-size",table_name = "Bird_Size",debug = TRUE)

Database specific installation
------------------------------
**Description** : ``rdataretriever`` supports installation of datasets in four different databses through different functions:

- MySQL (``install_mysql``)
- PostgreSQL (``install_postgres``)
- SQLite (``install_sqlite``)
- MSAccess (``install_msaccess``)

**Arguments for PostgreSQL and MySQL** :

- ``database_name`` (String): Specify database name.

- ``debug`` (Bool): Setting True helps in debugging in case of errors.

- ``host`` (String): Specify host name for database.

- ``password`` (String): Specify password for database.

- ``port`` (Int): Specify the port number for installation.

- ``quiet`` (Bool): Setting True minimizes the console output.

- ``table_name`` (String): Specify the table name to install.

- ``use_cache`` (Bool): Setting False reinstalls scripts even if they are already installed.

- ``user`` (String): Specify the username.

**Example** :

::

rdataretriever :: install_postgres(dataset = 'portal', user='postgres', password='abcdef')

**Arguments for MSAccess and SQLite** :

- ``file`` (String): Enter file_name for database.

- ``table_name`` (String): Specify the table name to install.

- ``debug`` (Bool): Setting True helps in debugging in case of errors.

- ``use_cache`` (Bool): Setting False reinstalls scripts even if they are already installed.

**Example** :

::

rdataretriever :: install_sqlite(dataset = 'iris', file = 'sqlite.db',debug=FALSE, use_cache=TRUE)

get_updates()
^^^^^^^^^^^^^
**Description** : This function will check if the version of the retriever’s scripts in your local directory ‘
~/.retriever/scripts/' is up-to-date with the most recent official retriever release.

**Example** :

::

rdataretriever :: get_updates()

reset()
^^^^^^^
**Description** : The function will Reset the components of rdataretriever using scope [ all, scripts, data, connection]

**Arguments** :

- ``scope`` : Specifies what components to reset. Options include: ’scripts’, ’data’, ’connection’ and
’all’, where ’all’ is the default setting that resets all components.

**Example** :

::

rdataretriever :: reset(scope = 'data')


Examples
~~~~~~~~
========

::

library(rdataretriever)
# List the datasets available via the Retriever
# List the datasets available via the retriever
rdataretriever::datasets()
# Install the Gentry forest transects dataset into csv files in your working directory
Expand All @@ -55,4 +261,4 @@ Examples
To get citation information for the ``rdataretriever`` in R use ``citation(package = 'rdataretriever')``:


.. _Data Retriever: http://data-retriever.org
.. _Data Retriever: http://data-retriever.org

0 comments on commit 774fd9d

Please sign in to comment.