# Using GN2's REST API

GN2 has a REST API through which the user can fetch data and perform several analyses.
This notebook will include examples for all the currently existing queries, as well as their parameters.

First, we'll start with necessary imports (urllib2 to do HTTP requests with Python and IPython.display to format JSON output).

In [1]:
from urllib.request import urlopen
import json
import uuid
from IPython.core.display import display, HTML

We'll also use a function that makes the displayed JSON collapsible, just for easier readability (written by Tarang Shah - https://gist.github.com/t27).

In [1]:
class RenderJSON(object):
    def __init__(self, json_data):
        if isinstance(json_data, dict):
            self.json_str = json.dumps(json_data)
        else:
            self.json_str = json_data
        self.uuid = str(uuid.uuid4())
        # This line is missed out in most of the versions of this script across the web, it is essential for this to work interleaved with print statements
        # ZS: I removed this because it causes duplicate output
        # self._ipython_display_()

    def _ipython_display_(self):
        display(HTML('<div id="{}" style="height: auto; width:100%;"></div>'.format(self.uuid)))
        display(HTML("""<script>
        require(["https://rawgit.com/caldwell/renderjson/master/renderjson.js"], function() {
          renderjson.set_show_to_level(2)
          document.getElementById('%s').appendChild(renderjson(%s))
        });</script>
        """ % (self.uuid, self.json_str)))

## Fetching Group/Dataset lists and information

The following query fetches the list of species available in the GN database:

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/species")
RenderJSON(json.loads(response.read()))

You can also get the information for a single species.

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/species/mouse")
RenderJSON(json.loads(response.read()))

Groups (for example RISets) can also be listed or selected individually in the same way.
To retrieve all groups across all species:

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/groups")
RenderJSON(json.loads(response.read()))

Or to retrieve all groups for mouse:

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/mouse/groups")
RenderJSON(json.loads(response.read()), level_to_show=1)

Individual group information can also be selected (with an optional field to limit the search to a species).
To get information on the HSNIH-Palmer rat group:

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/group/rat/HSNIH-Palmer")
RenderJSON(json.loads(response.read()))

The same can be done for datasets, but this query is limited to group.
To get a list of all datasets in the BXD group:

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/datasets/bxd")
RenderJSON(json.loads(response.read()))

To get information on a specific dataset, its Short_Abbreviation (from the previous query) can be used like the following:

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/dataset/bxd/HC_M2_0606_P")
RenderJSON(json.loads(response.read()))

Because of some quirks of the GN database, traits other than mRNA expression or genotype traits are treated different.
In this case, the user supplies the group name and trait ID (how to fetch these will be described later), and the dataset
query returns information about the publication it's from (like the title, description, pubmed_id, etc).

For example, to get publication information for BXD phenotype trait 10001:

In [1]:
response = urlopen("http://gn2.genenetwork.org/api/v_pre1/dataset/bxd/10001")
RenderJSON(json.loads(response.read()))

Trait information can also be selected by supplying the dataset abbreviation (for all traits) or the dataset abbreviation
and trait name (for individual traits). Trait information in this context includes things like the description, position, etc; a
separate query gets actual sample/strain data.

For example, to fetch information for all of the traits in the BXD hippocampus dataset HC_M2_0606_P dataset (a BXD hippocampus dataset):

In [1]:
# Output limited to 50 results for the sake of demonstration
response = urlopen("http://gn2.genenetwork.org/api/v_pre1/traits/HC_M2_0606_P?limit_to=50")
RenderJSON(json.loads(response.read()))

Or to fetch all the phenotype (non-mRNA expression/genotype) traits for a group, use the group name

In [1]:
response = urlopen("http://gn2.genenetwork.org/api/v_pre1/traits/HXBBXH?limit_to=50")
RenderJSON(json.loads(response.read()))

Lastly, to view a single tria'ts information (name, gene symbol, description, location, etc):

In [1]:
response = urlopen("http://gn2.genenetwork.org/api/v_pre1/trait/HC_M2_0606_P/1436869_at")
RenderJSON(json.loads(response.read()))

## Downloading Sample Data and Genotypes

Genotypes can be downloaded in either the .geno format (which is basically like a CSV, with strains/samples as columns and markers as rows) or
in BIMBAM format. For several groups (where we received the genotypes directly as BIMBAM), the .geno file is a dummy file, but it should be obvious
when that is the case (all genotypes would be the same and markers would just be called "Marker1, etc").

Below are a couple queries to fetch genotypes in both these formats (in each case they include some formatting to make the output, which would
normally be an exported file, more readable)

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/genotypes/BXD.geno?as_text&limit_to=10")
# The replacing is just to make the output in this notebook readable; this query would normally download a fileprint(str(response.read()).replace('\n', '
').replace('\t', '	'))

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/genotypes/LXS.bimbam?as_text&limit_to=10")
print(str(response.read()).replace('\n', '
'))

Sample data can also be downloaded for a specific dataset.
Currently data is downloaded as a CSV file with samples/strains as columns and traits as rows. JSON will be made available in the future.

The following would fetch sample data for an AIL mouse hippocampus dataset (limited to first 10 traits in this case). Like with the previous
queries a file would normally be downloaded, so some formatting was added to make the output in thie notebook a bit more readable.

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/sample_data/UCSD_AIL_HIP_RNA-Seq_0418.csv?limit_to=10")
print(str(response.read()).replace('\n', '
'))

And this would get phenotype sample data for the HSNIH-Palmer rat group:

In [1]:
response = urlopen("http://gn2-zach.genenetwork.org/api/v_pre1/sample_data/HSNIH-PalmerPublish.csv?limit_to=1")
print(str(response.read()).replace('\n', '
'))

## Running Analyses

### Correlations

Correlations can also be returned through the API. The correlation function takes the following parameters:
* trait_id (*required*) - ID for trait used for correlation
* db (*required*) - DB name for the trait above (this is the Short_Abbreviation listed when you query for datasets)
* target_db (*required*) - Target DB name to be correlated against
* type - sample (default) | tissue | literature
* method - pearson (default) | spearman
* return - Number of results to return (default = 500)

For example, if I wanted to do a tissue correlation between trait 1427571_at in the BXD Hippocampus dataset with abbreviation
HC_M2_0606_P against all BXD phenotype traits, use Spearman Rank method, and return 50 results, I'd use the following query:

In [1]:
# Note - takes a minute to run
response = urlopen("http://gn2.genenetwork.org/api/v_pre1/correlation?trait_id=1427571_at&db=HC_M2_0606_P&target_db=BXDPublish&method=spearman&type=tissue&return_count=50")
RenderJSON(json.loads(response.read()))

Or if I wanted to do a sample correlation between trait ENSRNOG00000006120 in the HSNIH Prelimbic Cortex dataset with abbreviation
HSNIH-Rat-PL-RSeq-0818 against all traits in the same dataset, use Pearson method, and return 50 results, I'd use the following query:

In [1]:
# Note - takes a minute to run
response = urlopen("http://gn2.genenetwork.org/api/v_pre1/correlation?trait_id=ENSRNOG00000006120&db=HSNIH-Rat-PL-RSeq-0818&target_db=HSNIH-Rat-PL-RSeq-0818&method=pearson&type=sample&return_count=50")
RenderJSON(json.loads(response.read()))