# NIAID DATA HUB: Mycobacteria drug resistance prediction
---
## Setup
---
We are using Gen3 SDK to query structure data and retrieve object data. After installing the gen3 package using pip and using the import statements to import the classes and functions from the package, we need to set and endpoint variable and an auth variable to initialize instances of the classes we just imported. The endpoint should be the url of the commons you would like to interact with, and the refresh_file should contain your API key, which you can obtain by logging into the commons and going to the **Profile** page to create an API key.

In [None]:
# installing packages
!pip install gen3
!pip install --force --upgrade gen3
!pip install flatten_json
!pip install pandas
!pip install requests
!pip install sh
from gen3.auth import Gen3Auth
from gen3.submission import Gen3Submission
from gen3.file import Gen3File
import subprocess
import pandas as pd
import nde_tb_function as nde

In [None]:
endpoint = "https://tb.niaiddata.org/"
auth = Gen3Auth(endpoint, refresh_file = "/home/jovyan/pd/credentials.json")
sub = Gen3Submission(endpoint, auth)
file = Gen3File(endpoint, auth)

## Query
We will use Gen3 Python SDK to run GraphQL queries on NIAID Data Hub using the Gen3Submission class. You can pass your query as a string and use the Gen3Submission.query() function to receive the results of your query.

In [None]:
object_dict = nde.query_file("TB-PATRIC",10,2,{"isoniazid_res_phenotype":"Resistant","amikacin_res_phenotype":"Resistant"})

In [None]:
df = nde.parse_json(object_dict,10)

### Run Ariba for drug resistance prediction
We are getting reference data from CARD as an example. Ariba getref generates reference fasta file and reference metadata file for drug resistance prediction. User can use customized reference fasta file and reference metadata file to improve prediction accuracy.

In [None]:
subprocess.run(["ariba","getref","card","/home/jovyan/pd/nb_output/tb/ariba/reference"])

After getting reference fasta and reference metadata files, Ariba prepareref generates gene clusters or variants clusters

In [None]:
subprocess.run(["ariba","prepareref","-f","/home/jovyan/pd/nb_output/tb/ariba/reference.fa","-m","/home/jovyan/pd/nb_output/tb/ariba/reference.tsv","/home/jovyan/pd/nb_output/tb/ariba/prepareref.out"])

Ariba run runs local assembly to map raw sequences to gene clusters/variant clusters conveying drug resistance

In [None]:
nde.runAriba(df)

Ariba summary creates a summary matrix from individual report files to give an overview of gene cluster/variant clusters occurrance among all the samples tested.

In [None]:
nde.extract_ariba_predict("/home/jovyan/pd/nb_output/tb/ariba/output")

### Run Mykrobe for drug resistance prediction

In [None]:
nde.runMykrobe(df)

#### Extract Mykrobe resistant prediction 

In [None]:
nde.extract_mykrobe_predict(df)

###  Submission of Ariba and Mykrobe to Sheepdog

In [None]:
data = nde.extract_ariba_predict("/home/jovyan/pd/nb_output/tb/ariba/output")
nde.submit_results(data,"Ariba")