# Check Scripts on Cluster

## 1. Configure AWS key pair

This cell only contains information that you, the user, should input.

#### String Fields

**your_cluster_name**: This is the name given to your cluster when it was created using cfncluster. 

**private_key**: The path to your private key needed to access your cluster.

In [None]:
from pprint import pprint
import os
import sys
from cirrusngs.managers import ClusterSetupManager, ConnectionManager, AddonsManager

#name of the cluster
your_cluster_name = "clustername"

## The private key pair for accessing cluster.
private_key = "/path/to/your_aws_key.pem"

## If delete cfncluster after job is done./
delete_cfncluster= False

print("variables set")

## 2. Connect to Cluster

Run this before any following cells, it provides the connection to the cluster.

In [None]:
master_ip_address = CFNClusterManager.create_cfn_cluster(cluster_name=your_cluster_name)
ssh_client = ConnectionManager.connect_master(hostname=master_ip_address,
               username="ec2-user",
               private_key_file=private_key)

## 3. Get the Supported Pipelines

### Note: this cell must be run before any following cells

In [None]:
#This cell must be run before others in this section

scripts = AddonsManager.get_scripts_dict(ssh_client)
print()
print("Supported Pipelines:", AddonsManager.get_all_pipeline_names(scripts))

## 4. Choose Any of the Following Cells

These cells all provide some different kind of information about the cluster's scripts. Unless otherwise noted feel free to skip around and run them as you wish.

### Get the Supported Workflows in a Given Pipeline

This cell prints out a list of workflows that are supported by a pipeline that you specify. The target_pipeline field can be set to a supported pipeline (see Get the Supported Pipelines) or "all". When set to "all", this cell will show all supported workflows by all pipelines. Note that the target_pipeline field is case sensitive.

In [None]:
#can be set to a supported pipeline name or "all"
target_pipeline = "all"

print("Supported Workflows in {} Pipeline(s): ".format(target_pipeline), end="")
pprint(AddonsManager.get_workflows_in_pipeline(scripts, target_pipeline), indent=2)

### Get the Scripts Used by a Given Pipeline/Workflow

This cell prints out a list of all the shell scripts that are used by a given pipeline or workflow. The target_pipeline field is the same as the target_pipeline field above. It can be set to a supported pipeline or to "all". When set to all, this cell will print the scripts used by every workflow in every pipeline. The target_workflow field is ignored if target_pipeline is set to "all". The target_workflow field can be set to a supported workflow for the provided target_pipeline or to all. See the previous cell for supported workflows for each pipeline. If set to "all", then this cell will print the shell scripts used by all workflows within the given pipeline.

In [None]:
#can be set to a supported pipeline or "all"
target_pipeline = "all"

#can be set to a support workflow or "all"
#if target_pipeline == "all" then this variable is ignored
target_workflow = "all"

pprint(AddonsManager.get_scripts(scripts, target_pipeline, target_workflow))

### Print a Script

This prints out the specified script. The whole shell script will be printed out. The target_pipeline field must be set to a valid pipeline, the target_workflow field must be set to a valid workflow in that pipeline, and the target_script field must be set to a valid script in that workflow. For scripts that are shared by multiple pipelines/workflows you must provide some pipeline/workflow that contains that script to print it out.

The output will include a result surrounded by "###" that indicates where this script can be called from. Afterwards
there will be an exact printout of the script specified.

In [None]:
#all targets must be set to a valid pipeline/workflow/script
#use cells above to check valid options

target_pipeline = "DNASeq"
target_workflow = "bwa_gatk"
target_script = "bwa.sh"

loc, file_cat = AddonsManager.cat_script(ssh_client, scripts, target_pipeline, target_workflow, target_script)

print("".join(["#"]*len(loc)) + "\n{}\n".format(loc) + "".join(["#"]*len(loc)))
AddonsManager.show_script(file_cat)

### Check Which Step Calls a Script

This cell shows the user which steps in the pipeline notebooks actually call a given shell script. The target_script field should be a shell script that exists on the pipeline (including the .sh file extension). The output will contain
where the shell script is called from. If (for example) fastqc.sh is called from the fastqc step in every pipeline, then the output will only say "fastqc in all Pipelines". 

In [None]:
#should be the name of a shell script on the cluster (include sh extension)
target_script = "k_align.sh"

print(AddonsManager.get_steps_calling_script(ssh_client, scripts, target_script))