# Integrating third-party tools in *pyrpipe*
Executing any shell command with pyrpipe is easy and straight-forward. 

The `Runnable` class can used to import any Unix command into python in an object oriented manner. The `Runnable` class executes all commands via the `pyrpipe_engine` module, which provides helper functions to easily execute and log shell commands. 
Users can directly use `execute_command()` function from `pyrpipe_engine` to directly run Unix commands.

**NOTE:** Inexperienced users must be careful when executing shell commands as some commands can be dangerous and cause loss of data or even worse. Same applies when executing shell commands via `pyrpipe`. `pyrpipe` provides `--dry-run` option that users can use to see the shell commands without executing them (Use this before running any scripts shared via a source you dont trust (also go through the python code)).  In pyrpipe only the `get_shell_output()` method provides `shell=True` option for subprocess.  This comes in handy when executing commands that relies on environment variables, pipes, output redirection etc., but can also execute commands like `rm -r *` which deletes everything in the current working directory.
`pyrpipe` has a `--safe-mode` flag that can disable `rm` commands.


## The Runnable class

To import a Unix command one can directly create a Runnable object and specify the command name. The following example imports the [orfipy](https://github.com/urmi-21/orfipy) command into python.

In [3]:
from pyrpipe.runnable import Runnable
orfipy=Runnable(command='orfipy')
#specify orfipy options; these can be specified into orfipy.yaml too
param={'--outdir':'orfipy_out','--procs':'3','--bed':'orfs.bed'}
infile='sample_data/test.fa'
orfipy.run(infile,**param)
#above commands create a file 'orfipy_out/orfs.bed'
#print 10 lines of output
with open('orfipy_out/orfs.bed') as f:
    nlines = [next(f) for x in range(10)]
print(nlines)

[93mStart:21-01-01 13:43:15[0m
[96m$ orfipy --outdir orfipy_out --procs 3 --bed orfs.bed sample_data/test.fa[0m


['CNT0043697\t39\t483\tID=CNT0043697_ORF.1;ORF_type=complete;ORF_len=444;ORF_frame=1;Start:ATG;Stop:TAA\t0\t+\n', 'CNT0043697\t549\t666\tID=CNT0043697_ORF.2;ORF_type=complete;ORF_len=117;ORF_frame=1;Start:TTG;Stop:TAG\t0\t+\n', 'CNT0043697\t64\t550\tID=CNT0043697_ORF.3;ORF_type=complete;ORF_len=486;ORF_frame=2;Start:TTG;Stop:TGA\t0\t+\n', 'CNT0043697\t32\t65\tID=CNT0043697_ORF.4;ORF_type=complete;ORF_len=33;ORF_frame=3;Start:ATG;Stop:TGA\t0\t+\n', 'CNT0043697\t71\t203\tID=CNT0043697_ORF.5;ORF_type=complete;ORF_len=132;ORF_frame=3;Start:TTG;Stop:TAA\t0\t+\n', 'CNT0043697\t560\t677\tID=CNT0043697_ORF.6;ORF_type=complete;ORF_len=117;ORF_frame=3;Start:CTG;Stop:TAA\t0\t+\n', 'CNT0043697\t642\t699\tID=CNT0043697_ORF.7;ORF_type=complete;ORF_len=57;ORF_frame=-1;Start:TTG;Stop:TGA\t0\t-\n', 'CNT0043697\t81\t627\tID=CNT0043697_ORF.8;ORF_type=complete;ORF_len=546;ORF_frame=-1;Start:ATG;Stop:TAA\t0\t-\n', 'CNT0043697\t32\t431\tID=CNT0043697_ORF.9;ORF_type=complete;ORF_len=399;ORF_frame=-2;Start:TT

[93mEnd:21-01-01 13:43:15[0m
[92mTime taken:0:00:00[0m


### Targets and dependencies
One can specify required dependencies and expected target files in the run() method
Replacing the call to `run()` with the following will verify the required files and the target files.
If command is interrupted, pyrpipe will scan for `Locked` taget files and resume from where the pipeline was interrupted.

In [4]:
orfipy.run(infile,requires=infile,target='orfipy_out/orfs.bed',**param)

[92mTarget files orfipy_out/orfs.bed already exist.[0m


True

## Building APIs
One can extend the Runnable class to provide custom APIs to Unix tools. The RNA-Seq API provided by pyrpipe uses this framework. As a small example is provided in the [tutorial](https://pyrpipe.readthedocs.io/en/latest/?badge=latest)

## The pyrpipe_engine module

The `pyrpipe_engine` module contains the necessary functions to execute the commands. User can directly use these functions to run commands. All these function are decorated by the `dryable` decorator and are automatically turned off if pyrpipe scripts are run with `--dry-run` option.

A list of these functions is provided here. For details refer to the [API docs](https://pyrpipe.readthedocs.io/en/latest/?badge=latest)

| Function | Description |
| --- | --- |
| execute_command | Runs a command, logs the status and returns the status (True or False) |
| get_shell_output | Runs a command and returns a tuple (returncode, stdout and stderr) |
| get_return_status | Runs a command and returns True if command succeeded or False otherwise |
| execute_commandRealtime | Runs a command and print output in real-time |


### The execute_command() method

Execute a command, log the details and return the status (True or False).

The following example executes a simple `ls -l` command. The command is not logged (`logs=False`) and the stdout is printed to screen as (`verbose=True`). See API docs for more information [`execute_command()`](https://pyrpipe.readthedocs.io/en/latest/pyrpipe.html#pyrpipe.pyrpipe_engine.execute_command)



In [1]:
#Import necessary modules
from pyrpipe import pyrpipe_engine as pe

#run a shell commad
pe.execute_command(['ls', '-l'],logs=False,verbose=True)

[93mReading configuration from pyrpipe_conf.yaml[0m
[93mStart:21-01-17 19:04:34[0m
[96m$ ls -l[0m
[96mSTDOUT:
total 56
drwxr-xr-x  8 usingh usingh  4096 Jan  4 15:00 Athaliana_transcript_assembly
drwxr-xr-x  4 usingh usingh  4096 Jan 17 13:41 Covid_RNA-Seq
drwxr-xr-x  3 usingh usingh  4096 Jan 11 11:24 GTEx_processing
-rw-r--r--  1 usingh usingh 16955 Jan 17 19:00 Integrating third-party tools.ipynb
drwxr-xr-x 11 usingh usingh  4096 Jan  4 15:00 Maize_lncRNA_prediction
drwxr-xr-x  2 usingh usingh  4096 Jan  4 15:00 orfipy_out
-rw-r--r--  1 usingh usingh    12 Jan  4 15:00 pyrpipe_conf.yaml
drwxr-xr-x  2 usingh usingh  4096 Dec 31 17:02 pyrpipe_logs
drwxr-xr-x  3 usingh usingh  4096 Mar  7  2020 sample_data
drwxr-xr-x  3 usingh usingh  4096 Jan  5 12:24 Snakemake_example
[0m
[93mEnd:21-01-17 19:04:34[0m
[92mTime taken:0:00:00[0m


True

## Commands in a `string`
A command in a `string` for mat can be easily converted to a list.

In [6]:
cmd="blastx -query sample_data/test.fa -db sample_data/pldb/mydb -qcov_hsp_perc 30 -num_threads 2 -out sample_data/blast_out"
cmdList=cmd.split()
pe.execute_command(cmdList,verbose=True,logs=False)

#head the output
pe.execute_command(['head','-20','sample_data/blast_out'],verbose=True,logs=False,objectid="",command_name="")

[93mStart:21-01-01 13:44:00[0m
[96m$ blastx -query sample_data/test.fa -db sample_data/pldb/mydb -qcov_hsp_perc 30 -num_threads 2 -out sample_data/blast_out[0m
[93mEnd:21-01-01 13:44:04[0m
[92mTime taken:0:00:04[0m
[93mStart:21-01-01 13:44:04[0m
[96m$ head -20 sample_data/blast_out[0m
[96mSTDOUT:
BLASTX 2.7.1+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.



Database: mydb
           250 sequences; 128,483 total letters



Query= CNT0043697

Length=699


[0m
[93mEnd:21-01-01 13:44:04[0m
[92mTime taken:0:00:00[0m


True

## Commands in a `dict`
The `pyrpipe_utils` module contains helper functions [`parse_unix_args()`](https://pyrpipe.readthedocs.io/en/latest/pyrpipe.html#pyrpipe.pyrpipe_utils.parse_unix_args) and [`parse_java_args()`](https://pyrpipe.readthedocs.io/en/latest/pyrpipe.html#pyrpipe.pyrpipe_utils.parse_java_args) to convert commands present in a `dict` to a `list`. This option can be useful to read commands or rules stored in **.json or .yaml** files and execute them with pyrpipe.

**Note: When using a Runnable object, this parsing is automatically performed to merge the command options with the command**

In [7]:
from pyrpipe import pyrpipe_utils as pu
#run blast
"""NOTE: python 3.6 and higher keeps the order in which dict elements are inserted.
To provide positional arguments use "--" as key followed by a tuple. for example:
dict={'-threads':'10','--':('file1','file2')} will be parsed as

-threads 10 file1 file2

"""

blast_parameters={'-query':'sample_data/test.fa',
                  '-db': 'sample_data/pldb/mydb',
                  '-qcov_hsp_perc': '30',
                  '-num_threads': '2',
                  '-out': 'sample_data/blast_out2'
}

blast_cmd=['blastx']

param_list=pu.parse_unix_args([],blast_parameters) 
#Note: the first argument, valid_args_list, can be provided to ignore invalid arguments

#add parameters
blast_cmd.extend(param_list)
pe.execute_command(blast_cmd,verbose=True,logs=False)

#head the output
pe.execute_command(['head','-20','sample_data/blast_out2'],verbose=True,logs=False)


[93mStart:21-01-01 13:48:24[0m
[96m$ blastx -query sample_data/test.fa -db sample_data/pldb/mydb -qcov_hsp_perc 30 -num_threads 2 -out sample_data/blast_out2[0m
[93mEnd:21-01-01 13:48:27[0m
[92mTime taken:0:00:03[0m
[93mStart:21-01-01 13:48:27[0m
[96m$ head -20 sample_data/blast_out2[0m
[96mSTDOUT:
BLASTX 2.7.1+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.



Database: mydb
           250 sequences; 128,483 total letters



Query= CNT0043697

Length=699


[0m
[93mEnd:21-01-01 13:48:27[0m
[92mTime taken:0:00:00[0m


True

## Getting stdout from command
The [`get_shell_output()`](https://pyrpipe.readthedocs.io/en/latest/pyrpipe.html#pyrpipe.pyrpipe_engine.get_shell_output) can directly returns tuple with returncode, stdout, and stderr. returncode is an integer specifying the return status of command. stdout, and stderr are strings.

In [11]:
result=pe.get_shell_output(['du', '-sh','sample_data/blast_out2'])
#result contains return code, stdout, stderr
print(result)

#check if command was successful
if result[0] == 0:
    #get the stdout as string
    print(result[1])
    

(0, '364K\tsample_data/blast_out2\n', '')
364K	sample_data/blast_out2



## Get realtime output from shell
The `execute_commandRealtime()` produces outputs to screen in realtime.

In [12]:
cmd=['ping','-c','4','google.com']

for output in pe.execute_commandRealtime(cmd):
    print (output)

PING google.com(ord08s13-in-x0e.1e100.net (2607:f8b0:4009:807::200e)) 56 data bytes

64 bytes from ord38s19-in-x0e.1e100.net (2607:f8b0:4009:807::200e): icmp_seq=1 ttl=119 time=15.1 ms

64 bytes from ord38s19-in-x0e.1e100.net (2607:f8b0:4009:807::200e): icmp_seq=2 ttl=119 time=15.1 ms

64 bytes from ord38s19-in-x0e.1e100.net (2607:f8b0:4009:807::200e): icmp_seq=3 ttl=119 time=15.1 ms

64 bytes from ord38s19-in-x0e.1e100.net (2607:f8b0:4009:807::200e): icmp_seq=4 ttl=119 time=15.1 ms



--- google.com ping statistics ---

4 packets transmitted, 4 received, 0% packet loss, time 3003ms

rtt min/avg/max/mdev = 15.125/15.156/15.189/0.091 ms

