# Acquire files in a specific format

If you need to download a file, and perform some standard functions on them, then you can do it with the `Acquire` object.

The Acquire object facilitates the acquisition of files and preprocessing.
Currently supported functions are:

 * Acquisition:
   * curl
   * wget
   * lftp
   * local
   * touch
   * merge
  
 * Processing:
   * Compression
     * unzip
     * gunzip
     * bunzip
     * untar
     * gzip
     * bzip
     * bgzip
   
   * Commands:
     * cat
     * ls
     * call
     * cmd
    
   * Processing
     * sort
     * tabix
    
 * Renaming
   * finalize
   
The usage of the `Acquire` object always starts with an acquisition command, followed by some processing commands, followed by the `finalize` command.

    biu.utils.Acquire().curl(url).unzip().finalize(finalLocation)

The Acquire object follows a lazy evaluation. Acquisition and processing is only performed when the `acquire` command is applied to it.

    biu.utils.Acquire().curl(url).unzip().finalize(finalLocation).acquire()

In [1]:
import biu

## The `Acquire` object
The `Acquire` object is an object that allows you to chain commands after each other. these commands are defined above, and exemplified below. To construct an `Acquire` object, one simple creates one with the `biu.utils.Acquire` class. You can also specify to redo each step in the pipeline you create using the `redo=True` argument. You can also specify where the files should be downloaded to with the `where` argument.

In [3]:
myAcquire = biu.utils.Acquire(redo=True)
print(myAcquire)

Acquire object.
 Re-do steps: yes
 Current steps:



## Acquisition

### curl

In [4]:
ao = biu.utils.Acquire().curl("http://zzz.bwh.harvard.edu/plink/dist/plink-1.07-i686.zip")\
                        .unzip("plink-1.07-i686/test.ped")\
                        .call("cat")

print(ao)

ao.acquire()

Acquire object.
 Re-do steps: no
 Current steps:
  * curl
  * unzip
  * call

/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c
0
0
1 1 0 0 1  1  A A  G T
2 1 0 0 1  1  A C  T G
3 1 0 0 1  1  C C  G G
4 1 0 0 1  2  A C  T T
5 1 0 0 1  2  C C  G T
6 1 0 0 1  2  C C  T T

0


D: cat '/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c.unzipped/plink-1.07-i686/test.ped'


'/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c.unzipped/plink-1.07-i686/test.ped'

### wget

In [5]:
ao = biu.utils.Acquire().wget("http://zzz.bwh.harvard.edu/plink/dist/plink-1.07-i686.zip")\
                        .call('unzip -l %s | head').acquire()

D: unzip -l /home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c | head


0
Archive:  /home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2011-06-21 03:20   plink-1.07-i686/
  4589621  2011-06-21 03:20   plink-1.07-i686/plink
  1799865  2011-06-21 03:20   plink-1.07-i686/gPLINK.jar
      138  2007-07-27 16:51   plink-1.07-i686/test.ped
       23  2007-07-27 16:51   plink-1.07-i686/test.map
     1287  2007-07-27 16:51   plink-1.07-i686/README.txt
    15365  2007-07-27 16:51   plink-1.07-i686/COPYING.txt

0


### lftp

In [6]:
biu.utils.Acquire().lftp("sftp://sftp-cancer.sanger.ac.uk",
                         "cosmic/grch38/cosmic/v84/VCF/CosmicCodingMuts.vcf.gz",
                         username="t.gehrmann@lumc.nl", password="Cosmic_password1").gunzip().acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/a5645c178583747883873165eb8b77de6c536924
0
0


'/home/tgehrmann/repos/BIU/docs/_downloads/a5645c178583747883873165eb8b77de6c536924.gunzipped'

### local

There are two ways to make use of a local file. One is more or less a shortcut of the other

In [7]:
biu.utils.Acquire().local('/etc/group').call("head -n3").acquire()

0
root:x:0:
daemon:x:1:
bin:x:2:

0


D: head -n3 '/etc/group'


'/etc/group'

Or you can directly give it as a parameter to the Acquire function:

In [8]:
biu.utils.Acquire('/etc/group').call("head -n3").acquire()

root:x:0:
daemon:x:1:
bin:x:2:

0


D: head -n3 '/etc/group'


'/etc/group'

### touch

You can also create a file (if you simply need an empty one).

In [9]:
biu.utils.Acquire().touch().acquire()

0


'/home/tgehrmann/repos/BIU/docs/_downloads/touchedFile.ef25e3fd76a375aef44c9f0f79a21f0c5ab188a8'

### merge

You can merge multiple acquire steps together into one file, using for example cat.

Currently available methods:
  * cat
  * zcat

In [3]:
ao1 = biu.utils.Acquire().curl("http://zzz.bwh.harvard.edu/plink/dist/plink-1.07-i686.zip")\
                        .unzip("plink-1.07-i686/test.ped")
    
ao2 = biu.utils.Acquire().curl("http://zzz.bwh.harvard.edu/plink/dist/plink-1.07-i686.zip")\
                        .unzip("plink-1.07-i686/test.map")
    
merged = biu.utils.Acquire().merge([ao1, ao2], method='cat').call("wc -l").acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c
0
0
/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c
0
0
0
9 /home/tgehrmann/repos/BIU/docs/_downloads/01e583e08705ac5d5f859c48ab3b993539e858de.cat

0


D: cat '/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c.unzipped/plink-1.07-i686/test.ped' '/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c.unzipped/plink-1.07-i686/test.map' > '/home/tgehrmann/repos/BIU/docs/_downloads/01e583e08705ac5d5f859c48ab3b993539e858de.cat'
D: wc -l '/home/tgehrmann/repos/BIU/docs/_downloads/01e583e08705ac5d5f859c48ab3b993539e858de.cat'


## Processing

### Compression

#### unzip

Unzip a zip file. You can optionally define a specific file from the directory to use for further processing (otherwise a link to the directory is maintained.

In [10]:
biu.utils.Acquire().curl("http://zzz.bwh.harvard.edu/plink/dist/plink-1.07-i686.zip")\
                   .unzip("plink-1.07-i686/test.ped")\
                   .call("cat").acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c
0
0
1 1 0 0 1  1  A A  G T
2 1 0 0 1  1  A C  T G
3 1 0 0 1  1  C C  G G
4 1 0 0 1  2  A C  T T
5 1 0 0 1  2  C C  G T
6 1 0 0 1  2  C C  T T

0


D: cat '/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c.unzipped/plink-1.07-i686/test.ped'


'/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c.unzipped/plink-1.07-i686/test.ped'

In [11]:
biu.utils.Acquire().curl("http://zzz.bwh.harvard.edu/plink/dist/plink-1.07-i686.zip")\
                   .unzip()\
                   .call("ls").acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c
0
0
plink-1.07-i686

0


D: ls '/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c.unzipped'


'/home/tgehrmann/repos/BIU/docs/_downloads/8f52ad05b4c2ba036683cfedcdedb328eb8c837c.unzipped'

#### gunzip

gunzip a file.

In [12]:
biu.utils.Acquire().curl("http://geneontology.org/gene-associations/goa_human.gaf.gz")\
                   .gunzip()\
                   .call("head -n3").acquire()


/home/tgehrmann/repos/BIU/docs/_downloads/67def37e1c6b755c0cbfb09da6b74203e9192838
0
0


D: head -n3 '/home/tgehrmann/repos/BIU/docs/_downloads/67def37e1c6b755c0cbfb09da6b74203e9192838.gunzipped'


!gaf-version: 2.1
!
!Generated by GO Central

0


'/home/tgehrmann/repos/BIU/docs/_downloads/67def37e1c6b755c0cbfb09da6b74203e9192838.gunzipped'

#### untar

Untar a file. You can optionally define a specific file from the directory to use for further processing (otherwise a link to the directory is maintained.

In [13]:
biu.utils.Acquire().curl("https://github.com/thiesgehrmann/proteny/archive/0.1.tar.gz")\
                   .gunzip()\
                   .untar()\
                   .call("ls").acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639
0
0
0
proteny-0.1
proteny-0.1.__exists__

0


D: ls '/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639.gunzipped.untar'


'/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639.gunzipped.untar'

In [14]:
biu.utils.Acquire().curl("https://github.com/thiesgehrmann/proteny/archive/0.1.tar.gz")\
                   .gunzip()\
                   .untar("proteny-0.1/Snakefile")\
                   .call("head -n5").acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639
0
0
0


D: head -n5 '/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639.gunzipped.untar/proteny-0.1/Snakefile'


import inspect, os
__INSTALL_DIR__ = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
__PC_DIR__ = "%s/pipeline_components" % __INSTALL_DIR__

###############################################################################

0


'/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639.gunzipped.untar/proteny-0.1/Snakefile'

#### gzip
gzip a file

In [15]:
biu.utils.Acquire().curl("https://github.com/thiesgehrmann/proteny/archive/0.1.tar.gz")\
                   .gunzip()\
                   .untar("proteny-0.1/Snakefile")\
                   .gzip().acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639
0
0
0
0


'/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639.gunzipped.untar/proteny-0.1/Snakefile.gz'

#### bgzip

bgzip a file.

In [16]:
biu.utils.Acquire().curl("https://github.com/thiesgehrmann/proteny/archive/0.1.tar.gz")\
                   .gunzip()\
                   .untar("proteny-0.1/Snakefile")\
                   .bgzip().acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639
0
0
0
0


'/home/tgehrmann/repos/BIU/docs/_downloads/25f853a528bed894cebd73b8540c05476fa32639.gunzipped.untar/proteny-0.1/Snakefile.bgz'

### Commands

#### cat

#### ls

#### call

#### cmd

### Processing

#### sort
Sort a file. Default is no parameters, but you can provide paramaters to sort the file how you want (posix sort parameters)

In [17]:
biu.utils.Acquire().curl("ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz")\
                   .gunzip()\
                   .sort("-t $'\\t' -k19,19V -k 20,21n")\
                   .call("head")\
                   .acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/95e9311620201ce991f6fe93afc95a5471c406ba
0
0
0
72684	copy number gain	GRCh38/hg38 1q21.1-21.2(chr1:143965076-149471555)x3	-1	subset of 84 genes:GJA5;GJA8	-	Pathogenic	1	Aug 12, 2011	-1	nsv530354	RCV000051832	na	See cases	not provided	not provided	GRCh37	NW_003871056.3	1	1	1676126	na	na	-	criteria provided, single submitter	1		N	dbVar:nssv578530,dbVar:nsv530354	2	58089
380521	copy number gain	NCBI36/hg18 1p36.33(chr1:4737-338603)x3	-1	-	-	Uncertain significance	0	-	-1	nsv2768335	RCV000453780	na	See cases	not provided	not provided	NCBI36	NC_000001.9	1	4737	358524	na	na	1p36.33	no assertion criteria provided	1		N	dbVar:nssv13638640,dbVar:nsv2768335	2	393629
385893	copy number loss	NCBI36/hg18 1p36.33-36.23(chr1:4737-7424898)x1	-1	-	-	Pathogenic	1	-	-1	nsv2771294	RCV000450793	na	See cases	not provided	not provided	NCBI36	NC_000001.9	1	4737	7449889	na	na	1p36.33-36.23	no assertion criteria provided	1		N	dbVar:nssv13638713,dbVar:nsv2771294	2	398920
3

D: head '/home/tgehrmann/repos/BIU/docs/_downloads/95e9311620201ce991f6fe93afc95a5471c406ba.gunzipped.sorted'


'/home/tgehrmann/repos/BIU/docs/_downloads/95e9311620201ce991f6fe93afc95a5471c406ba.gunzipped.sorted'

#### tabix
Use tabix to generate an index for a file

In [18]:
biu.utils.Acquire().curl("ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz")\
                   .gunzip()\
                   .sort("-t $'\\t' -k19,19V -k 20,21n")\
                   .cmd("awk -F $'\\t' 'BEGIN {OFS = FS} { if($19 != \"na\"){ print $0}}'")\
                   .bgzip()\
                   .tabix(seq=19, start=20, end=21)\
                   .acquire()

/home/tgehrmann/repos/BIU/docs/_downloads/95e9311620201ce991f6fe93afc95a5471c406ba
0
0
0
0
0
0


'/home/tgehrmann/repos/BIU/docs/_downloads/95e9311620201ce991f6fe93afc95a5471c406ba.gunzipped.sorted.cmd.bgz.tbi'

## Finalize

## Constructing multiple processes.

In [19]:
bgzip = biu.utils.Acquire().curl("ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz")\
                   .gunzip()\
                   .sort("-t $'\\t' -k19,19V -k 20,21n")\
                   .cmd("awk -F $'\\t' 'BEGIN {OFS = FS} { if($19 != \"na\"){ print $0}}'")\
                   .bgzip()

tbi = bgzip.tabix(seq=19, start=20, end=21)

In [20]:
print(bgzip)

Acquire object.
 Re-do steps: no
 Current steps:
  * curl
  * gunzip
  * sort
  * cmd
  * bgzip



In [21]:
print(tbi)

Acquire object.
 Re-do steps: no
 Current steps:
  * curl
  * gunzip
  * sort
  * cmd
  * bgzip
  * tabix

