# Purpose:

2014-12-21

- dumping the whole proteome on the online server seems to have broken it.
- I believe I will need to split the input up into around 3000 proteins per submission
- this will need some python code (maybe added to `spartan`?)

# Implementation:

## Imports:

In [1]:
# imports
from spartan.utils.misc import split_stream
from spartan.utils import errors as e
import spartan.utils.blast.output as blast
import spartan.utils.hmmer.output as hmmer

## File paths:

In [2]:
# define paths to files

base_dir = "/home/gus/remote_mounts/louise/data/"
prj_dir = base_dir + "projects/ddrad58/argot_prep/"

peptides = base_dir + "genomes/glossina_fuscipes/annotations/seqs/Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.1.fa"

blast_data_in = prj_dir + "Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.union.blastp"
hmmer_data_in = prj_dir + "Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan"

blast_data_out_template = prj_dir + "Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.union.%s.blastp"
hmmer_data_out_template = prj_dir + "Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.%s.hmmscan"

## Begin work:

#### How many proteins are we dealing with?

In [3]:
pep_num = !grep '>' $peptides | wc -l
pep_num

['23264']

So dividing by around ten would give us about 2320 per file.

That is about half of what `Argot2` asks for as the max.

That should work.

In [4]:
23264/2330.0

9.984549356223177

In [5]:
num_per_file = 2330

#### Collect and rough-sort peptide names:

In [6]:
get_pep_name = lambda x: x.lstrip('>').split()[0]

In [7]:
pep_headers = !grep '>' $peptides

In [8]:
pep_names = []
for header in pep_headers:
    pep_names.append(get_pep_name(header))
    
pep_names.sort()

In [9]:
pep_names[:5]

['GFUI000002-PA',
 'GFUI000004-PA',
 'GFUI000006-PA',
 'GFUI000008-PA',
 'GFUI000009-PA']

#### Get the groups:

In [10]:
groups = list(split_stream(stream=pep_names, divisor=num_per_file))

In [11]:
len(groups)

10

In [12]:
len(groups[0])

2330

In [13]:
len(groups[-1])


2294

In [14]:
groups[0][:5]

('GFUI000002-PA',
 'GFUI000004-PA',
 'GFUI000006-PA',
 'GFUI000008-PA',
 'GFUI000009-PA')

In [15]:
groups[1][:5]

('GFUI005096-PA',
 'GFUI005097-PA',
 'GFUI005098-PA',
 'GFUI005105-PA',
 'GFUI005106-PA')

#### Make the new files:

In [None]:
def process_file(in_path, name_func, out_files):
    with open(in_path, 'rU') as lines:
        for line in lines:
            try:
                group = name_func(line, group_map)
                out_files[group].write(line)
            except e.IgnoreThisError as exc:
                print "%s line ignored: %s" % (in_path.split('/')[-1], exc.msg) 

In [17]:
pdb

Automatic pdb calling has been turned ON


In [45]:
# set up groups
groups = list(split_stream(stream=pep_names, divisor=num_per_file))

# set up out files and memoize group membership 
blast_outs = {}
hmmer_outs = {}

group_map = {}

for index, group in enumerate(groups):
    
    blast_outs[index] = open(blast_data_out_template % (str(index)), 'w')
    hmmer_outs[index] = open(hmmer_data_out_template % (str(index)), 'w')
    
    for protein in group:
        group_map[protein] = index

process_file(blast_data_in, blast.protein_name_from_argot_search, blast_outs)

process_file(hmmer_data_in, hmmer.protein_name_from_argot_search, hmmer_outs)

# close and flush our output files  
for outb in blast_outs.values():
    outb.close()
    
for outh in hmmer_outs.values():
    outh.close()

Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line
Glossina-fuscipes-IAEA_PEPTIDES_GfusI1.hmmscan line ignored: Comment line


#### Double check the blast file data:

In [60]:
!head {blast_outs[0].name}

GFUI001296-PA	sp|P9WLR9|Y1815_MYCTU	2.4
GFUI001296-PA	sp|P59981|Y1845_MYCBO	2.4
GFUI001296-PA	sp|P9WLR8|Y1815_MYCTO	2.4
GFUI001296-PA	sp|A1T774|EFTS_MYCVP	4.0
GFUI001296-PA	sp|A4TC66|EFTS_MYCGI	4.9
GFUI001296-PA	sp|Q54KI4|EPS15_DICDI	5.4
GFUI001296-PA	sp|O74477|MCA1_SCHPO	8.4
GFUI001296-PA	sp|Q5NQA8|RISB_ZYMMO	9.9
GFUI003870-PA	sp|P18053|PSA4_DROME	3e-151
GFUI003870-PA	sp|Q9R1P0|PSA4_MOUSE	1e-129


In [61]:
!head {blast_outs[1].name}

GFUI005162-PA	sp|A5WCX3|EFTS_PSYWF	2.4
GFUI005162-PA	sp|A8GCF0|SYS_SERP5	6.8
GFUI005162-PA	sp|A2RP65|RNH3_LACLM	8.2
GFUI005162-PA	sp|Q02VL7|RNH3_LACLS	9.0
GFUI005811-PA	sp|P02518|HSP27_DROME	0.064
GFUI005811-PA	sp|P15990|CRYAA_SPAEH	0.19
GFUI005811-PA	sp|Q96JA1|LRIG1_HUMAN	0.33
GFUI005811-PA	sp|P02517|HSP26_DROME	0.34
GFUI005811-PA	sp|P02506|CRYAA_TUPTE	0.64
GFUI005811-PA	sp|P24623|CRYAA_RAT	1.5


In [62]:
!head {blast_outs[2].name}

GFUI015612-PA	sp|Q9WWW4|RUBR1_PSEPU	1.7
GFUI011464-PA	sp|Q8Y0X8|HLDD_RALSO	4.4
GFUI011464-PA	sp|Q2TBI0|LBP_BOVIN	7.8
GFUI011322-PA	sp|Q9QZW0|AT11C_MOUSE	2.4
GFUI011322-PA	sp|O74537|YCQ4_SCHPO	3.0
GFUI011322-PA	sp|Q12389|DBP10_YEAST	5.0
GFUI011322-PA	sp|A6ZXU0|DBP10_YEAS7	5.0
GFUI011322-PA	sp|Q8MJ26|GYS1_MACMU	7.0
GFUI011322-PA	sp|Q5R9H0|GYS1_PONAB	10.0
GFUI011322-PA	sp|P13807|GYS1_HUMAN	10.0


In [63]:
!head {blast_outs[3].name}

GFUI020531-PA	sp|Q9CR41|HYPK_MOUSE	7e-29
GFUI020531-PA	sp|Q2PFU1|HYPK_MACFA	7e-29
GFUI020531-PA	sp|Q9NX55|HYPK_HUMAN	7e-29
GFUI020531-PA	sp|A6UVF4|NAC_META3	0.46
GFUI020531-PA	sp|Q9M612|NACA_PINTA	0.51
GFUI020531-PA	sp|A7TG43|NACA_VANPO	0.57
GFUI020531-PA	sp|Q9SZY1|NACA4_ARATH	0.82
GFUI020531-PA	sp|P0C0K9|NAC_METTM	1.7
GFUI020531-PA	sp|Q6ICZ8|NACA3_ARATH	2.2
GFUI020531-PA	sp|Q756T5|NACA_ASHGO	2.2


In [64]:
!head {blast_outs[4].name}

GFUI026835-PA	sp|Q1QA78|PRMA_PSYCK	4.4
GFUI026835-PA	sp|O47496|COX2_METSE	4.7
GFUI025686-PA	sp|Q24524|SING_DROME	0.0
GFUI025686-PA	sp|Q16658|FSCN1_HUMAN	1e-119
GFUI025686-PA	sp|Q61553|FSCN1_MOUSE	9e-116
GFUI025686-PA	sp|O18728|FSCN2_BOVIN	1e-115
GFUI025686-PA	sp|P85845|FSCN1_RAT	1e-115
GFUI025686-PA	sp|Q32M02|FSCN2_MOUSE	9e-115
GFUI025686-PA	sp|O14926|FSCN2_HUMAN	2e-113
GFUI025686-PA	sp|Q91837|FASC_XENLA	1e-100


In [65]:
!head {blast_outs[5].name}

GFUI031992-PA	sp|P23677|IP3KA_HUMAN	3e-68
GFUI031992-PA	sp|P17105|IP3KA_RAT	1e-67
GFUI031992-PA	sp|Q8R071|IP3KA_MOUSE	4e-67
GFUI031992-PA	sp|Q7TS72|IP3KC_MOUSE	4e-56
GFUI031992-PA	sp|Q80ZG2|IP3KC_RAT	4e-56
GFUI031992-PA	sp|Q96DU7|IP3KC_HUMAN	6e-53
GFUI031992-PA	sp|P42335|IP3KB_RAT	1e-51
GFUI031992-PA	sp|P27987|IP3KB_HUMAN	4e-51
GFUI031992-PA	sp|O74561|YCZ8_SCHPO	0.016
GFUI031992-PA	sp|Q8BWD2|IP6K3_MOUSE	0.016


In [66]:
!head {blast_outs[6].name}

GFUI035770-PA	sp|Q15R69|PUR4_PSEA6	1.2
GFUI035770-PA	sp|Q9T0I8|MTN1_ARATH	2.0
GFUI035770-PA	sp|Q8VCE4|CI040_MOUSE	5.0
GFUI035770-PA	sp|Q8A1G2|SUSD_BACTN	6.3
GFUI035770-PA	sp|A1SLW4|LEUD_NOCSJ	7.1
GFUI035770-PA	sp|Q9XAQ9|NUOF_STRCO	9.9
GFUI037294-PA	sp|Q9VTJ8|TIM14_DROME	3e-66
GFUI037294-PA	sp|Q5RF34|TIM14_PONAB	3e-44
GFUI037294-PA	sp|Q96DA6|TIM14_HUMAN	3e-44
GFUI037294-PA	sp|Q3ZBN8|TIM14_BOVIN	2e-43


In [67]:
!head {blast_outs[7].name}

GFUI041106-PA	sp|Q23979|MY61F_DROME	0.0
GFUI041106-PA	sp|Q92002|MYO1C_LITCT	0.0
GFUI041106-PA	sp|Q5ZLA6|MYO1C_CHICK	0.0
GFUI041106-PA	sp|A0MP03|MY1CA_XENLA	0.0
GFUI041106-PA	sp|Q63355|MYO1C_RAT	0.0
GFUI041106-PA	sp|Q27966|MYO1C_BOVIN	0.0
GFUI041106-PA	sp|O00159|MYO1C_HUMAN	0.0
GFUI041106-PA	sp|Q9WTI7|MYO1C_MOUSE	0.0
GFUI041106-PA	sp|A5PF48|MYO1C_DANRE	0.0
GFUI041106-PA	sp|Q8N1T3|MYO1H_HUMAN	0.0


In [68]:
!head {blast_outs[8].name}

GFUI045866-PA	sp|Q8BLY1|SMOC1_MOUSE	2e-24
GFUI045866-PA	sp|Q8BLY1|SMOC1_MOUSE	8e-15
GFUI045866-PA	sp|Q8BLY1|SMOC1_MOUSE	3e-13
GFUI045866-PA	sp|Q8BLY1|SMOC1_MOUSE	4e-10
GFUI045866-PA	sp|Q9H4F8|SMOC1_HUMAN	2e-24
GFUI045866-PA	sp|Q9H4F8|SMOC1_HUMAN	1e-14
GFUI045866-PA	sp|Q9H4F8|SMOC1_HUMAN	4e-13
GFUI045866-PA	sp|Q9H4F8|SMOC1_HUMAN	9e-10
GFUI045866-PA	sp|Q8CD91|SMOC2_MOUSE	2e-24
GFUI045866-PA	sp|Q8CD91|SMOC2_MOUSE	6e-15


In [69]:
!head {blast_outs[9].name}

GFUI048986-PA	sp|P03641|F_BPPHS	3.7
GFUI048986-PA	sp|Q0P3N4|PSBH_OSTTA	3.9
GFUI048986-PA	sp|Q6C6M0|ATG2_YARLI	7.6
GFUI052848-PA	sp|P21329|RTJK_DROFU	3e-10
GFUI052848-PA	sp|Q95SX7|RTBS_DROME	3e-09
GFUI052848-PA	sp|P21328|RTJK_DROME	5e-09
GFUI052848-PA	sp|Q9NBX4|RTXE_DROME	1e-07
GFUI052848-PA	sp|P08548|LIN1_NYCCO	0.15
GFUI052848-PA	sp|Q03277|PO11_BRACO	3.1
GFUI052848-PA	sp|Q7PZ96|CCD22_ANOGA	6.9


#### Now the `hmmer` outs:

In [70]:
!head {hmmer_outs[0].name}

Pfam-B_6315          PB006315   GFUI000002-PA        -              0.056   13.4   0.1     0.056   13.4   0.1   5.6   2   2   1   5   5   5   0 -
Pfam-B_12302         PB012302   GFUI000004-PA        -               0.24   12.7   0.8       1.3   10.4   0.0   2.0   2   0   0   2   2   2   0 -
Pfam-B_2490          PB002490   GFUI000004-PA        -               0.45   10.6   0.9      0.51   10.4   0.9   1.0   1   0   0   1   1   1   0 -
Pfam-B_2897          PB002897   GFUI000006-PA        -            5.2e-05   23.9   4.4   5.2e-05   23.9   4.4   1.1   1   0   0   1   1   1   1 -
Pfam-B_18119         PB018119   GFUI000006-PA        -             0.0023   18.3   3.0    0.0024   18.2   3.0   1.2   1   0   0   1   1   1   1 -
Pfam-B_5800          PB005800   GFUI000006-PA        -             0.0055   16.8   1.9    0.0055   16.8   1.9   1.1   1   0   0   1   1   1   1 -
Pfam-B_12732         PB012732   GFUI000006-PA        -               0.05   14.2   8.1     0.085   13.5   8.1   1.4   

In [71]:
!head {hmmer_outs[1].name}

Pfam-B_3358          PB003358   GFUI005096-PA        -            7.1e-12   46.2   0.2   1.2e-11   45.4   0.2   1.3   1   1   0   1   1   1   1 -
AIP3                 PF03915.8  GFUI005096-PA        -            6.8e-10   39.6   0.6     1e-09   39.0   0.1   1.5   2   0   0   2   2   2   1 Actin interacting protein 3
Pfam-B_1461          PB001461   GFUI005096-PA        -            0.00033   19.6   0.1   0.00033   19.6   0.1   1.8   2   0   0   2   2   2   1 -
Pfam-B_2821          PB002821   GFUI005096-PA        -                3.2    7.8   8.5       4.3    7.4   8.5   1.1   1   0   0   1   1   1   0 -
Pfam-B_2892          PB002892   GFUI005096-PA        -                3.9    7.3   7.1       4.9    7.0   7.1   1.1   1   0   0   1   1   1   0 -
Pfam-B_35            PB000035   GFUI005096-PA        -                4.3    7.0   5.0       5.8    6.6   5.0   1.1   1   0   0   1   1   1   0 -
Pfam-B_4234          PB004234   GFUI005096-PA        -                7.4    6.5   9.2      

In [72]:
!head {hmmer_outs[2].name}

Pfam-B_4903          PB004903   GFUI010554-PA        -             0.0063   16.5   0.5     0.013   15.5   0.5   1.5   1   0   0   1   1   1   1 -
Pfam-B_4274          PB004274   GFUI010554-PA        -                7.4    6.8   9.4        25    5.0   9.4   1.9   1   1   0   1   1   1   0 -
Pfam-B_13592         PB013592   GFUI010557-PA        -               0.32   12.1   1.7      0.48   11.5   1.7   1.3   1   0   0   1   1   1   0 -
IQ                   PF00612.22 GFUI010567-PA        -            3.4e-13   49.1   0.9   2.1e-05   24.9   0.1   2.1   2   0   0   2   2   2   2 IQ calmodulin-binding motif
TTL                  PF03133.10 GFUI010568-PA        -            1.4e-23   84.4   1.9   5.8e-13   49.6   1.8   2.1   1   1   1   2   2   2   2 Tubulin-tyrosine ligase family
Methylase_S          PF01420.14 GFUI010568-PA        -              0.033   15.3   0.1     0.051   14.7   0.1   1.2   1   0   0   1   1   1   0 Type I restriction modification DNA specificity domain
Pfam-B_297

In [73]:
!head {hmmer_outs[3].name}

DUF389               PF04087.9  GFUI015987-PA        -            3.9e-38  131.7   6.2     6e-38  131.1   6.2   1.3   1   0   0   1   1   1   1 Domain of unknown function (DUF389)
Pfam-B_9186          PB009186   GFUI015987-PA        -             0.0094   16.9   1.5     0.018   16.0   1.5   1.4   1   0   0   1   1   1   1 -
YABBY                PF04690.8  GFUI015987-PA        -               0.11   13.9   0.8       1.9    9.9   0.0   2.4   2   0   0   2   2   2   0 YABBY protein
Pfam-B_1205          PB001205   GFUI015987-PA        -                8.6    6.2   5.9        11    5.8   5.9   1.1   1   0   0   1   1   1   0 -
CUB                  PF00431.15 GFUI015988-PA        -            2.8e-05   25.3   0.0   9.1e-05   23.6   0.0   1.9   1   0   0   1   1   1   1 CUB domain
hEGF                 PF12661.2  GFUI015988-PA        -             0.0077   17.3   4.5     0.016   16.3   4.5   1.6   1   0   0   1   1   1   1 Human growth factor-like EGF
Laminin_EGF          PF00053.19 GFUI

In [74]:
!head {hmmer_outs[4].name}

Pkinase              PF00069.20 GFUI021430-PA        -              0.013   15.9   0.0     0.015   15.7   0.0   1.1   1   0   0   1   1   1   0 Protein kinase domain
Pfam-B_1520          PB001520   GFUI021434-PA        -            2.6e-28  100.0   0.1     4e-27   96.1   0.1   2.0   1   1   0   1   1   1   1 -
RRM_1                PF00076.17 GFUI021434-PA        -            1.1e-09   38.9   0.1   2.2e-09   38.0   0.1   1.5   1   0   0   1   1   1   1 RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain)
RRM_5                PF13893.1  GFUI021434-PA        -            2.9e-07   31.4   0.0   6.3e-07   30.3   0.0   1.6   1   0   0   1   1   1   1 RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain)
RRM_6                PF14259.1  GFUI021434-PA        -            2.9e-05   25.1   0.0   5.7e-05   24.2   0.0   1.5   1   0   0   1   1   1   1 RNA recognition motif (a.k.a. RRM, RBD, or RNP domain)
HGTP_anticodon       PF03129.15 GFUI021434-PA        -            0.00035   21.6   0

In [75]:
!head {hmmer_outs[5].name}

Pfam-B_15687         PB015687   GFUI026860-PA        -               0.04   14.4   4.8       1.5    9.4   0.3   2.8   2   1   1   3   3   3   0 -
Pfam-B_11529         PB011529   GFUI026860-PA        -                1.9    9.9   5.5       1.1   10.7   3.1   1.8   2   0   0   2   2   2   0 -
Pfam-B_8176          PB008176   GFUI026861-PA        -              0.066   13.9   0.1     0.099   13.3   0.1   1.2   1   0   0   1   1   1   0 -
Pfam-B_7609          PB007609   GFUI026862-PA        -                0.2   11.2   0.2        19    4.7   0.0   2.0   2   0   0   2   2   2   0 -
PilI                 PF10623.4  GFUI026862-PA        -               0.24   12.5   0.5        33    5.6   0.1   2.3   2   0   0   2   2   2   0 Plasmid conjugative transfer protein PilI
ZZ                   PF00569.12 GFUI026862-PA        -               0.61   10.8  10.3         8    7.2   1.6   3.4   2   2   0   2   2   2   0 Zinc finger, ZZ type
CHCH                 PF06747.8  GFUI026865-PA        -     

In [76]:
!head {hmmer_outs[6].name}

Siah-Interact_N      PF09032.6  GFUI032248-PA        -               0.55   11.5   4.8       2.8    9.2   0.3   2.6   2   0   0   2   2   2   0 Siah interacting protein, N terminal
Pfam-B_18559         PB018559   GFUI032248-PA        -                1.8    8.9   8.7       3.7    7.8   8.7   1.5   1   0   0   1   1   1   0 -
Pfam-B_16171         PB016171   GFUI032248-PA        -                  3    8.5   5.7       1.1    9.9   2.9   1.6   2   0   0   2   2   2   0 -
Pfam-B_6339          PB006339   GFUI032248-PA        -                6.3    7.0   3.9       5.1    7.3   2.2   1.6   2   0   0   2   2   2   0 -
Toxin_7              PF05980.7  GFUI032249-PA        -             0.0089   17.2   4.2     0.015   16.4   4.2   1.5   1   0   0   1   1   1   1 Toxin 7
RhoGEF               PF00621.15 GFUI032258-PA        -            1.8e-27   97.5   6.9   1.8e-27   97.5   6.9   2.0   2   0   0   2   2   2   1 RhoGEF domain
Pfam-B_18355         PB018355   GFUI032258-PA        -           

In [77]:
!head {hmmer_outs[7].name}

Cyclin_N             PF00134.18 GFUI037715-PA        -              2e-11   44.7   0.1   3.7e-11   43.8   0.1   1.6   1   1   0   1   1   1   1 Cyclin, N-terminal domain
Cyclin_C             PF02984.14 GFUI037715-PA        -            8.8e-05   23.6   2.9   0.00087   20.4   0.1   2.6   2   1   1   3   3   3   1 Cyclin, C-terminal domain
TFIIB                PF00382.14 GFUI037715-PA        -             0.0031   18.4   0.0     0.036   15.0   0.0   2.3   2   0   0   2   2   2   1 Transcription factor TFIIB repeat
Pfam-B_13813         PB013813   GFUI037715-PA        -               0.13   12.3   0.3      0.34   10.9   0.3   1.6   1   0   0   1   1   1   0 -
YqhG                 PF11079.3  GFUI037716-PA        -               0.39   10.6   0.0      0.79    9.6   0.0   1.4   1   0   0   1   1   1   0 Bacterial protein YqhG of unknown function
Pfam-B_842           PB000842   GFUI037716-PA        -                  3    8.8   5.6      0.94   10.4   1.5   2.1   2   0   0   2   2   2   0 

In [78]:
!head {hmmer_outs[8].name}

Cyclin_N             PF00134.18 GFUI043265-PA        -              1e-25   90.9   0.0   1.4e-25   90.4   0.0   1.2   1   0   0   1   1   1   1 Cyclin, N-terminal domain
GLYCAM-1             PF05242.6  GFUI043265-PA        -              0.093   13.8   0.7      0.15   13.2   0.7   1.2   1   0   0   1   1   1   0 Glycosylation-dependent cell adhesion molecule 1 (GlyCAM-1)
DP                   PF08781.5  GFUI043266-PA        -               0.18   12.7   0.0      0.18   12.7   0.0   1.1   1   0   0   1   1   1   0 Transcription factor DP
Y_phosphatase        PF00102.22 GFUI043268-PA        -            1.4e-29  104.3   0.1   1.9e-14   54.8   0.0   2.4   1   1   1   2   2   2   2 Protein-tyrosine phosphatase
DSPc                 PF00782.15 GFUI043268-PA        -            4.3e-05   24.3   0.2   0.00087   20.0   0.0   2.4   2   0   0   2   2   2   1 Dual specificity phosphatase, catalytic domain
AbiH                 PF14253.1  GFUI043268-PA        -             0.0042   17.9   2.6   

In [79]:
!head {hmmer_outs[9].name}

HCNGP                PF07818.8  GFUI048496-PA        -            1.8e-31  109.2   1.4   5.1e-31  107.7   1.4   1.8   1   0   0   1   1   1   1 HCNGP-like protein
Pfam-B_15158         PB015158   GFUI048496-PA        -              0.098   13.6   1.2      0.16   12.8   1.2   1.3   1   0   0   1   1   1   0 -
Pfam-B_12294         PB012294   GFUI048496-PA        -               0.19   11.9   8.6      0.25   11.5   8.6   1.0   1   0   0   1   1   1   0 -
Pfam-B_6524          PB006524   GFUI048496-PA        -                1.1    9.4   7.6       1.5    9.0   7.6   1.1   1   0   0   1   1   1   0 -
Pfam-B_10304         PB010304   GFUI048496-PA        -                1.4    9.4  13.0     0.044   14.3   5.7   1.8   2   0   0   2   2   2   0 -
Pfam-B_1459          PB001459   GFUI048496-PA        -                4.6    6.3   9.5       5.6    6.0   9.5   1.0   1   0   0   1   1   1   0 -
Pfam-B_6214          PB006214   GFUI048496-PA        -                4.7    6.3   8.0       6.1    5