Author: Natalia Zajac (nataliazajac@gmail.com) Last updated: 15 July 2021

The purpose of this tutorial is to give a basic guide on how to create an hdf5 database out of an OMA standalone run and how to work with it.

From tutorial_hdf5.ipnyb by Natasha Glover: What is HDF5? "HDF5 is a data model, library, and file format for storing and managing data" (https://www.hdfgroup.org/HDF5/). We use the hdf5 as a hierarchical database to store the results of OMA to be used for the browser. This includes genome information, homologous pairs, protein information, orthoxmls, oma groups, HOG groups, and more.

I would recommend to first consult the documentation about the HDF5 database, which contains brief descriptions of each table. This way you can get an idea of the type of data we store and that you can access. work in progress

This introduction gives you a guide on how to run OMA standalone and create an hdf5 from it.
1. Download OMA - latest version is 2.5.0 (July 2021)

2. Run omastandalone - for instructions on that see https://omabrowser.org/standalone/. Most important points about that is changing to bottom up algorithm in the parameters file. Divide the processes into 3 subprocesses: db conversion, all-against-all, oma output formation.

3. Make sure your ssh command works with which you can connect to the cluster and you can copy files from your laptop to the cluster. Create a file on your computer called ./ssh/config where you put the following 3 lines
ForwardAgent=yes
UseKeychain yes
AddKeysToAgent yes

4. Make a link to omadarwin in your home directory. If you installed OMA on your scratch it will be:
ln -s /scratch/user/OMA.2.5.0/bin/omadarwin ~/bin/darwin

5. Make sure your local dir is in your path by checking if it is there if you go 
echo $PATH

6. Create an environment and activate it (2 ways):
    -python3 -m venv myenv(a name you give to an environemnt)
     source myenv/bin/activate (when you are done do deactivate)
    -download miniconda (ssh miniconda downloaded file onto the cluster)
     bash the-miniconda-downloaded-file (remember to specify your desired location in the process) 
     for the activation to take place close and reopen the terminal
     conda create -n myenv python
     conda activate myenv (when you are done do conda deactivate)
     For more info: 
     https://wiki.unil.ch/ci/books/service-de-calcul-haute-performance-%28hpc%29/page/using-conda-and-anaconda#bkmrk-using-conda-virtual-
     
7.Clone pyoma
git clone ssh://gitolite@lab.dessimoz.org:2222/pyoma
if it requests a password that means your ssh key was not added into the git, check more ~/.ssh/ (create a new key on the cluster https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh, give the key to someone from the lab who can add it to git)
then cd pyoma
and then:
pip install pyoma[create-db]
or 
pip install -r requirements . (a file that should be in pyoma when you enter the directory)
or 
pip install -e .

8. Do pyoma/bin/importdata.py --release /path/to/omastadalone OmaServer.h5

Now that you have created an hdf5 file you can query it either using a jupyter notebook that you open on the cluster (see https://lab.dessimoz.org/wiki/wally_and_axiom, bottom of the page) 
or 

by copying the file onto your computer using ssh your-username@curnagl.dcsr.unil.ch:/path/to/your/OmaServer.h5 Documents/your/target/dir 
or

You can also copy your files to NAS for long term storage and mount the NAS onto your laptop. For more info see:
https://wiki.unil.ch/ci/books/service-de-calcul-haute-performance-%28hpc%29/page/data-management#bkmrk-laptop-%3C-%3E-nas




Now we can start to explore the database. As an example I will use a hdf5 containing 378 avian genomes which I access by mounting NAS onto my computer.

In [2]:
import pyoma
from tables import *
import pandas as pd
import numpy as np
import pyoma.browser 
from pyoma.browser import db
import pyham
##adivce: if import any of these throws an error about a numpy array, uninstall and reinstall numpy (i know, but it works)

working_dir = "/Volumes/DBC/cdessim2/default/D2c/birds/initial_dataset/" 
#specify location of the h5 file
h5file = "/Volumes/DBC/cdessim2/default/D2c/birds/initial_dataset/OmaServer.h5"
#convert the h5 file into a dbObject and Oma id Object
dbObj = db.Database(h5file)
h5file2 = dbObj.get_hdf5_handle()
omaIdObj = db.OmaIdMapper(dbObj)

outdated database version, but only minor version change: 3.4 != 3.2. Some functions might fail
be ready to see PyTables asking for *lots* of memory and possibly slow
I/O.  You may want to reduce the rowsize by trimming the value of
dimensions that are orthogonal (and preferably close) to the *main*
dimension of this leave.  Alternatively, in case you have specified a
very small/large chunksize, you may want to increase/decrease it.


In [3]:
##How many genomes do you have in your h5 file
h5file2.root.Genome.nrows
h5file2.root.Protein.Locus._v_nchildren

378

In [4]:
##How many root HOGs are there
dbObj.get_nr_toplevel_hogs()

49427

In [7]:
##Find out how many proteins for each species
genomeTab = h5file2.root.Genome
scinames = genomeTab.col('SciName')

for genome in scinames:
    total_entries = h5file2.root.Genome.read_where('SciName == genome', field='TotEntries')
    print(genome,total_entries)

b'ANAPL' [15753]
b'Acanthisitta_chloris_57068' [16077]
b'Acrocephalus_arundinaceus_39621' [13698]
b'Aegithalos_caudatus_73327' [13259]
b'Aegotheles_bennettii_48278' [13701]
b'Agelaius_phoeniceus_39638' [14146]
b'Alaudala_cheleensis_670337' [14297]
b'Alca_torda_28689' [13980]
b'Alcedo_cyanopectus_390723' [12043]
b'Aleadryas_rufinucha_461220' [13899]
b'Alectura_lathami_81907' [13399]
b'Alopecoenas_beccarii_262131' [14599]
b'Amazona_aestiva_12930' [16144]
b'Amazona_guildingii_175529' [14211]
b'Anhinga_anhinga_56067' [13590]
b'Anhinga_rufa_317792' [10486]
b'Anser_cygnoides_domesticus_381198' [31811]
b'Anseranas_semipalmata_8851' [14223]
b'Anthoscopus_minutus_156561' [13582]
b'Antrostomus_carolinensis_279965' [16878]
b'Apaloderma_vittatum_57397' [15288]
b'Aphelocoma_coerulescens_39617' [14584]
b'Aptenodytes_forsteri_9233' [19104]
b'Aptenodytes_patagonicus_9234' [13942]
b'Apteryx_rowi_308060' [37618]
b'Aquila_chrysaetos_chrysaetos_223781' [48143]
b'Aramus_guarauna_54356' [14295]
b'Ardeotis_k

# Create useful dataframes

In [5]:
##create a dataframe with info from all the genomes
genomeTab = h5file2.root.Genome
genomeTab.description
genomeTab.read()
genome_df = pd.DataFrame(genomeTab.read())

In [6]:
##create a dataframe with info for each protein
entriesTab = h5file2.root.Protein.Entries
entriesTab.read()
entries_df = pd.DataFrame(entriesTab.read())

In [8]:
##For a set of entries from entries_df from HOGs of interest find the genomes these proteins are coming from

list_of_hogs_of_interest = [b'HOG:0039961', b'HOG:0039951']
df = entries_df[entries_df["OmaHOG"].isin(list_of_hogs_of_interest)]

entry_species_dict = {}
for i in df.EntryNr.unique():
    l = omaIdObj.genome_of_entry_nr(i)[5].decode("utf-8")
    entry_species_dict[i] = l

df['species'] = df.EntryNr.map(entry_species_dict)    
df[:10]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['species'] = df.EntryNr.map(entry_species_dict)


Unnamed: 0,EntryNr,SeqBufferOffset,SeqBufferLength,OmaGroup,OmaHOG,Chromosome,LocusStart,LocusEnd,LocusStrand,AltSpliceVariant,CanonicalId,CDNABufferOffset,CDNABufferLength,MD5ProteinHash,DescriptionOffset,DescriptionLength,SubGenome,RootHogUpstream,RootHogDownStream,species
231463,231464,103762255,621,89313,b'HOG:0039951',b'',0,0,1,0,b'XP_013034218.1 PREDI',310823839,1861,b'148a6862b350afecaa38df3c6c350425',0,0,b'',-1,-1,Anser_cygnoides_domesticus_381198
339440,339441,158939569,638,89313,b'HOG:0039951',b'',0,0,1,0,b'XP_009283416.1 PREDI',476139827,1912,b'd7825d42a8e6711b4762e16edcb8ac47',0,0,b'',-1,-1,Aptenodytes_forsteri_9233
395992,395993,193387826,625,89313,b'HOG:0039951',b'',0,0,1,0,b'XP_025943772.1 sperm',579371494,1873,b'2b5770aeb3ba49793a4b294278ac1671',0,0,b'',-1,-1,Apteryx_rowi_308060
396351,396352,193610029,1365,36547,b'HOG:0039961',b'',0,0,1,0,b'XP_025944131.1 prote',580037385,4093,b'972b454cdbbfb3a10d9b776e807297d8',0,0,b'',-1,-1,Apteryx_rowi_308060
516515,516516,266304712,583,89313,b'HOG:0039951',b'',0,0,1,0,b'XP_026708212.1 LOW Q',797881106,1747,b'1fa568c2da76739672ac5288e8039cf2',0,0,b'',-1,-1,Athene_cunicularia_194338
570649,570650,296739803,618,89313,b'HOG:0039951',b'',0,0,1,0,b'XP_032045988.1 sperm',889078111,1852,b'8e455e20edf302b59f8fae4d7072bb9f',0,0,b'',-1,-1,Aythya_fuligula_219594
697152,697153,354512144,1511,36537,b'HOG:0039961',b'',0,0,1,0,b'NXH17782.1 SCRIB pro',1062142128,4531,b'9836c8b4cf6fd87841b71c57cf15541a',0,0,b'',-1,-1,Bucco_capensis_135168
710008,710009,360561785,1800,36553,b'HOG:0039961',b'',0,0,1,0,b'XP_010136241.1 PREDI',1080265339,5398,b'80bfdbf352c10732d20e1a7692ffbc1b',0,0,b'',-1,-1,Buceros_rhinoceros_silvestris_175836
770269,770270,388057208,633,2442,b'HOG:0039951',b'',0,0,1,0,b'ENSGALG00000003148',1162631086,1897,b'8c799e7c75bc81e66be3604d20321e8b',1846597,58,b'',-1,-1,CHICK
803140,803141,406486869,597,89313,b'HOG:0039951',b'',0,0,1,0,b'XP_014807437.1 PREDI',1217854327,1789,b'df0b5d006b6b2a65ba584c061a496cc4',0,0,b'',-1,-1,Calidris_pugnax_198806


In [9]:
#lets say you have a list of entries you are interested in 
##and you want to match an original protein ID to them and find them in the fasta files

entries = open("/path/to/file/list_of_entries_of_interest",'r')
for line in entries:
    lines=entries.read().split()
numbers = [ int(x) for x in lines ]
output = pd.DataFrame()
for i in numbers:
    entrynr = i
    entry = dbObj.entry_by_entry_nr(entrynr)
    l = entry[10].decode().split(" ")[0]
    rows = []
    rows.append([i, l])
    df = pd.DataFrame(rows, columns=["entry", "protein_name"])
    output = output.append(df)
output[:10]


FileNotFoundError: [Errno 2] No such file or directory: '/path/to/file/list_of_entries_of_interest'

In [10]:
##alternatively
entries = ['27585', '52263']
numbers = [ int(x) for x in entries ]
output = pd.DataFrame()
for i in numbers:
    entrynr = i
    entry = dbObj.entry_by_entry_nr(entrynr)
    l = entry[10].decode().split(" ")[0]
    rows = []
    rows.append([i, l])
    df = pd.DataFrame(rows, columns=["entry", "protein_name"])
    output = output.append(df)
output[:10]

Unnamed: 0,entry,protein_name
0,27585,XP_009079256.1
0,52263,NWH89998.1


In [11]:
##You can also directly obtain sequence for the entries you are interested in the following

entries = ['27585', '52263']
numbers = [ int(x) for x in entries ]

for entrynr in numbers:
    print(entrynr, dbObj.get_sequence(entrynr).decode("utf-8"))
    print("\n")
    print(entrynr, dbObj.get_cdna(entrynr).decode("utf-8"))

27585 MSITSDEVNFLVYRYLQESGFSHSAFTFGIESHISQSNINGTLVPPAALISILQKGLQYVEAEISINEDGTVFDGRPIESLSLIDAVMPDVVQTRQQAFREKLAQQQASAAAAAAATAATAGATTTAVSQQNTPKNGEATVNGEENGAHAINNHSKPMEIDGDVEIPPNKATVLRGHESEVFICAWNPVSDLLASGSGDSTARIWNLNENSNSGSTQLVLRHCIREGGHDVPSNKDVTSLDWNSDGTLLATGSYDGFARIWTEDGNLASTLGQHKGPIFALKWNKKGNYILSAGVDKTTIIWDAHTGEAKQQFPFHSAPALDVDWQNNTTFASCSTDMCIHVCRLGCDRPVKTFQGHTNEVNAIKWDPSGMLLASCSDDMTLKIWSMKQDTCVHDLQAHSKEIYTIKWSPTGPGTSNPNSNIMLASASFDSTVRLWDVDRGVCIHTLTKHQEPVYSVAFSPDGKYLASGSFDKCVHIWNTQSGTLVHSYRGTGGIFEVCWNARGDKVGASASDGSVCVLDLRK


27585 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

In [12]:
##Print root HOG for all entries you are interested in
df = entries_df[:10]
omahog_root_hog_dict = {}
for i in df.OmaHOG.unique():
    if i != b'':
        entrynr = df[df["OmaHOG"] == i]["EntryNr"].item()
        m = dbObj.hog_family(entrynr)
        omahog_root_hog_dict[i] = m
    else:
        omahog_root_hog_dict[i] = "Singleton"  
df["rootHOG"] = df.OmaHOG.map(omahog_root_hog_dict)
df[:10]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["rootHOG"] = df.OmaHOG.map(omahog_root_hog_dict)


Unnamed: 0,EntryNr,SeqBufferOffset,SeqBufferLength,OmaGroup,OmaHOG,Chromosome,LocusStart,LocusEnd,LocusStrand,AltSpliceVariant,CanonicalId,CDNABufferOffset,CDNABufferLength,MD5ProteinHash,DescriptionOffset,DescriptionLength,SubGenome,RootHogUpstream,RootHogDownStream,rootHOG
0,1,0,1466,13480,b'HOG:0031882.11a',b'',0,0,1,0,b'ENSAPLG00000013871',0,4396,b'0d827ea2708038ef1ee0df7123791889',0,79,b'',-1,-1,31882
1,2,1466,369,15138,b'HOG:0031873',b'',0,0,1,0,b'ENSAPLG00000014680',4396,1105,b'2cf92ae40944def4d9844c5aa9ce6f97',79,81,b'',-1,-1,31873
2,3,1835,76,5008,b'HOG:0027674',b'',0,0,1,0,b'ENSAPLG00000014980',5501,226,b'e157923ca71a4bc2ebf6e5d9156e177f',160,60,b'',-1,-1,27674
3,4,1911,488,10900,b'HOG:0038813',b'',0,0,1,0,b'ENSAPLG00000014983',5727,1462,b'55380f6839def91e558fe8b1f787777d',220,70,b'',-1,-1,38813
4,5,2399,206,3761,b'HOG:0010076.7b',b'',0,0,1,0,b'ENSAPLG00000014999',7189,616,b'27388db9c8b2c8770a95df096817ba42',290,54,b'',-1,-1,10076
5,6,2605,229,3267,b'HOG:0031885',b'',0,0,1,0,b'ENSAPLG00000015003',7805,685,b'e041820d5f2ceae936c28eb870bf0d40',344,53,b'',-1,-1,31885
6,7,2834,327,2374,b'HOG:0031879',b'',0,0,1,0,b'ENSAPLG00000015017',8490,979,b'54a9ed2b15dc6d3002ae49362fb898fa',397,70,b'',-1,-1,31879
7,8,3161,437,1708,b'HOG:0031861',b'',0,0,1,0,b'ENSAPLG00000015635',9469,1309,b'e5826ecb0f6980c475c36b7818ba173d',467,58,b'',-1,-1,31861
8,9,3598,204,0,b'',b'',0,0,1,0,b'ENSAPLG00000015852',10778,610,b'3aacbe8953647a8ae492e38c0b44d45b',525,63,b'',-1,-1,Singleton
9,10,3802,694,944,b'HOG:0038817',b'',0,0,1,0,b'ENSAPLG00000015894',11388,2080,b'15667cc30c4ee30211905ad7f59a8d48',588,63,b'',-1,-1,38817


# Looking at specific root HOG

In [31]:
##Lets take a look at the first root hog from the previous table, when did it originate 
#What taxonomic levels are represented in this family?
print(dbObj.hog_levels_of_fam(31882))

be ready to see PyTables asking for *lots* of memory and possibly slow
I/O.  You may want to reduce the rowsize by trimming the value of
dimensions that are orthogonal (and preferably close) to the *main*
dimension of this leave.  Alternatively, in case you have specified a
very small/large chunksize, you may want to increase/decrease it.
be ready to see PyTables asking for *lots* of memory and possibly slow
I/O.  You may want to reduce the rowsize by trimming the value of
dimensions that are orthogonal (and preferably close) to the *main*
dimension of this leave.  Alternatively, in case you have specified a
very small/large chunksize, you may want to increase/decrease it.


[b'Aves' b'Palaeognathae' b'Apteryx_rowi_308060' b'Apteryx_rowi_308060'
 b'Casuariiformes' b'Casuarius_casuarius_8787'
 b'Dromaius_novaehollandiae_8790' b'Struthio_camelus_internal'
 b'Struthio_camelus_8801' b'Tinamidae' b'Tinamus_guttatus_94827'
 b'Crypturellus' b'Crypturellus_undulatus_48396'
 b'Crypturellus_soui_458187' b'Nothoprocta'
 b'Nothoprocta_perdicaria_30464' b'Neognathae' b'Coliidae'
 b'Colius_striatus_57412' b'Coliidae' b'Urocolius_indicus_458196'
 b'Musophagidae' b'Corythaeola_cristata_103954' b'Musophagidae'
 b'Tauraco_erythrolophus_121530' b'Corythaixoides_concolor_103956'
 b'Procellariiformes' b'Procellariidae' b'Fulmarus_glacialis_30455'
 b'Procellariidae' b'Calonectris_borealis_1323832'
 b'Pelecanoides_urinatrix_37079' b'Thalassarche_chlororhynchos_54017'
 b'Hydrobatidae' b'Fregetta_grallaria_79628' b'Hydrobates_tethys_79633'
 b'Oceanites_oceanicus_79653' b'Procellariiformes' b'Procellariidae'
 b'Calonectris_borealis_1323832' b'Piciformes'
 b'Indicator_maculatus_5452

In [43]:
#Print all the members of that family
dbObj.member_of_fam(31882)

#Print all the subhog ids at a particular level/ ancestral node
hog_at_Coliidae_level = dbObj.get_subhogids_at_level(31882, "Coliidae")
print(hog_at_Coliidae_level) ## You can see a duplication happened at the Coliidae root

h5file2.root.HogLevel.read_where('Fam==31882') #Find all duplications

[b'HOG:0031882.2a' b'HOG:0031882.2b']


array([(31882, b'HOG:0031882', b'Aves',  0.733, -1, 304,  True),
       (31882, b'HOG:0031882', b'Palaeognathae',  0.571, -1,   9, False),
       (31882, b'HOG:0031882.1a', b'Apteryx_rowi_308060',  1.   , -1,   1,  True),
       (31882, b'HOG:0031882.1b', b'Apteryx_rowi_308060',  1.   , -1,   1,  True),
       (31882, b'HOG:0031882', b'Casuariiformes',  1.   , -1,   2, False),
       (31882, b'HOG:0031882', b'Casuarius_casuarius_8787',  1.   , -1,   1, False),
       (31882, b'HOG:0031882', b'Dromaius_novaehollandiae_8790',  1.   , -1,   1, False),
       (31882, b'HOG:0031882', b'Struthio_camelus_internal',  0.5  , -1,   1, False),
       (31882, b'HOG:0031882', b'Struthio_camelus_8801',  1.   , -1,   1, False),
       (31882, b'HOG:0031882', b'Tinamidae',  0.444, -1,   4, False),
       (31882, b'HOG:0031882', b'Tinamus_guttatus_94827',  1.   , -1,   1, False),
       (31882, b'HOG:0031882', b'Crypturellus',  1.   , -1,   2, False),
       (31882, b'HOG:0031882', b'Crypturellus_undul

# Find extra information about a specific entry

In [15]:
entrynr = 12
entry = dbObj.entry_by_entry_nr(entrynr)
entry

(12, 4537, 421, 1794, b'HOG:0031851', b'', 0, 0, 1, 0, b'ENSAPLG00000015975', 13589, 1261, b'6051c18f5009f79d9c95cc5204a09a04', 696, 75, b'', -1, -1)

In [18]:
##If you have locus coordinates for each protein you can obtain them using this
entries = ['27585', '52263']
numbers = [ int(x) for x in entries ]

for entrynr in numbers:
    start = [row['LocusStart'] for row in entriesTab.where('EntryNr== entrynr')][0]
    stop = [row['LocusEnd'] for row in entriesTab.where('EntryNr== entrynr')][0]
    print(start,stop)

0 0
0 0


In [30]:
withinTab = h5file2.root.PairwiseRelation.S0003.VPairs
#What's in this table?
withinTab

/PairwiseRelation/S0003/VPairs (Table(3917336,), fletcher32, shuffle, zlib(6)) ''
  description := {
  "EntryNr1": UInt32Col(shape=(), dflt=0, pos=0),
  "EntryNr2": UInt32Col(shape=(), dflt=0, pos=1),
  "RelType": EnumCol(enum=Enum({'n/a': 6, '1:1': 0, '1:n': 1, 'm:1': 2, 'm:n': 3, 'close paralog': 4, 'homeolog': 5}), dflt='n/a', base=UInt8Atom(shape=(), dflt=0), shape=(), pos=2),
  "Score": Float32Col(shape=(), dflt=-1.0, pos=3),
  "Distance": Float32Col(shape=(), dflt=-1.0, pos=4),
  "AlignmentOverlap": Float16Col(shape=(), dflt=-1.0, pos=5),
  "SyntenyConservationLocal": Float16Col(shape=(), dflt=-1.0, pos=6),
  "Confidence": Float16Col(shape=(), dflt=-1.0, pos=7)}
  byteorder := 'little'
  chunkshape := (5698,)
  autoindex := True
  colindexes := {
    "EntryNr1": Index(9, full, shuffle, zlib(1)).is_csi=True}

In [19]:
## To get verified pair/ pairwise orthologs
#Count the number
dbObj.count_vpairs(entrynr)
#Get the list of them
dbObj.get_vpairs(entrynr)

array([(52263,    2537, 6228.63,  0.7886, '1:1'),
       (52263,   27585, 6260.35,  0.5926, '1:1'),
       (52263,   80334, 6323.39,  0.1932, '1:1'),
       (52263,   98827, 6326.05,  0.1932, '1:1'),
       (52263,  110459, 6329.  ,  0.0494, '1:1'),
       (52263,  122602, 6068.23,  1.7785, '1:1'),
       (52263,  138989, 6366.45,  0.0494, '1:1'),
       (52263,  153208, 6283.01,  0.1452, '1:1'),
       (52263,  168285, 6341.73,  0.0494, '1:1'),
       (52263,  174431, 6207.29,  0.7383, '1:1'),
       (52263,  186515, 6283.01,  0.1452, '1:1'),
       (52263,  199313, 6341.73,  0.0494, '1:1'),
       (52263,  244208, 6228.63,  0.7886, '1:1'),
       (52263,  268721, 6329.  ,  0.0494, '1:1'),
       (52263,  293479, 6209.38,  0.5926, '1:1'),
       (52263,  301294, 5882.96,  2.3667, '1:1'),
       (52263,  316881, 6308.36,  0.1452, '1:1'),
       (52263,  347962, 5944.52,  2.528 , '1:1'),
       (52263,  349318, 4710.04, 11.0233, '1:1'),
       (52263,  392476, 6231.05,  0.7886, '1:n'),


In [21]:
for vp in dbObj.get_vpairs(entrynr):
    print(omaIdObj.genome_of_entry_nr(vp[1])[1].decode("utf-8"))

ANAPL
S0002
S0006
S0007
S0008
S0009
S0010
S0011
S0012
S0013
S0014
S0015
S0017
S0018
S0020
S0021
S0022
S0023
S0024
S0025
S0025
S0025
S0025
S0026
S0026
S0026
S0026
S0026
S0027
S0028
S0029
S0030
S0031
S0032
S0033
S0034
S0034
S0034
S0036
S0038
S0039
S0040
S0041
S0042
S0043
S0044
S0046
CHICK
S0048
S0049
S0049
S0050
S0051
S0052
S0053
S0054
S0054
S0054
S0056
S0057
S0058
S0059
S0060
S0061
S0062
S0063
S0063
S0065
S0066
S0067
S0068
S0069
S0070
S0071
S0073
S0074
S0075
S0076
S0077
S0078
S0079
S0079
S0079
S0079
S0079
S0079
S0080
S0081
S0083
S0084
S0085
S0086
S0087
S0088
S0089
S0090
S0091
S0092
S0093
S0094
S0095
S0096
S0097
S0097
S0098
S0099
S0100
S0100
S0100
S0100
S0101
S0102
S0102
S0102
S0103
S0103
S0103
S0103
S0103
S0103
S0104
S0105
S0106
S0106
S0106
S0107
S0108
S0109
S0110
S0110
S0110
S0111
S0111
S0111
S0112
S0112
S0112
S0112
S0112
S0112
S0112
S0112
S0112
S0112
S0113
S0114
S0115
S0116
S0117
S0118
S0118
S0118
S0118
S0118
S0118
S0118
S0118
S0118
S0118
S0118
S0119
S0120
S0121
S0122
S0123
S0124
S012