# Accessing the BBMRI dataset

In [1]:
import biu

## Initializing and examining the structure

In [2]:
bbmri = biu.db.BBMRI()
print(bbmri)

BBMRI object
 Where: /exports/molepi/BBMRISEQ
 Version: current
 Objects:
  * [ ] vcf[1]
  * [ ] vcf[2]
  * [ ] vcf[3]
  * [ ] vcf[4]
  * [ ] vcf[5]
  * [ ] vcf[6]
  * [ ] vcf[7]
  * [ ] vcf[8]
  * [ ] vcf[9]
  * [ ] vcf[10]
  * [ ] vcf[11]
  * [ ] vcf[12]
  * [ ] vcf[13]
  * [ ] vcf[14]
  * [ ] vcf[15]
  * [ ] vcf[16]
  * [ ] vcf[17]
  * [ ] vcf[18]
  * [ ] vcf[19]
  * [ ] vcf[20]
  * [ ] vcf[21]
  * [ ] vcf[22]
  * [ ] vcf[M]
  * [ ] vcf[X]
  * [ ] vcf[Y]
 Files:
  * [X] vcf_1 : /exports/molepi/BBMRISEQ/tbx/merged.bbmri.chr1.vcf.bgz
  * [X] vcf_1_tbi : /exports/molepi/BBMRISEQ/tbx/merged.bbmri.chr1.vcf.bgz.tbi
  * [X] vcf_2 : /exports/molepi/BBMRISEQ/tbx/merged.bbmri.chr2.vcf.bgz
  * [X] vcf_2_tbi : /exports/molepi/BBMRISEQ/tbx/merged.bbmri.chr2.vcf.bgz.tbi
  * [X] vcf_3 : /exports/molepi/BBMRISEQ/tbx/merged.bbmri.chr3.vcf.bgz
  * [X] vcf_3_tbi : /exports/molepi/BBMRISEQ/tbx/merged.bbmri.chr3.vcf.bgz.tbi
  * [X] vcf_4 : /exports/molepi/BBMRISEQ/tbx/merged.bbmri.chr4.vcf.bgz
  * [X] v

## Querying the structure

The data held in BBMRI are tabix indexed VCF files. Thus, we can make use of the `query` and `queryRegions` functionalities of the VCF structures.


In [3]:
for record in bbmri.query(1, 1000000, 1001000, types=['snp']):
    print(record)

Record(CHROM=1, POS=1000760, REF=G, ALT=[A])
Record(CHROM=1, POS=1000894, REF=A, ALT=[T])
Record(CHROM=1, POS=1000902, REF=G, ALT=[A])
Record(CHROM=1, POS=1000910, REF=C, ALT=[T])
Record(CHROM=1, POS=1000930, REF=G, ALT=[T])
Record(CHROM=1, POS=1000940, REF=T, ALT=[A])


D: Initializing the VCFResourceManager object NOW
D: VCF Input source is tabixed file.
D: VCF Input source is list of Records.


In [4]:
#for record in bbmri.queryRegions([ (1, 1000000, 1111000), (2, 1000000, 1001000)], types=['snp']):
#    print(record)

In [5]:
#bbmri.queryRegions([ (1, 1000000, 1001000), (2, 1000000, 1001000)], types=['snp'], extract="summary")

In [6]:
v = bbmri.queryRegions([ (12, 12630140-1, 12630140)], types=['snp'])[0]
biu.formats.VCF.summary([v], altPos=[1])

D: Initializing the VCFResourceManager object NOW
D: VCF Input source is tabixed file.
D: VCF Input source is list of Records.


Unnamed: 0,id,RR,R,RA,A,AA,O
0,12-12630140-T-G,96,0,2,0,0,0


In [7]:
for sample in v.samples:
    print(sample.sample)

GS000018437-ASM
GS000018438-ASM
GS000018439-ASM
GS000018440-ASM
GS000018441-ASM
GS000018442-ASM
GS000018443-ASM
GS000018444-ASM
GS000018445-ASM
GS000018535-ASM
GS000018536-ASM
GS000018537-ASM
GS000018538-ASM
GS000018539-ASM
GS000018540-ASM
GS000018541-ASM
GS000018577-ASM
GS000018578-ASM
GS000018579-ASM
GS000018580-ASM
GS000018581-ASM
GS000018583-ASM
GS000018584-ASM
GS000018585-ASM
GS000018586-ASM
GS000018587-ASM
GS000018588-ASM
GS000018589-ASM
GS000018590-ASM
GS000018591-ASM
GS000018592-ASM
GS000018593-ASM
GS000018594-ASM
GS000018595-ASM
GS000018596-ASM
GS000018597-ASM
GS000018598-ASM
GS000018651-ASM
GS000018685-ASM
GS000018686-ASM
GS000018687-ASM
GS000018688-ASM
GS000018689-ASM
GS000018690-ASM
GS000018691-ASM
GS000018692-ASM
GS000018693-ASM
GS000018694-ASM
GS000018695-ASM
GS000018696-ASM
GS000018697-ASM
GS000018698-ASM
GS000018699-ASM
GS000018700-ASM
GS000018701-ASM
GS000018702-ASM
GS000018703-ASM
GS000018704-ASM
GS000018779-ASM
GS000018780-ASM
GS000018781-ASM
GS000018782-ASM
GS000018

In [8]:
#CHROM=1, POS=1010065, REF=G, ALT=[A, C])
var, altPos = bbmri.getVar(1, 1010065, 'C')
biu.formats.VCF.summary([var], altPos=[altPos])

D: VCF Input source is list of Records.


Unnamed: 0,id,RR,R,RA,A,AA,O
0,1-1010065-G-C,96,0,1,0,0,1
