# Downloading Mappings from BioMart

BioMart maintains a large list of mappings between ensembl IDs. You can access them with the BioMart structure.

In [1]:
import biu

## Predefined Queries

In [14]:
biu.maps.BioMart.versions()

Pre-defined BioMart queries:
 * hsapiens_gene_trans_prot_geneid_hgnc
  - database: hsapiens_gene_ensembl
  - attributes: ensembl_gene_id,ensembl_transcript_id,ensembl_peptide_id,entrezgene,ucsc,hgnc_symbol
 * mmusculus_gene_trans_prot_hgnc
  - database: mmusculus_gene_ensembl
  - attributes: ensembl_gene_id,ensembl_transcript_id,ensembl_peptide_id,hgnc_symbol


In [3]:
bmm = biu.maps.BioMart('mmusculus_gene_trans_prot_hgnc')
print(bmm)

BioMart object
 Objects:
  * [ ] ensembl_gene_id
  * [ ] ensembl_transcript_id
  * [ ] ensembl_peptide_id
  * [ ] hgnc_symbol
 Files:
  * [ ] db : /home/tgehrmann/repos/BIU/docs/bioMart_mmusculus_gene_trans_prot_hgnc/data.tsv



## Accessing mappings in the BioMart response

The results of the BioMart response are provided as a `biu.formats.TSVIndex` structure, which is indexed on a specific column. For each column, you can look up the rows that have that value. For example:

In [4]:
bmm.ensembl_transcript_id['ENSMUST00000082423']

0


D: cp '/home/tgehrmann/repos/BIU/docs/_downloads/df963d55dfb66c984892dc19d33534bc7ed9de97' '/home/tgehrmann/repos/BIU/docs/bioMart_mmusculus_gene_trans_prot_hgnc/data.tsv'


0


[TSVIndexRow(ensembl_gene_id='ENSMUSG00000064372', ensembl_transcript_id='ENSMUST00000082423', ensembl_peptide_id='', hgnc_symbol='')]

In [5]:
bmm.ensembl_gene_id['ENSMUSG00000064372']

[TSVIndexRow(ensembl_gene_id='ENSMUSG00000064372', ensembl_transcript_id='ENSMUST00000082423', ensembl_peptide_id='', hgnc_symbol='')]

### View all mappings as a table

In [13]:
bmm.ensembl_gene_id.table[:20]

Unnamed: 0,ensembl_gene_id,ensembl_transcript_id,ensembl_peptide_id,hgnc_symbol
0,ENSMUSG00000064372,ENSMUST00000082423,,
1,ENSMUSG00000064371,ENSMUST00000082422,,
2,ENSMUSG00000064370,ENSMUST00000082421,ENSMUSP00000081003,
3,ENSMUSG00000064369,ENSMUST00000082420,,
4,ENSMUSG00000064368,ENSMUST00000082419,ENSMUSP00000081002,
5,ENSMUSG00000064367,ENSMUST00000082418,ENSMUSP00000081001,
6,ENSMUSG00000064366,ENSMUST00000082417,,
7,ENSMUSG00000064365,ENSMUST00000082416,,
8,ENSMUSG00000064364,ENSMUST00000082415,,
9,ENSMUSG00000064363,ENSMUST00000082414,ENSMUSP00000081000,


## Get the mappings for GRCH37

Only the human genome is defined for GRCH37, but you can access it as follows:

In [7]:
bm = biu.maps.BioMart(grch37=True)

In [8]:
print(bm)

BioMart object
 Objects:
  * [ ] ensembl_gene_id
  * [ ] ensembl_transcript_id
  * [ ] ensembl_peptide_id
  * [ ] entrezgene
  * [ ] ucsc
  * [ ] hgnc_symbol
 Files:
  * [ ] db : /home/tgehrmann/repos/BIU/docs/bioMart_hsapiens_gene_trans_prot_geneid_hgnc/data.tsv



In [9]:
bm.ensembl_peptide_id.lookup('ENSP00000456546', singleton=True)

0


D: cp '/home/tgehrmann/repos/BIU/docs/_downloads/1ed4b4c0a7c48f851712b95edef1f8077b0ffd14' '/home/tgehrmann/repos/BIU/docs/bioMart_hsapiens_gene_trans_prot_geneid_hgnc/data.tsv'


0


TSVIndexRow(ensembl_gene_id='ENSG00000261657', ensembl_transcript_id='ENST00000566782', ensembl_peptide_id='ENSP00000456546', entrezgene='115286', ucsc='', hgnc_symbol='SLC25A26')

## Define your own queries

In [10]:
bmDM = biu.maps.BioMart(database='dmelanogaster_gene_ensembl', 
                       attributes=['ensembl_gene_id','ensembl_transcript_id', 'ensembl_peptide_id'])

In [11]:
print(bmDM)

BioMart object
 Objects:
  * [ ] ensembl_gene_id
  * [ ] ensembl_transcript_id
  * [ ] ensembl_peptide_id
 Files:
  * [ ] db : /home/tgehrmann/repos/BIU/docs/bioMart_dmelanogaster_gene_ensembl.ensembl_gene_id_ensembl_transcript_id_ensembl_peptide_id/data.tsv

