# Genomes

In [16]:
import biu
where = '/exports/molepi/tgehrmann/data/'
biu.config.settings.setWhere(where)

## Ensembl Genomes

You can access any genome on Ensembl with the EnsemblGenome class.
You can specify a release number and an organism name, and it will retrieve the GFF annotations, genome, coding sequences and amino acid Fasta files.

In [18]:
# Default is the grch38 human genome, release 92
genome = biu.genomes.Ensembl()

{'gff': <biu.utils.acquireUtils.Acquire object at 0x7f26b6429098>, 'genome': <biu.utils.acquireUtils.Acquire object at 0x7f26b249f908>, 'cds': <biu.utils.acquireUtils.Acquire object at 0x7f26b249fa48>, 'aa': <biu.utils.acquireUtils.Acquire object at 0x7f26b249fb88>}


In [4]:
print(genome)

Ensembl object
 Genome : ensembl_92.homo_sapiens
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [X] gff : /home/tgehrmann/repos/BIU/docs/ensembl_92.homo_sapiens/genes.gff3
  * [X] genome : /home/tgehrmann/repos/BIU/docs/ensembl_92.homo_sapiens/dna.fasta
  * [ ] cds : /home/tgehrmann/repos/BIU/docs/ensembl_92.homo_sapiens/cds.fa
  * [ ] aa : /home/tgehrmann/repos/BIU/docs/ensembl_92.homo_sapiens/aa.fa



### Other genomes

In [21]:
# Load the mouse genome
genome = biu.genomes.Ensembl(organism='mus_musculus')
print(genome)

{'gff': <biu.utils.acquireUtils.Acquire object at 0x7f26b641d9f8>, 'genome': <biu.utils.acquireUtils.Acquire object at 0x7f26b249f9f8>, 'cds': <biu.utils.acquireUtils.Acquire object at 0x7f26b249f868>, 'aa': <biu.utils.acquireUtils.Acquire object at 0x7f26b249f778>}
Ensembl object
 Genome : ensembl_92.mus_musculus
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [ ] gff : /exports/molepi/tgehrmann/data/ensembl_92.mus_musculus/genes.gff3
  * [ ] genome : /exports/molepi/tgehrmann/data/ensembl_92.mus_musculus/dna.fasta
  * [ ] cds : /exports/molepi/tgehrmann/data/ensembl_92.mus_musculus/cds.fa
  * [ ] aa : /exports/molepi/tgehrmann/data/ensembl_92.mus_musculus/aa.fa



In [22]:
# Load the mouse genome, release 91
genome = biu.genomes.Ensembl(release=91, organism='mus_musculus')
print(genome)

{'gff': <biu.utils.acquireUtils.Acquire object at 0x7f26b6412098>, 'genome': <biu.utils.acquireUtils.Acquire object at 0x7f26b641d278>, 'cds': <biu.utils.acquireUtils.Acquire object at 0x7f26b249f728>, 'aa': <biu.utils.acquireUtils.Acquire object at 0x7f26b249fbd8>}
Ensembl object
 Genome : ensembl_91.mus_musculus
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [ ] gff : /exports/molepi/tgehrmann/data/ensembl_91.mus_musculus/genes.gff3
  * [ ] genome : /exports/molepi/tgehrmann/data/ensembl_91.mus_musculus/dna.fasta
  * [ ] cds : /exports/molepi/tgehrmann/data/ensembl_91.mus_musculus/cds.fa
  * [ ] aa : /exports/molepi/tgehrmann/data/ensembl_91.mus_musculus/aa.fa



### GRCH37 Ensembl Genome
Ensembl maintains seperately the GRCH37 build of the human genome. This can be accessed with a special class.

In [7]:
hg37 = biu.genomes.GRCH37Ensembl()

{'gff': <biu.utils.acquireUtils.Acquire object at 0x7f26b6423138>, 'genome': <biu.utils.acquireUtils.Acquire object at 0x7f26b6429278>, 'cds': <biu.utils.acquireUtils.Acquire object at 0x7f26b6429408>, 'aa': <biu.utils.acquireUtils.Acquire object at 0x7f26b6429548>}


In [8]:
print(hg37)

GRCH37Ensembl object
 Genome : ensembl_grch37.92.homo_sapiens
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [ ] gff : /home/tgehrmann/repos/BIU/docs/ensembl_grch37.92.homo_sapiens/genes.gff3
  * [ ] genome : /home/tgehrmann/repos/BIU/docs/ensembl_grch37.92.homo_sapiens/dna.fasta
  * [ ] cds : /home/tgehrmann/repos/BIU/docs/ensembl_grch37.92.homo_sapiens/cds.fa
  * [ ] aa : /home/tgehrmann/repos/BIU/docs/ensembl_grch37.92.homo_sapiens/aa.fa



In [9]:
print(hg37.aa)

/home/tgehrmann/repos/BIU/docs/_downloads/7a694787ab49b75d81ed89d378150f56718043d3


D: curl -L  'ftp://ftp.ensembl.org//pub/grch37/release-92/fasta/homo_sapiens/pep/Homo_sapiens.GRCh37.pep.all.fa.gz' > '/home/tgehrmann/repos/BIU/docs/_downloads/7a694787ab49b75d81ed89d378150f56718043d3'


0


D: gunzip < '/home/tgehrmann/repos/BIU/docs/_downloads/7a694787ab49b75d81ed89d378150f56718043d3' > '/home/tgehrmann/repos/BIU/docs/_downloads/7a694787ab49b75d81ed89d378150f56718043d3.gunzipped'


0


D: cp '/home/tgehrmann/repos/BIU/docs/_downloads/7a694787ab49b75d81ed89d378150f56718043d3.gunzipped' '/home/tgehrmann/repos/BIU/docs/ensembl_grch37.92.homo_sapiens/aa.fa'


0


D: Fasta input source is file


Fasta object
 Where: /home/tgehrmann/repos/BIU/docs/ensembl_grch37.92.homo_sapiens/aa.fa
 Entries: 104763
 Primary type: prot



In [15]:
hg37.aa['ENSP00000456546.1']

<biu.formats.seqUtils.Sequence at 0x7f26b63fa2d0>