# Genomes

In [1]:
import biu
where = '/exports/molepi/tgehrmann/data/'
biu.config.settings.setWhere(where)

## Ensembl Genomes

You can access any genome on Ensembl with the EnsemblGenome class.
You can specify a release number and an organism name, and it will retrieve the GFF annotations, genome, coding sequences and amino acid Fasta files.

### List available organisms

In [2]:
biu.genomes.Ensembl.organisms()

Organisms in Ensembl, release 92:
 * ailuropoda_melanoleuca
 * anas_platyrhynchos
 * anolis_carolinensis
 * aotus_nancymaae
 * astyanax_mexicanus
 * bos_taurus
 * caenorhabditis_elegans
 * callithrix_jacchus
 * canis_familiaris
 * capra_hircus
 * carlito_syrichta
 * cavia_aperea
 * cavia_porcellus
 * cebus_capucinus
 * cercocebus_atys
 * chinchilla_lanigera
 * chlorocebus_sabaeus
 * choloepus_hoffmanni
 * ciona_intestinalis
 * ciona_savignyi
 * colobus_angolensis_palliatus
 * cricetulus_griseus_chok1gshd
 * cricetulus_griseus_crigri
 * danio_rerio
 * dasypus_novemcinctus
 * dipodomys_ordii
 * drosophila_melanogaster
 * echinops_telfairi
 * equus_caballus
 * erinaceus_europaeus
 * felis_catus
 * ficedula_albicollis
 * fukomys_damarensis
 * gadus_morhua
 * gallus_gallus
 * gasterosteus_aculeatus
 * gorilla_gorilla
 * heterocephalus_glaber_female
 * heterocephalus_glaber_male
 * homo_sapiens
 * ictidomys_tridecemlineatus
 * jaculus_jaculus
 * latimeria_chalumnae
 * lepisosteus_oculatus
 *

In [3]:
# Default is the grch38 human genome, release 92
genome = biu.genomes.Ensembl()

In [4]:
print(genome)

Ensembl object
 Genome : ensembl_92.homo_sapiens
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [ ] gff : /exports/molepi/tgehrmann/data/ensembl_92.homo_sapiens/genes.gff3
  * [ ] genome : /exports/molepi/tgehrmann/data/ensembl_92.homo_sapiens/dna.fasta
  * [ ] cds : /exports/molepi/tgehrmann/data/ensembl_92.homo_sapiens/cds.fa
  * [ ] aa : /exports/molepi/tgehrmann/data/ensembl_92.homo_sapiens/aa.fa



### Other genomes

In [5]:
# Load the mouse genome
genome = biu.genomes.Ensembl(organism='mus_musculus')
print(genome)

Ensembl object
 Genome : ensembl_92.mus_musculus
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [ ] gff : /exports/molepi/tgehrmann/data/ensembl_92.mus_musculus/genes.gff3
  * [ ] genome : /exports/molepi/tgehrmann/data/ensembl_92.mus_musculus/dna.fasta
  * [ ] cds : /exports/molepi/tgehrmann/data/ensembl_92.mus_musculus/cds.fa
  * [X] aa : /exports/molepi/tgehrmann/data/ensembl_92.mus_musculus/aa.fa



In [6]:
# Load the mouse genome, release 91
genome = biu.genomes.Ensembl(release=91, organism='mus_musculus')
print(genome)

Ensembl object
 Genome : ensembl_91.mus_musculus
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [ ] gff : /exports/molepi/tgehrmann/data/ensembl_91.mus_musculus/genes.gff3
  * [ ] genome : /exports/molepi/tgehrmann/data/ensembl_91.mus_musculus/dna.fasta
  * [ ] cds : /exports/molepi/tgehrmann/data/ensembl_91.mus_musculus/cds.fa
  * [ ] aa : /exports/molepi/tgehrmann/data/ensembl_91.mus_musculus/aa.fa



### GRCH37 Ensembl Genome
Ensembl maintains seperately the GRCH37 build of the human genome. This can be accessed with a special class.

In [7]:
hg37 = biu.genomes.Ensembl(grch37=True)

In [8]:
print(hg37)

Ensembl object
 Genome : ensembl_grch37.92.homo_sapiens
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [ ] gff : /exports/molepi/tgehrmann/data/ensembl_grch37.92.homo_sapiens/genes.gff3
  * [ ] genome : /exports/molepi/tgehrmann/data/ensembl_grch37.92.homo_sapiens/dna.fasta
  * [ ] cds : /exports/molepi/tgehrmann/data/ensembl_grch37.92.homo_sapiens/cds.fa
  * [X] aa : /exports/molepi/tgehrmann/data/ensembl_grch37.92.homo_sapiens/aa.fa



In [9]:
#print(hg37.aa)

In [10]:
#hg37.aa['ENSP00000456546.1']

## Wormbase Genomes

You can also download the genomes present on wormbase. It doesn't download the CDS though, so be aware of that... Organisms currently defined in wormbase are:

In [11]:
biu.genomes.Wormbase.organisms()

Organisms in Wormbase
/exports/molepi/tgehrmann/data/_downloads/2a4190087a93236b6560fbb1faee17454ea483bd
0
 * Brugia_malayi
 * Caenorhabditis_angaria
 * Caenorhabditis_brenneri
 * Caenorhabditis_briggsae
 * Caenorhabditis_elegans
 * Caenorhabditis_japonica
 * Caenorhabditis_nigoni
 * Caenorhabditis_remanei
 * Caenorhabditis_sinica
 * Caenorhabditis_tropicalis
 * Onchocerca_volvulus
 * Pristionchus_pacificus
 * Panagrellus_redivivus
 * Strongyloides_ratti
 * Trichuris_muris
 * Romanomermis_culicivorax
 * Soboliphyme_baturini
 * Trichinella_britovi
 * Trichinella_murrelli
 * Trichinella_nativa
 * Trichinella_nelsoni
 * Trichinella_papuae
 * Trichinella_patagoniensis
 * Trichinella_pseudospiralis
 * Trichinella_sp._T6
 * Trichinella_sp._T8
 * Trichinella_sp._T9
 * Trichinella_spiralis
 * Trichinella_zimbabwensis
 * Trichuris_suis
 * Trichuris_trichiura
 * Acanthocheilonema_viteae
 * Anisakis_simplex
 * Ascaris_lumbricoides
 * Ascaris_suum
 * Brugia_pahangi
 * Brugia_timori
 * Dirofilaria_

In [12]:
worm = biu.genomes.Wormbase()
print(worm)

/exports/molepi/tgehrmann/data/_downloads/2a4190087a93236b6560fbb1faee17454ea483bd
0
Wormbase object
 Genome : wormbase_Caenorhabditis_elegans
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] aa
 Files:
  * [ ] gff : /exports/molepi/tgehrmann/data/wormbase_Caenorhabditis_elegans/genes.gff3
  * [ ] genome : /exports/molepi/tgehrmann/data/wormbase_Caenorhabditis_elegans/genome.fasta
  * [X] aa : /exports/molepi/tgehrmann/data/wormbase_Caenorhabditis_elegans/aa.fasta



In [13]:
print(worm.aa)

D: Fasta input source is file


Fasta object
 Where: /exports/molepi/tgehrmann/data/wormbase_Caenorhabditis_elegans/aa.fasta
 Entries: 28178
 Primary type: prot



## Flybase
You can also download the genomes present on Flybase

In [14]:
biu.genomes.Flybase.organisms()

Organisms in Flybase, release FB2018_03:
 * dana_r1.05
 * dere_r1.05
 * dgri_r1.05
 * dmel_r6.22
 * dmoj_r1.04
 * dper_r1.3
 * dpse_r3.04
 * dsec_r1.3
 * dsim_r2.02
 * dvir_r1.06
 * dwil_r1.05
 * dyak_r1.05


In [15]:
fly = biu.genomes.Flybase()
print(fly)

Flybase object
 Genome : flybase_FB2018_03.dmel_r6.22
 Objects:
  * [ ] gff
  * [ ] genome
  * [ ] cds
  * [ ] aa
 Files:
  * [ ] gff : /exports/molepi/tgehrmann/data/flybase_FB2018_03.dmel_r6.22/genes.gff3
  * [ ] genome : /exports/molepi/tgehrmann/data/flybase_FB2018_03.dmel_r6.22/dna.fasta
  * [ ] cds : /exports/molepi/tgehrmann/data/flybase_FB2018_03.dmel_r6.22/cds.fa
  * [X] aa : /exports/molepi/tgehrmann/data/flybase_FB2018_03.dmel_r6.22/aa.fa

