### 1. Creating a support function to perform a web request
We start by importing the requests library and specifying the root URL. Then, we create a simple function that will take the functionality to be called (see the following examples) and generate a complete URL. It will also add optional parameters and specify the payload to be of the JSON type (just to get a default JSON answer). It will return the response in JSON format. This is typically a nested Python data structure of lists and dictionaries.

In [1]:
import requests
 
ensembl_server = 'http://rest.ensembl.org'

def do_request(server, service, *args, **kwargs):
    url_params = ''
    for a in args:
        if a is not None:
            url_params += '/' + a
    req = requests.get('%s/%s%s' % (server, service, url_params),
                       params=kwargs,
                       headers={'Content-Type': 'application/json'})
 
    if not req.ok:
        req.raise_for_status()
    return req.json()

### 2. Check all the available species on the server 

In [2]:
answer = do_request(ensembl_server, 'info/species')
for i, sp in enumerate(answer['species']):
    print(i, sp['name'])

0 mus_musculus_c3hhej
1 nomascus_leucogenys
2 sinocyclocheilus_grahami
3 serinus_canaria
4 amphiprion_percula
5 cyprinodon_variegatus
6 felis_catus
7 canis_lupus_familiaris
8 bubo_bubo
9 vombatus_ursinus
10 salmo_trutta
11 otolemur_garnettii
12 sinocyclocheilus_rhinocerous
13 mus_musculus_pwkphj
14 scleropages_formosus
15 sus_scrofa_hampshire
16 leptobrachium_leishanense
17 athene_cunicularia
18 octodon_degus
19 naja_naja
20 pelodiscus_sinensis
21 melopsittacus_undulatus
22 labrus_bergylta
23 oreochromis_niloticus
24 ovis_aries
25 colobus_angolensis_palliatus
26 caenorhabditis_elegans
27 gouania_willdenowi
28 mastacembelus_armatus
29 mus_musculus_129s1svimj
30 numida_meleagris
31 vulpes_vulpes
32 sus_scrofa_bamei
33 pongo_abelii
34 mus_musculus_akrj
35 malurus_cyaneus_samueli
36 sander_lucioperca
37 choloepus_hoffmanni
38 mus_musculus_fvbnj
39 poecilia_reticulata
40 mus_musculus_c57bl6nj
41 latimeria_chalumnae
42 takifugu_rubripes
43 petromyzon_marinus
44 monodelphis_domestica
45 gaste

### 3. Find any HGNC databases on the server related to human data

In [3]:
ext_dbs = do_request(ensembl_server, 'info/external_dbs', 'homo_sapiens', filter='HGNC%')
print(ext_dbs)

[{'display_name': 'HGNC Symbol', 'name': 'HGNC', 'description': None, 'release': '1'}, {'release': '1', 'display_name': 'Transcript name', 'name': 'HGNC_trans_name', 'description': 'transcript name from HGNC'}]


### 4. Retrieve the Ensembl ID for the gene

In [4]:
answer = do_request(ensembl_server, 'lookup/symbol', 'homo_sapiens', 'LCT')
print(answer)
lct_id = answer['id']

{'end': 135837184, 'id': 'ENSG00000115850', 'seq_region_name': '2', 'biotype': 'protein_coding', 'start': 135787850, 'display_name': 'LCT', 'version': 10, 'db_type': 'core', 'canonical_transcript': 'ENST00000264162.7', 'assembly_name': 'GRCh38', 'object_type': 'Gene', 'description': 'lactase [Source:HGNC Symbol;Acc:HGNC:6530]', 'source': 'ensembl_havana', 'species': 'homo_sapiens', 'logic_name': 'ensembl_havana_gene_homo_sapiens', 'strand': -1}


### 5. Get the sequence of the area containing the gene.

In [5]:
lct_seq = do_request(ensembl_server, 'sequence/id', lct_id)
print(lct_seq)

{'molecule': 'dna', 'version': 10, 'desc': 'chromosome:GRCh38:2:135787850:135837184:-1', 'seq': 'AACAGTTCCTAGAAAATGGAGCTGTCTTGGCATGTAGTCTTTATTGCCCTGCTAAGTTTTTCATGCTGGGGGTCAGACTGGGAGTCTGATAGAAATTTCATTTCCACCGCTGGTCCTCTAACCAATGACTTGCTGCACAACCTGAGTGGTCTCCTGGGAGACCAGAGTTCTAACTTTGTAGCAGGGGACAAAGACATGTATGTTTGTCACCAGCCACTGCCCACTTTCCTGCCAGAATACTTCAGCAGTCTCCATGCCAGTCAGATCACCCATTATAAGGTATTTCTGTCATGGGCACAGCTCCTCCCAGCAGGAAGCACCCAGAATCCAGACGAGAAAACAGTGCAGTGCTACCGGCGACTCCTCAAGGCCCTCAAGACTGCACGGCTTCAGCCCATGGTCATCCTGCACCACCAGACCCTCCCTGCCAGCACCCTCCGGAGAACCGAAGCCTTTGCTGACCTCTTCGCCGACTATGCCACATTCGCCTTCCACTCCTTCGGGGACCTAGTTGGGATCTGGTTCACCTTCAGTGACTTGGAGGAAGTGATCAAGGAGCTTCCCCACCAGGAATCAAGAGCGTCACAACTCCAGACCCTCAGTGATGCCCACAGAAAAGCCTATGAGATTTACCACGAAAGCTATGCTTTTCAGGGTGAGTACACATTGACCTGATGGTGACCCCTCGGCAACCTTCATCACACACCTTCCCCATCCTCCTTAGAGCAGATTCGACATTTCTCCCAACTCACCTTCAGCAGTCCTCTTATGTCTGTGCATAGGGAGAAATTAATATTGTAAATTGATTTCCCACTGGCGATAGGAAGGGGTAGCTAACATGGCAAAACACTCAGCATTTCCTTTGAAAAATATCTTTGAGGCTCACGCCTGTAATCCTAGCAC

### 6. Inspect other databases known to Ensembl; refer to the following gene

In [6]:
lct_xrefs = do_request(ensembl_server, 'xrefs/id', lct_id)
for xref in lct_xrefs:
    print(xref['db_display_name'])
    print(xref)

LRG display in Ensembl gene
{'description': 'Locus Reference Genomic record for LCT', 'info_type': 'DIRECT', 'db_display_name': 'LRG display in Ensembl gene', 'dbname': 'ENS_LRG_gene', 'version': '0', 'display_id': 'LRG_338', 'primary_id': 'LRG_338', 'synonyms': [], 'info_text': ''}
Expression Atlas
{'info_type': 'DIRECT', 'description': None, 'dbname': 'ArrayExpress', 'version': '0', 'db_display_name': 'Expression Atlas', 'primary_id': 'ENSG00000115850', 'display_id': 'ENSG00000115850', 'synonyms': [], 'info_text': ''}
NCBI gene (formerly Entrezgene)
{'db_display_name': 'NCBI gene (formerly Entrezgene)', 'version': '0', 'dbname': 'EntrezGene', 'info_type': 'DEPENDENT', 'description': 'lactase', 'synonyms': [], 'info_text': '', 'display_id': 'LCT', 'primary_id': '3938'}
GeneCards
{'primary_id': '6530', 'display_id': 'LCT', 'synonyms': [], 'info_text': '', 'description': 'lactase', 'info_type': 'DEPENDENT', 'version': '0', 'dbname': 'GeneCards', 'db_display_name': 'GeneCards'}
HGNC Symb

In [7]:
refs = do_request(ensembl_server, 'xrefs/id', lct_id, external_db='GO', all_levels='1')
print(lct_id, refs)

ENSG00000115850 [{'display_id': 'GO:0000016', 'version': '0', 'info_text': 'GO_Central', 'dbname': 'GO', 'linkage_types': ['IBA'], 'info_type': 'DIRECT', 'description': 'lactase activity', 'primary_id': 'GO:0000016', 'synonyms': [], 'db_display_name': 'GO'}, {'db_display_name': 'GO', 'synonyms': [], 'primary_id': 'GO:0000016', 'description': 'lactase activity', 'info_type': 'DIRECT', 'linkage_types': ['IEA'], 'dbname': 'GO', 'info_text': 'RHEA', 'version': '0', 'display_id': 'GO:0000016'}, {'display_id': 'GO:0000016', 'dbname': 'GO', 'info_type': 'DIRECT', 'linkage_types': ['IMP', 'IDA', 'IEA'], 'version': '0', 'info_text': 'UniProt', 'description': 'lactase activity', 'primary_id': 'GO:0000016', 'db_display_name': 'GO', 'synonyms': []}, {'primary_id': 'GO:0003824', 'description': 'catalytic activity', 'db_display_name': 'GO', 'synonyms': [], 'display_id': 'GO:0003824', 'linkage_types': ['IEA'], 'info_type': 'DIRECT', 'dbname': 'GO', 'info_text': 'UniProt', 'version': '0'}, {'display_i

### 7. Get the orthologues for this gene on the horse genome

In [8]:
hom_response = do_request(ensembl_server, 'homology/id', lct_id, type='orthologues', sequence='none')
#print(hom_response['data'][0]['homologies'])
homologies = hom_response['data'][0]['homologies']
for homology in homologies:
    print(homology['target']['species'])
    if homology['target']['species'] != 'equus_caballus':
        continue
    print(homology)
    print(homology['taxonomy_level'])
    horse_id = homology['target']['id']

pan_troglodytes
nomascus_leucogenys
pan_paniscus
gorilla_gorilla
pongo_abelii
ornithorhynchus_anatinus
sarcophilus_harrisii
notamacropus_eugenii
choloepus_hoffmanni
erinaceus_europaeus
dasypus_novemcinctus
echinops_telfairi
callithrix_jacchus
cercocebus_atys
macaca_fascicularis
macaca_mulatta
macaca_nemestrina
papio_anubis
mandrillus_leucophaeus
ursus_americanus
ailuropoda_melanoleuca
tursiops_truncatus
ursus_maritimus
octodon_degus
procavia_capensis
mesocricetus_auratus
bos_taurus
equus_asinus
ochotona_princeps
heterocephalus_glaber_female
vulpes_vulpes
canis_lupus_familiaris
vicugna_pacos
dipodomys_ordii
delphinapterus_leucas
capra_hircus
tupaia_belangeri
marmota_marmota_marmota
rattus_norvegicus
microtus_ochrogaster
microtus_ochrogaster
microtus_ochrogaster
monodelphis_domestica
monodelphis_domestica
equus_caballus
{'method_link_type': 'ENSEMBL_ORTHOLOGUES', 'dn_ds': None, 'taxonomy_level': 'Boreoeutheria', 'type': 'ortholog_one2one', 'source': {'species': 'homo_sapiens', 'perc_id':

### 8. Look for the horse_id Ensembl record

In [9]:
horse_req = do_request(ensembl_server, 'lookup/id', horse_id)
print(horse_req)

{'description': 'lactase [Source:VGNC Symbol;Acc:VGNC:19613]', 'db_type': 'core', 'source': 'ensembl', 'canonical_transcript': 'ENSECAT00000020097.4', 'version': 4, 'object_type': 'Gene', 'logic_name': 'ensembl', 'start': 19677638, 'display_name': 'LCT', 'assembly_name': 'EquCab3.0', 'species': 'equus_caballus', 'id': 'ENSECAG00000018594', 'biotype': 'protein_coding', 'end': 19729486, 'seq_region_name': '18', 'strand': -1}


In [10]:
#maybe synteny of MCM6 and LCT with caballus and gorilla