# Cell Cycle genes
Using Gene Ontologies (GO), create an up-to-date list of all human protein-coding genes that are know to be associated with cell cycle.

## 1. Download Ontologies, if necessary

In [1]:
# Get http://geneontology.org/ontology/go-basic.obo
from goatools.base import download_go_basic_obo
obo_fname = download_go_basic_obo()

  EXISTS: go-basic.obo


## 2. Download Associations, if necessary

In [2]:
# Get ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
from goatools.base import download_ncbi_associations
gene2go = download_ncbi_associations()

  EXISTS: gene2go


## 3. Read associations
Normally, when reading associations, GeneID2GOs are returned. We get the reverse, GO2GeneIDs, by adding the key-word arg, "go2geneids=True" to the call to read_ncbi_gene2go.

In [3]:
from goatools.associations import read_ncbi_gene2go

go2geneids_human = read_ncbi_gene2go("gene2go", taxids=[9606], namespace='BP', go2geneids=True)
print("{N} GO terms associated with human NCBI Entrez GeneIDs".format(N=len(go2geneids_human)))

HMS:0:00:05.178009 269,133 annotations READ: gene2go 
1 taxids stored: 9606
11963 IDs in association branch, BP
11963 GO terms associated with human NCBI Entrez GeneIDs


## 4. Initialize Gene-Search Helper

In [4]:
from goatools.go_search import GoSearch

srchhelp = GoSearch("go-basic.obo", go2items=go2geneids_human)

go-basic.obo: fmt(1.2) rel(2019-04-17) 47,398 GO Terms; optional_attrs(comment def relationship synonym xref)


## 5. Find human all genes related to "cell cycle"

###  5a. Prepare "cell cycle" text searches
We will need to search for both *cell cycle* and *cell cycle-independent*. Those GOs that contain the text *cell cycle-independent* are specifically **not** related to *cell cycle* and must be removed from our list of *cell cycle* GO terms.

In [5]:
import re

# Compile search pattern for 'cell cycle'
cell_cycle_all = re.compile(r'cell cycle', flags=re.IGNORECASE)
cell_cycle_not = re.compile(r'cell cycle.independent', flags=re.IGNORECASE)

### 5b. Find NCBI Entrez GeneIDs related to "cell cycle"

In [6]:
# Find ALL GOs and GeneIDs associated with 'cell cycle'.

# Details of search are written to a log file
fout_allgos = "cell_cycle_gos_human.log" 
with open(fout_allgos, "w") as log:
    # Search for 'cell cycle' in GO terms
    gos_cc_all = srchhelp.get_matching_gos(cell_cycle_all, prt=log)
    # Find any GOs matching 'cell cycle-independent' (e.g., "lysosome")
    gos_no_cc = srchhelp.get_matching_gos(cell_cycle_not, gos=gos_cc_all, prt=log)
    # Remove GO terms that are not "cell cycle" GOs
    gos = gos_cc_all.difference(gos_no_cc)
    # Add children GOs of cell cycle GOs
    gos_all = srchhelp.add_children_gos(gos)
    # Get Entrez GeneIDs for cell cycle GOs
    geneids = srchhelp.get_items(gos_all)
print("{N} human NCBI Entrez GeneIDs related to 'cell cycle' found.".format(N=len(geneids)))


1916 human NCBI Entrez GeneIDs related to 'cell cycle' found.


## 6. Print the "cell cycle" protein-coding gene Symbols

In [7]:
from goatools.test_data.genes_NCBI_9606_ProteinCoding import GENEID2NT
for geneid in geneids: # geneids associated with cell-cycle
    nt = GENEID2NT.get(geneid, None)
    if nt is not None:
        print("{Symbol:<10} {desc}".format(
                Symbol = nt.Symbol, 
                desc = nt.description))

TTYH1      tweety family member 1
CABLES2    Cdk5 and Abl enzyme substrate 2
SEH1L      SEH1 like nucleoporin
KIF18A     kinesin family member 18A
CHAF1B     chromatin assembly factor 1 subunit B
SDE2       SDE2 telomere maintenance homolog (S. pombe)
ABL1       ABL proto-oncogene 1, non-receptor tyrosine kinase
CLTCL1     clathrin heavy chain like 1
AICDA      activation-induced cytidine deaminase
USP9X      ubiquitin specific peptidase 9, X-linked
SMC1A      structural maintenance of chromosomes 1A
HIPK1      homeodomain interacting protein kinase 1
ACTA1      actin, alpha 1, skeletal muscle
ACTB       actin, beta
SPC25      SPC25, NDC80 kinetochore complex component
NAA10      N(alpha)-acetyltransferase 10, NatA catalytic subunit
ACVR1      activin A receptor type I
ACVR1B     activin A receptor type IB
BIRC6      baculoviral IAP repeat containing 6
ADARB1     adenosine deaminase, RNA-specific, B1
LIN9       lin-9 DREAM MuvB core complex component
ADCYAP1    adenylate cyclase activa