#go-model README File
This is a library for building gene ontology support vector classifiers from protein domain scores and then using them to predict function of candidate proteins.
#Objective
Identification of enzymes by sequence homology tends to result in a signal to noise problem. Determining which candidates are genuine functional homologs and which are false positives can be difficult.
The go_preprocess script is designed to use HMMER to search a protein domain hmm database (Pfams are best known but others are possible) and save the scores.
The model_test script takes the protein hmm scores and existing gene ontology classifications to train support vector classifiers by grid search through a parameter space.
The go_prediction script takes the SVC generated by model building and predicts gene ontology based on the primary sequence.
#Requirements
-
Python 3.4, 3.5
-
BioPython 1.67
-
scikit-learn 0.18.0
-
HMMER v3.1b2
-
P-fam
-
Gene ontology