STM-ac4C is a deep learning model that can predict the ac4C sites in human mRNA sequences. ac4C is a modified nucleoside that plays an important role in regulating gene expression and protein synthesis. The accurate prediction of ac4C sites can help to understand the biological functions and mechanisms of ac4C modification.
STM-ac4C is a hybrid neural network that combines selective kernel convolution, temporal convolution network, and multi-head self-attention mechanism, which can effectively capture the different patterns, structures, and correlations in RNA sequences, thereby improving the prediction accuracy and robustness. Selective kernel convolution(SKC) adaptively captures k-mer features of different lengths, avoiding the limitation of fixed kernel size. The temporal convolution network(TCN) has a more vigoroso long-term dependency modeling ability than LSTM and can better handle the temporality and dynamics of RNA sequences. On the other hand, the multi-head self-attention(MHSA) mechanism can enhance the representation ability of the model and capture the global and local correlations in RNA sequences. Finally, MLP will serve as the classifier's last layer and transform the previous layer's output into the prediction result through nonlinear activation functions and fully connected layers.
The project contains the following folders and files:
- data: This folder contains the ac4C dataset used (training dataset and independent test dataset) from iRNA-ac4C
- models: This folder contains the py file of the STM-ac4C model.
- save_ckpt: This folder contains the checkpoint files of the trained model.
- utils: This folder contains some functions and methods that are used for data processing, training, and prediction.
- code.ipynb: This file is a Jupyter notebook that shows how to use the STM-ac4C model for prediction. It provides some examples and explanations of the model functions.
To use this project, you need to install the following libraries:
- torch 2.0.0+cu118
- pandas 2.1.1
- pytorch-lightning 2.0.8
- pytorch-metric-learning 1.7.3
- pandas 2.1.1
- multimethod 1.9.1
You can install them by running the following command:
pip install -r requirements.txt
To use the STM-ac4C model for prediction, you need to follow these steps:
- Import the classifier and the utils from the project folder.
from models.SK_TCN_MHSA import classifier
from utils.one_trial import LitModel
from utils.prepare_data import get_ac4C
- Get the ac4C data as trainset and testset.
trainset, testset = get_ac4C()
- Get the hyperparameters and the model parameters of the classifier.
hparams = classifier.get_hparams()
model_params = classifier.get_model_params()
- Create a LitModel instance with the classifier, the hyperparameters, and the model parameters.
model = LitModel(classifier, hparams , model_params)
- Use the
test
function to perform prediction and evaluation on the testset. You need to specify the checkpoint path of the trained model.
model.test(testset, ckpt_path="save_ckpt/mcc0.6958.ckpt")
- Use the
predict_proba
function to get the probability of each sample containing an ac4C site. You need to specify the checkpoint path of the trained model.
proba = model.predict_proba(testset, ckpt_path="save_ckpt/mcc0.6958.ckpt")
- Use the
fit
function to train the model on the trainset.
model.fit(trainset)