Skip to content
An R Package for Full Length Circular RNA Sequence Extraction and Classification Using the Output of circRNA Prediction Tools (such as CIRI, find_circ, CIRCExplorer etc.)
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
FcircSEC
R
data
man
.Rbuildignore
.gitattributes
.gitignore
DESCRIPTION
FcircSEC.Rproj
Manual.pdf
NAMESPACE
README.md

README.md

FcircSEC is an R package for full length circRNA sequence extraction and classification

Requirements

  • R (>= 3.6.0)
  • devtools
  • Biostrings
  • seqRFLP
  • stringi

Installation

To install the package from github first you need to install the package “devtools” using the following command:

install.packages("devtools", dep=T)

The package "FcircSEC" depends on a bioconductor package "Biostrings" which cannot be installed automatically while installing "FicrcSEC" using "devtools". So, you need to install "Biostrings" manually using the following way:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("Biostrings")

Finally, install “FcircSEC” by the following command:

devtools::install_github("tofazzal4720/FcircSEC", dep = T)

Start analysis by typing the following command:

library("FcircSEC")

Extracting transcript information from the annotation file

Transcript data can be obtained using the following function:

transcriptExtract(annotationFile, databaseName, outputfile)

Here,

annotationFile is the annotation file (in gtf, gff or gff3 fromat) corresponding to the reference genome. Please use gff or gff3 format for "ncbi" and gtf format for "ucsc" and "other".

databaseName is the database name from where the annotation file was downloaded (the possible options are "ncbi", "ucsc" and "other").

outputfile is the name of the output file.

Examples

#Loading an example annotation file and write to a file
annotation_file<-data(refGenchr1)
annotation_file<-refGenchr1
write.table(annotation_file, file="annotation_file.gtf", row.names=FALSE, sep="\t",quote=FALSE, col.names=FALSE)

#Extraction of transcript information. Here, the output will be generated in file transcriptdata.txt  
transcriptExtract("annotation_file.gtf", "ucsc", "transcriptdata.txt")

Classifying circRNAs

circular RNAs can be classified using the following function:

circClassification(transcriptdata, bedfile, outfiletxt, outfilebed)

Here,

transcriptdata is the transcript data extracted from the annotation file (obtained from function transcriptExtract).

bedfile is the bed file (obtained from the circRNA prediction tools) having four columns chromosome, start position, end position and strand of circRNAs.

outfiletxt is the output file with the detailed information of circRNA classification.

outfilebed is the output file with chromosome, start and end position of each circRNAs.

Examples

#Loading and example transcript data and write to a file
t_data<-data("transcript_data")
t_data<-transcript_data
write.table(t_data, file="transcript_data.txt", row.names=FALSE)

#Loading an example bedfile obtained form the circRNA prediction tool and write to a file
b_file<-data("output_CIRI")
b_file<-output_CIRI
write.table(b_file, file="output_CIRI.bed", col.names=FALSE, row.names=FALSE)

#Classification of circRNAs. Here, the output will be written in two files circRNA_class.txt and circRNA_class.bed
circClassification ("transcript_data.txt", "output_CIRI.bed","circRNA_class.txt", "circRNA_class.bed")

Generating sequences from the reference genome with specific intervals

Genomic sequences of the circRNAs is ontained from the reference genome for given circRNA boundary(start and end) using the following function:

get.fasta(ref_genome, circ_class_bed, out_filename)

Here,

ref_genome is the reference genome.

circ_class_bed is the bed file having chromosome, start and end position of each circRNAs (obtained from function circClassification)

out_filename is the name of the output file.

Examples

#Loading an example reference genome and write to a file
ref_genom<-data("chr1")
ref_genom<-chr1
df.fasta=dataframe2fas(ref_genom, file="ref_genome.fasta")

#Loading an example circRNA classification bed file and write to a file
circ_class_bed<-data("circRNA_classb")
circ_class_bed<-circRNA_classb
write.table(circ_class_bed, file="circ_class.bed", col.names=FALSE, row.names=FALSE)

#Getting genomic sequences of circRNAs. The output will be generated in file circRNA_genomic_seq.fasta
get.fasta("ref_genome.fasta", "circ_class.bed", "circRNA_genomic_seq.fasta")

Generating full length circRNA sequences

The full length circRNA sequences are obtained using the following function:

circSeqExt(genomic_seq, circ_class_txt, out_filename)

Here,

genomic_seq is the fasta file (obtained using function get.fasta) having the genomic sequences for circRNAs.

circ_class_txt is the circRNA classification file (obtained from function circClassification).

out_filename is the name of the output file.

Examples

#Loading an example circRNA genomic sequence and write to a file
circ_genomic_seq<-data("circRNA_genomic_sequence")
circ_genomic_seq<-circRNA_genomic_sequence
df.fasta=dataframe2fas(circ_genomic_seq, file="circ_genomic_seq.fasta")

#Loading an example circ_class_txt data and write to a file
circ_class_txt<-data("circRNA_classt")
circ_class_txt<-circRNA_classt
write.table(circ_class_txt, file="circ_class.txt", row.names=FALSE)

#Extracting full length circRNA sequences. Here, the output will be written in file circRNA_sequence.fasta
circSeqExt("circ_genomic_seq.fasta", "circ_class.txt", "circRNA_sequence.fasta")
You can’t perform that action at this time.