circRNAFull: an R package for reconstruction of full length circRNA sequence using chimeric alignment information
- R (>= 3.0.0)
- Biostrings
- seqinr
- stringi
- Rsamtools
To install the package from github first you need to install the package “devtools” using the following command:
install.packages("devtools", dep=T)
The package "circRNAFull" depends on two bioconductor packages "Biostrings" and "Rsamtools" which cannot be installed automatically while installing "circRNAFull" using "devtools". So, you need to install these two packages manually using the following way:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Biostrings")
BiocManager::install("Rsamtools")
Finally, install “circRNAFull” by the following command:
devtools::install_github("tofazzalh/circRNAFull", dep = T)
Start analysis by loading the package with the following command:
library("circRNAFull")
This function makes a text file containing circle ids, transcript name and the spanning redas for each circRNAs.
extract_circle_ids(circ_out, chimeric_out, output)
circ_out
is the output from the circRNA prediction tool CIRCexplorer.
chimeric_out
is the junction file obtained from the chimeric alignment produced by STAR aligner.
output
is the name of the output folder.
The circle ids, transcript name and the spanning reads will be produced in 'output' folder.
#Loading an example output from the circRNA prediction tool CIRCexplorer
data(CIRCexplorer_output)
circ_out<-CIRCexplorer_output
#Loading an example junction file obtained from the. chimeric alignment produced by STAR
data(Chimeric.out.junction)
chimeric_out<-Chimeric.out.junction
#A temporary directory is created as an output folder.
output<-tempdir()
#Extracting circle ids, transcript name and spanning reads of circRNAs.
#The output will be written in 'output' directory
extract_circle_ids(circ_out, chimeric_out, output)
This function generates individual alignment file for each circRNA from the chimeric alignment produced by STAR.
extract_reads_from_bam(circle_id, bamfile, outfolder)
circle_id
is a data frame containg circle ids, transcript name and spanning reads of circRNAs obtained from function
extract_circle_ids()
.
bamfile
is a chimeric alignment bam file produced by STAR read as a data frame.
outfolder
is the name of the output directory.
Individual alignment files for each circRNAs will be produced in outfolder.
#loading an example circle_id file
data(circle_ID)
circle_id<-circle_ID
#Please upload your chimeric alignment bam file.Suppose you have the chimeric
#alignemnt bam file 'Chimeric.out.bam' in you working directory.
#Then run:
bam <- scanBam("Chimeric.out.bam")
bamfile <- as.data.frame(bam)
#Creating an output directory
outfolder<-tempdir()
#Extracting individual alignment file for each circRNAs.
#The individual alignment file will be generated in the 'outfolder' directory.
extract_reads_from_bam(circle_id, bamfile, outfolder)
This function extracts exons for the circRNAs from the CIGAR value of the chimeric alignment obtained from STAR.
Extract_exon(bed_file, folder_name, circle_ids, output_folder)
bed_file
is a data frame containing the exon coordinates obtained from reference genome annotation.
folder_name
is the location of the individual alignments obtained from function extract_reads_from_bam()
.
circle_ids
is a data frame containing the circle id, transcript name and spanning reads obtained from function extract_circle_ids()
.
output_folder
is the output folder name.
A file exon_count.txt containing exons for each circRNA will be produced in output_folder.
#Loading an example bed_file
data(exon_coordinate)
bed_file<-exon_coordinate
#Example of folder_name containing the individual alignments
folder_name<-system.file('extdata', package = 'circRNAFull')
#Loading an example circle_ids file
data(circle_ID)
circle_ids<-circle_ID
#creating a temporary output_folder
output_folder<-tempdir()
#Extracting exons for all circRNAs
Extract_exon(bed_file, folder_name, circle_ids, output_folder)
This function detects skip exon.
skip_exon_detection(circle_reads_folder, exon_count_file, output_folder)
circle_reads_folder
is the location of the individual alignments obtained from function extract_reads_from_bam()
.
exon_count_file
is a data frame containing exons for circRNAs obtained from function Extract_exon()
.
output_folder
is the name of the output directory.
A file detect_skip_exon.txt containing the skip exons will be produced in output_folder.
#Example of circle_reads_folder containing the individual alignments
circle_reads_folder<-system.file('extdata', package = 'circRNAFull')
#Loading an example exon_count_file containing exons for circRNAs
data(exon_count)
exon_count_file<-exon_count
#Creating a temporary output directory
output_folder<-tempdir()
#Detecting skip exon
skip_exon_detection(circle_reads_folder, exon_count_file, output_folder)
This function extracts the remaining exons after deleting skip exons.
extract_exon_after_delete_skip_exon(exon_count_file, skip_exon_file, output_folder)
exon_count_file
is a data frame containing exons for circRNAs obtained from function Extract_exon()
.
skip_exon_file
is a data frame containing skip exons for circRNAs obtained from function skip_exon_detection()
.
output_folder
is the name of the output directory.
A text file extract_exon_after_skipping_exon.txt containing the exons after deleting skip exon will be produced in output_folder.
#Loading an example exon count file
data(exon_count)
exon_count_file<-exon_count
#Loading an example skip exon file
data(skip_exon)
skip_exon_file<-skip_exon
#Creating a temporary output directory
output_folder<-tempdir()
#Extracting exon after deleting skip exon
extract_exon_after_delete_skip_exon(exon_count_file, skip_exon_file, output_folder)
This function reconstructs the full length circRNA sequences.
extract_sequence(seq_name, sequence, exon_count_file, output)
seq_name
is a vector containing the name of chromosomes of the reference genome.
sequence
is a vector containing the sequences of the reference genome.
exon_count_file
is the exon count file after deleting skip exons.
output
is the output folder name.
The reconstructed circRNA sequences circRNAseq.fasta will be produced in folder output.
#Please download the reference genome.
#Suppose you have downloaded the reference genome 'hg38.fa' in you working directory.
#Then run:
ref_genome<-"hg38.fa"
fastaFile <- readDNAStringSet(ref_genome)
seq_name = sub('\\ .*', '', names(fastaFile))
sequence = paste(fastaFile)
#Loading an exon count file after deleting skip exon
data(exon_count_after_skipping_exon)
exon_count_file<-exon_count_after_skipping_exon
#Creating an output directory
output<-tempdir()
#Reconstructing the circRNA sequences.
#The circRNA sequences will be generated in the 'output' directory.
extract_sequence(seq_name, sequence, exon_count, output)