Skip to content

xuxiny17/oncoAnalysis

Repository files navigation

oncoAnalysis

Compare DNA Sequences and Check if Mutated.

CodeFactor GitHub issues License GitHub language count GitHub commit activity (branch)

Description

The goal of oncoAnalysis package is to develop functions that check if there is mutations between healthy input DNA FASTA file and mutated(suspected) DNA FASTA file. The package contains the main components: DESCRIPTION, NAMESPACE, man subdirectory, R subdirectory(where functions are storing), LICENSE, README, folder vignettes, tests, data and inst. The package is build for BCB410H (Applied Bioinformatics) course work.

The package is currently under construction and the current version contains one FASTA reading function, two analysis functions, two plotting functions and one function to launch the shiny app. The FASTA reading function was able to read in FATSA files, check if the DNA sequence is valid and output sequence. The two analysis functions are able to compare the contents of two DNA FASTA files and obtain the mutation details which current work seem to be lack of. The plotting functions visualize the results obtained from the analysis functions and display them in a clearer way. The function which launches the shiny app is used for opening the shiny app.

The current package does not handle frame shift of base sequence. The function might be implement in the future.

The oncoAnalysis package was developed using R version 4.2.1 (2022-06-23), Platform: x86_64-apple-darwin17.0 (64-bit) and Running under: macOS Ventura 13.0.1.

Installation

You can install the development version of oncoAnalysis from GitHub with:

# install.packages("devtools")
# library("devtools")
require("devtools")
devtools::install_github("xuxiny17/oncoAnalysis", build_vignettes = TRUE)
library("oncoAnalysis")

To run the shinyApp:

oncoAnalysis::runoncoAnalysis()

Overview

ls("package:oncoAnalysis") 
data(package = "oncoAnalysis")
browseVignettes("oncoAnalysis") 

oncoAnalysis currently contains 6 functions.

The fastaReader function takes in fasta files and check if the input DNA sequence is valid (contain characters other than ATCGatcg or not), if valid, it returns the sequence contained, otherwise the function terminates and produce error message.

The mutChecker function (analysis function) takes in two same length DNA sequences and check if the suspected mutated sequence is different from the healthy version, if so, output the mutation details. In details, the output contains the number of total base changes, the number of Base A, T C, G mutated receptively, the positions that the base changes occurred, and a base changes detailed matrix containing information which illustrates which base was mutated to which in mutated sequence compared to the healthy version.

If the length of the two input DNA sequences are different, it would print in the console telling the user whether there is deletion or insertion happened.

The mutTable (analysis function) generates a simple table which illustrates the matrix output by the mutChecker function. The Role names represent the base in original sequence, Column names represent the base in mutated sequence. Used to check for example how many base A has mutated into base T. (In future updates, the table is expected to be made clearer.)

The mutPlot is a plotting function that plots the number of each base mutated comparing to the healthy sequence.

The mutCompPlot is a plotting function that visualizes the base number comparison between healthy and mutated Sequence. Both of the plotting functions could take in optional arguments that specifies the title, name, x labels and y labels, if not given, the functions would use default values.

The runoncoAnalysis is a shiny app launcher function which opens the shiny app for this R package.

The package also contains five DNA sequencing data sets, called samplemutseq.rda, sampleseq.rda, samplefalseseq.rda, sampleInsseq.rda and sampleDelseq.rda. Refer to package vignettes for more details. The current overview of the package is illustrated below.

Contributions

The author of the package is Xu Xinyi. The fastaReader function takes in fasta files and output DNA sequence characters if the sequence is valid. The base change details along with the plotting function to illustrate the base change results use the functions written by the author. The mutChecker function takes in DNA sequence and generate detailed mutation information. The code for mutChecker function uses the loop creating and vector comparing ideas illustrated online (See References section for detailed documentation, no direct codes taken). The code for mutTable function uses the table creating ideas illustrated online to create a two way table.(See References section for detailed documentation, no direct codes taken).

The codes for runoncoAnalysis function along with the codes in app.R R script are adapted and imitated the codes of the app.R and runMPLNClust.R R script files in the MPLNClust R package. The runoncoAnalysis function also used the help of function reference webpage of shiny app: Function Reference along with stackoverflow to resolve issues. (See References section for detailed documentation.)

R package imported and used for each function

The fastaReader function uses the read.fasta() function in seqinr R package to read the FASTA files. The mutPlot and mutCompPlot function make use of the ggplot2 R package, and the plotting details imitated the tutorials online along with the technique of adding optional arguments to the functions. The runoncoAnalysis function used the shiny R package and shinyalert R package to build the shiny app. (See References section for detailed documentation).

References

Holtz, Yan. “Basic Histogram with GGPLOT2.” – The R Graph Gallery. https://r-graph-gallery.com/220-basic-ggplot2-histogram.html

Modify Components of a Theme - Theme.” - Theme • ggplot2 https://ggplot2.tidyverse.org/reference/theme.html

Bar Charts - geom_bar.” - geom_bar • ggplot2 https://ggplot2.tidyverse.org/reference/geom_bar.html

Wickham, Hadley, Winston Chang, and Maintainer Hadley Wickham. Package ‘ggplot2’.” Create elegant data visualisations using the grammar of graphics. Version 2.1 (2016): 1-189.

Chang, Winston. “R Graphics Cookbook, 2nd Edition.” 3.9 Adding Labels to a Bar Graph, 11 Nov. 2022 https://r-graphics.org/recipe-bar-graph-labels

AndrewAndrew 1322 silver badges33 bronze badges, and jtr13jtr13 1. “Ggplot geom_col Legend Not Showing.” Stack Overflow, 1 Mar. 1965. https://stackoverflow.com/questions/48134472/ggplot-geom-col-legend-not-showing

Coghlan, Avril. “Little book of R for Bioinformatics.” Cambridge, UK: Welcome Trust Sanger Institute (2011). https://a-little-book-of-r-for-bioinformatics.readthedocs.io/en/latest/index.html

How to Create Tables in R?” GeeksforGeeks, 27 Dec. 2021. https://www.geeksforgeeks.org/how-to-create-tables-in-r/

Charif, D. and Lobry, J.R. (2007). SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. https://cran.r-project.org/web/packages/seqinr/index.html

R Compare Vectors & Find Differences (5 Examples): Identify Similarities.” Statistics Globe, 18 Mar. 2022. https://statisticsglobe.com/compare-vectors-and-find-differences-in-r

R For Loop (with Examples).” DataMentor, 8 Oct. 2018. https://www.datamentor.io/r-programming/for-loop/

Wickham, H. and Bryan, J. (2019). R Packages (2nd edition). Newton, Massachusetts: O’Reilly Media. https://r-pkgs.org/

SimonGSimonG4, et al. “‘Correct’ Way to Specifiy Optional Arguments in R Functions.” Stack Overflow, 1 Apr. 1962, https://stackoverflow.com/questions/28370249/correct-way-to-specifiy-optional-arguments-in-r-functions

Silva, Anjali. “Anjalisilva/TestingPackage: A Simple R Package Illustrating Components of an R Package: 2019-2022 BCB410H - Applied Bioinformatics, University of Toronto, Canada.” GitHub, https://github.com/anjalisilva/TestingPackage.

mikemike, et al. “Test If Characters Are in a String.” Stack Overflow, 12 Apr. 2012, https://stackoverflow.com/questions/10128617/test-if-characters-are-in-a-string.

user2588829, et al. “Break/Exit Script.” Stack Overflow, 24 July 2013, https://stackoverflow.com/questions/17837289/break-exit-script.

Johnston, Jacob, et al. “R - Shiny: Error in Cat(List(…), File, Sep, Fill, Labels, Append) : Argument 1 (Type ‘List’) Cannot Be Handled by ‘Cat’.” Stack Overflow, 13 Mar. 2015, https://stackoverflow.com/questions/29041449/r-shiny-error-in-catlist-file-sep-fill-labels-append-argument.

Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, Allen J, McPherson J, ipert A, Borges B (2022). shiny: Web Application Framework for R. R package version 1.7.3, https://CRAN.R-project.org/package=shiny.

Sievert, Carson. Interactive web-based data visualization with R, plotly, and shiny. CRC Press, 2020.

Attali D, Edwards T (2021). shinyalert: Easily Create Pretty Popup Messages (Modals) in ‘Shiny’. R package version 3.0.0, https://CRAN.R-project.org/package=shinyalert.

Silva, A. et al. (2019). A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data. BMC Bioinformatics 20. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2916-0

“Function Reference Version 1.0.5.” Shiny, https://shiny.rstudio.com/reference/shiny/1.0.5/

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

The creation of the package followed the contents taught in BCB410 lectures.

Acknowledgements:

This package was developed as part of an assessment for 2022 BCB410H: Applied Bioinformatics course at the University of Toronto, Toronto, CANADA. oncoAnalysis welcomes issues, enhancement requests, and other contributions. To submit an issue, use the GitHub issues.

The package tree structure is provided below.

- oncoAnalysis
  |- oncoAnalysis.Rproj
  |- DESCRIPTION
  |- NAMESPACE
  |- LICENSE
  |- README
  |- data
    |- sampleseq.rda
    |- samplemutseq.rda
    |- samplefalseseq.rda
    |- sampleInsseq.rda
    |- sampleDelseq.rda
  |- inst
    CITATION
    |- extdata
      |- Xinyi_X_A1.png
      |- sampleseq.fasta
      |- samplemutseq.fasta
      |- samplefalseseq.fasta
      |- sampleDel.fasta
      |- sample.fasta
    |- shiny-scripts
      |- app.R
  |- man
    |- mutChecker.Rd
    |- mutCompPlot.Rd
    |- mutPlot.Rd
    |- mutTable.Rd
    |- samplemutseq.Rd
    |- sampleseq.Rd
    |- sampleInsseq.Rd
    |- fastaReader.Rd
    |- sampleDelseq.Rd
    |- samplefalseseq.Rd
    |- runoncoAnalysis.Rd
  |- R
    |- data.R
    |- mutChecker.R
    |- oncoAnalysisPlot.R
    |- dataImporter.R
    |- runoncoAnalysis.R
  |- vignettes
    |- Introduction_oncoAnalysis.Rmd
    |- Xinyi_X_A1.png
  |- tests
    |- testthat.R
    |- testthat
      |- test-mutChecker.R
      |- test-oncoAnalysisPlot.R
      |- test-dataImporter.R

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages