Skip to content

Commit

Permalink
additions to readme, added more ggplot2 fun doc
Browse files Browse the repository at this point in the history
  • Loading branch information
vsbuffalo committed Feb 20, 2012
1 parent 7ef8abd commit 6ddaa52
Show file tree
Hide file tree
Showing 8 changed files with 283 additions and 41 deletions.
44 changes: 40 additions & 4 deletions README.md
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -12,11 +12,47 @@ as well, but please open an issue first.


## About ## About


qrqc is a Bioconductor package is a fast and extensible package that qrqc (short for "Quick Read Quality Control") is a fast and extensible
reports basic quality and summary statistics on FASTQ and FASTA files, package that reports basic quality and summary statistics on FASTQ and
including base and quality distribution by position, sequence length FASTA files, including base and quality distribution by position,
distribution, and common sequences. sequence length distribution, and common sequences.


## License ## License


GNU General Public License, version 2. GNU General Public License, version 2.

## FAQ

### Why `ggplot2`?

I've had some feature requests for `qrqc` since its release, mostly
related to customizing the graphics. Since data accessibility and
custom graphics were the reason I created `qrqc`, I initially rewrote
`qrqc` to provide more graphics options through `lattice`. However,
all the graphics parameters I added led to large numbers of arguments
to functions and high complexity. This rewrite uses `ggplot2`, which
is a very excellent way to create graphics as any graphics object can
be further manipulated.

### Why do you use Monte Carlo simulations to generate the smooth curve?

`qrqc` is fast because it bins the quality scores of bases by
positions; there is data summarization done by `readSeqFile`. To
create a smooth curve, the function needs multiple data points (not
binned data), which I simulate via Monte Carlo draws from the quality
distribution by position. This is an approximation, but it leads to a
smooth curve which can create a useful visual tool in assessing
quality drops.

### What do I do about bad quality regions?

Illumina reads often have poor 3'-end qualities. I've noticed that
HiSeq machines also produce poor quality 5'-ends. For increased
mapping rates and better assmeblies, it is generally advisable that
these poor quality regions be trimmed off. Nik Joshi's took `sickle`
tool can do this; you can get it here
<http://github.com/najoshi/sickle>.

3'-end adapter contamination can be difficult to recognize (and thus
remove) due to poor quality and likely incorrect bases. I've developed
a tool called `scythe` that removes
5 changes: 3 additions & 2 deletions qrqc/R/ggplotting-methods.R
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ function(x) {
colnames(gc) <- c('position', 'gc') colnames(gc) <- c('position', 'gc')
return(gc) return(gc)
}) })
gcd
}) })


setMethod("getBase", signature(x="SequenceSummary"), setMethod("getBase", signature(x="SequenceSummary"),
Expand Down Expand Up @@ -106,7 +107,7 @@ function(x, fun) {


setMethod("qualPlot", signature(x="FASTQSummary"), setMethod("qualPlot", signature(x="FASTQSummary"),
# Plot a single FASTQSummary object. # Plot a single FASTQSummary object.
function(x, smooth=TRUE, extreme.color="grey", quantile.color="orange", function(x, smooth=TRUE, extreme.color="grey", quartile.color="orange",
mean.color="blue", median.color=NULL) { mean.color="blue", median.color=NULL) {
qd <- getQual(x) qd <- getQual(x)
p <- ggplot(qd) p <- ggplot(qd)
Expand All @@ -122,7 +123,7 @@ function(x, smooth=TRUE, extreme.color="grey", quantile.color="orange",


setMethod("qualPlot", signature(x="list"), setMethod("qualPlot", signature(x="list"),
# Plot a list of FASTQSummary objects as facets. # Plot a list of FASTQSummary objects as facets.
function(x, smooth=TRUE, extreme.color="grey", quantile.color="orange", function(x, smooth=TRUE, extreme.color="grey", quartile.color="orange",
mean.color="blue", median.color=NULL) { mean.color="blue", median.color=NULL) {
if (!length(names(x))) if (!length(names(x)))
stop("A list pased into qualPlot must have named elements.") stop("A list pased into qualPlot must have named elements.")
Expand Down
17 changes: 12 additions & 5 deletions qrqc/man/getBase-methods.Rd
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -2,15 +2,16 @@
\docType{methods} \docType{methods}
\alias{getBase-methods} \alias{getBase-methods}
\alias{getBase,SequenceSummary-method} \alias{getBase,SequenceSummary-method}
\title{Get a Data Frame of Bae Frequency Data from a \code{SequenceSummary} object} \title{Get a Data Frame of Base Frequency Data from a \code{SequenceSummary} object}
\description{ \description{
An object that inherits from class \code{SequenceSummary} contains An object that inherits from class \code{SequenceSummary} contains
base data by position gathered by \code{readSeqFile}. \code{getBase} base frequency data by position gathered by \code{readSeqFile}. \code{getBase}
is an accessor function that reshapes the base frequency data by position is an accessor function that reshapes the base frequency data by position
into a data frame. into a data frame.


This accessor function is useful if you want to map variables to This accessor function is useful if you want to map variables to
custom \code{ggplot2} aesthetics. custom \code{ggplot2} aesthetics. Base proportions can be accessed
with \code{getBaseProp}.
} }




Expand All @@ -36,7 +37,6 @@
\section{Methods}{ \section{Methods}{
\describe{ \describe{
\item{\code{signature(x = "SequenceSummary")}}{ \item{\code{signature(x = "SequenceSummary")}}{
\code{getBase} is an accessor function that works on any object read \code{getBase} is an accessor function that works on any object read
in with \code{readSeqFile}; that is, objects that inherit from in with \code{readSeqFile}; that is, objects that inherit from
Expand All @@ -50,11 +50,18 @@
s.fastq <- readSeqFile(system.file('extdata', 'test.fastq', s.fastq <- readSeqFile(system.file('extdata', 'test.fastq',
package='qrqc')) package='qrqc'))
# A custom base quality plot # A custom base plot
ggplot(getBase(s.fastq)) + geom_line(aes(x=position, y=frequency, ggplot(getBase(s.fastq)) + geom_line(aes(x=position, y=frequency,
color=base)) + facet_grid(. ~ base) + scale_color_dna() color=base)) + facet_grid(. ~ base) + scale_color_dna()
} }
\seealso{getGC}
\seealso{getSeqlen}
\seealso{getBaseProp}
\seealso{getQual}
\seealso{getMCQual}
\seealso{basePlot} \seealso{basePlot}
\keyword{methods} \keyword{methods}
\keyword{accessor} \keyword{accessor}
63 changes: 56 additions & 7 deletions qrqc/man/getBaseProp-methods.Rd
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -2,16 +2,65 @@
\docType{methods} \docType{methods}
\alias{getBaseProp-methods} \alias{getBaseProp-methods}
\alias{getBaseProp,SequenceSummary-method} \alias{getBaseProp,SequenceSummary-method}
\title{ ~~ Methods for Function \code{getBaseProp} ~~} \title{Get a Data Frame of Base Proportion Data from a \code{SequenceSummary} object}
\description{ \description{
~~ Methods for function \code{getBaseProp} ~~ An object that inherits from class \code{SequenceSummary} contains
base frequency data by position gathered by \code{readSeqFile}. \code{getBaseProp}
is an accessor function that reshapes the base frequency data by position
into a data frame and calculates the proportions of each base per position.

This accessor function is useful if you want to map variables to
custom \code{ggplot2} aesthetics. Base frequency be accessed
with \code{getBase}.
} }
\section{Methods}{
\describe{


\item{\code{signature(x = "SequenceSummary")}}{
%% ~~describe this method here~~ \usage{
getBaseProp(x, drop=TRUE)
}

\arguments{
\item{x}{an S4 object that inherits from \code{SequenceSummary} from
\code{readSeqFile}.}
\item{drop}{a logical value indicating whether to drop bases that
don't have any counts.}
}
\value{
\code{getBaseProp} returns a \code{data.frame} with columns:
\item{position}{the position in the read.}
\item{base}{the base.}
\item{proportion}{the proportion of a base found per position in the read.}
} }
\section{Methods}{
\describe{
\item{\code{signature(x = "SequenceSummary")}}{
\code{getBaseProp} is an accessor function that works on any object read
in with \code{readSeqFile}; that is, objects that inherit from
\code{SequenceSummary}.
}
}} }}
\author{Vince Buffalo <vsbuffalo@ucdavis.edu>}
\examples{
## Load a FASTQ file, with sequence hashing.
s.fastq <- readSeqFile(system.file('extdata', 'test.fastq',
package='qrqc'))
# A custom base plot
ggplot(getBaseProp(s.fastq)) + geom_line(aes(x=position, y=proportion,
color=base)) + facet_grid(. ~ base) + scale_color_dna()
}
\seealso{getGC}
\seealso{getSeqlen}
\seealso{getBase}
\seealso{getQual}
\seealso{getMCQual}
\seealso{basePlot}
\keyword{methods} \keyword{methods}
\keyword{ ~~ other possible keyword(s) ~~ } \keyword{accessor}
63 changes: 56 additions & 7 deletions qrqc/man/getGC-methods.Rd
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -2,16 +2,65 @@
\docType{methods} \docType{methods}
\alias{getGC-methods} \alias{getGC-methods}
\alias{getGC,SequenceSummary-method} \alias{getGC,SequenceSummary-method}
\title{ ~~ Methods for Function \code{getGC} ~~} \title{Get a Data Frame of GC Content from a \code{SequenceSummary} object}
\description{ \description{
~~ Methods for function \code{getGC} ~~ An object that inherits from class \code{SequenceSummary} contains
base frequency data by position gathered by \code{readSeqFile}. \code{getGC}
is an accessor function that reshapes the base frequency data into a
data frame and returns the GC content by position.

This accessor function is useful if you want to map variables to
custom \code{ggplot2} aesthetics. Frequencies or proportions of all
bases (not just GC) can be accessed with \code{getBase} and
\code{getBaseProp} respectively.
} }
\section{Methods}{
\describe{


\item{\code{signature(x = "SequenceSummary")}}{ \usage{
%% ~~describe this method here~~ getGC(x)
}

\arguments{
\item{x}{an S4 object that inherits from \code{SequenceSummary} from
\code{readSeqFile}.}
}


\value{
\code{getGC} returns a \code{data.frame} with columns:

\item{position}{the position in the read.}
\item{gc}{GC content per position in the read.}
} }

\section{Methods}{
\describe{
\item{\code{signature(x = "SequenceSummary")}}{
\code{getGC} is an accessor function that works on any object read
in with \code{readSeqFile}; that is, objects that inherit from
\code{SequenceSummary}.
}
}} }}

\author{Vince Buffalo <vsbuffalo@ucdavis.edu>}
\examples{
## Load a FASTQ file, with sequence hashing.
s.fastq <- readSeqFile(system.file('extdata', 'test.fastq',
package='qrqc'))

# A custom GC plot
d <- merge(getQual(s.fastq), getGC(s.fastq), by.x="position", by.y="position")
p <- ggplot(d) + geom_linerange(aes(x=position, ymin=lower,
ymax=upper, color=gc)) + scale_color_gradient(low="red",
high="blue") + scale_y_continuous("GC content")
p
}

\seealso{getSeqlen}
\seealso{getBase}
\seealso{getBaseProp}
\seealso{getQual}
\seealso{getMCQual}

\seealso{gcPlot}
\keyword{methods} \keyword{methods}
\keyword{ ~~ other possible keyword(s) ~~ } \keyword{accessor}
62 changes: 55 additions & 7 deletions qrqc/man/getMCQual-methods.Rd
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -2,16 +2,64 @@
\docType{methods} \docType{methods}
\alias{getMCQual-methods} \alias{getMCQual-methods}
\alias{getMCQual,FASTQSummary-method} \alias{getMCQual,FASTQSummary-method}
\title{ ~~ Methods for Function \code{getMCQual} ~~} \title{Get a Data Frame of Simulated Qualitied from a \code{FASTQSummary} object}
\description{ \description{
~~ Methods for function \code{getMCQual} ~~ An object that inherits from class \code{FASTQSummary} contains
base quality data by position gathered by \code{readSeqFile}. \code{getMCQual}
generates simulated quality data for each base from this binned
quality data that can be used for adding smoothed lines via lowess.

This accessor function is useful if you want to map variables to
custom \code{ggplot2} aesthetics.
} }
\section{Methods}{
\describe{


\item{\code{signature(x = "FASTQSummary")}}{
%% ~~describe this method here~~ \usage{
getMCQual(x, n=100)
}

\arguments{
\item{x}{an S4 object that inherits from \code{FASTQSummary} from
\code{readSeqFile}.}
\item{n}{a numeric value indicating the number of quality values to
draw per base.}
}


\value{
\code{getMCQual} returns a \code{data.frame} with columns:

\item{position}{the position in the read.}
\item{quality}{simulated quality scores.}
} }

\section{Methods}{
\describe{
\item{\code{signature(x = "FASTQSummary")}}{
\code{getMCQual} is a function that works on any object with class
\code{FASTQSummary} read in with \code{readSeqFile}.
}
}} }}

\author{Vince Buffalo <vsbuffalo@ucdavis.edu>}
\examples{
## Load a FASTQ file, with sequence hashing.
s.fastq <- readSeqFile(system.file('extdata', 'test.fastq',
package='qrqc'))

# A custom quality plot
ggplot(getQual(s.fastq)) + geom_linerange(aes(x=position, ymin=lower,
ymax=upper), color="grey") + geom_smooth(aes(x=position, y=quality),
data=getMCQual(s.fastq), color="blue", se=FALSE)
}

\seealso{getGC}
\seealso{getSeqlen}
\seealso{getBase}
\seealso{getBaseProp}
\seealso{getQual}

\seealso{qualPlot}

\keyword{methods} \keyword{methods}
\keyword{ ~~ other possible keyword(s) ~~ } \keyword{accessor}
9 changes: 7 additions & 2 deletions qrqc/man/getQual-methods.Rd
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -63,7 +63,12 @@
ymax=upper, color=mean)) + scale_color_gradient("mean quality", ymax=upper, color=mean)) + scale_color_gradient("mean quality",
low="red", high="green") + scale_y_continuous("quality") low="red", high="green") + scale_y_continuous("quality")
} }
\seealso{getQual} \seealso{getGC}
\seealso{list2df} \seealso{getSeqlen}
\seealso{getBase}
\seealso{getBaseProp}
\seealso{getMCQual}

\seealso{qualPlot}
\keyword{methods} \keyword{methods}
\keyword{accessor} \keyword{accessor}
Loading

0 comments on commit 6ddaa52

Please sign in to comment.