Skip to content

Commit

Permalink
Add global option old.bruvo.model; update docs
Browse files Browse the repository at this point in the history
This global option allows me to give the users a
switch without having to switch it on for every
function that uses Bruvo's distance.
  • Loading branch information
zkamvar committed Aug 13, 2017
1 parent 6e2aaaf commit 486928a
Show file tree
Hide file tree
Showing 5 changed files with 85 additions and 54 deletions.
54 changes: 34 additions & 20 deletions R/bruvo.r
Expand Up @@ -96,7 +96,7 @@
#' distinct alleles at each locus, so you end up with genotypes that appear to
#' have a lower ploidy level than the organism.
#'
#' To help deal with these situations, Bruvo has suggested three methods for
#' To help deal with these situations, Bruvo has suggested three methods for
#' dealing with these differences in ploidy levels: \itemize{ \item
#' \strong{Infinite Model} - The simplest way to deal with it is to count all
#' missing alleles as infinitely large so that the distance between it and
Expand All @@ -107,34 +107,48 @@
#' replace with all possible combinations of the observed alleles in the
#' shorter genotype}. For example, if there is a genotype of [69, 70, 0, 0]
#' where 0 is a missing allele, the possible combinations are: [69, 70, 69,
#' 69], [69, 70, 69, 70], and [69, 70, 70, 70]. The resulting distances are
#' then averaged over the number of comparisons. \item \strong{Genome Loss
#' Model} - This is similar to the genome addition model, except that it
#' assumes that there was a recent genome reduction event and uses \strong{the
#' observed values in the full genotype to fill the missing values in the
#' short genotype}. As with the Genome Addition Model, the resulting distances
#' are averaged over the number of comparisons. \item \strong{Combination
#' Model} - Combine and average the genome addition and loss models. } As
#' mentioned above, the infinite model is biased, but it is not nearly as
#' 69], [69, 70, 69, 70], [69, 70, 70, 69], and [69, 70, 70, 70]. The
#' resulting distances are then averaged over the number of comparisons. \item
#' \strong{Genome Loss Model} - This is similar to the genome addition model,
#' except that it assumes that there was a recent genome reduction event and
#' uses \strong{the observed values in the full genotype to fill the missing
#' values in the short genotype}. As with the Genome Addition Model, the
#' resulting distances are averaged over the number of comparisons. \item
#' \strong{Combination Model} - Combine and average the genome addition and
#' loss models. }
#'
#' As mentioned above, the infinite model is biased, but it is not nearly as
#' computationally intensive as either of the other models. The reason for
#' this is that both of the addition and loss models requires replacement of
#' alleles and recalculation of Bruvo's distance. The number of replacements
#' required is equal to the multiset coefficient: \eqn{\left({n \choose
#' k}\right) == {(n+k-1) \choose k}}{choose(n+k-1, k)} where \emph{n} is the
#' number of potential replacements and \emph{k} is the number of alleles to
#' be replaced. So, for the example given above, The genome addition model
#' would require \eqn{\left({2 \choose 2}\right) = 3}{choose(2+2-1, 2) == 3}
#' calculations of Bruvo's distance, whereas the genome loss model would
#' require \eqn{\left({4 \choose 2}\right) = 10}{choose(4+2-1, 2) == 10}
#' calculations.
#'
#' required is equal to n^k where where \emph{n} is the number of potential
#' replacements and \emph{k} is the number of alleles to be replaced.

#' To reduce the number of calculations and assumptions otherwise, Bruvo's
#' distance will be calculated using the largest observed ploidy in pairwise
#' comparisons. This means that when comparing [69,70,71,0] and [59,60,0,0],
#' they will be treated as triploids.
#' }
#'
#' @note Do not use missingno with this function.
#' @note Do not use missingno with this function.
#' \subsection{Missing alleles and Bruvo's distance in \pkg{poppr} versions < 2.5}{
#' In earlier versions of \pkg{poppr}, the authors had assumed that, because
#' the calculation of Bruvo's distance does not rely on orderd sets of
#' alleles, the imputation methods in the genome addition and genome loss
#' models would also assume unordered alleles for creating the hypothetical
#' genotypes. This means that the results from this imputation did not
#' consider all possible combinations of alleles, resulting in either an over-
#' or under- estimation of Bruvo's distance between two samples with two or
#' more missing alleles. This version of \pkg{poppr} considers all possible
#' combinations when calculating Bruvo's distance for incomplete genotype with
#' a negligable gain in computation time.
#'
#' If you want to see the effect of this change on your data, you can use the
#' global \pkg{poppr} option \code{old.bruvo.model}. Currently, this option is
#' \code{FALSE} and you can set it by using
#' \code{options(old.bruvo.model = TRUE)}, but make sure to reset it to
#' \code{FALSE} afterwards.
#' }
#' \subsection{Repeat Lengths (replen)}{
#' The \code{replen} argument is crucial for proper analysis of Bruvo's
#' distance since the calculation relies on the knowledge of the number of
Expand Down
11 changes: 9 additions & 2 deletions R/internal.r
Expand Up @@ -946,7 +946,7 @@ fix_negative_branch <- function(tre){
#==============================================================================#

bruvos_distance <- function(bruvomat, funk_call = match.call(), add = TRUE,
loss = TRUE, by_locus = FALSE, old_model = FALSE){
loss = TRUE, by_locus = FALSE){
x <- bruvomat@mat
ploid <- bruvomat@ploidy
replen <- bruvomat@replen
Expand All @@ -960,7 +960,14 @@ bruvos_distance <- function(bruvomat, funk_call = match.call(), add = TRUE,
perms <- .Call("permuto", ploid, PACKAGE = "poppr")

# Calculating bruvo's distance over each locus.
distmat <- .Call("bruvo_distance", x, perms, ploid, add, loss, old_model, PACKAGE = "poppr")
distmat <- .Call("bruvo_distance",
x, # data matrix
perms, # permutation vector (0-indexed)
ploid, # maximum ploidy
add, # Genome addition model switch
loss, # Genome loss model switch
getOption("old.bruvo.model"), # switch to use unordered genotypes
PACKAGE = "poppr")

# If there are missing values, the distance returns 100, which means that the
# comparison is not made. These are changed to NA.
Expand Down
3 changes: 2 additions & 1 deletion R/zzz.r
Expand Up @@ -45,7 +45,8 @@
.onAttach <- function(...) {
op <- options()
op.poppr <- list(
poppr.debug = FALSE # flag for verbosity
poppr.debug = FALSE, # flag for verbosity
old.bruvo.model = FALSE # flag for using the old model of Bruvo's distance.
)
toset <- !(names(op.poppr) %in% names(op))
if(any(toset)) options(op.poppr[toset])
Expand Down
53 changes: 33 additions & 20 deletions man/bruvo.dist.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 7 additions & 11 deletions vignettes/algo.Rnw
Expand Up @@ -429,9 +429,9 @@ with these differences in ploidy levels \citep{Bruvo:2004}:
through a recent genome expansion, the missing alleles will be replace with
all possible combinations of the observed alleles in the shorter genotype. For
example, if there is a genotype of [69, 70, 0, 0] where 0 is a missing allele,
the possible combinations are: [69, 70, 69, 69], [69, 70, 69, 70], and [69,
70, 70, 70]. The resulting distances are then averaged over the number of
comparisons.
the possible combinations are: [69, 70, 69, 69], [69, 70, 69, 70],
[69, 70, 70, 69], and [69, 70, 70, 70]. The resulting distances are then
averaged over the number of comparisons.
\item{Genome Loss Model -} This is similar to the genome addition model,
except that it assumes that there was a recent genome reduction event and uses
the observed values in the full genotype to fill the missing values in the
Expand All @@ -444,18 +444,14 @@ with these differences in ploidy levels \citep{Bruvo:2004}:
As mentioned above, the infinite model is biased, but it is not nearly as
computationally intensive as either of the other models. The reason for this is
that both of the addition and loss models requires replacement of alleles and
recalculation of Bruvo's distance. The number of replacements required is equal
to the multiset coefficient: $\left({n \choose k}\right) == {(n-k+1) \choose k}$
recalculation of Bruvo's distance. The number of replacements required is $n^k$
where $n$ is the number of potential replacements and $k$ is the number of
alleles to be replaced. So, for the example given above, The genome addition
model would require $\left({2 \choose 2}\right) = 3$ calculations of Bruvo's
distance, whereas the genome loss model would require $\left({4 \choose
2}\right) = 10$ calculations.
alleles to be replaced.

To reduce the number of calculations and assumptions otherwise, Bruvo's distance
will be calculated using the largest observed ploidy in pairwise comparisons.
This means that when
comparing [69,70,71,0] and [59,60,0,0], they will be treated as triploids.
This means that when comparing [69,70,71,0] and [59,60,0,0], they will be
treated as triploids.

\subsubsection{Choosing a model}
\label{appendix:algorithm:bruvomodel}
Expand Down

0 comments on commit 486928a

Please sign in to comment.