nullmodel changes in 2.5-0 #255

jarioksa · 2017-10-11T09:11:56Z

The nullmodel() functions have changed much during 2.5-0 development, and they are just now partly in volatile changes: there are some functions and choices that may be removed before the release, and some choices may need rethinking. Here is the update of major changes against 2.4-x:

Compiled code uses now .Call() interface and proper registration. The effect on speed is probably marginal, but the interface is much cleaner.
The methods picking 2x2 submatrices are now much more parsimonious with random numbers which means considerable speed-up, but also means that the results change from previous implementation when using the same seed. This concerns methods quasiswap, swap, tswap, quasiswap_count, swap_count and abuswap_* plus methods using quasiswap internally (swsh_*). Profiling (also of compiled code) showed that these methods spent most of their time in generating random numbers. We used four random numbers for a 2x2 matrix (two row indices, two column indices). Now we have two alternative schemes: "3minus" finds first element directly from the matrix, and then a second row (2 numbers), and if these give a submatrix that cannot be swapped (both 1 or both 0) or quasiswapped (both 0), it skips finding the second column and starts a new cycle. This give 3 or 2 random numbers per cycle (see analysis in commit message 83281ae). In "2x2" scheme we find directly two diagonal elements with 2 random numbers and the antidiagonal elements from their row and column indices (implemented in 33e9813). This always finds only 2 random numbers. This was implemented in quantitative swap methods (810d902), but surprisingly it was usually slower and at best only marginallytfaster in binary quasiswaps, and there we use the "3minus" scheme (analysis in commit message 72b2d93).
There are two new quasiswap algorithms on steroids (merged in 175889c). In greedyqswap the first element is picked from >1 elements increasing chances of sum-of-squares reducing quasiswaps. The search for last >1 elements takes most of the time in quasiswap, and being greedy gives a huge speed-up. However, greedy steps are biased, but if we thin, say, to 1% greedy steps among ordinary quasiswaps, we can still double the speed with little risk of bias. Another method is boostedqswap which is based on the same idea as curveball: In curveball we find the set of unique species that occur only in one of two compared sites, and in boostedqswap we find species that are more abundant in one of sites (1 against 0, 2 against 1 etc), and quasiswap equal number of these up and down on two rows. My first tests indicate that this is biased (although similar curveball should be unbiased). If so, this will be removed and not released. Both methods need testing, and neither will be released if they appear to be biased (we don't have a lack of poor null models in this world).
backtracking use now compiled code with a huge speed-up. Backtracking is biased, but it is a classic method that may be useful for comparative purposes.
There is a new quantitative method that fills a count matrix honouring row and column sums. The method is non-public and has now make.commsim() interface, but it can be called as .Call("do_rcfill", n, rs, cs) where n is the number of simulated matrices, and rs and cs are row and column sums. The main reason for first implementing this method was that posts in R-devel and StackOverflow claimed that stats::r2dtable() (that we use much internally) give too regular data. I checked the Miklós & Podani paper, and found out that they used a function giving more dispersed data as initial matrix in their quasiswap, and implemented that method. My analysis indicates that the huge number of steps we need in quasiswap guarantees that the initial matrix does not influence the result, but the new function is faster than r2dtable() and may speed up simulations. Another use for this function is as a nullmodel that generates count data with larger variance than r2dtable that we now have as a null model. However, I still hesitate with releasing this function, because we really do not have a lack of poor null models (and now I think all quantitative nullmodels for counts are poor).

The text was updated successfully, but these errors were encountered:

jarioksa · 2017-10-11T09:30:11Z

Here microbenchmark results of do_rcfill effects on quasiswap.

Currently we use r2dtable() to get the initial matrices for quasiswap methods, and the uppermost pair shows the speed advantage of switching to rcfill. The graph also shows the intolerable slowness of quasiswap, and since the basic quasiswap is so slow, switch to rcfill has little effect. The greedyqswap pair has thin=100 meaning that only first of 100 steps is greedy (1% greediness), and even this gives a huge speed-up. quasiswap_count spent some 75% of time in r2dtable and there the boost was also large when using rcfill instead of r2dtable.

jarioksa · 2018-01-17T13:43:05Z

Some information about timing. We have some speed-up within 2.4-0 development:

each RNG does not check sufficient data (but this is done before)
make.commsim uses snappier loop
"backtrack" was made faster (but still uses R)

In 2.5-0 the major changes:

use .Call() interface with faster internal code in C: use fewer random numbers, move loops from R to C
"backtrack" is in C

Benchmarking was performed with microbenchmark package. Times are averages of 100 replicates and they give the average running time in seconds per 100 simulations with BCI

Model	Arguments	2.4-0	2.4-5	2.5-0	Notes
`backtrack`		32.504	23.282	2.085	biased
`swap`	`thin=2000, burnin=1000`	0.295	0.283	0.175	sequential, biased
`tswap`	`thin=40000, burnin=25000`	0.232	0.226	0.121	sequential
`curveball`	`thin=200`	0.064	0.052	0.037	sequential
`quasiswap`	`thin=1000`	8.675	8.559	7.262	mild bias with `thin=0`
`greedyqswap`	`thin=100`			2.035	biased with `thin=0`
`greedyqswap`	`thin=20`			1.000
`boostedqswap`				1.234	biased `b067566`

jarioksa added the request-for-comments label Oct 11, 2017

jarioksa mentioned this issue Nov 27, 2017

Checklist for vegan 2.5-0 release #235

Closed

12 tasks

jarioksa closed this as completed Apr 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nullmodel changes in 2.5-0 #255

nullmodel changes in 2.5-0 #255

jarioksa commented Oct 11, 2017

jarioksa commented Oct 11, 2017 •

edited

Loading

jarioksa commented Jan 17, 2018 •

edited

Loading

nullmodel changes in 2.5-0 #255

nullmodel changes in 2.5-0 #255

Comments

jarioksa commented Oct 11, 2017

jarioksa commented Oct 11, 2017 • edited Loading

jarioksa commented Jan 17, 2018 • edited Loading

jarioksa commented Oct 11, 2017 •

edited

Loading

jarioksa commented Jan 17, 2018 •

edited

Loading