Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upnullmodel changes in 2.5-0 #255
Comments
|
Some information about timing. We have some speed-up within 2.4-0 development:
In 2.5-0 the major changes:
Benchmarking was performed with microbenchmark package. Times are averages of 100 replicates and they give the average running time in seconds per 100 simulations with
|

The
nullmodel()functions have changed much during 2.5-0 development, and they are just now partly in volatile changes: there are some functions and choices that may be removed before the release, and some choices may need rethinking. Here is the update of major changes against 2.4-x:.Call()interface and proper registration. The effect on speed is probably marginal, but the interface is much cleaner.quasiswap,swap,tswap,quasiswap_count,swap_countandabuswap_*plus methods usingquasiswapinternally (swsh_*). Profiling (also of compiled code) showed that these methods spent most of their time in generating random numbers. We used four random numbers for a 2x2 matrix (two row indices, two column indices). Now we have two alternative schemes:"3minus"finds first element directly from the matrix, and then a second row (2 numbers), and if these give a submatrix that cannot be swapped (both 1 or both 0) or quasiswapped (both 0), it skips finding the second column and starts a new cycle. This give 3 or 2 random numbers per cycle (see analysis in commit message 83281ae). In"2x2"scheme we find directly two diagonal elements with 2 random numbers and the antidiagonal elements from their row and column indices (implemented in 33e9813). This always finds only 2 random numbers. This was implemented in quantitative swap methods (810d902), but surprisingly it was usually slower and at best only marginallytfaster in binary quasiswaps, and there we use the"3minus"scheme (analysis in commit message 72b2d93).greedyqswapthe first element is picked from >1 elements increasing chances of sum-of-squares reducing quasiswaps. The search for last >1 elements takes most of the time inquasiswap, and being greedy gives a huge speed-up. However, greedy steps are biased, but if we thin, say, to 1% greedy steps among ordinary quasiswaps, we can still double the speed with little risk of bias. Another method isboostedqswapwhich is based on the same idea ascurveball: Incurveballwe find the set of unique species that occur only in one of two compared sites, and inboostedqswapwe find species that are more abundant in one of sites (1 against 0, 2 against 1 etc), and quasiswap equal number of these up and down on two rows. My first tests indicate that this is biased (although similarcurveballshould be unbiased). If so, this will be removed and not released. Both methods need testing, and neither will be released if they appear to be biased (we don't have a lack of poor null models in this world).backtrackinguse now compiled code with a huge speed-up. Backtracking is biased, but it is a classic method that may be useful for comparative purposes.make.commsim()interface, but it can be called as.Call("do_rcfill", n, rs, cs)wherenis the number of simulated matrices, andrsandcsare row and column sums. The main reason for first implementing this method was that posts in R-devel and StackOverflow claimed thatstats::r2dtable()(that we use much internally) give too regular data. I checked the Miklós & Podani paper, and found out that they used a function giving more dispersed data as initial matrix in their quasiswap, and implemented that method. My analysis indicates that the huge number of steps we need in quasiswap guarantees that the initial matrix does not influence the result, but the new function is faster thanr2dtable()and may speed up simulations. Another use for this function is as a nullmodel that generates count data with larger variance thanr2dtablethat we now have as a null model. However, I still hesitate with releasing this function, because we really do not have a lack of poor null models (and now I think all quantitative nullmodels for counts are poor).