Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to median instead of mean #242

Closed
sydneyg opened this issue Jul 12, 2017 · 14 comments
Closed

Switch to median instead of mean #242

sydneyg opened this issue Jul 12, 2017 · 14 comments

Comments

@sydneyg
Copy link

@sydneyg sydneyg commented Jul 12, 2017

@Microbiology Hi can you please add an option to make a median of the subsampled OTU tables instead of the mean?

@gavinsimpson
Copy link
Contributor

@gavinsimpson gavinsimpson commented Jul 12, 2017

Which functions did you have in mind for this?

@sydneyg
Copy link
Author

@sydneyg sydneyg commented Jul 12, 2017

sorry, I've never written an issue before, this is the avgdist.Rd function.

@gavinsimpson
Copy link
Contributor

@gavinsimpson gavinsimpson commented Jul 12, 2017

This seems trivial to add.

@Microbiology --- as you committed this function, do you have any input on whether this is a useful feature to add (& also #243)? (I'm not expecting you to implement it of course!)

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Jul 13, 2017

The avgdist function does not find the mean of OTU tables, but it calculates the dissimilarities for each subsampled OTU table and returns the average of these dissimilarities. In principle, it could return the median of these dissimilarities, but you were not suggesting this? (And this means having a new trick to calculate such a median).

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Jul 13, 2017

Replacing the line

findist <- Reduce("+", distlist)/length(distlist)

with

findist <- apply(do.call(rbind, distlist), 2, meanfun, ...)

and having new argument meanfun = mean would allow user-selected function to find mean, and having those dots (...) allows mean functions that take extra arguments (such as trim).

@Microbiology
Copy link
Contributor

@Microbiology Microbiology commented Jul 13, 2017

Thanks @sydneyg for the issue and the feedback (I had suggested this and the other issue be created in a previous conversation).

I will work on adding this in to the avgdist function, using the info from @jarioksa as well. :)

@Microbiology
Copy link
Contributor

@Microbiology Microbiology commented Jul 29, 2017

I added this functionality and created a pull request to incorporate it (#244).

Also it looks like I am unable to close this issue myself. I think @sydneyg might have to be the one to close it, provided this all looks okay.

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Aug 7, 2017

Closed with PR #244

@jarioksa jarioksa closed this Aug 7, 2017
@sydneyg
Copy link
Author

@sydneyg sydneyg commented Nov 28, 2017

Thank you for adding this feature to avgdist.R Microbiology! However, when I ran the script I got the same results regardless of using the mean or median options. Is this implemented correctly? Also, can you please put in the dcoumentation that Bray-Curtis is the default dissimilarity index?

Thanks!

avgdist_7kmean <- avgdist(otu, 7000, meanfun= mean, transf= sqrt )
avgdist_7kmedian <- avgdist(otu, 7000, meanfun=median, transf= sqrt )

@Microbiology
Copy link
Contributor

@Microbiology Microbiology commented Dec 12, 2017

Thanks for pointing this out @sydneyg ! :) I think you need to reopen the issue since you are the original poster.

I'll look into this and follow up.

@Microbiology
Copy link
Contributor

@Microbiology Microbiology commented Dec 12, 2017

Hey @sydneyg , so I was looking into the problem and things seem to be working as far as I can tell, but I may be missing something. I can switch the default to median and it still runs as expected, without any mention of mean so I see how it would be doing the mean in that case.

Thinking about it though, I would probably not expect the results to be massively different since we are taking the mean and median of the random iterations. Could you provide a small test set where you would expect to see very different results between the mean and median of the random iterations? Or does it make sense that they would be very close results?

Thanks again for your helpful feedback. :)

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Dec 12, 2017

@sydneyg @Microbiology : I had a quick look at the function, and in my inspection avgdist did use the median when so requested. Probably the results are just very similar to each other. They are very similar here, for instance:

> set.seed(4711); meand <- avgdist(BCI, 100)
> set.seed(4711); medid <- avgdist(BCI, 100, meanfun=median)
> range(meand-medid)
[1] -0.0114  0.0141

You must use set.seed() to guarantee same random rarefaction in two calls.

This experiment shows that results change when you change mean function. My analysis of the code shows that the change happens when you switch to median(), and it indeed is median() that is called.

@jarioksa
Copy link
Contributor

@jarioksa jarioksa commented Dec 12, 2017

One way of inspecting the issue is to check the variability of simulations. You can do this by abusing the function like this:

> set.seed(4711); sdd <- avgdist(BCI, 100, meanfun=sd)
> summary(sdd)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.03319 0.04048 0.04266 0.04271 0.04478 0.05539 

So you ask for sd() instead of having mean() or median(), and this will show that you do not have much variation among your simulations. @sydneyg : you used rarefaction to 7000 items, and that is such a high number that I would expect the simulations have very low variance.

@Microbiology : I just wonder if some measure of variability would be useful diagnostic here.

@Microbiology
Copy link
Contributor

@Microbiology Microbiology commented Dec 12, 2017

Thanks @jarioksa, it sounds like we are on the same page. :)

I like your idea of using sd, that is very clever. I agree that it is worth including in the testing. I will write that into the documentation example testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.