Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_lefse, subgroup argument gives Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed #62

Closed
akhst7 opened this issue Mar 21, 2022 · 16 comments

Comments

@akhst7
Copy link

akhst7 commented Mar 21, 2022

I run a following phyloseq obj;

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 625 taxa and 12 samples ]
sample_data() Sample Data:       [ 12 samples by 2 sample variables ]
tax_table()   Taxonomy Table:    [ 625 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 625 tips and 624 internal nodes ]
refseq()      DNAStringSet:      [ 625 reference sequences ]

I run a following;

lefse.new<-run_lefse(
ps,
group="Treatment",
subgroup = "Sex",
taxa_rank = "all",
transform = "log10p",
norm = "CPM",
norm_para = list(),
kw_cutoff = 0.1,
lda_cutoff = 2,
bootstrap_n = 100,
bootstrap_fraction = 2/3,
wilcoxon_cutoff = 0.1,
multigrp_strat = FALSE,
strict = "0",
sample_min = 10,
only_same_subgrp = FALSE,
curv = FALSE
)

I get an error as follows;

Error in if ((sx < sy) != dir_cmp || sx == sy) { : 
  missing value where TRUE/FALSE needed

Here is a sample_data of ps;

Sample Data:        [12 samples by 2 sample variables]:
        Sex   Treatment
1      M           HH
2      M           HH
3      F           HH
4      F           HH
5      M           RA
6      M           RA
7      F           RA
8      F           RA
9      M           HH
10      F           HH
11      M           RA
12      F           RA

what am I missing here ?

Thanks.

@cpavloud
Copy link

I have the same error....
I am running microbiomeMarker (v 1.0.2) in R 4.1.1

@akhst7
Copy link
Author

akhst7 commented May 26, 2022

@cpavloud

Can you post a result of following ?

str(sample_data(phyloseq.obj))

@cpavloud
Copy link

Yes, of course

> str(sample_data(physeq)) 'data.frame': 26 obs. of 2 variables: Formal class 'sample_data' [package "phyloseq"] with 4 slots ..@ .Data :List of 2 .. ..$ : chr "kidneys with calcification" "SG_affected" "SG_affected" "SG_affected" ... .. ..$ : chr "Sick" "Sick" "Sick" "Sick" ... ..@ names : chr "Condition" "Health_state" ..@ row.names: chr "ASB1_LL12_1" "KOK1_LL1_1" "KOK10_LL5_2" "KOK2_LL1_2" ... ..@ .S3Class : chr "data.frame"

The functions runs if I have only group = "Condition" or only group = "Health_state"
But I get the error when I try to run with group = "Health_state" and subgroup = "Condition"

@akhst7
Copy link
Author

akhst7 commented May 27, 2022

@cpavloud ,

I did remember now that there is a bug in the code for run-lefse, right in area of test_rep_wilcoxon within lefse-utilities.R, which has not been fixed. Only thing you can do is to make another column of combined two groups in to one ;

Sex   Treatment  Sex_Treatment
1      M           HH        HH_M
2      M           HH       HH_M
3      F           HH        HH_F

Alternatively, you can start using another package called, MicrobiomeProcess. It will let you do lefse with second group as follows;

mp_diff_analysis(
.data = ps,
.abundance=normalize,
.group= Treatment,
.sec.group = Sex,

It is pretty simple to use but powerful. Overall, it is a better package for a 16S microbiome analysis.

@cpavloud
Copy link

cpavloud commented May 27, 2022

Thank you! Great package! I didn't know about it.
And I also found this very nice website: https://yulab-smu.top/MicrobiotaProcessWorkshop/articles/MicrobiotaProcessWorkshop.html

@akhst7
Copy link
Author

akhst7 commented May 28, 2022

@cpavloud, Yeah, it is an upcoming R package for the Microbiome stuff. The link you had is actually for the older version, though commands listed in this tutorial still work. I like the older version better than the newer one.

I am gonna close this issue, since I don't think the developer will fix this issue any time soon.

@yiluheihei
Copy link
Owner

yiluheihei commented May 29, 2022

@akhst7 @cpavloud Thanks for your interested in microbiomeMarker, and I'm Sorry for the later reply. Could you provide a reproducible example that would help me to fix this issue?

For more details on how to make a great minimal reproducible example, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and https://www.tidyverse.org/help/#reprex.

@cpavloud
Copy link

cpavloud commented Jun 5, 2022

I cannot use reprex to create a reproducible example as you want it, as it only renders the last line of code in the console.

I am using R version 4.1.1 and tidyverse (1.3.1), phyloseq (1.38.0) and microbiomeMarker (1.0.2)

When I try to run lefse with a group and a subgroup

OTU_lefse <- run_lefse( physeq, group = "Health_state", subgroup = "Condition", taxa_rank = "none", transform = c("identity"), norm = "CPM", kw_cutoff = 0.05, lda_cutoff = 2, bootstrap_n = 30, bootstrap_fraction = 2/3, wilcoxon_cutoff = 0.05, multigrp_strat = FALSE, strict = "0", sample_min = 7, only_same_subgrp = FALSE, curv = FALSE )

I get this error
Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed

But when I run it with group = "Health_state" and no subgroup
I get a result

OTU_lefse
microbiomeMarker-class inherited from phyloseq-class
normalization method: [ CPM ]
microbiome marker identity method: [ lefse ]
marker_table() Marker Table: [ 8 microbiome markers with 5 variables ]
otu_table() OTU Table: [ 460 taxa and 26 samples ]
sample_data() Sample Data: [ 26 samples by 2 sample variables ]
tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]

Also, when I run it with group = "Condition" and no subgroup
again I get a result

OTU_lefse
microbiomeMarker-class inherited from phyloseq-class
normalization method: [ CPM ]
microbiome marker identity method: [ lefse ]
marker_table() Marker Table: [ 19 microbiome markers with 5 variables ]
otu_table() OTU Table: [ 460 taxa and 26 samples ]
sample_data() Sample Data: [ 26 samples by 2 sample variables ]
tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]

This is what my sample_data look like

sample_data(OTU_lefse)
Sample Data: [26 samples by 2 sample variables]:
Condition Health_state
ASB1_LL12_1 kidneys with calcification Sick
KOK1_LL1_1 SG_affected Sick
KOK10_LL5_2 SG_affected Sick
KOK2_LL1_2 SG_affected Sick
KOK3_LL1_3 SG_affected Sick
KOK4_LL5_3 SG_affected Sick
KOK5_MM2_1 SG_affected Sick
KOK6_LL12_2 SG_affected Sick
KOK7_LL12_4 SG_affected Sick
KOK8_LL12_3 SG_affected Sick
ASB10_OO10_4 kidneys with calcification Sick
KOK9_LL5_1 SG_affected Sick
O_1_OO7_2 healthy Healthy
O_2_OO3_3 healthy Healthy
O_3_MM9_1 healthy Healthy
O_4_MM9_4 healthy Healthy
O_5_NN6_1 healthy Healthy
O_6_NN6_2 healthy Healthy
O_8_MM9_3 healthy Healthy
ASB2_NN8_2 kidneys with calcification Sick
ASB3_NN8_3 kidneys with calcification Sick
ASB4_NN11_2 kidneys with calcification Sick
ASB5_OO3_1 kidneys with calcification Sick
ASB6_OO7_1 kidneys with calcification Sick
ASB7_OO7_3 kidneys with calcification Sick
ASB8_OO10_1 kidneys with calcification Sick

@yiluheihei
Copy link
Owner

@cpavloud Please install the latest development version (1.3.2) from github in which your issue has been fixed.

devtools::install_github("yiluheihei/microbiomeMarker")

@cpavloud
Copy link

cpavloud commented Jun 6, 2022

Ok! That worked!
Thank you!

@akhst7
Copy link
Author

akhst7 commented Jun 6, 2022

@cpavloud

Could you do me a big favor ? If you have a chance, could you run lefse on MicrobiotaProcess and compared to MicrobiomeMaker and post the result somewhere so that I can see them. Thanks.

@yiluheihei
Copy link
Owner

@akhst7 lefse in microbiomeMarker is a reimplementation of LEfSe. For details of the method, please see the LEfSe paper

@cpavloud
Copy link

cpavloud commented Jun 7, 2022

This is quite interesting.

The result of mp_diff_analysis function

new_lefse <- mp_diff_analysis(
.data = MPSE,
.abundance=normalize,
.group= Health_state,
.sec.group = Condition,
relative = TRUE)

is that There are not significantly discriminative features after internal first test !

Also, as probably expected, the result of the diff_analysis function

deres <- diff_analysis(obj = physeq, classgroup = "Health_state", subclass = "Condition",
mlfun = "lda",
filtermod = "pvalue",
firstcomfun = "kruskal_test",
firstalpha = 0.05,
strictmod = TRUE,
secondcomfun = "wilcox_test",
subclmin = 3,
subclwilc = TRUE,
secondalpha = 0.01,
lda=3)

is that There are not significantly discriminative features after internalwilcox_test !

However, the run_lefse function identifies 2 microbiome markers

microbiomeMarker-class inherited from phyloseq-class
normalization method: [ CPM ]
microbiome marker identity method: [ lefse ]
marker_table() Marker Table: [ 2 microbiome markers with 5 variables ]
otu_table() OTU Table: [ 460 taxa and 26 samples ]
sample_data() Sample Data: [ 26 samples by 2 sample variables ]
tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]

Object of class "marker_table"
feature enrich_group ef_lda pvalue padj
marker1 Otu16 Healthy 3.503214 0.04537015 0.04537015
marker2 Otu185 Sick 3.041596 0.02413744 0.02413744

@akhst7
Copy link
Author

akhst7 commented Jun 7, 2022

@cpavloud

OK that's what I thought you would get. I get the exactly the same result. Initially I thought it was due to different default normalization approaches used by MP and MB. MP uses "hellinger"(I think, according to the vignette) while MB uses "cpm". But then, when I swapped MP's RelRareAbundance with MB's cpm, and did diff_analysis, I got the same result. I have not done the opposite and have not manually calculate lefse (basically annova followed by LDA) Huttonhower's lefse package yet but it appears that different normalization significantly affects lefse results, particularly data with a borderline diff. abundance. Perhaps, yiluheihei could address which normalization method we should be using for lefse.

@yiluheihei
Copy link
Owner

@akhst7 As you said, the choice of normalization method has large effect on the result of different analysis, and there is no consensus on which method is better than others. For lesfse, I recommend use the "CPM", the default normalization method in lefse paper. And try other methods if the result is not good enough.

Moreever, microbiomeMarker currently provides a function compare_DA() which can be used to perform comparison of different methods and help users to choose a suitable normalization/differential method for their own dataset.

library(microbiomeMarker)
data(kostic_crc)

# Remove taxa not seen in at least 20% of the samples
kostic_crc <- phyloseq::filter_taxa(
    kostic_crc,
    function(x) sum(x > 0) > (0.2*length(x)), TRUE)

compare_res <- compare_DA(
    ps = kostic_crc,
    group = "DIAGNOSIS",
    methods = "lefse",
    args = list(lefse = list(list(norm = "CPM"), list(norm = "TSS"))),
    BPPARAM = BiocParallel::SnowParam(workers = 2, progressbar = TRUE),
    n_rep = 2,
    effect_size = 5)
#>   |                                                                              |                                                                      |   0%  |                                                                              |==================                                                    |  25%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================================| 100%
compare_summary <- summary(compare_res)
#> Warning: Best score is <= 0.
#> You might require to preprocessing your data or re-run with a higher effect size.
compare_summary
#>                                                                                  call
#> 2 run_lefse(ps = "kostic_crc", group = "DIAGNOSIS", taxa_rank = "none", norm = "TSS")
#> 1 run_lefse(ps = "kostic_crc", group = "DIAGNOSIS", taxa_rank = "none", norm = "CPM")
#>         auc        fpr     power       fdr      score score_0.05 score_0.95
#> 2 0.5000000 0.00000000 0.0000000 0.0000000  0.0000000       0.00  0.0000000
#> 1 0.8755844 0.03832834 0.3333333 0.6388889 -0.5136941      -0.55 -0.4823232

Created on 2022-06-07 by the reprex package (v2.0.1)

The higher the score in the summary result, the better the method is estimated to be. However, this feature is in beta version, and only taxa_rank = "none" is allowed.

Hope this helps.

@akhst7
Copy link
Author

akhst7 commented Jun 7, 2022

Thanks. It is much clear now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants