run_lefse, subgroup argument gives Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed #62

akhst7 · 2022-03-21T20:28:36Z

I run a following phyloseq obj;

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 625 taxa and 12 samples ]
sample_data() Sample Data:       [ 12 samples by 2 sample variables ]
tax_table()   Taxonomy Table:    [ 625 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 625 tips and 624 internal nodes ]
refseq()      DNAStringSet:      [ 625 reference sequences ]

I run a following;

lefse.new<-run_lefse(
ps,
group="Treatment",
subgroup = "Sex",
taxa_rank = "all",
transform = "log10p",
norm = "CPM",
norm_para = list(),
kw_cutoff = 0.1,
lda_cutoff = 2,
bootstrap_n = 100,
bootstrap_fraction = 2/3,
wilcoxon_cutoff = 0.1,
multigrp_strat = FALSE,
strict = "0",
sample_min = 10,
only_same_subgrp = FALSE,
curv = FALSE
)

I get an error as follows;

Error in if ((sx < sy) != dir_cmp || sx == sy) { : 
  missing value where TRUE/FALSE needed

Here is a sample_data of ps;

Sample Data:        [12 samples by 2 sample variables]:
        Sex   Treatment
1      M           HH
2      M           HH
3      F           HH
4      F           HH
5      M           RA
6      M           RA
7      F           RA
8      F           RA
9      M           HH
10      F           HH
11      M           RA
12      F           RA

what am I missing here ?

Thanks.

The text was updated successfully, but these errors were encountered:

cpavloud · 2022-05-25T23:00:36Z

I have the same error....
I am running microbiomeMarker (v 1.0.2) in R 4.1.1

akhst7 · 2022-05-26T18:01:26Z

@cpavloud

Can you post a result of following ?

str(sample_data(phyloseq.obj))

cpavloud · 2022-05-26T21:32:16Z

Yes, of course

> str(sample_data(physeq)) 'data.frame': 26 obs. of 2 variables: Formal class 'sample_data' [package "phyloseq"] with 4 slots ..@ .Data :List of 2 .. ..$ : chr "kidneys with calcification" "SG_affected" "SG_affected" "SG_affected" ... .. ..$ : chr "Sick" "Sick" "Sick" "Sick" ... ..@ names : chr "Condition" "Health_state" ..@ row.names: chr "ASB1_LL12_1" "KOK1_LL1_1" "KOK10_LL5_2" "KOK2_LL1_2" ... ..@ .S3Class : chr "data.frame"

The functions runs if I have only group = "Condition" or only group = "Health_state"
But I get the error when I try to run with group = "Health_state" and subgroup = "Condition"

akhst7 · 2022-05-27T16:18:39Z

@cpavloud ,

I did remember now that there is a bug in the code for run-lefse, right in area of test_rep_wilcoxon within lefse-utilities.R, which has not been fixed. Only thing you can do is to make another column of combined two groups in to one ;

Sex   Treatment  Sex_Treatment
1      M           HH        HH_M
2      M           HH       HH_M
3      F           HH        HH_F

Alternatively, you can start using another package called, MicrobiomeProcess. It will let you do lefse with second group as follows;

mp_diff_analysis(
.data = ps,
.abundance=normalize,
.group= Treatment,
.sec.group = Sex,

It is pretty simple to use but powerful. Overall, it is a better package for a 16S microbiome analysis.

cpavloud · 2022-05-27T22:04:45Z

Thank you! Great package! I didn't know about it.
And I also found this very nice website: https://yulab-smu.top/MicrobiotaProcessWorkshop/articles/MicrobiotaProcessWorkshop.html

akhst7 · 2022-05-28T00:54:08Z

@cpavloud, Yeah, it is an upcoming R package for the Microbiome stuff. The link you had is actually for the older version, though commands listed in this tutorial still work. I like the older version better than the newer one.

I am gonna close this issue, since I don't think the developer will fix this issue any time soon.

yiluheihei · 2022-05-29T14:36:23Z

@akhst7 @cpavloud Thanks for your interested in microbiomeMarker, and I'm Sorry for the later reply. Could you provide a reproducible example that would help me to fix this issue?

For more details on how to make a great minimal reproducible example, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and https://www.tidyverse.org/help/#reprex.

cpavloud · 2022-06-05T21:42:43Z

I cannot use reprex to create a reproducible example as you want it, as it only renders the last line of code in the console.

I am using R version 4.1.1 and tidyverse (1.3.1), phyloseq (1.38.0) and microbiomeMarker (1.0.2)

When I try to run lefse with a group and a subgroup

OTU_lefse <- run_lefse( physeq, group = "Health_state", subgroup = "Condition", taxa_rank = "none", transform = c("identity"), norm = "CPM", kw_cutoff = 0.05, lda_cutoff = 2, bootstrap_n = 30, bootstrap_fraction = 2/3, wilcoxon_cutoff = 0.05, multigrp_strat = FALSE, strict = "0", sample_min = 7, only_same_subgrp = FALSE, curv = FALSE )

I get this error
Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed

But when I run it with group = "Health_state" and no subgroup
I get a result

OTU_lefse
microbiomeMarker-class inherited from phyloseq-class
normalization method: [ CPM ]
microbiome marker identity method: [ lefse ]
marker_table() Marker Table: [ 8 microbiome markers with 5 variables ]
otu_table() OTU Table: [ 460 taxa and 26 samples ]
sample_data() Sample Data: [ 26 samples by 2 sample variables ]
tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]

Also, when I run it with group = "Condition" and no subgroup
again I get a result

OTU_lefse
microbiomeMarker-class inherited from phyloseq-class
normalization method: [ CPM ]
microbiome marker identity method: [ lefse ]
marker_table() Marker Table: [ 19 microbiome markers with 5 variables ]
otu_table() OTU Table: [ 460 taxa and 26 samples ]
sample_data() Sample Data: [ 26 samples by 2 sample variables ]
tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]

This is what my sample_data look like

sample_data(OTU_lefse)
Sample Data: [26 samples by 2 sample variables]:
Condition Health_state
ASB1_LL12_1 kidneys with calcification Sick
KOK1_LL1_1 SG_affected Sick
KOK10_LL5_2 SG_affected Sick
KOK2_LL1_2 SG_affected Sick
KOK3_LL1_3 SG_affected Sick
KOK4_LL5_3 SG_affected Sick
KOK5_MM2_1 SG_affected Sick
KOK6_LL12_2 SG_affected Sick
KOK7_LL12_4 SG_affected Sick
KOK8_LL12_3 SG_affected Sick
ASB10_OO10_4 kidneys with calcification Sick
KOK9_LL5_1 SG_affected Sick
O_1_OO7_2 healthy Healthy
O_2_OO3_3 healthy Healthy
O_3_MM9_1 healthy Healthy
O_4_MM9_4 healthy Healthy
O_5_NN6_1 healthy Healthy
O_6_NN6_2 healthy Healthy
O_8_MM9_3 healthy Healthy
ASB2_NN8_2 kidneys with calcification Sick
ASB3_NN8_3 kidneys with calcification Sick
ASB4_NN11_2 kidneys with calcification Sick
ASB5_OO3_1 kidneys with calcification Sick
ASB6_OO7_1 kidneys with calcification Sick
ASB7_OO7_3 kidneys with calcification Sick
ASB8_OO10_1 kidneys with calcification Sick

yiluheihei · 2022-06-06T05:28:35Z

@cpavloud Please install the latest development version (1.3.2) from github in which your issue has been fixed.

devtools::install_github("yiluheihei/microbiomeMarker")

cpavloud · 2022-06-06T15:17:51Z

Ok! That worked!
Thank you!

akhst7 · 2022-06-06T22:26:50Z

@cpavloud

Could you do me a big favor ? If you have a chance, could you run lefse on MicrobiotaProcess and compared to MicrobiomeMaker and post the result somewhere so that I can see them. Thanks.

yiluheihei · 2022-06-07T01:28:22Z

@akhst7 lefse in microbiomeMarker is a reimplementation of LEfSe. For details of the method, please see the LEfSe paper

cpavloud · 2022-06-07T01:30:04Z

This is quite interesting.

The result of mp_diff_analysis function

new_lefse <- mp_diff_analysis(
.data = MPSE,
.abundance=normalize,
.group= Health_state,
.sec.group = Condition,
relative = TRUE)

is that There are not significantly discriminative features after internal first test !

Also, as probably expected, the result of the diff_analysis function

deres <- diff_analysis(obj = physeq, classgroup = "Health_state", subclass = "Condition",
mlfun = "lda",
filtermod = "pvalue",
firstcomfun = "kruskal_test",
firstalpha = 0.05,
strictmod = TRUE,
secondcomfun = "wilcox_test",
subclmin = 3,
subclwilc = TRUE,
secondalpha = 0.01,
lda=3)

is that There are not significantly discriminative features after internalwilcox_test !

However, the run_lefse function identifies 2 microbiome markers

microbiomeMarker-class inherited from phyloseq-class
normalization method: [ CPM ]
microbiome marker identity method: [ lefse ]
marker_table() Marker Table: [ 2 microbiome markers with 5 variables ]
otu_table() OTU Table: [ 460 taxa and 26 samples ]
sample_data() Sample Data: [ 26 samples by 2 sample variables ]
tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]

Object of class "marker_table"
feature enrich_group ef_lda pvalue padj
marker1 Otu16 Healthy 3.503214 0.04537015 0.04537015
marker2 Otu185 Sick 3.041596 0.02413744 0.02413744

akhst7 · 2022-06-07T02:21:02Z

@cpavloud

OK that's what I thought you would get. I get the exactly the same result. Initially I thought it was due to different default normalization approaches used by MP and MB. MP uses "hellinger"(I think, according to the vignette) while MB uses "cpm". But then, when I swapped MP's RelRareAbundance with MB's cpm, and did diff_analysis, I got the same result. I have not done the opposite and have not manually calculate lefse (basically annova followed by LDA) Huttonhower's lefse package yet but it appears that different normalization significantly affects lefse results, particularly data with a borderline diff. abundance. Perhaps, yiluheihei could address which normalization method we should be using for lefse.

yiluheihei · 2022-06-07T05:31:17Z

@akhst7 As you said, the choice of normalization method has large effect on the result of different analysis, and there is no consensus on which method is better than others. For lesfse, I recommend use the "CPM", the default normalization method in lefse paper. And try other methods if the result is not good enough.

Moreever, microbiomeMarker currently provides a function compare_DA() which can be used to perform comparison of different methods and help users to choose a suitable normalization/differential method for their own dataset.

library(microbiomeMarker)
data(kostic_crc)

# Remove taxa not seen in at least 20% of the samples
kostic_crc <- phyloseq::filter_taxa(
    kostic_crc,
    function(x) sum(x > 0) > (0.2*length(x)), TRUE)

compare_res <- compare_DA(
    ps = kostic_crc,
    group = "DIAGNOSIS",
    methods = "lefse",
    args = list(lefse = list(list(norm = "CPM"), list(norm = "TSS"))),
    BPPARAM = BiocParallel::SnowParam(workers = 2, progressbar = TRUE),
    n_rep = 2,
    effect_size = 5)
#>   |                                                                              |                                                                      |   0%  |                                                                              |==================                                                    |  25%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================================| 100%
compare_summary <- summary(compare_res)
#> Warning: Best score is <= 0.
#> You might require to preprocessing your data or re-run with a higher effect size.
compare_summary
#>                                                                                  call
#> 2 run_lefse(ps = "kostic_crc", group = "DIAGNOSIS", taxa_rank = "none", norm = "TSS")
#> 1 run_lefse(ps = "kostic_crc", group = "DIAGNOSIS", taxa_rank = "none", norm = "CPM")
#>         auc        fpr     power       fdr      score score_0.05 score_0.95
#> 2 0.5000000 0.00000000 0.0000000 0.0000000  0.0000000       0.00  0.0000000
#> 1 0.8755844 0.03832834 0.3333333 0.6388889 -0.5136941      -0.55 -0.4823232

^{Created on 2022-06-07 by the reprex package (v2.0.1)}

The higher the score in the summary result, the better the method is estimated to be. However, this feature is in beta version, and only taxa_rank = "none" is allowed.

Hope this helps.

akhst7 · 2022-06-07T22:22:24Z

Thanks. It is much clear now.

yiluheihei mentioned this issue May 30, 2022

error when adding subgroup in lefse #55

Closed

yiluheihei closed this as completed in 02bbf87 May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_lefse, subgroup argument gives Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed #62

run_lefse, subgroup argument gives Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed #62

akhst7 commented Mar 21, 2022

cpavloud commented May 25, 2022

akhst7 commented May 26, 2022

cpavloud commented May 26, 2022

akhst7 commented May 27, 2022

cpavloud commented May 27, 2022 •

edited

akhst7 commented May 28, 2022

yiluheihei commented May 29, 2022 •

edited

cpavloud commented Jun 5, 2022

yiluheihei commented Jun 6, 2022

cpavloud commented Jun 6, 2022

akhst7 commented Jun 6, 2022

yiluheihei commented Jun 7, 2022

cpavloud commented Jun 7, 2022

akhst7 commented Jun 7, 2022

yiluheihei commented Jun 7, 2022

akhst7 commented Jun 7, 2022

run_lefse, subgroup argument gives Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed #62

run_lefse, subgroup argument gives Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed #62

Comments

akhst7 commented Mar 21, 2022

cpavloud commented May 25, 2022

akhst7 commented May 26, 2022

cpavloud commented May 26, 2022

akhst7 commented May 27, 2022

cpavloud commented May 27, 2022 • edited

akhst7 commented May 28, 2022

yiluheihei commented May 29, 2022 • edited

cpavloud commented Jun 5, 2022

yiluheihei commented Jun 6, 2022

cpavloud commented Jun 6, 2022

akhst7 commented Jun 6, 2022

yiluheihei commented Jun 7, 2022

cpavloud commented Jun 7, 2022

akhst7 commented Jun 7, 2022

yiluheihei commented Jun 7, 2022

akhst7 commented Jun 7, 2022

cpavloud commented May 27, 2022 •

edited

yiluheihei commented May 29, 2022 •

edited