Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lefser producing empty results when given pathway data #33

Closed
jmartin77777 opened this issue Jan 24, 2024 · 4 comments
Closed

lefser producing empty results when given pathway data #33

jmartin77777 opened this issue Jan 24, 2024 · 4 comments
Assignees

Comments

@jmartin77777
Copy link

I am trying to use the lefser package to generate LDA effect size on a table of pathway relative abundances. I am able to use the tool to reproduce your example case with the zeller14 dataset. But when I try loading my own data it gives me an empty table. I am starting with an input tab-delimited text file that looks like this:

ID	COMPARISON	12DICHLORETHDEG-PWY	1CMET2-PWY
MB20-007-MB20-007_01	CTRL	0	366.878
MB19-001-MB19-001_01	CTRL	4.50847	318.903
MB20-010-MB20-010_01	PD	0	439.004
MB20-012-MB20-012_01	PD	151.322	433.688
MB20-014-MB20-014_01	PD	98.831	435.307

with each row being a sample, column 1 if the samplename, column 2 is the class, further columns are pathway CPM abundance values generated by Humann3. My R script looks like this:

data <- read.csv("Campbell_PWY_abundance.tsv", sep = "\t", header = TRUE, check.names = FALSE)
group <- data$COMPARISON # Capture the 'COMPARISON' column
abundance_data <- data[, -c(1,2)] # Exclude the 'ID' & 'COMPARISON' column
abundance_data_transposed <- t(abundance_data) # Transpose table
sample_metadata <- DataFrame(COMPARISON = group, row.names = data$ID)
se <- SummarizedExperiment(assays = list(counts = abundance_data_transposed),
                           colData = sample_metadata)
se_relative <- relativeAb(se)
lefse_results <- lefser(se_relative, groupCol = "COMPARISON")

It does generate the lefse_results object, but its just an empty table. I was initially concerned that maybe lefser was not happy with the features being pathways, and not the usual formatted taxonomy. But I then tried importing a taxonomy based table (of the same format as shown above) and am getting the same result. Can you help me troubleshoot what I am doing wrong?

@lwaldron
Copy link
Member

lwaldron commented Feb 1, 2024

@jmartin77777 can you reproduce this with a dataset from curatedMetagenomicData, so that we can reproduce it?

@jmartin77777
Copy link
Author

Sorry for the delayed response. I'll try and find a curated dataset to test. Can you recommend any small datasets that might be appropriate?

@lwaldron
Copy link
Member

lwaldron commented Feb 8, 2024

AsnicarF_2017 from curatedMetagenomicData has 24 samples and includes HUMAnN3 pathways https://waldronlab.io/curatedMetagenomicData/articles/available-studies.html

@lwaldron
Copy link
Member

With issues #32 and #35 resolved in release and devel, things should be clearer now as you will see a message if there are no results that pass both the statistical significance threshold and the LDA threshold. You can test which of these is responsible for the lack of results, such as follows (such as setting a p-value threshold to 1, or a LDA threshold to 0).

suppressPackageStartupMessages({
  library(lefser)
  library(curatedMetagenomicData)
})
zeller <-
  curatedMetagenomicData("ZellerG_2014.pathway_abundance",
                         counts = TRUE,
                         dryrun = FALSE)[[1]]
#> Cannot connect to ExperimentHub server, using 'localHub=TRUE' instead
#> Using 'localHub=TRUE'
#>   If offline, please also see BiocManager vignette section on offline use
zeller <- zeller[, zeller$study_condition != "adenoma"]
zeller <- relativeAb(zeller)

lefse_results1 <-
  lefser(
    zeller,
    groupCol = "study_condition",
    kruskal.threshold = 0.05,
    lda.threshold = 2
  )
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#>  See `?factor` or `?relevel` to change the reference category.
head(lefse_results1)
#>                                                                 Names    scores
#> 1                 UNINTEGRATED|g__Ruminococcus.s__Ruminococcus_bromii -3.473240
#> 2 UNINTEGRATED|g__Lachnospiraceae_unclassified.s__Eubacterium_rectale -3.423555
#> 3                 UNINTEGRATED|g__Anaerostipes.s__Anaerostipes_hadrus -3.200505
#> 4                            UNINTEGRATED|g__Blautia.s__Blautia_obeum -3.199587
#> 5                         UNINTEGRATED|g__Blautia.s__Blautia_wexlerae -3.160973
#> 6           UNINTEGRATED|g__Bifidobacterium.s__Bifidobacterium_longum -3.037694

lefse_results2 <-
  lefser(
    zeller,
    groupCol = "study_condition",
    kruskal.threshold = 1e-6,
    lda.threshold = 2
  )
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#>  See `?factor` or `?relevel` to change the reference category.
#> No significant features found.
head(lefse_results2)
#> [1] Names  scores
#> <0 rows> (or 0-length row.names)

lefse_results3 <- 
  lefser(
    zeller, 
    groupCol = "study_condition", 
    kruskal.threshold = 0.05,
    lda.threshold = 2000
  )
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#>  See `?factor` or `?relevel` to change the reference category.
#> No significant features found.
head(lefse_results3)
#> [1] Names  scores
#> <0 rows> (or 0-length row.names)

Created on 2024-03-18 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants