Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acute problem: LinkPeaks error #858

Closed
laasov opened this issue Nov 5, 2021 · 11 comments
Closed

Acute problem: LinkPeaks error #858

laasov opened this issue Nov 5, 2021 · 11 comments
Labels
bug Something isn't working

Comments

@laasov
Copy link

laasov commented Nov 5, 2021

Hi,

LinkPeaks produces an unexpected error when I try to run it:

DefaultAssay(rv2.ss) <- "peaks"

linkpeaks.out <- LinkPeaks(rv2.ss,
                           peak.assay = "peaks",
                           expression.assay = "RNA",
                           n_sample = 200,
                           genes.use = lp.genes,
                           min.distance = 2000,
                           distance = 50000)

Error:

Testing 4 genes and 107523 peaks
Error in LinkPeaks(rv2.ss, peak.assay = "peaks", expression.assay = "RNA", :
No peaks fall within distance threshold
Have you set the proper genome and seqlevelsStyle for peaks assay?

Here lp.genes is a vector containing ENSMUS IDs (as are the rownames of RNA assay) for which I know there are expected to be correlated peaks within the distance parameter. Thus, the error shouldn't rise from that. The data object contains data from typical chromosomes, but seqlevels returns also other chr-tags:

> seqlevels(rv2.ss)
 [1] "chr1"                 "chr2"                 "chr3"                 "chr4"                 "chr5"                 "chr6"                 "chr7"                 "chr8"                
 [9] "chr9"                 "chr10"                "chr11"                "chr12"                "chr13"                "chr14"                "chr15"                "chr16"               
[17] "chr17"                "chr18"                "chr19"                "chrX"                 "chrY"                 "chrM"                 "chr1_GL456210_random" "chr1_GL456211_random"
[25] "chr1_GL456212_random" "chr1_GL456213_random" "chr1_GL456221_random" "chr4_GL456216_random" "chr4_GL456350_random" "chr4_JH584292_random" "chr4_JH584293_random" "chr4_JH584294_random"
[33] "chr4_JH584295_random" "chr5_GL456354_random" "chr5_JH584296_random" "chr5_JH584297_random" "chr5_JH584298_random" "chr5_JH584299_random" "chr7_GL456219_random" "chrX_GL456233_random"
[41] "chrY_JH584300_random" "chrY_JH584301_random" "chrY_JH584302_random" "chrY_JH584303_random" "chrUn_GL456239"       "chrUn_GL456359"       "chrUn_GL456360"       "chrUn_GL456366"      
[49] "chrUn_GL456367"       "chrUn_GL456368"       "chrUn_GL456370"       "chrUn_GL456372"       "chrUn_GL456378"       "chrUn_GL456379"       "chrUn_GL456381"       "chrUn_GL456382"      
[57] "chrUn_GL456383"       "chrUn_GL456385"       "chrUn_GL456387"       "chrUn_GL456389"       "chrUn_GL456390"       "chrUn_GL456392"       "chrUn_GL456393"       "chrUn_GL456394"      
[65] "chrUn_GL456396"       "chrUn_JH584304" 

I'm using R version 4.1.1 and Signac version 1.4.0. Info on these and other properties can be found from session info below

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /opt/intel/oneapi/mkl/2021.2.0/lib/intel64/libmkl_gf_lp64.so.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SeuratObject_4.0.2 Seurat_4.0.5       forcats_0.5.1      stringr_1.4.0      dplyr_1.0.7        purrr_0.3.4        readr_2.0.2        tidyr_1.1.4        tibble_3.1.5       ggplot2_3.3.3     
[11] tidyverse_1.3.1    Signac_1.4.0      

loaded via a namespace (and not attached):
  [1] readxl_1.3.1           backports_1.2.1        fastmatch_1.1-3        plyr_1.8.6             igraph_1.2.6           lazyeval_0.2.2         splines_4.1.1          BiocParallel_1.26.2   
  [9] listenv_0.8.0          scattermore_0.7        SnowballC_0.7.0        GenomeInfoDb_1.26.7    digest_0.6.28          htmltools_0.5.2        fansi_0.5.0            magrittr_2.0.1        
 [17] tensor_1.5             cluster_2.1.2          ROCR_1.0-11            tzdb_0.2.0             globals_0.14.0         Biostrings_2.60.2      modelr_0.1.8           matrixStats_0.61.0    
 [25] docopt_0.7.1           spatstat.sparse_2.0-0  colorspace_2.0-2       rvest_1.0.1            ggrepel_0.9.1          haven_2.4.3            sparsesvd_0.2          crayon_1.4.1          
 [33] RCurl_1.98-1.5         jsonlite_1.7.2         spatstat.data_2.1-0    survival_3.2-11        zoo_1.8-9              glue_1.4.2             polyclip_1.10-0        gtable_0.3.0          
 [41] zlibbioc_1.38.0        XVector_0.32.0         leiden_0.3.9           future.apply_1.8.1     BiocGenerics_0.36.1    abind_1.4-5            scales_1.1.1           DBI_1.1.1             
 [49] miniUI_0.1.1.1         Rcpp_1.0.7             viridisLite_0.4.0      xtable_1.8-4           reticulate_1.22        spatstat.core_2.3-0    stats4_4.1.1           htmlwidgets_1.5.4     
 [57] httr_1.4.2             RColorBrewer_1.1-2     ellipsis_0.3.2         ica_1.0-2              pkgconfig_2.0.3        farver_2.1.0           ggseqlogo_0.1          uwot_0.1.10           
 [65] dbplyr_2.1.1           deldir_0.2-10          utf8_1.2.2             labeling_0.4.2         tidyselect_1.1.1       rlang_0.4.12           reshape2_1.4.4         later_1.3.0           
 [73] cellranger_1.1.0       munsell_0.5.0          tools_4.1.1            cli_3.1.0              generics_0.1.0         broom_0.7.6            ggridges_0.5.3         fastmap_1.1.0         
 [81] goftest_1.2-2          fs_1.5.0               fitdistrplus_1.1-6     RANN_2.6.1             pbapply_1.5-0          future_1.21.0          nlme_3.1-152           mime_0.12             
 [89] slam_0.1-48            RcppRoll_0.3.0         xml2_1.3.2             compiler_4.1.1         rstudioapi_0.13        plotly_4.9.4.1         png_0.1-7              spatstat.utils_2.2-0  
 [97] reprex_2.0.1           tweenr_1.0.2           stringi_1.7.5          lattice_0.20-44        Matrix_1.3-4           vctrs_0.3.8            pillar_1.6.3           lifecycle_1.0.1       
[105] spatstat.geom_2.2-2    lmtest_0.9-38          RcppAnnoy_0.0.19       data.table_1.14.0      cowplot_1.1.1          bitops_1.0-7           irlba_2.3.3            httpuv_1.6.3          
[113] patchwork_1.1.1        GenomicRanges_1.42.0   R6_2.5.1               promises_1.2.0.1       KernSmooth_2.23-20     gridExtra_2.3          lsa_0.73.2             IRanges_2.26.0        
[121] parallelly_1.28.1      codetools_0.2-18       MASS_7.3-54            assertthat_0.2.1       withr_2.4.2            qlcMatrix_0.9.7        sctransform_0.3.2      Rsamtools_2.8.0       
[129] S4Vectors_0.30.2       GenomeInfoDbData_1.2.6 mgcv_1.8-36            parallel_4.1.1         hms_1.1.1              grid_4.1.1             rpart_4.1-15           Rtsne_0.15            
[137] ggforce_0.3.3          lubridate_1.7.10       shiny_1.7.1

Thank you!

@laasov laasov added the bug Something isn't working label Nov 5, 2021
@skilpinen
Copy link

I followed this line by line and problem is same we encountered earlier with GeneActivity function. LinkPeaks() does have this hardcoded:
gene.coords.use <- gene.coords[gene.coords$gene_name %in%
genes, ]

Meaning that gene id given to it must be gene name and not gene id. Our data is Ensembl id based thus this fails.

@timoast
Copy link
Collaborator

timoast commented Nov 13, 2021

Hi @laasov, I've added a parameter to control whether to use gene names or gene IDs in the gene annotations (gene.id parameter in LinkPeaks()).

Could you try installing from the develop branch and see if setting this parameter solves the issue?

@laasov
Copy link
Author

laasov commented Nov 15, 2021

Hello @timoast. Thank you! Seems like this got rid of the previous error. However, running

linkpeaks.out <- LinkPeaks(rv2.ss,
                           peak.assay = "peaks",
                           expression.assay = "RNA",
                           n_sample = 200,
                           genes.use = lp.genes,
                           min.distance = 2000,
                           distance = 50000,
                           gene.id = T)

now stops with an error:

Testing 4 genes and 107523 peaks
  |                                                  | 0 % ~calculating  
Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 't': invalid character indexing

Traceback:

Error in h(simpleError(msg, call)) : 
error in evaluating the argument 'x' in selecting a method for function 't': invalid character indexing

12. h(simpleError(msg, call))

11. .handleSimpleError(function (cond) 
.Internal(C_tryCatchHelper(addr, 1L, cond)), "invalid character indexing", 
base::quote(intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE)))

10. stop("invalid character indexing")

9. intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE)

8. subCsp_rows(x, i, drop = drop)

7. expression.data[genes.use[[i]], , drop = FALSE]

6. expression.data[genes.use[[i]], , drop = FALSE]

5. t(x = expression.data[genes.use[[i]], , drop = FALSE])

4. FUN(X[[i]], ...)

3. lapply(X[Split[[i]]], FUN, ...)

2. mylapply(X = seq_along(along.with = genes.use), FUN = function(i) {
peak.use <- as.logical(x = peak_distance_matrix[, genes.use[[i]]])
gene.expression <- t(x = expression.data[genes.use[[i]], 
, drop = FALSE]) ...

1. LinkPeaks(rv2.ss, peak.assay = "peaks", expression.assay = "RNA", 
n_sample = 200, genes.use = lp.genes, min.distance = 2000, 
distance = 50000, gene.id = T)

May I open another issue for this error or would it be okay to study the error under this one?

@timoast
Copy link
Collaborator

timoast commented Nov 15, 2021

Can you show what is lp.genes? And the output of sum(lp.genes) %in% rownames(rv2.ss[["RNA"]])?

@laasov
Copy link
Author

laasov commented Nov 15, 2021

Sure!

> lp.genes
[1] "ENSMUSG00000028716" "ENSMUSG00000028717" "ENSMUSG00000015053" "ENSMUSG00000015619"
> sum((lp.genes) %in% rownames(rv2.ss[["RNA"]]))
[1] 4

@timoast
Copy link
Collaborator

timoast commented Nov 15, 2021

Thanks, I think I found the issue. Can you try reinstalling from the develop branch and see if it's fixed?

@laasov
Copy link
Author

laasov commented Nov 15, 2021

Thanks, it got past that point. One more thing, I guess. It prints:

Testing 4 genes and 107523 peaks
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=05s  
Error in which(x = x == tss$gene_name)[[1]] : subscript out of bounds

Traceback (if that helps to locate the problem) tells that it comes from sapply(X = rownames(x = linkmat), FUN = function(x) { which(x = x == tss$gene_name)[[1]] }).

Edit: Whoops, forgot to mention this came after reinstalling the package.

@timoast
Copy link
Collaborator

timoast commented Nov 15, 2021

Ah, looks like I missed one. Made some more updates, hopefully it's fixed for real now

@laasov
Copy link
Author

laasov commented Nov 15, 2021

No worries! However, I'm afraid it's still missing something. It prints:

Testing 4 genes and 107523 peaks
Error in LinkPeaks(rv2.ss, peak.assay = "peaks", expression.assay = "RNA",  : 
  object 'gene.coords.use' not found

@timoast
Copy link
Collaborator

timoast commented Nov 15, 2021

Ok should be fixed now

@laasov
Copy link
Author

laasov commented Nov 15, 2021

Looks good, at least Signac::Links returns reasonable-looking values. Thank you so much and congratulations on your new paper! Closing this issue.

@laasov laasov closed this as completed Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants