Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an error about LinkPeaks #759

Closed
ws-xss opened this issue Aug 18, 2021 · 6 comments
Closed

an error about LinkPeaks #759

ws-xss opened this issue Aug 18, 2021 · 6 comments

Comments

@ws-xss
Copy link

ws-xss commented Aug 18, 2021

HI ,
I've got a problem I can't solve when i used LinkPeaks. I thought my error was similar with #12 and i try to solve it by adding sep = c(":", "-") but i failed.
I found some DEs and want to link the peaks and i got this error in some genes:

Testing 1 genes and 132282 peaks
Testing 1 genes and 132282 peaks
Testing 1 genes and 132282 peaks
Testing 1 genes and 132282 peaks
Testing 1 genes and 132282 peaks
Testing 1 genes and 132282 peaks
Testing 1 genes and 132282 peaks
No significant links found
Testing 1 genes and 132282 peaks
Testing 1 genes and 132282 peaks
Error in .get_data_frame_col_as_numeric(df, granges_cols[["start"]]) : 
  some values in the "start" column cannot be turned into numeric values
Calls: CoveragePlot ... makeGRangesFromDataFrame -> .get_data_frame_col_as_numeric
In addition: Warning messages:
1: Removed 2 rows containing missing values (position_stack). 
2: Removed 1 rows containing missing values (position_stack). 
3: Removed 1 rows containing missing values (position_stack). 
4: Removed 2 rows containing missing values (position_stack). 
5: Removed 1 rows containing missing values (position_stack). 
6: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [1].
Execution halted
(the error gene is HLA-DPB1)

My code:
(1)callpeak:

##call peaks using MACS2
peaks <- CallPeaks(pbmc, macs2.path = "/share/work4/wangshuang179/software/conda/Miniconda3/Miniconda3/bin/macs2")
head(peaks)
head(rownames(peaks))
peaks <- GRangesToString(peaks,sep = c(":", "-"))   ##add for this error
peaks <- StringToGRanges(peaks,sep = c(":", "-"))   ##add for this error
##remove peaks on nonstandard chromosomes and in genomic blacklist regions
peaks <- keepStandardChromosomes(peaks, pruning.mode = "coarse")
peaks <- subsetByOverlaps(x = peaks, ranges = blacklist_hg38_unified, invert = TRUE)
#peaks <- subsetByOverlaps(x = peaks, ranges = blacklist, invert = TRUE)
##quantify counts in each peak
peaks <- GRangesToString(peaks,sep = c(":", "-"))   ##add for this error
peaks <- StringToGRanges(peaks,sep = c(":", "-"))   ##add for this error
head(rownames(peaks))
#macs2_counts <- FeatureMatrix(fragments = Fragments(pbmc),features = peaks,sep = c(":", "-"),cell = colnames(pbmc),verbose = TRUE)
macs2_counts <- FeatureMatrix(fragments = Fragments(pbmc),features = peaks,sep = c(":", "-"),cell = colnames(pbmc),verbose = TRUE)
##create a new assay using the MACS2 peak set and add it to the Seurat object
pbmc[["peaks"]] <- CreateChromatinAssay(counts = macs2_counts,sep = c(":", "-"),fragments = fragpath,annotation = annotation)
**(2)LinkPeaks:**
pbmc <- readRDS(argv$rds)
##find diff gene
DefaultAssay(pbmc) <- "RNA"
markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
write.table(markers,file=paste(argv$outdir,'/',argv$sample,"_diff.xls",sep=""),sep="\t",row.names = FALSE,col.names = TRUE,quote=F)
diff <- markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_log2FC)
write.table(diff,file=paste(argv$outdir,'/',argv$sample,"_diff_top2.xls",sep=""),sep="\t",row.names = FALSE,col.names = TRUE,quote=F)
print("find diff gene ok")
##link peaks to genes
peak_anno_gene <- unique(as.vector(pbmc@assays$peaks@annotation$gene_name))
gene_diff <- unique(diff$gene)
GENE <- intersect(peak_anno_gene,gene_diff)
print(GENE)
cluster <- unique(pbmc@meta.data$seurat_clusters)
print(cluster)
DefaultAssay(pbmc) <- "peaks"
for (i in GENE){
    print(i)
    pbmc_plot <- pbmc
    DefaultAssay(pbmc_plot) <- "peaks"
    pbmc_plot <- RegionStats(pbmc_plot, genome = BSgenome.Hsapiens.UCSC.hg38)
    head(rownames(pbmc_plot@assays$peaks))
    head(rownames(pbmc_plot@assays$SCT))
    pbmc_plot <- LinkPeaks(object = pbmc_plot,peak.assay = "peaks",expression.assay = "SCT",genes.use = i)
    pdf(paste(argv$outdir,'/',argv$sample,"_",i,'_LinkPeaks.pdf',sep=''),onefile=F,width=6,height=6)
    p1 <- CoveragePlot(object = pbmc_plot,region = i,features = i,expression.assay = "SCT",idents = cluster)
    print(p1)
    dev.off()
}

Some output:

(1)head(peaks):
image
(2)head(rownames(peaks)):
NULL
I dont known why its NULL and it looks work well with some genes.

@ws-xss ws-xss added the documentation Documentation help label Aug 18, 2021
@timoast
Copy link
Collaborator

timoast commented Aug 27, 2021

It's not clear from this what the issue is. If you can make a minimal reproducible example (removing all code that is not relevant to running the LinkPeaks() function), I will take a look. You should also include the output of sessionInfo().

@timoast
Copy link
Collaborator

timoast commented Sep 10, 2021

Closing since I have not heard back, please reopen if you can provide the requested information

@timoast timoast closed this as completed Sep 10, 2021
@timoast timoast removed the documentation Documentation help label Sep 10, 2021
@Baboon61
Copy link

I am reopening this issue, I think the problem is coming from the dashes inside gene names ("HLA-DPB1").
This my example with pbmc

library(Signac)
library(Seurat)
library(EnsDb.Hsapiens.v86)
library(BSgenome.Hsapiens.UCSC.hg38)
set.seed(1234)

OS_path <- "/my/path/"
counts <- Read10X_h5(paste0(OS_path,"Data/Others/Signac/pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.h5"))
fragpath <- paste0(OS_path,"Data/Others/Signac/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz")

annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
seqlevelsStyle(annotation) <- "UCSC"
genome(annotation) <- "hg38"

pbmc <- CreateSeuratObject(
  counts = counts$`Gene Expression`,
  assay = "RNA"
)
pbmc[["ATAC"]] <- CreateChromatinAssay(
  counts = counts$Peaks,
  sep = c(":", "-"),
  fragments = fragpath,
  annotation = annotation
)

DefaultAssay(pbmc) <- "ATAC"
pbmc <- NucleosomeSignal(pbmc)
pbmc <- TSSEnrichment(pbmc)

#Hard QC to lower down the number of cell to speed up the peak calling using macs2
pbmc <- subset(
  x = pbmc,
  subset = nCount_ATAC < 25000 &
    nCount_RNA < 5000 &
    nCount_ATAC > 15000 &
    nCount_RNA > 1500 &
    nucleosome_signal < 1.1 &
    TSS.enrichment > 4
)

peaks <- CallPeaks(pbmc, macs2.path = "/my/path/anaconda3/envs/Tools/bin/macs2")
peaks <- keepStandardChromosomes(peaks, pruning.mode = "coarse")
peaks <- subsetByOverlaps(x = peaks, ranges = blacklist_hg38_unified, invert = TRUE)
macs2_counts <- FeatureMatrix(
  fragments = Fragments(pbmc),
  features = peaks,
  cells = colnames(pbmc)
)
pbmc[["peaks"]] <- CreateChromatinAssay(
  counts = macs2_counts,
  fragments = fragpath,
  annotation = annotation
)

DefaultAssay(pbmc) <- "RNA"
pbmc <- SCTransform(pbmc)
pbmc <- RunPCA(pbmc)
DefaultAssay(pbmc) <- "peaks"
pbmc <- FindTopFeatures(pbmc, min.cutoff = 5)
pbmc <- RunTFIDF(pbmc)
pbmc <- RunSVD(pbmc)

pbmc <- FindMultiModalNeighbors(
  object = pbmc,
  reduction.list = list("pca", "lsi"), 
  dims.list = list(1:50, 2:40),
  modality.weight.name = "RNA.weight",
  verbose = TRUE
)
pbmc <- FindClusters(pbmc, resolution = 0.7, verbose = FALSE, graph.name="wsnn")
pbmc <- RunUMAP(
  object = pbmc,
  nn.name = "weighted.nn",
  assay = "RNA",
  verbose = TRUE
)

DefaultAssay(pbmc) <- "peaks"
pbmc <- RegionStats(pbmc, genome = BSgenome.Hsapiens.UCSC.hg38)
pbmc <- LinkPeaks(
  object = pbmc,
  peak.assay = "peaks",
  expression.assay = "SCT",
  genes.use = c("LYZ", "ADAMTSL4-AS1", "LARGE-AS1", "HLA-DQA2", "HLA-DOA")
)

#Genomic regions for both genes
Links(pbmc)

Idents(object = pbmc) <- "seurat_clusters"

#This plot is working
CoveragePlot(
  object = pbmc,
  region = "LYZ",
  features = "LYZ",
  expression.assay = "SCT",
  extend.upstream = 500,
  extend.downstream = 10000
)

#This one is not, also not working for "LARGE-AS1", "HLA-DQA2" and "HLA-DOA"
CoveragePlot(
  object = pbmc,
  region = "ADAMTSL4-AS1",
  features = "ADAMTSL4-AS1",
  expression.assay = "SCT",
  extend.upstream = 500,
  extend.downstream = 10000
)
Warning message:
"Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [1]."

Error in .get_data_frame_col_as_numeric(df, granges_cols[["start"]]): some values in the "start" column cannot be turned into numeric values
Traceback:

1. CoveragePlot(object = pbmc, region = "ADAMTSL4-AS1", features = "ADAMTSL4-AS1", 
 .     expression.assay = "SCT", extend.upstream = 500, extend.downstream = 10000)
2. SingleCoveragePlot(object = object, region = region, annotation = annotation, 
 .     features = features, expression.assay = expression.assay, 
 .     expression.slot = expression.slot, show.bulk = show.bulk, 
 .     peaks = peaks, peaks.group.by = peaks.group.by, ranges = ranges, 
 .     ranges.group.by = ranges.group.by, ranges.title = ranges.title, 
 .     region.highlight = region.highlight, assay = assay, links = links, 
 .     tile = tile, tile.size = tile.size, tile.cells = tile.cells, 
 .     bigwig = bigwig, bigwig.type = bigwig.type, group.by = group.by, 
 .     window = window, extend.upstream = extend.upstream, extend.downstream = extend.downstream, 
 .     ymax = ymax, scale.factor = scale.factor, cells = cells, 
 .     idents = idents, sep = sep, heights = heights, max.downsample = max.downsample, 
 .     downsample.rate = downsample.rate)
3. FindRegion(object = object, region = region, sep = sep, assay = assay, 
 .     extend.upstream = extend.upstream, extend.downstream = extend.downstream)
4. StringToGRanges(regions = region, sep = sep)
5. makeGRangesFromDataFrame(df = ranges.df, ...)
6. .get_data_frame_col_as_numeric(df, granges_cols[["start"]])
7. stop(wmsg("some values in the ", "\"", names(df)[[col]], "\" ", 
 .     "column cannot be turned into numeric values"))
sessionInfo()

R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/bastien/anaconda3/envs/R4_jupyter/lib/libopenblasp-r0.3.15.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=sv_SE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=sv_SE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=sv_SE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] data.table_1.14.0                 BSgenome.Hsapiens.UCSC.hg38_1.4.3
 [3] BSgenome_1.60.0                   rtracklayer_1.52.0               
 [5] Biostrings_2.60.0                 XVector_0.32.0                   
 [7] EnsDb.Hsapiens.v86_2.99.0         ensembldb_2.16.0                 
 [9] AnnotationFilter_1.16.0           GenomicFeatures_1.44.0           
[11] AnnotationDbi_1.54.0              Biobase_2.52.0                   
[13] GenomicRanges_1.44.0              GenomeInfoDb_1.28.0              
[15] IRanges_2.26.0                    S4Vectors_0.30.0                 
[17] BiocGenerics_0.38.0               SeuratObject_4.0.1               
[19] Seurat_4.0.2                      Signac_1.3.0                     

loaded via a namespace (and not attached):
  [1] utf8_1.2.1                  reticulate_1.20            
  [3] tidyselect_1.1.1            RSQLite_2.2.5              
  [5] htmlwidgets_1.5.3           grid_4.1.0                 
  [7] docopt_0.7.1                BiocParallel_1.26.0        
  [9] Rtsne_0.15                  munsell_0.5.0              
 [11] codetools_0.2-18            ica_1.0-2                  
 [13] pbdZMQ_0.3-5                future_1.22.1              
 [15] miniUI_0.1.1.1              colorspace_2.0-1           
 [17] filelock_1.0.2              knitr_1.33                 
 [19] uuid_0.1-4                  rstudioapi_0.13            
 [21] ROCR_1.0-11                 tensor_1.5                 
 [23] listenv_0.8.0               labeling_0.4.2             
 [25] MatrixGenerics_1.4.0        slam_0.1-48                
 [27] repr_1.1.3                  GenomeInfoDbData_1.2.6     
 [29] polyclip_1.10-0             bit64_4.0.5                
 [31] farver_2.1.0                parallelly_1.27.0          
 [33] vctrs_0.3.8                 generics_0.1.0             
 [35] xfun_0.24                   biovizBase_1.40.0          
 [37] BiocFileCache_2.0.0         lsa_0.73.2                 
 [39] ggseqlogo_0.1               R6_2.5.0                   
 [41] hdf5r_1.3.4                 bitops_1.0-7               
 [43] spatstat.utils_2.1-0        cachem_1.0.5               
 [45] DelayedArray_0.18.0         assertthat_0.2.1           
 [47] promises_1.2.0.1            BiocIO_1.2.0               
 [49] scales_1.1.1                nnet_7.3-16                
 [51] gtable_0.3.0                globals_0.14.0             
 [53] goftest_1.2-2               rlang_0.4.11               
 [55] RcppRoll_0.3.0              splines_4.1.0              
 [57] lazyeval_0.2.2              dichromat_2.0-0            
 [59] checkmate_2.0.0             spatstat.geom_2.1-0        
 [61] yaml_2.2.1                  reshape2_1.4.4             
 [63] abind_1.4-5                 backports_1.2.1            
 [65] httpuv_1.6.1                Hmisc_4.5-0                
 [67] tools_4.1.0                 ggplot2_3.3.3              
 [69] ellipsis_0.3.2              spatstat.core_2.1-2        
 [71] RColorBrewer_1.1-2          ggridges_0.5.3             
 [73] Rcpp_1.0.7                  plyr_1.8.6                 
 [75] base64enc_0.1-3             progress_1.2.2             
 [77] zlibbioc_1.38.0             purrr_0.3.4                
 [79] RCurl_1.98-1.4              prettyunits_1.1.1          
 [81] rpart_4.1-15                deldir_0.2-10              
 [83] pbapply_1.4-3               cowplot_1.1.1              
 [85] zoo_1.8-9                   SummarizedExperiment_1.22.0
 [87] ggrepel_0.9.1               cluster_2.1.2              
 [89] magrittr_2.0.1              RSpectra_0.16-0            
 [91] scattermore_0.7             lmtest_0.9-38              
 [93] RANN_2.6.1                  SnowballC_0.7.0            
 [95] ProtGenerics_1.24.0         fitdistrplus_1.1-5         
 [97] matrixStats_0.59.0          hms_1.1.0                  
 [99] patchwork_1.1.1             mime_0.10                  
[101] evaluate_0.14               xtable_1.8-4               
[103] XML_3.99-0.7                jpeg_0.1-8.1               
[105] sparsesvd_0.2               gridExtra_2.3              
[107] compiler_4.1.0              biomaRt_2.48.0             
[109] tibble_3.1.2                KernSmooth_2.23-20         
[111] crayon_1.4.1                htmltools_0.5.1.1          
[113] mgcv_1.8-36                 later_1.2.0                
[115] Formula_1.2-4               tidyr_1.1.3                
[117] DBI_1.1.1                   tweenr_1.0.2               
[119] dbplyr_2.1.1                MASS_7.3-54                
[121] rappdirs_0.3.3              Matrix_1.3-4               
[123] igraph_1.2.6                pkgconfig_2.0.3            
[125] GenomicAlignments_1.28.0    foreign_0.8-81             
[127] IRdisplay_1.0               plotly_4.9.3               
[129] spatstat.sparse_2.0-0       VariantAnnotation_1.38.0   
[131] stringr_1.4.0               digest_0.6.27              
[133] sctransform_0.3.2           RcppAnnoy_0.0.18           
[135] spatstat.data_2.1-0         leiden_0.3.8               
[137] fastmatch_1.1-3             htmlTable_2.2.1            
[139] uwot_0.1.10                 restfulr_0.0.13            
[141] curl_4.3.1                  shiny_1.6.0                
[143] Rsamtools_2.8.0             rjson_0.2.20               
[145] lifecycle_1.0.0             nlme_3.1-152               
[147] jsonlite_1.7.2              viridisLite_0.4.0          
[149] fansi_0.4.2                 pillar_1.6.1               
[151] lattice_0.20-44             KEGGREST_1.32.0            
[153] fastmap_1.1.0               httr_1.4.2                 
[155] survival_3.2-11             glue_1.4.2                 
[157] qlcMatrix_0.9.7             png_0.1-7                  
[159] bit_4.0.4                   ggforce_0.3.3              
[161] stringi_1.6.2               blob_1.2.2                 
[163] latticeExtra_0.6-29         memoise_2.0.0              
[165] IRkernel_1.2                dplyr_1.0.6                
[167] irlba_2.3.3                 future.apply_1.7.0

timoast added a commit that referenced this issue Sep 20, 2021
@timoast
Copy link
Collaborator

timoast commented Sep 20, 2021

Hi @Baboon61, this should now be fixed on the develop branch

@Baboon61
Copy link

Hello, sorry not on Signac_1.3.0.9010

@timoast
Copy link
Collaborator

timoast commented Sep 20, 2021

This definitely works for me using Signac 1.3.0.9010 (both LinkPeaks() and plotting genes with - in the gene name using CoveragePlot())

If it's not working for you please open a new issue with your code and output of sessionInfo()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants