Region Matrix won't work #1076

masonsweat · 2022-04-12T20:28:58Z

Hi Tim, Thanks in advance...

Essentially I cannot get region matrix to work. Here is the code I used to generate my Seurat object before trying to use feature matrix:

annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
seqlevelsStyle(annotation) <- "UCSC"

peaks.AF1 <- read.table(
  file = "AF1_atac_peaks.bed",
  col.names = c("chr", "start", "end")
)
#I then filtered by peak width. This dataset contains 4 objects but I am only showing one. 

peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths  < 10000 & peakwidths > 20]
combined.peaks
# Read metadata
md.AF1 <- read.table(
  file = "AF1_per_barcode_metrics.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-2, ] 

# Filter
md.AF1 <- md.AF1[md.AF1$atac_raw_reads > 250, ]

# create fragment objects
frags.AF1 <- CreateFragmentObject(
  path = "AF1_atac_fragments.tsv.gz",
  cells = rownames(md.AF1)
)
# Make counts using combined peak set
AF1.counts <- FeatureMatrix(
  fragments = frags.AF1,
  features = combined.peaks,
  cells = rownames(md.AF1)
)

AF1_assay <- CreateChromatinAssay(AF1.counts, fragments = frags.AF1, annotation = annotation, min.features = -1)
AF1 <- CreateSeuratObject(AF1_assay, assay = "ATAC", meta.data=md.AF1)
AF1[["RNA"]]<- CreateAssayObject(counts=counts$'Gene Expression'[,colnames(AF1)])
AF1

Output:
An object of class Seurat 
178393 features across 2283 samples within 2 assays 
Active assay: ATAC (146108 features, 0 variable features)
 1 other assay present: RNA

## I then performed qc, and merged 4 different seurats together. 
## I went through the weighted nearest neighbors vignett to identify clusters, make reductions, ect. 
## Then I try to do the region matrix to look at the signal within a few select clusters of cells:

# First make a granges object
bed <- read.table('filepath', sep = '\t', header = FALSE)
names(bed)[1] <- 'chromosome'
names(bed)[2] <- 'start'
names(bed)[3] <- 'end'
newrange <- makeGRangesFromDataFrame(bed, start.field = 'start', end.field = 'end',
      seqinfo = c('chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10',
                         'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18',
                         'chr19', 'chr20', 'chr21', 'chrX'), seqnames.field = 'chromosome' )
newrange 

Output:
GRanges object with 1112 ranges and 0 metadata columns:
         seqnames              ranges strand
            <Rle>           <IRanges>  <Rle>
     [1]     chr1 130803960-130804946      *
     [2]     chr1     9858933-9859881      *
     [3]     chr1   77048816-77049875      *
     [4]     chr1   79631214-79632159      *
     [5]     chr1   92343525-92344341      *
     ...      ...                 ...    ...
  [1108]    chr19   10433720-10434542      *
  [1109]    chr19   47561819-47562748      *
  [1110]    chr19   40368233-40369160      *
  [1111]    chr19     5821966-5822899      *
  [1112]     chrX 159942321-159943395      *
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths

## do regionmatrix:
idents.plot <- c("x1", "x2", "x3", 'x4', "x5", "x6")

DefaultAssay(Seurat)<- 'ATAC'
obj <-RegionMatrix(
  Seurat,
  newrange,
  key = 'myregions',
  assay = NULL,
  idents = idents.plot,
  upstream = 1000,
  downstream = 1000,
  verbose = TRUE,
)
obj

output:
An object of class Seurat 
200391 features across 14569 samples within 4 assays 
Active assay: ATAC (146056 features, 146056 variable features)
 3 other assays present: RNA, chromvar, SCT
 6 dimensional reductions calculated: pca, umap.rna, lsi, umap.atac, wnn.umap, umap
## Then I try RegionHeatMap

RegionHeatmap(
  obj,
  key = 'myregions',
  assay = 'ATAC',
  idents = idents.plot,
  normalize = TRUE,
  upstream = 1000,
  downstream = 1000,
  max.cutoff = "q95",
  cols = NULL,
  min.counts = 1,
  window = (2000)/30,
  order = TRUE,
  nrow = NULL
)

This returns 0 counts for any of the identities, although I know that isn't the case because there are fragments in the coverageplot. An example:
CoveragePlot(Seurat, 'chr1-130803960-130804946', idents = idents.plot)

masonsweat · 2022-04-12T20:38:01Z

Full session info:
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods
[9] base

other attached packages:
[1] dplyr_1.0.8 SeuratDisk_0.0.0.9019
[3] stringr_1.4.0 BSgenome.Mmusculus.UCSC.mm10_1.4.3
[5] BSgenome_1.62.0 rtracklayer_1.54.0
[7] Biostrings_2.62.0 XVector_0.34.0
[9] TFBSTools_1.32.0 JASPAR2020_0.99.10
[11] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.18.4
[13] AnnotationFilter_1.18.0 GenomicFeatures_1.46.5
[15] AnnotationDbi_1.56.2 cicero_1.3.5
[17] Gviz_1.38.4 shiny_1.7.1
[19] patchwork_1.1.1 ggplot2_3.3.5
[21] Matrix_1.4-1 monocle3_1.0.0
[23] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
[25] MatrixGenerics_1.6.0 matrixStats_0.61.0
[27] Biobase_2.54.0 SeuratWrappers_0.3.0
[29] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[31] IRanges_2.28.0 S4Vectors_0.32.4
[33] BiocGenerics_0.40.0 SeuratObject_4.0.4
[35] Seurat_4.1.0 Signac_1.6.0.9010

loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 SnowballC_0.7.0 scattermore_0.8
[4] R.methodsS3_1.8.1 tidyr_1.2.0 bit64_4.0.5
[7] knitr_1.38 irlba_2.3.5 DelayedArray_0.20.0
[10] R.utils_2.11.0 data.table_1.14.2 rpart_4.1.16
[13] KEGGREST_1.34.0 RCurl_1.98-1.6 generics_0.1.2
[16] callr_3.7.0 leidenbase_0.1.9 cowplot_1.1.1
[19] RSQLite_2.2.12 RANN_2.6.1 VGAM_1.1-6
[22] proxy_0.4-26 future_1.24.0 tzdb_0.3.0
[25] bit_4.0.4 spatstat.data_2.1-4 xml2_1.3.3
[28] httpuv_1.6.5 assertthat_0.2.1 DirichletMultinomial_1.36.0
[31] viridis_0.6.2 xfun_0.30 hms_1.1.1
[34] jquerylib_0.1.4 promises_1.2.0.1 fansi_1.0.3
[37] restfulr_0.0.13 progress_1.2.2 caTools_1.18.2
[40] dbplyr_2.1.1 igraph_1.3.0 DBI_1.1.2
[43] htmlwidgets_1.5.4 sparsesvd_0.2 spatstat.geom_2.4-0
[46] purrr_0.3.4 ellipsis_0.3.2 backports_1.4.1
[49] annotate_1.72.0 biomaRt_2.50.3 deldir_1.0-6
[52] vctrs_0.4.0 remotes_2.4.2 ROCR_1.0-11
[55] abind_1.4-5 cachem_1.0.6 withr_2.5.0
[58] checkmate_2.0.0 sctransform_0.3.3 GenomicAlignments_1.30.0
[61] prettyunits_1.1.1 goftest_1.2-3 cluster_2.1.3
[64] seqLogo_1.60.0 lazyeval_0.2.2 crayon_1.5.1
[67] hdf5r_1.3.5 pkgconfig_2.0.3 slam_0.1-50
[70] labeling_0.4.2 vipor_0.4.5 nlme_3.1-157
[73] ProtGenerics_1.26.0 nnet_7.3-17 rlang_1.0.2
[76] globals_0.14.0 lifecycle_1.0.1 miniUI_0.1.1.1
[79] filelock_1.0.2 BiocFileCache_2.2.1 rsvd_1.0.5
[82] dichromat_2.0-0 ggrastr_1.0.1 rprojroot_2.0.3
[85] polyclip_1.10-0 lmtest_0.9-40 ggseqlogo_0.1
[88] zoo_1.8-9 beeswarm_0.4.0 base64enc_0.1-3
[91] ggridges_0.5.3 processx_3.5.3 png_0.1-7
[94] viridisLite_0.4.0 rjson_0.2.21 bitops_1.0-7
[97] R.oo_1.24.0 KernSmooth_2.23-20 blob_1.2.3
[100] parallelly_1.31.0 spatstat.random_2.2-0 readr_2.1.2
[103] jpeg_0.1-9 CNEr_1.30.0 scales_1.1.1
[106] memoise_2.0.1 magrittr_2.0.3 plyr_1.8.7
[109] ica_1.0-2 zlibbioc_1.40.0 compiler_4.1.2
[112] tinytex_0.38 BiocIO_1.4.0 RColorBrewer_1.1-3
[115] fitdistrplus_1.1-8 Rsamtools_2.10.0 cli_3.2.0
[118] listenv_0.8.0 pbapply_1.5-0 ps_1.6.0
[121] htmlTable_2.4.0 Formula_1.2-4 MASS_7.3-56
[124] mgcv_1.8-40 tidyselect_1.1.2 stringi_1.7.6
[127] yaml_2.3.5 latticeExtra_0.6-29 ggrepel_0.9.1
[130] sass_0.4.1 VariantAnnotation_1.40.0 fastmatch_1.1-3
[133] tools_4.1.2 future.apply_1.8.1 parallel_4.1.2
[136] rstudioapi_0.13 TFMPvalue_0.0.8 foreign_0.8-82
[139] lsa_0.73.2 gridExtra_2.3 farver_2.1.0
[142] Rtsne_0.15 digest_0.6.29 BiocManager_1.30.16
[145] pracma_2.3.8 FNN_1.1.3 qlcMatrix_0.9.7
[148] Rcpp_1.0.8.3 later_1.3.0 RcppAnnoy_0.0.19
[151] httr_1.4.2 biovizBase_1.42.0 colorspace_2.0-3
[154] XML_3.99-0.9 tensor_1.5 reticulate_1.24
[157] splines_4.1.2 uwot_0.1.11 RcppRoll_0.3.0
[160] spatstat.utils_2.3-0 plotly_4.10.0 xtable_1.8-4
[163] poweRlaw_0.70.6 jsonlite_1.8.0 R6_2.5.1
[166] Hmisc_4.6-0 pillar_1.7.0 htmltools_0.5.2
[169] mime_0.12 glue_1.6.2 fastmap_1.1.0
[172] BiocParallel_1.28.3 codetools_0.2-18 pkgbuild_1.3.1
[175] utf8_1.2.2 lattice_0.20-45 bslib_0.3.1
[178] spatstat.sparse_2.1-0 tibble_3.1.6 ggbeeswarm_0.6.0
[181] curl_4.3.2 leiden_0.3.9 gtools_3.9.2
[184] GO.db_3.14.0 survival_3.3-1 docopt_0.7.1
[187] munsell_0.5.0 GenomeInfoDbData_1.2.7 reshape2_1.4.4
[190] gtable_0.3.0 spatstat.core_2.4-2

timoast · 2022-04-15T20:45:02Z

I wasn't able to reproduce this issue, are you able to provide an example using any of the datasets used in the Signac vignettes that recreates it?

masonsweat · 2022-04-15T22:40:26Z

I will try that and get back to you. Thanks!

…

On Apr 15, 2022, at 4:45 PM, Tim Stuart ***@***.***> wrote: I wasn't able to reproduce this issue, are you able to provide an example using any of the datasets used in the Signac vignettes that recreates it? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

masonsweat · 2022-04-16T19:02:21Z

Hi Tim,
Following your suggestion, I figured out how to reproduce the error! Could you please help me fix it, when you have time? :) (You are the greatest!!!).

I used the pbmc500 dataset to show that RegionMatrix works. I followed the merge vignette (https://satijalab.org/signac/articles/merging.html) to get the data, but did not include any other datasets.

Comparing my dataset to pbmc500, i couldn't find many differences. I removed data and fragment files but nothing made my object work. The only difference between my object and pbmc500 was that my object originated from merging 4 seurat objects.

To test the idea that merging somehow makes a difference, I tried Regionmatrix on my unmerged seurat objects. Actually, this worked for me. All 4 of the seurat objects could be used to make good RegionHeatmap plots!

I wanted to see if the error was exclusive to my datasets or if it could be reproduced using 10x datasets. To reproduce the error, I then followed the merge vignette and merged pbmc500 with pbmc1000.

After merging pbmc500 with pbmc1000, RegionMatrix no longer functions, and I get the same result as when I use it with my data (Regionheatmap shows 0 fragments in the specified regions).

In conclusion, for some reason RegionMatrix fails when applied to merged seurats.

Here is the code I used to merge pbmc500 and pbmc1000, which should reproduce the error. Session info is the same as listed above.

Again, thank you very much with your support for this issue.

Best,
Mason

Code:

peaks.500 <- read.table(
file = "Pbmc_500cell/atac_pbmc_500_nextgem_peaks.bed",
col.names = c("chr", "start", "end")
)
peaks.1k <- read.table(
file = "atac_pbmc_1k_nextgem_peaks.bed",
col.names = c("chr", "start", "end")
)
gr.500 <- makeGRangesFromDataFrame(peaks.500)
gr.1k <- makeGRangesFromDataFrame(peaks.1k)
combined.peaks <- reduce(x = c(gr.500, gr.1k))
peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths < 10000 & peakwidths > 20]
combined.peaks
md.500 <- read.table(
file = "atac_pbmc_500_nextgem_singlecell.csv",
stringsAsFactors = FALSE,
sep = ",",
header = TRUE,
row.names = 1
)[-1, ] # remove the first row
md.1k <- read.table(
file = "atac_pbmc_1k_nextgem_singlecell.csv",
stringsAsFactors = FALSE,
sep = ",",
header = TRUE,
row.names = 1
)[-1, ] # remove the first row
md.500 <- md.500[md.500$passed_filters > 500, ]
md.1k <- md.1k[md.1k$passed_filters > 500, ]
frags.500 <- CreateFragmentObject(
path = "atac_pbmc_500_nextgem_fragments.tsv.gz",
cells = rownames(md.500)
)
frags.1k <- CreateFragmentObject(
path = "atac_pbmc_1k_nextgem_fragments.tsv.gz",
cells = rownames(md.1k)
)
pbmc500.counts <- FeatureMatrix(
fragments = frags.500,
features = combined.peaks,
cells = rownames(md.500)
)
pbmc1k.counts <- FeatureMatrix(
fragments = frags.1k,
features = combined.peaks,
cells = rownames(md.1k)
)
pbmc500_assay <- CreateChromatinAssay(pbmc500.counts, fragments = frags.500)
pbmc500 <- CreateSeuratObject(pbmc500_assay, assay = "ATAC", meta.data=md.500)

pbmc1k_assay <- CreateChromatinAssay(pbmc1k.counts, fragments = frags.1k)
pbmc1k <- CreateSeuratObject(pbmc1k_assay, assay = "ATAC", meta.data=md.1k)
pbmc1k

pbmc500$dataset <- 'pbmc500'
pbmc1k$dataset <- 'pbmc1k'
combined <- merge(
x = pbmc500,
y = list(pbmc1k),
add.cell.ids = c("500", "1k")
)
combined[["ATAC"]]
combined

New bed file creation

bed <- read.table('/Users/masonsweat/Desktop/NGS_analysis/AF_Multiome/Regionmatrix/trial.bed', sep = '\t', header = FALSE)
names(bed)[1] <- 'chromosome'
names(bed)[2] <- 'start'
names(bed)[3] <- 'end'

newrange <- makeGRangesFromDataFrame(bed, start.field = 'start', end.field = 'end',
seqinfo = c('chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10',
'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18',
'chr19', 'chr20', 'chr21', 'chrX'), seqnames.field = 'chromosome' )
DefaultAssay(pbmc500)<- 'ATAC'
obj1 <-RegionMatrix(
combined,
newrange,
key = 'myregions',
assay = NULL,
idents = NULL,
upstream = 1000,
downstream = 1000,
verbose = TRUE,
)
DefaultAssay(obj) <- 'ATAC'
RegionHeatmap(
obj1,
key = 'myregions',
assay = 'ATAC',
idents = NULL,
normalize = TRUE,
upstream = 1000,
downstream = 1000,
max.cutoff = "q95",
cols = NULL,
min.counts = 0,
window = (2000)/30,
order = TRUE,
nrow = NULL
)

This returns the following plot:

It should be noted that the function works fine on either pbmc500 or pbmc1k, prior to merger. Here is an example regionheatmap for pbmc500:

obj1 <-RegionMatrix(
pbmc500,
newrange,
key = 'myregions',
assay = NULL,
idents = NULL,
upstream = 1000,
downstream = 1000,
verbose = TRUE,
)
DefaultAssay(obj) <- 'ATAC'
RegionHeatmap(
obj1,
key = 'myregions',
assay = 'ATAC',
idents = NULL,
normalize = TRUE,
upstream = 1000,
downstream = 1000,
max.cutoff = "q95",
cols = NULL,
min.counts = 0,
window = (2000)/30,
order = TRUE,
nrow = NULL
)

timoast · 2022-04-18T19:37:19Z

Thanks @masonsweat for looking into this! Turns out we weren't converting the cell names correctly when reading data from the fragment file, this should now be fixed on the develop branch.

masonsweat · 2022-04-18T21:39:01Z

It was my pleasure to look into it! I applied it to my data and it is working perfectly. Thank you so much for the quick fix Tim!

…

On Apr 18, 2022, at 15:37, Tim Stuart ***@***.***> wrote: Thanks @masonsweat <https://github.com/masonsweat> for looking into this! Turns out we weren't converting the cell names correctly when reading data from the fragment file, this should now be fixed on the develop branch. — Reply to this email directly, view it on GitHub <#1076 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATF6L4LOI55CZWSNGMJZBOLVFW2XVANCNFSM5TIOLYYA>. You are receiving this because you were mentioned.

masonsweat added the documentation Documentation help label Apr 12, 2022

timoast added a commit that referenced this issue Apr 18, 2022

Fix cell name lookup; #1076

26812e1

timoast closed this as completed Apr 18, 2022

feanaros mentioned this issue Apr 3, 2023

Region Matrix issue #1368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Region Matrix won't work #1076

Region Matrix won't work #1076

masonsweat commented Apr 12, 2022 •

edited

Loading

masonsweat commented Apr 12, 2022

timoast commented Apr 15, 2022

masonsweat commented Apr 15, 2022 via email

masonsweat commented Apr 16, 2022 •

edited

Loading

timoast commented Apr 18, 2022

masonsweat commented Apr 18, 2022 via email

Region Matrix won't work #1076

Region Matrix won't work #1076

Comments

masonsweat commented Apr 12, 2022 • edited Loading

masonsweat commented Apr 12, 2022

timoast commented Apr 15, 2022

masonsweat commented Apr 15, 2022 via email

masonsweat commented Apr 16, 2022 • edited Loading

New bed file creation

This returns the following plot:

timoast commented Apr 18, 2022

masonsweat commented Apr 18, 2022 via email

masonsweat commented Apr 12, 2022 •

edited

Loading

masonsweat commented Apr 16, 2022 •

edited

Loading