Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Region Matrix won't work #1076

Closed
masonsweat opened this issue Apr 12, 2022 · 6 comments
Closed

Region Matrix won't work #1076

masonsweat opened this issue Apr 12, 2022 · 6 comments
Labels
documentation Documentation help

Comments

@masonsweat
Copy link

masonsweat commented Apr 12, 2022

Hi Tim, Thanks in advance...

Essentially I cannot get region matrix to work. Here is the code I used to generate my Seurat object before trying to use feature matrix:

annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
seqlevelsStyle(annotation) <- "UCSC"

peaks.AF1 <- read.table(
  file = "AF1_atac_peaks.bed",
  col.names = c("chr", "start", "end")
)
#I then filtered by peak width. This dataset contains 4 objects but I am only showing one. 

peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths  < 10000 & peakwidths > 20]
combined.peaks
# Read metadata
md.AF1 <- read.table(
  file = "AF1_per_barcode_metrics.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-2, ] 

# Filter
md.AF1 <- md.AF1[md.AF1$atac_raw_reads > 250, ]

# create fragment objects
frags.AF1 <- CreateFragmentObject(
  path = "AF1_atac_fragments.tsv.gz",
  cells = rownames(md.AF1)
)
# Make counts using combined peak set
AF1.counts <- FeatureMatrix(
  fragments = frags.AF1,
  features = combined.peaks,
  cells = rownames(md.AF1)
)

AF1_assay <- CreateChromatinAssay(AF1.counts, fragments = frags.AF1, annotation = annotation, min.features = -1)
AF1 <- CreateSeuratObject(AF1_assay, assay = "ATAC", meta.data=md.AF1)
AF1[["RNA"]]<- CreateAssayObject(counts=counts$'Gene Expression'[,colnames(AF1)])
AF1

Output:
An object of class Seurat 
178393 features across 2283 samples within 2 assays 
Active assay: ATAC (146108 features, 0 variable features)
 1 other assay present: RNA

## I then performed qc, and merged 4 different seurats together. 
## I went through the weighted nearest neighbors vignett to identify clusters, make reductions, ect. 
## Then I try to do the region matrix to look at the signal within a few select clusters of cells:

# First make a granges object
bed <- read.table('filepath', sep = '\t', header = FALSE)
names(bed)[1] <- 'chromosome'
names(bed)[2] <- 'start'
names(bed)[3] <- 'end'
newrange <- makeGRangesFromDataFrame(bed, start.field = 'start', end.field = 'end',
      seqinfo = c('chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10',
                         'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18',
                         'chr19', 'chr20', 'chr21', 'chrX'), seqnames.field = 'chromosome' )
newrange 

Output:
GRanges object with 1112 ranges and 0 metadata columns:
         seqnames              ranges strand
            <Rle>           <IRanges>  <Rle>
     [1]     chr1 130803960-130804946      *
     [2]     chr1     9858933-9859881      *
     [3]     chr1   77048816-77049875      *
     [4]     chr1   79631214-79632159      *
     [5]     chr1   92343525-92344341      *
     ...      ...                 ...    ...
  [1108]    chr19   10433720-10434542      *
  [1109]    chr19   47561819-47562748      *
  [1110]    chr19   40368233-40369160      *
  [1111]    chr19     5821966-5822899      *
  [1112]     chrX 159942321-159943395      *
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths

## do regionmatrix:
idents.plot <- c("x1", "x2", "x3", 'x4', "x5", "x6")

DefaultAssay(Seurat)<- 'ATAC'
obj <-RegionMatrix(
  Seurat,
  newrange,
  key = 'myregions',
  assay = NULL,
  idents = idents.plot,
  upstream = 1000,
  downstream = 1000,
  verbose = TRUE,
)
obj

output:
An object of class Seurat 
200391 features across 14569 samples within 4 assays 
Active assay: ATAC (146056 features, 146056 variable features)
 3 other assays present: RNA, chromvar, SCT
 6 dimensional reductions calculated: pca, umap.rna, lsi, umap.atac, wnn.umap, umap
## Then I try RegionHeatMap

RegionHeatmap(
  obj,
  key = 'myregions',
  assay = 'ATAC',
  idents = idents.plot,
  normalize = TRUE,
  upstream = 1000,
  downstream = 1000,
  max.cutoff = "q95",
  cols = NULL,
  min.counts = 1,
  window = (2000)/30,
  order = TRUE,
  nrow = NULL
)

This returns 0 counts for any of the identities, although I know that isn't the case because there are fragments in the coverageplot. An example:
CoveragePlot(Seurat, 'chr1-130803960-130804946', idents = idents.plot)

image

@masonsweat masonsweat added the documentation Documentation help label Apr 12, 2022
@masonsweat
Copy link
Author

Full session info:
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods
[9] base

other attached packages:
[1] dplyr_1.0.8 SeuratDisk_0.0.0.9019
[3] stringr_1.4.0 BSgenome.Mmusculus.UCSC.mm10_1.4.3
[5] BSgenome_1.62.0 rtracklayer_1.54.0
[7] Biostrings_2.62.0 XVector_0.34.0
[9] TFBSTools_1.32.0 JASPAR2020_0.99.10
[11] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.18.4
[13] AnnotationFilter_1.18.0 GenomicFeatures_1.46.5
[15] AnnotationDbi_1.56.2 cicero_1.3.5
[17] Gviz_1.38.4 shiny_1.7.1
[19] patchwork_1.1.1 ggplot2_3.3.5
[21] Matrix_1.4-1 monocle3_1.0.0
[23] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
[25] MatrixGenerics_1.6.0 matrixStats_0.61.0
[27] Biobase_2.54.0 SeuratWrappers_0.3.0
[29] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[31] IRanges_2.28.0 S4Vectors_0.32.4
[33] BiocGenerics_0.40.0 SeuratObject_4.0.4
[35] Seurat_4.1.0 Signac_1.6.0.9010

loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 SnowballC_0.7.0 scattermore_0.8
[4] R.methodsS3_1.8.1 tidyr_1.2.0 bit64_4.0.5
[7] knitr_1.38 irlba_2.3.5 DelayedArray_0.20.0
[10] R.utils_2.11.0 data.table_1.14.2 rpart_4.1.16
[13] KEGGREST_1.34.0 RCurl_1.98-1.6 generics_0.1.2
[16] callr_3.7.0 leidenbase_0.1.9 cowplot_1.1.1
[19] RSQLite_2.2.12 RANN_2.6.1 VGAM_1.1-6
[22] proxy_0.4-26 future_1.24.0 tzdb_0.3.0
[25] bit_4.0.4 spatstat.data_2.1-4 xml2_1.3.3
[28] httpuv_1.6.5 assertthat_0.2.1 DirichletMultinomial_1.36.0
[31] viridis_0.6.2 xfun_0.30 hms_1.1.1
[34] jquerylib_0.1.4 promises_1.2.0.1 fansi_1.0.3
[37] restfulr_0.0.13 progress_1.2.2 caTools_1.18.2
[40] dbplyr_2.1.1 igraph_1.3.0 DBI_1.1.2
[43] htmlwidgets_1.5.4 sparsesvd_0.2 spatstat.geom_2.4-0
[46] purrr_0.3.4 ellipsis_0.3.2 backports_1.4.1
[49] annotate_1.72.0 biomaRt_2.50.3 deldir_1.0-6
[52] vctrs_0.4.0 remotes_2.4.2 ROCR_1.0-11
[55] abind_1.4-5 cachem_1.0.6 withr_2.5.0
[58] checkmate_2.0.0 sctransform_0.3.3 GenomicAlignments_1.30.0
[61] prettyunits_1.1.1 goftest_1.2-3 cluster_2.1.3
[64] seqLogo_1.60.0 lazyeval_0.2.2 crayon_1.5.1
[67] hdf5r_1.3.5 pkgconfig_2.0.3 slam_0.1-50
[70] labeling_0.4.2 vipor_0.4.5 nlme_3.1-157
[73] ProtGenerics_1.26.0 nnet_7.3-17 rlang_1.0.2
[76] globals_0.14.0 lifecycle_1.0.1 miniUI_0.1.1.1
[79] filelock_1.0.2 BiocFileCache_2.2.1 rsvd_1.0.5
[82] dichromat_2.0-0 ggrastr_1.0.1 rprojroot_2.0.3
[85] polyclip_1.10-0 lmtest_0.9-40 ggseqlogo_0.1
[88] zoo_1.8-9 beeswarm_0.4.0 base64enc_0.1-3
[91] ggridges_0.5.3 processx_3.5.3 png_0.1-7
[94] viridisLite_0.4.0 rjson_0.2.21 bitops_1.0-7
[97] R.oo_1.24.0 KernSmooth_2.23-20 blob_1.2.3
[100] parallelly_1.31.0 spatstat.random_2.2-0 readr_2.1.2
[103] jpeg_0.1-9 CNEr_1.30.0 scales_1.1.1
[106] memoise_2.0.1 magrittr_2.0.3 plyr_1.8.7
[109] ica_1.0-2 zlibbioc_1.40.0 compiler_4.1.2
[112] tinytex_0.38 BiocIO_1.4.0 RColorBrewer_1.1-3
[115] fitdistrplus_1.1-8 Rsamtools_2.10.0 cli_3.2.0
[118] listenv_0.8.0 pbapply_1.5-0 ps_1.6.0
[121] htmlTable_2.4.0 Formula_1.2-4 MASS_7.3-56
[124] mgcv_1.8-40 tidyselect_1.1.2 stringi_1.7.6
[127] yaml_2.3.5 latticeExtra_0.6-29 ggrepel_0.9.1
[130] sass_0.4.1 VariantAnnotation_1.40.0 fastmatch_1.1-3
[133] tools_4.1.2 future.apply_1.8.1 parallel_4.1.2
[136] rstudioapi_0.13 TFMPvalue_0.0.8 foreign_0.8-82
[139] lsa_0.73.2 gridExtra_2.3 farver_2.1.0
[142] Rtsne_0.15 digest_0.6.29 BiocManager_1.30.16
[145] pracma_2.3.8 FNN_1.1.3 qlcMatrix_0.9.7
[148] Rcpp_1.0.8.3 later_1.3.0 RcppAnnoy_0.0.19
[151] httr_1.4.2 biovizBase_1.42.0 colorspace_2.0-3
[154] XML_3.99-0.9 tensor_1.5 reticulate_1.24
[157] splines_4.1.2 uwot_0.1.11 RcppRoll_0.3.0
[160] spatstat.utils_2.3-0 plotly_4.10.0 xtable_1.8-4
[163] poweRlaw_0.70.6 jsonlite_1.8.0 R6_2.5.1
[166] Hmisc_4.6-0 pillar_1.7.0 htmltools_0.5.2
[169] mime_0.12 glue_1.6.2 fastmap_1.1.0
[172] BiocParallel_1.28.3 codetools_0.2-18 pkgbuild_1.3.1
[175] utf8_1.2.2 lattice_0.20-45 bslib_0.3.1
[178] spatstat.sparse_2.1-0 tibble_3.1.6 ggbeeswarm_0.6.0
[181] curl_4.3.2 leiden_0.3.9 gtools_3.9.2
[184] GO.db_3.14.0 survival_3.3-1 docopt_0.7.1
[187] munsell_0.5.0 GenomeInfoDbData_1.2.7 reshape2_1.4.4
[190] gtable_0.3.0 spatstat.core_2.4-2

@timoast
Copy link
Collaborator

timoast commented Apr 15, 2022

I wasn't able to reproduce this issue, are you able to provide an example using any of the datasets used in the Signac vignettes that recreates it?

@masonsweat
Copy link
Author

masonsweat commented Apr 15, 2022 via email

@masonsweat
Copy link
Author

masonsweat commented Apr 16, 2022

Hi Tim,
Following your suggestion, I figured out how to reproduce the error! Could you please help me fix it, when you have time? :) (You are the greatest!!!).

I used the pbmc500 dataset to show that RegionMatrix works. I followed the merge vignette (https://satijalab.org/signac/articles/merging.html) to get the data, but did not include any other datasets.

Comparing my dataset to pbmc500, i couldn't find many differences. I removed data and fragment files but nothing made my object work. The only difference between my object and pbmc500 was that my object originated from merging 4 seurat objects.

To test the idea that merging somehow makes a difference, I tried Regionmatrix on my unmerged seurat objects. Actually, this worked for me. All 4 of the seurat objects could be used to make good RegionHeatmap plots!

I wanted to see if the error was exclusive to my datasets or if it could be reproduced using 10x datasets. To reproduce the error, I then followed the merge vignette and merged pbmc500 with pbmc1000.

After merging pbmc500 with pbmc1000, RegionMatrix no longer functions, and I get the same result as when I use it with my data (Regionheatmap shows 0 fragments in the specified regions).

In conclusion, for some reason RegionMatrix fails when applied to merged seurats.

Here is the code I used to merge pbmc500 and pbmc1000, which should reproduce the error. Session info is the same as listed above.

Again, thank you very much with your support for this issue.

Best,
Mason

Code:

peaks.500 <- read.table(
file = "Pbmc_500cell/atac_pbmc_500_nextgem_peaks.bed",
col.names = c("chr", "start", "end")
)
peaks.1k <- read.table(
file = "atac_pbmc_1k_nextgem_peaks.bed",
col.names = c("chr", "start", "end")
)
gr.500 <- makeGRangesFromDataFrame(peaks.500)
gr.1k <- makeGRangesFromDataFrame(peaks.1k)
combined.peaks <- reduce(x = c(gr.500, gr.1k))
peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths < 10000 & peakwidths > 20]
combined.peaks
md.500 <- read.table(
file = "atac_pbmc_500_nextgem_singlecell.csv",
stringsAsFactors = FALSE,
sep = ",",
header = TRUE,
row.names = 1
)[-1, ] # remove the first row
md.1k <- read.table(
file = "atac_pbmc_1k_nextgem_singlecell.csv",
stringsAsFactors = FALSE,
sep = ",",
header = TRUE,
row.names = 1
)[-1, ] # remove the first row
md.500 <- md.500[md.500$passed_filters > 500, ]
md.1k <- md.1k[md.1k$passed_filters > 500, ]
frags.500 <- CreateFragmentObject(
path = "atac_pbmc_500_nextgem_fragments.tsv.gz",
cells = rownames(md.500)
)
frags.1k <- CreateFragmentObject(
path = "atac_pbmc_1k_nextgem_fragments.tsv.gz",
cells = rownames(md.1k)
)
pbmc500.counts <- FeatureMatrix(
fragments = frags.500,
features = combined.peaks,
cells = rownames(md.500)
)
pbmc1k.counts <- FeatureMatrix(
fragments = frags.1k,
features = combined.peaks,
cells = rownames(md.1k)
)
pbmc500_assay <- CreateChromatinAssay(pbmc500.counts, fragments = frags.500)
pbmc500 <- CreateSeuratObject(pbmc500_assay, assay = "ATAC", meta.data=md.500)

pbmc1k_assay <- CreateChromatinAssay(pbmc1k.counts, fragments = frags.1k)
pbmc1k <- CreateSeuratObject(pbmc1k_assay, assay = "ATAC", meta.data=md.1k)
pbmc1k

pbmc500$dataset <- 'pbmc500'
pbmc1k$dataset <- 'pbmc1k'
combined <- merge(
x = pbmc500,
y = list(pbmc1k),
add.cell.ids = c("500", "1k")
)
combined[["ATAC"]]
combined

New bed file creation

bed <- read.table('/Users/masonsweat/Desktop/NGS_analysis/AF_Multiome/Regionmatrix/trial.bed', sep = '\t', header = FALSE)
names(bed)[1] <- 'chromosome'
names(bed)[2] <- 'start'
names(bed)[3] <- 'end'

newrange <- makeGRangesFromDataFrame(bed, start.field = 'start', end.field = 'end',
seqinfo = c('chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10',
'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18',
'chr19', 'chr20', 'chr21', 'chrX'), seqnames.field = 'chromosome' )
DefaultAssay(pbmc500)<- 'ATAC'
obj1 <-RegionMatrix(
combined,
newrange,
key = 'myregions',
assay = NULL,
idents = NULL,
upstream = 1000,
downstream = 1000,
verbose = TRUE,
)
DefaultAssay(obj) <- 'ATAC'
RegionHeatmap(
obj1,
key = 'myregions',
assay = 'ATAC',
idents = NULL,
normalize = TRUE,
upstream = 1000,
downstream = 1000,
max.cutoff = "q95",
cols = NULL,
min.counts = 0,
window = (2000)/30,
order = TRUE,
nrow = NULL
)

This returns the following plot:

image

It should be noted that the function works fine on either pbmc500 or pbmc1k, prior to merger. Here is an example regionheatmap for pbmc500:

obj1 <-RegionMatrix(
pbmc500,
newrange,
key = 'myregions',
assay = NULL,
idents = NULL,
upstream = 1000,
downstream = 1000,
verbose = TRUE,
)
DefaultAssay(obj) <- 'ATAC'
RegionHeatmap(
obj1,
key = 'myregions',
assay = 'ATAC',
idents = NULL,
normalize = TRUE,
upstream = 1000,
downstream = 1000,
max.cutoff = "q95",
cols = NULL,
min.counts = 0,
window = (2000)/30,
order = TRUE,
nrow = NULL
)

image

timoast added a commit that referenced this issue Apr 18, 2022
@timoast
Copy link
Collaborator

timoast commented Apr 18, 2022

Thanks @masonsweat for looking into this! Turns out we weren't converting the cell names correctly when reading data from the fragment file, this should now be fixed on the develop branch.

@timoast timoast closed this as completed Apr 18, 2022
@masonsweat
Copy link
Author

masonsweat commented Apr 18, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation help
Projects
None yet
Development

No branches or pull requests

2 participants