Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all cells requested could be found in the fragment file #748

Closed
Malindrie opened this issue Aug 9, 2021 · 8 comments
Closed

Not all cells requested could be found in the fragment file #748

Malindrie opened this issue Aug 9, 2021 · 8 comments

Comments

@Malindrie
Copy link

I have downloaded the SHARE-seq ATAC-seq data and trying to create ChromatinAssay object:

atac <- Read10X("./GSM4156597_skin.late.anagen_atac/", gene.column = 1)
fragments <- "./GSM4156597_skin.late.anagen.atac.fragments.sorted.bed.gz"

share[['ATAC']] <- CreateChromatinAssay(
   counts = atac,
   sep = c(":", "-"),
   genome = "mm10",
   fragments = fragments
)

I get the following error:

Computing hash
Checking for 34774 cell barcodes
Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments,  : 
  Not all cells requested could be found in the fragment file.
)

However, when I read in all the barcodes from the fragments file manually and compared them to barcodes of the ATAC counts matrix, all barcodes from counts are in the fragments, although there are more barcodes in fragments file.

I see that a similar issue was raised before, but even setting max.lines = NULL and tolerance = 0 to perform an exhaustive search, still resulted in the same error as above.

Greatly appreciate if you could let me know what I might be doing wrong.

@timoast
Copy link
Collaborator

timoast commented Aug 9, 2021

In the count matrix, cell barcodes have the format like: R1.01.R2.01.R3.06.P1.55. In the fragment file cell barcodes have the format like: R1.52,R2.48,R3.53,P1.05. So they do not match (comma versus period). The SHARE-seq fragment file on GEO is also not sorted and bgzip-compressed. I'd suggest replacing all the commas in the fragment file with periods, then sorting, compress with bgzip, and index with tabix. You should then be able to use the file with Signac.

@timoast timoast closed this as completed Aug 9, 2021
@Malindrie
Copy link
Author

Thank you for the reply.

I have already done all above suggestions (replacing all the commas in the fragment file with periods, then sorting, compress with bgzip, and index with tabix), but is getting the same error. I could email a Dropbox folder link to the dataset (dataset in required format for input to Signac), if it helps?

Greatly appreciate any help.

@timoast
Copy link
Collaborator

timoast commented Aug 9, 2021

Sure, you can email a link to tstuart@nygenome.org and I will take a look

@timoast
Copy link
Collaborator

timoast commented Aug 10, 2021

In this case the error message was misleading, the real issue was that there is a column missing in the fragment file. I have added a check for the correct number of columns in CreateFragmentObject which will now throw a more informative error message.

If you add the 5th column to the fragment file (just add 1 for all rows), then it should work as expected. eg:

gzip -d GSM4156597_skin.late.anagen.atac.fragments.sorted.bed.gz
awk 'BEGIN {FS=OFS="\t"} {print $0, 1}' GSM4156597_skin.late.anagen.atac.fragments.sorted.bed > frags.bed
bgzip -@ 10 frags.bed
tabix -p bed frags.bed.gz

@XinshuXie
Copy link

i met the same problem , how did you solve it? give me help, please!
fragments <- 'atac_v1_pbmc_10k_fragments.tsv.gz'

chrom_assay <- CreateChromatinAssay(

  • counts = counts,
  • sep = c(":", "-"),
  • genome = 'hg19',
  • fragments = fragments ,
  • min.cells = 10,
  • min.features = 200,
  • )
    Computing hash
    can't open fileError in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
    Not all cells requested could be found in the fragment file.

@XinshuXie
Copy link

@timoast

@timoast
Copy link
Collaborator

timoast commented Aug 16, 2021

@XinshuXie please open a new issue including the full code and output of sessionInfo()

@stathismegas
Copy link

Sorry to reopen this, but I get the same error,
"Not all cells requested could be found in the fragment file",
when I try to create a chromatin assay from cellrnger-arc output,
all_cells_seurat[["ATAC"]] <- CreateChromatinAssay(
counts = counts$Peaks,
sep = c(":", "-"),
fragments = fragpath,
annotation = annotation
).
I try to solve it by removing barcodes that are not in the fragments file by
fileName ="./atac_fragments.tsv.gz"
tokeep <- scan(fileName, quote = "", what = list(NULL, NULL, NULL, name = character(), NULL), skip = 18)
print(tokeep$name)
print(length(colnames(all_cells_seurat)) )
all_cells_seurat <- all_cells_seurat[,colnames(all_cells_seurat) %in% tokeep$name]
but it doesn't work.
I thought that the error message indicates that there are more cells in the RNA assay than in the fragments file. So the above should work, shouldn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants