Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not compute similarity for... #30

Closed
liz-is opened this issue Jan 15, 2021 · 4 comments
Closed

Could not compute similarity for... #30

liz-is opened this issue Jan 15, 2021 · 4 comments

Comments

@liz-is
Copy link
Collaborator

liz-is commented Jan 15, 2021

Hi folks,

Some of my region pairs are being deemed invalid, but I don't think they fall into any of the possible reasons given. Do you have any other ideas what the issue might be? Is there a way I can get more diagnostic info to try to debug this myself (without having to dig deep into the code and run each step manually, which I can do if necessary)?

Here's the error message:

2021-01-15 14:35:17,634 INFO Running '/home/research/vaquerizas/liz/project_ko/.ko_venv/bin/chess sim data/hic/ko_Rep1/hic/ko_Rep1_10kb.hic data/hic/wt_Rep1/hic/wt_Rep1_10kb.hic data/chess/dm6_pairs_150x_10kb.bedpe data/chess/ko_Rep1_vs_wt_Rep1/genome_scan_150x_10kb.txt -p 8'
2021-01-15 14:35:26,020 INFO CHESS version: 0.3.6
2021-01-15 14:35:26,021 INFO FAN-C version: 0.9.11
2021-01-15 14:35:26,052 INFO Loading reference contact data
2021-01-15 14:38:42,767 INFO Loading query contact data
2021-01-15 14:43:31,332 INFO Loading region pairs
2021-01-15 14:43:31,690 INFO Launching workers
2021-01-15 14:43:33,110 INFO Submitting pairs for comparison
2021-01-15 14:45:01,759 INFO Could not compute similarity for 6316 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
2021-01-15 14:45:20,267 INFO Finished '/home/research/vaquerizas/liz/project_ko/.ko_venv/bin/chess sim data/hic/ko_Rep1/hic/ko_Rep1_10kb.hic data/hic/wt_Rep1/hic/wt_Rep1_10kb.hic data/chess/dm6_pairs_150x_10kb.bedpe data/chess/ko_Rep1_vs_wt_Rep1/genome_scan_150x_10kb.txt -p 8'
Closing remaining open files:data/hic/ko_Rep1/hic/ko_Rep1_10kb.hic...donedata/hic/wt_Rep1/hic/wt_Rep1_10kb.hic...done

This is Drosophila Hi-C data. I've tried different resolutions and two different window sizes (100x and 150x the bin size). The pairs file for each parameter combo was generated with chess pairs from the same text file with the chromosome sizes (and these files look okay to me from a quick glance).

In each example, all bins from certain chromosomes are missing! In particular, chr 2R and 3R. However I get results for these chrs at 25kb resolution so I don't think there is a chromosome naming mismatch between the files or anything like that.

Screenshot 2021-01-15 at 15 35 34
(N.B., it makes sense that there are no valid pairs on chr 4 at 25kb resolution, since I'm using a window size of at least 2.5 Mb, which is larger than the chromosome size. Same for 10 kb resolution with 150x window size)

I would have thought that it would be a resolution issue (i.e. too many unmappable bins), but having plotted each chromosome at 10kb resolution in both my query and my reference, they look fine. Some unmappable bins but I'd expect to get some results - they don't look any worse than other chromosomes.
wt_Rep1_10kb_2R

I'm happy to look into this further myself since I have some familiarity with the code by now, but I'm not really sure where to start. Do you have any ideas?

I am using a development version of FAN-C, but @kaukrise said that it should work fine.

Also, as a more general comment, would it be possible to implement a more informative version of this message?
2021-01-15 14:45:01,759 INFO Could not compute similarity for 6316 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
I've seen other questions relating to this, so it seems like a common issue/point of confusion. Although most of the time this is easy to solve, it would be helpful to know which of those three possibilities accounts for the invalid pairs as a starting point for debugging.

@kaukrise
Copy link
Collaborator

Hey @liz-is ,

thank you for the detailed bug report. Can you please try to plot the O/E matrix of a chromosome (or part thereof) that fails? I have a suspicion that the expected values might be the issue here, in which case this is probably related to the FAN-C dev version.

Thanks!

@liz-is
Copy link
Collaborator Author

liz-is commented Jan 15, 2021

Thanks for looking into this Kai! Here's the O/E matrix for the same dataset and chromosome.

wt_Rep1_10kb_2R_oe

@nickmachnik
Copy link
Collaborator

Hey Liz!
There is a lot of white in this matrix, which according to the colorbar is oe=1. Are all these values actually 1 or very very close to 1?
1 is the default masking value for unmappable pixels in chess. All 1 matrix rows are marked as unmappable rows if the row sum equals the row length (looking at the code now this already doesn't seem ideal to me). This is not done for the whole chromosome matrix, but only on the submatrices that are compared; so a row doesn't have to be all 1 for the whole chromosome, only in a particular compared region in order to be marked as unmappable.
You could try to increase the fraction of unmappable bins that chess permit with --mappability-cutoff (maybe 0.5 or even higher?). This is not a fix, but might point out if this bug has something to do with false masking or computation of oe values.

@kaukrise
Copy link
Collaborator

Hi @nickmachnik ,

this was an issue with the FAN-C development version, which we could figure out independently, so I am closing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants