Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chess sim results are all nan #5

Closed
Irenexzwen opened this issue Oct 22, 2020 · 6 comments
Closed

chess sim results are all nan #5

Irenexzwen opened this issue Oct 22, 2020 · 6 comments

Comments

@Irenexzwen
Copy link

Hi author:

I tried to use chess to compare two .hic file. I ran the following command:

chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out # generate window for the whole genome

chess sim \
HiC_control.hic \
HiC_treat.hic \
hg38_1mwin_100kstep.out \
Ctrl_Treat_1mwin_100kstep_diff.out.tsv

While the ./hg38_1mwin_100kstep.out looks fine to me, the final result Ctrl_Treat_1mwin_100kstep_diff.out.tsv contains all nan like the following:

ID      SN      ssim    z_ssim
0       nan     nan     nan
1       nan     nan     nan
2       nan     nan     nan
3       nan     nan     nan
4       nan     nan     nan
5       nan     nan     nan
6       nan     nan     nan
7       nan     nan     nan
8       nan     nan     nan
9       nan     nan     nan
10      nan     nan     nan
11      nan     nan     nan
12      nan     nan     nan
13      nan     nan     nan
...

Did I miss anything?
Thank you very much!

@nickmachnik
Copy link
Collaborator

Hi,
could you please post the full log in here?
Are you using normalized matrices? What is the bin size of your data?

@Irenexzwen
Copy link
Author

Thanks Nick! Sorry I forgot the log. Here I repruduced the error:

Here is my code:

chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out

chess sim \
HiC_control.mapq30.hic \
HiC_treat.mapq30.hic \
./hg38_1mwin_100kstep.out \
H1_ctrl_treat_1mwin_100kstep_diff.out.tsv

Here is the full log:

2020-10-22 10:03:19,635 INFO Running '/software/anaconda/install/bin/chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out'
2020-10-22 10:03:21,908 INFO CHESS version: 0.3.3
2020-10-22 10:03:21,908 INFO FAN-C version: 0.9.5
2020-10-22 10:03:22,011 INFO Finished '/software/anaconda/install/bin/chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out'
2020-10-22 10:03:24,861 INFO Running '/software/anaconda/install/bin/chess sim HiC_control.mapq30.hic HiC_treat.mapq30.hic ./hg38_1mwin_100kstep.out H1_ctrl_treat_1mwin_100kstep_diff.out.tsv'
2020-10-22 10:03:26,368 INFO CHESS version: 0.3.3
2020-10-22 10:03:26,368 INFO FAN-C version: 0.9.5
2020-10-22 10:03:26,369 INFO Loading reference contact data
2020-10-22 10:05:30,980 INFO Loading region pairs
2020-10-22 10:05:31,297 WARNING 392 region pairs have been dropped, because they involve chromosomes that are not present in the provided contact data.
2020-10-22 10:05:31,297 INFO Launching workers
2020-10-22 10:05:31,354 INFO Submitting pairs for comparison
2020-10-22 10:05:33,225 INFO Could not compute similarity for 30654 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
2020-10-22 10:05:33,389 INFO Finished '/software/anaconda/install/bin/chess sim HiC_control.mapq30.hic HiC_treat.mapq30.hic ./hg38_1mwin_100kstep.out H1_ctrl_treat_1mwin_100kstep_diff.out.tsv'

Here is the results:

ID      SN      ssim    z_ssim
0       nan     nan     nan
1       nan     nan     nan
2       nan     nan     nan
3       nan     nan     nan
4       nan     nan     nan
5       nan     nan     nan
...

Bin size of the .hic file is 1k. The .hic file is generated using the default parameters from juicebox_tools pre which is <VC,VC_SQRT,KR,SCALE>.

@nickmachnik
Copy link
Collaborator

Ok, I am not sure what is happening there, let's start with some guesswork. Your bin size is very small, so maybe you have a large number of unmappable bins. I don't know how long your preprocessing takes, but you could try 10 kb or 25 kb bins and see whether you get the same behaviour.
Another way would be to tweak the parameters of chess sim. By default unmappable bins are not considered in the comparisons and matrices with more than 10 percent unmappable bins are not compared at all.
You could therefore try to increase --mappability-cutoff and activate --keep-unmappable-bins. The problem here is that the these are low / deactivated by default, because I don't think the program behaves very well with a lot of missing data.
Anyhow, I suggest to try these things and see whether that gets rid of the NaNs, it might not even be the problem, then we can take it from there.
Best,
Nick

@kaukrise
Copy link
Collaborator

kaukrise commented Oct 22, 2020

Hey, I have an idea what the problem might be. Juicer by default removes the chr part of the chromosome names. @Irenexzwen could you post the first couple of lines of hg38_1mwin_100kstep.out to see if has the chr prefix or not, please?

@Irenexzwen
Copy link
Author

Hi Kaukrise:

Thanks for the reminder, however the hg38_1mwin_100kstep.out have "chr" prefix:

>less hg38_1mwin_100kstep.out|head -n 10
chr1    1       1000001 chr1    1       1000001 0       .       +       +
chr1    100001  1100001 chr1    100001  1100001 1       .       +       +
chr1    200001  1200001 chr1    200001  1200001 2       .       +       +
chr1    300001  1300001 chr1    300001  1300001 3       .       +       +
chr1    400001  1400001 chr1    400001  1400001 4       .       +       +
chr1    500001  1500001 chr1    500001  1500001 5       .       +       +
chr1    600001  1600001 chr1    600001  1600001 6       .       +       +
chr1    700001  1700001 chr1    700001  1700001 7       .       +       +
chr1    800001  1800001 chr1    800001  1800001 8       .       +       +
chr1    900001  1900001 chr1    900001  1900001 9       .       +       +

@Irenexzwen
Copy link
Author

Hi Nick and kaukrise:

After many tests I guess I sort of find out what the problem is.

  1. The original .hic file I generated containing multiple resolution and now I generate a new set of .hic file at one fixed resolution. Let's say 5k.
  2. Kaukrise actually had a great guess related to the problem that the juicer output do not contain "chr". However the problem here is that the .hic file do not contain "chr" prefix while CHESS output windows have "chr". So I triedt to remove the chr before using chess sim and it works!

hopefully this will be helpful to other peple.

Thank you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants