Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fanc pairs takes a long time and without any more ouput #42

Closed
biozzq opened this issue Jan 19, 2021 · 6 comments
Closed

fanc pairs takes a long time and without any more ouput #42

biozzq opened this issue Jan 19, 2021 · 6 comments

Comments

@biozzq
Copy link

biozzq commented Jan 19, 2021

Dear all,

I used the valid pairs generated by HiC-Pro as input for fanc pairs, the command is as following, but it runs slowly and three days without any more ouput. The timestamp of the output file just stayed on the time I submitted the commands. Do you have some suggestions on accelerating the progress?
fanc pairs ${pre}_mm10_index.bwt2pairs.validPairs ${pre}_fanc.pairs -g fragment.bed -t 10 -s ${pre}.statistic --filter-pcr-duplicates 1

Best wishes,
Zheng zhuqing

@kaukrise
Copy link
Collaborator

Hey, I'm sorry you are experiencing these slowdown issues - HiC-Pro import is not particularly optimised.

I am not sure where exactly it might be stuck. If the HiC-Pro pairs file is very big, perhaps a manual parallelisation would work. I.e. you could split the file into several smaller chunks and then run the pairs command on each one individually, without any filtering. I would also recommend an SSD for this, if you are not using one already. If you are on a network file system enabling the -tmp option would probably also help a lot.

There isn't a built-in command line function at the moment to merge the individual pairs files, but you can use this code from within a Python shell:

import fanc

pairs_list = ["first.pairs", "second.pairs"]  # replace as necessary
pairs = [fanc.load(file_name) for file_name in pairs_list]

merged = fanc.ReadPairs.merge(pairs, file_name="/path/to/output.pairs")
merged.close()

Once you have the merged file, you can run the filtering on it.

@biozzq
Copy link
Author

biozzq commented Jan 22, 2021

Dear @kaukrise

Thank you. I tried -tmp option, it also runs slowly. I think I can wait for the updated version which may speedup more than 10X. Meanwhile, as i want to generate the input for chess, do you have any other suggestions to speedup the analysis. I can convert to cool file, but i think the chess will run slowly when using cool file as input.

Best wishes,
Zheng zhuqing

@kaukrise
Copy link
Collaborator

The dev version won't affect HiC-Pro import, unfortunately.

For CHESS, you could create a Juicer file, I think they also support import from pairs files.

@biozzq
Copy link
Author

biozzq commented Jan 22, 2021

Dear @kaukrise

Yes, Juicer file is also OK. CHESS publication used fanc and has the detailed normalisation and filtering instructions, thus, it is more clearly for me to use fanc. If I use Juicer hic file as input for CHESS, i can not find the corresponding filtering options, such as balancing by chromosome and masking the low interaction bins.

Best wishes,
Zheng zhuqing

@kaukrise
Copy link
Collaborator

Juicer automatically balances by chromosome. Bins with 0 interactions are automatically masked by FAN-C, also in Juicer files. I think using a Juicer file directly should be quite okay for your needs.

If you really need filtering for sparsely populated bins and want to keep working with FAN-C files, you could

  1. Create a Juicer Hi-C file from your HiC-Pro pairs
  2. Convert the file using fanc hic --deepcopy <juicer.hic@resolution> <fanc.hic>
  3. Filter sparse bins and re-normalise using fanc hic -r 0.1 -n --norm-method ICE <fanc.hic>

But since you appear to be working with quite large matrices, the conversion from Juicer to FAN-C will take some time again.

@biozzq
Copy link
Author

biozzq commented Jan 22, 2021

Thank you for your quickly reply. I will have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants