fanc pairs takes a long time and without any more ouput #42

biozzq · 2021-01-19T05:11:56Z

Dear all,

I used the valid pairs generated by HiC-Pro as input for fanc pairs, the command is as following, but it runs slowly and three days without any more ouput. The timestamp of the output file just stayed on the time I submitted the commands. Do you have some suggestions on accelerating the progress?
fanc pairs ${pre}_mm10_index.bwt2pairs.validPairs ${pre}_fanc.pairs -g fragment.bed -t 10 -s ${pre}.statistic --filter-pcr-duplicates 1

Best wishes,
Zheng zhuqing

The text was updated successfully, but these errors were encountered:

kaukrise · 2021-01-21T08:11:53Z

Hey, I'm sorry you are experiencing these slowdown issues - HiC-Pro import is not particularly optimised.

I am not sure where exactly it might be stuck. If the HiC-Pro pairs file is very big, perhaps a manual parallelisation would work. I.e. you could split the file into several smaller chunks and then run the pairs command on each one individually, without any filtering. I would also recommend an SSD for this, if you are not using one already. If you are on a network file system enabling the -tmp option would probably also help a lot.

There isn't a built-in command line function at the moment to merge the individual pairs files, but you can use this code from within a Python shell:

import fanc

pairs_list = ["first.pairs", "second.pairs"]  # replace as necessary
pairs = [fanc.load(file_name) for file_name in pairs_list]

merged = fanc.ReadPairs.merge(pairs, file_name="/path/to/output.pairs")
merged.close()

Once you have the merged file, you can run the filtering on it.

biozzq · 2021-01-22T07:43:15Z

Dear @kaukrise

Thank you. I tried -tmp option, it also runs slowly. I think I can wait for the updated version which may speedup more than 10X. Meanwhile, as i want to generate the input for chess, do you have any other suggestions to speedup the analysis. I can convert to cool file, but i think the chess will run slowly when using cool file as input.

Best wishes,
Zheng zhuqing

kaukrise · 2021-01-22T07:51:07Z

The dev version won't affect HiC-Pro import, unfortunately.

For CHESS, you could create a Juicer file, I think they also support import from pairs files.

biozzq · 2021-01-22T11:59:04Z

Dear @kaukrise

Yes, Juicer file is also OK. CHESS publication used fanc and has the detailed normalisation and filtering instructions, thus, it is more clearly for me to use fanc. If I use Juicer hic file as input for CHESS, i can not find the corresponding filtering options, such as balancing by chromosome and masking the low interaction bins.

Best wishes,
Zheng zhuqing

kaukrise · 2021-01-22T12:11:03Z

Juicer automatically balances by chromosome. Bins with 0 interactions are automatically masked by FAN-C, also in Juicer files. I think using a Juicer file directly should be quite okay for your needs.

If you really need filtering for sparsely populated bins and want to keep working with FAN-C files, you could

Create a Juicer Hi-C file from your HiC-Pro pairs
Convert the file using fanc hic --deepcopy <juicer.hic@resolution> <fanc.hic>
Filter sparse bins and re-normalise using fanc hic -r 0.1 -n --norm-method ICE <fanc.hic>

But since you appear to be working with quite large matrices, the conversion from Juicer to FAN-C will take some time again.

biozzq · 2021-01-22T12:25:43Z

Thank you for your quickly reply. I will have a try.

biozzq mentioned this issue Jan 22, 2021

Prepare the normalized hic input for CHESS vaquerizaslab/chess#35

Open

kaukrise closed this as completed Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fanc pairs takes a long time and without any more ouput #42

fanc pairs takes a long time and without any more ouput #42

biozzq commented Jan 19, 2021

kaukrise commented Jan 21, 2021

biozzq commented Jan 22, 2021

kaukrise commented Jan 22, 2021

biozzq commented Jan 22, 2021

kaukrise commented Jan 22, 2021

biozzq commented Jan 22, 2021

fanc pairs takes a long time and without any more ouput #42

fanc pairs takes a long time and without any more ouput #42

Comments

biozzq commented Jan 19, 2021

kaukrise commented Jan 21, 2021

biozzq commented Jan 22, 2021

kaukrise commented Jan 22, 2021

biozzq commented Jan 22, 2021

kaukrise commented Jan 22, 2021

biozzq commented Jan 22, 2021