Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert cool to FAN-C format #30

Closed
BenxiaHu opened this issue Nov 16, 2020 · 18 comments
Closed

convert cool to FAN-C format #30

BenxiaHu opened this issue Nov 16, 2020 · 18 comments

Comments

@BenxiaHu
Copy link

BenxiaHu commented Nov 16, 2020

Hi,
I used patient_hic = fanc.load(case) to load cool Hi-C, and then run patient_hic[region_string, region_string].data to get data of interest.
However, I did not get any result.
do you know what is going on?

  I also run `fanc from-cooler `to convert cool to FAN-C format, but I got this error:
2020-11-16 19:39:04,974 INFO FAN-C version: 0.9.7
Traceback (most recent call last):
  File "/nas/longleaf/home/anaconda2/envs/py38/bin/fanc", line 127, in <module>
    Fanc()
  File "/nas/longleaf/home/anaconda2/envs/py38/bin/fanc", line 93, in __init__
    command([sys.argv[0]] + sys.argv[option_ix:], log_level=log_level, verbosity=verbosity)
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/commands/fanc_commands.py", line 1850, in from_cooler
    cool.deepcopy(fanc.Hic, file_name=output_file, mode='w')
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/matrix.py", line 1324, in deepcopy
    copy.add_regions(self.regions(lazy=True))
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/regions.py", line 664, in add_regions
    self.add_region(region, *args, **kwargs)
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/genomic_regions/regions.py", line 1199, in add_region
    return self._add_region(region.copy(), *args, **kwargs)
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/genomic_regions/regions.py", line 505, in copy
    d = {attribute: getattr(self, attribute) for attribute in self.attributes}
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/genomic_regions/regions.py", line 505, in <dictcomp>
    d = {attribute: getattr(self, attribute) for attribute in self.attributes}
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/compatibility/cooler.py", line 221, in __getattr__
    return getattr(self._series, item)
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'weight'
Closing remaining open files:NeuNpos_10000.fanc...done
Exception ignored in: <function Node.__del__ at 0x7ff6cc899a60>
Traceback (most recent call last):
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/tables/node.py", line 319, in __del__
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/tables/table.py", line 2961, in _f_close
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/tables/table.py", line 2896, in flush
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/tables/table.py", line 626, in autoindex
  File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/tables/file.py", line 1590, in _get_node
AttributeError: 'File' object has no attribute '_node_manager'

best,

@kaukrise
Copy link
Collaborator

Hi,

regarding your first issue, it looks like you are trying to obtain a submatrix from a region string. If that is the case, you are not using the correct syntax. Please have a look at the FAN-C documentation, which explains how to use the .matrix method:

https://fan-c.readthedocs.io/en/latest/api/interfaces/matrix_interface.html

Regarding your second issue, I may have a fix for you. Can you please download the following file and install it with

pip install fanc-0.9.8.tar.gz

fanc-0.9.8.tar.gz

Cheers,
Kai

@BenxiaHu
Copy link
Author

BenxiaHu commented Nov 17, 2020

thanks.
I followed this one: https://github.com/vaquerizaslab/chess/blob/master/examples/dlbcl/example_analysis.ipynb

patient_hic = fanc.load(wdir + "ukm_patient_fixed_le_25kb_chr2.hic")
control_hic = fanc.load(wdir + "ukm_control_fixed_le_25kb_chr2.hic")

reg = 1448
window_start, window_end = regions.loc[reg][1:3]
region_string = "chr2:{}-{}".format(window_start, window_end)
patient_region_sub = patient_hic[region_string, region_string].data
control_region_sub = control_hic[region_string, region_string].data

I just installed fanc-0.9.8.
hic=fanc.load("case.cool"), and then isinstance(hic, fanc.matrix.RegionMatrixContainer) which output True.
>>> isinstance(hic, fanc.matrix.RegionMatrixContainer)
True

however, when I ran m = hic.matrix(('chr22','chr22')), I still can not get any results.
best,

@kaukrise
Copy link
Collaborator

  • Is there any warning or output whatsoever?
  • What is the value of m after you run the matrix command?
  • Can you try plotting the case.cool file?
    fancplot chr22:1-40mb -p square case.cool -r
    
    Depending on the resolution of the Hi-C file you may have to plot a smaller region.

@BenxiaHu
Copy link
Author

BenxiaHu commented Nov 17, 2020

neither any warnings nor output

hic.matrix
<bound method RegionMatrixContainer.matrix of <Cooler case_10000.cool::/">>

fancplot chr22:1-200kb -p square case.cool -r showed this error:
fancplot -p square: error: the following arguments are required: hic

@BenxiaHu
Copy link
Author

and I also ran fanc from-cooler case.cool case.fanc about 1h ago, but it has been running. Do you think it is normal? if yes, how to speed up this process?

@kaukrise
Copy link
Collaborator

Thanks, but I meant what the value of m is. If there is no warning or other message, m must have a value - either None or some kind of np.ndarray.

Your plotting error is highly unusual. The error suggests that you did not provide the case.cool file as in your command. Unless you made some error typing the command out, my only suggestion is to try plotting from a Python console:

import fanc
import fanc.plotting as fancplot

hic = fanc.load("case.cool")
p = fancplot.SquareMatrixPlot(hic)
p.plot("chr18:3mb-70mb")
p.show()

Since I don't know the resolution of your Hi-C file, I don't know how long the conversion will take. But Hi-C data is massive, and it may take a while. I don't have a way of speeding up your conversion other than subsetting the matrix, you just have to be patient. There should be a progress bar with an estimate how long it will take. Or you can use Juicer to convert your matrix, which is also compatible with FAN-C and CHESS and might run faster.

@BenxiaHu
Copy link
Author

BenxiaHu commented Nov 17, 2020

thanks.

hic = fanc.load("case.cool")
p = fancplot.SquareMatrixPlot(hic)
p
<fanc.plotting.hic_plotter.SquareMatrixPlot object at 0x7f44513cd550>
p.plot("chr1:10kb-20kb")
this command still does not output any result.

the resolution of my Hi-C contact matrix is 10kb

I used the ICE-normalized contact matrix generated by HiC-Pro to perform CHESS analysis. Juicer currently does not support ICE normalization. So I am trying to convert convert iced contact matrix to cool, and then further convert cool to fanc and fanc to .hic using fanc

@kaukrise
Copy link
Collaborator

You can convert directly from HiC-Pro to FAN-C using fanc from-txt.

Regarding the lack of output. Did you also run p.show()? If you don't have an interactive graphics environment, you can also run p.save("output_file.png").

@kaukrise
Copy link
Collaborator

Also, the Juicer KR balancing is equivalent to ICE normalisation.

@BenxiaHu
Copy link
Author

You can convert directly from HiC-Pro to FAN-C using fanc from-txt.

Regarding the lack of output. Did you also run p.show()? If you don't have an interactive graphics environment, you can also run p.save("output_file.png").

what I meant was that p.plot("chr1:10kb-20kb") has been running. I do not know what is going on.

@kaukrise
Copy link
Collaborator

kaukrise commented Nov 17, 2020

I admit that I don't understand why it would still be going on. According to your code snippet you are plotting a 10kb region. Which is exactly one pixel in your 10kb resolution matrix (which I think also does not make much sense). It should finish practically instantaneously. You could try working through the FAN-C tutorial to see if you encounter the same issues, or if it is specific to your input data.

Your problems are quite unusual, and without having access to your original file I don't think I can debug this further. So you could provide me with the file somehow, if you are willing to share it. Otherwise I would recommend that you contact the Vaquerizas lab for a possible collaboration, and team up with one of their Hi-C (and FAN-C) experts.

@BenxiaHu
Copy link
Author

BenxiaHu commented Nov 17, 2020

thanks. now I try to load the .hic file generated by Juicer, but I still get this error:
hic.matrix(('chr22','chr22'))
Traceback (most recent call last):
File "", line 1, in
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/matrix.py", line 1018, in matrix
row_regions, col_regions, matrix_entries = self.regions_and_matrix_entries(key,
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/matrix.py", line 958, in regions_and_matrix_entries
row_regions, col_regions, edges_iter = self.regions_and_edges(key, *args, **kwargs)
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/matrix.py", line 864, in regions_and_edges
row_regions = list(row_regions)
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 776, in _region_subset
subset_ix, subset_start = self._region_start(region)
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 729, in _region_start
offset_ix = self._chromosome_ix_offset(region.chromosome)
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 712, in _chromosome_ix_offset
raise ValueError("Chromosome {} not in matrix.".format(target_chromosome))
ValueError: Chromosome chr22 not in matrix.

@kaukrise
Copy link
Collaborator

Juicer drops the chr. Try hic.matrix(('22','22')). BUT, realise that you are again trying to load the entire chromosome 22 matrix at 10kb resolution, which is roughly 25 million pixels. Maybe try something smaller first? hic.matrix(('22:10mb-12mb','22:10mb-12mb')).

If that does not work, list the chromosomes present in the matrix with hic.chromosomes()

@BenxiaHu
Copy link
Author

hic.chromosomes()
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y', 'MT']
hic.matrix(('22:10mb-12mb','22:10mb-12mb'))
Traceback (most recent call last):
File "", line 1, in
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/matrix.py", line 1018, in matrix
row_regions, col_regions, matrix_entries = self.regions_and_matrix_entries(key,
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/matrix.py", line 958, in regions_and_matrix_entries
row_regions, col_regions, edges_iter = self.regions_and_edges(key, *args, **kwargs)
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/matrix.py", line 864, in regions_and_edges
row_regions = list(row_regions)
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 779, in _region_subset
norm = self.normalisation_vector(region.chromosome)
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 655, in normalisation_vector
JuicerHic._skip_to_normalisation_vectors(req)
File "/nas/longleaf/home/anaconda2/envs/py38/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 513, in _skip_to_normalisation_vectors
n_vectors = struct.unpack('<i', req.read(4))[0]
struct.error: unpack requires a buffer of 4 bytes

@kaukrise
Copy link
Collaborator

Sorry, but something about your original data does not look right. Can you successfully plot it using any available tool (Juicebox? HiGlass?) to ensure the matrix looks okay? Does Juicer handle the file alright?

As I said above, I am at a loss what to try next without access to your data.

@kaukrise
Copy link
Collaborator

You could also download one of the Juicer files from the 4D Nucleome Portal to test your FAN-C installation on: https://data.4dnucleome.org/files-processed/4DNFIOTPSS3L/, then go to processed files and download "contact matrix (hic)". Plot it with

fancplot 2L:1-10mb -p square ~/tmp/4DNFIOTPSS3L.hic@50kb -r

At least then we can exclude that it is a FAN-C installation issue.

@BenxiaHu
Copy link
Author

I still got the same error after running fancplot 2L:1-10mb -p square ~/tmp/4DNFIOTPSS3L.hic@50kb -r:
n_vectors = struct.unpack('<i', req.read(4))[0]
struct.error: unpack requires a buffer of 4 bytes

@kaukrise
Copy link
Collaborator

If you get the same error with the downloaded dataset, then it is an installation issue.

What operating system are you using? We are only supporting Unix-based systems. Can you try the same command in a clean virtual environment? I just double-checked that it works without issues with the latest version on my Unix machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants