hicrep

Python implementation of the HiCRep: a stratum-adjusted correlation coefficient (SCC) for Hi-C data with support for Cooler sparse contact matrices

The algorithm is published in:

HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Tao Yang Feipeng Zhang Galip Gürkan Yardımcı Fan Song Ross C. Hardison William Stafford Noble Feng Yue and Qunhua Li, Genome Res. 2017 Nov;27(11):1939-1949. doi: 10.1101/gr.220640.117.

This implementation takes a pair of Hi-C data sets in Cooler format (.cool for single binsize or .mcool multiple binsizes) and computes the HiCRep SCC scores for each pair of chromosomes between the two data sets. A guide for how to convert a file of read-pairs into the appropriate .cool or .mcool format is available in the Cooler documentation here.

The HiCRep SCC computed from this implementaion is consistent with the original R implementaion (https://github.com/MonkeyLB/hicrep/) and it's more than 10x faster than the R version:

Usage

To use as a python module, install the package

pip install hicrep

and then use the util function readMcool

from hicrep.utils import readMcool

to read a pair of mcool files and specify the bin size to compute SCC with:

fmcool1 = "mydata1.mcool"
fmcool2 = "mydata2.mcool"
binSize = 100000
cool1, binSize1 = readMcool(fmcool1, binSize)
cool2, binSize2 = readMcool(fmcool2, binSize)

or a pair of .cool files with built-in bin size:

fcool1 = "mydata1.cool"
fcool2 = "mydata2.cool"
cool1, binSize1 = readMcool(fmcool1, -1)
cool2, binSize2 = readMcool(fmcool2, -1)
# binSize1 and binSize2 will be set to the bin size built in the cool file
binSize = binSize1

then define the parameters for computing HiCRep SCC:

from hicrep import hicrepSCC

# smoothing window half-size
h = 1

# maximal genomic distance to include in the calculation
dBPMax = 500000

# whether to perform down-sampling or not 
# if set True, it will bootstrap the data set # with larger contact counts to
# the same number of contacts as in the other data set; otherwise, the contact 
# matrices will be normalized by the respective total number of contacts
bDownSample = False

# compute the SCC score
# this will result in a SCC score for each chromosome available in the data set
# listed in the same order as the chromosomes are listed in the input Cooler files
scc = hicrepSCC(cool1, cool2, h, dBPMax, bDownSample)

# Optionally you can get SCC score from a subset of chromosomes
sccSub = hicrepSCC(cool1, cool2, h, dBPMax, bDownSample, np.array(['myChr1', 'myOtherChr'], dtype=str))

To use as a command line tool, install this package by

pip install hicrep

then run

hicrep mydata1.mcool mydata2.mcool outputSCC.txt --binSize 100000 --h 1 --dBPMax 500000

when passing in an .mcool file with multiple binsizes or

hicrep mydata1.cool mydata2.cool outputSCC.txt --h 1 --dBPMax 500000

when passing in a .cool file with a single bultin binsize. The output outputSCC.txt has a list of SCC scores for each chromosome in the input. The output SCC scores are listed in the same order as the chromosomes are listed in the input Cooler files. To see the list of command line options:

hicrep -h

You can optionally compute SCC scores for a subset of chromosomes using

hicrep mydata1.cool mydata2.cool outputSCC_Subset.txt --h 1 --dBPMax 500000 --chrNames 'myChr1' 'myOtherChr'

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
hicrep		hicrep
images		images
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
cli.py		cli.py
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hicrep

Usage

About

Releases

Packages

Languages

License

xieting0603/hicrep

Folders and files

Latest commit

History

Repository files navigation

hicrep

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages