Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to makr trans interaction contacts prediction? #40

Closed
linfanxiao opened this issue Apr 22, 2024 · 5 comments
Closed

How to makr trans interaction contacts prediction? #40

linfanxiao opened this issue Apr 22, 2024 · 5 comments

Comments

@linfanxiao
Copy link

Dear developers,

Thanks for your work to develop the state-of-art tool.I have tried the example and my own data for intra chromosomal contacts, looking OK. However, how to predict intra chromosomal like you paper Fig. 4. If there are some example or tutorial to make the inter chromosomal contacts?

Thanks !
https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41587-022-01612-8/MediaObjects/41587_2022_1612_Fig4_HTML.png?as=webp

@tanjimin
Copy link
Owner

Hi I think those are examples of translocation event which mostly happen in cancer settings. If you don't know then it might not be what you want. C.Origami currently don't have the capacity to predict interactions beyond 2Mb regions.

If you do want to predict results of translocation, you will extract sequence and ATAC/CTCF bigwig tracks and append the two junction points from different chromosomes. Then you can run prediction on the fused inputs.

@linfanxiao
Copy link
Author

Thanks, Jimin,
I will try it as your suggestion!

@linfanxiao
Copy link
Author

Hi, Jimin,

I have a question regarding the comparison of real and predicted HiC data. To resize the image, I used the preserve_range=True parameter. Is this the correct approach? The reason I used it is that the values in the resized image were very small, around 2e-18, whereas my original data ranged from 1 to 50. After setting preserve_range=True, the resized image now has a similar range to the original data.

This is my code below

cooler dump cell0041GLUT1.mcool::/resolutions/10000 --fill-lower -t pixels -r chr2:500,000-2,597,152 -b --no-balance --join --one-based-starts| awk '{print $2-1, $5-1, $7}' > test/cell0041GLUT1.txt

Rscript -e 'library(reshape2); df <- data.table::fread("test/cell0041GLUT1.txt"); df1 <- dcast(df, V1 ~ V2, value.var = "V3", fill = 0);df1=df1[,-1] ;write.table(df1, "test/cell0041GLUT1_wide.txt", sep = "\t",col.names=F,row.names = F, quote = F)'

The next part is processed in Python,plot_utils.py is the developer's script

import numpy as np
import os
import pandas as pd
from skimage.transform import resize
from plot_utils import MatrixPlot

# Open the text file in read mode
file_path = 'cell0041GLUT1_wide.txt'  # Replace with the actual file path
with open(file_path, 'r') as file:
    # Read the contents of the file
    lines = file.readlines()

# Remove any leading or trailing whitespace from each line
lines = [line.strip() for line in lines]

# Convert the lines to a nested list of integers
matrix = [[int(num) for num in line.split()] for line in lines]

# Convert the nested list to a numpy array
mat = np.array(matrix)

# Print the contents of the file
print(mat)
# Resize the matrix to the desired resolution
resized_mat = resize(mat, (256, 256), anti_aliasing=True,preserve_range=True)
print(resized_mat)
bin_size = 8192

# Generate interaction lines and write to file
with open('realcell0041GLUT1_chr2_500000-2597152.txt', 'w') as f:
    for i in range(256):
        for j in range(i, 256):
            start_coord = (i) * bin_size + 500000
            reads = resized_mat[i, j]
            end_coord = (j + 1) * bin_size + 500000
            interaction_line = f"{start_coord} {end_coord} {reads:.8f}"
            f.write(interaction_line + '\n')

# Instantiate MatrixPlot object
image=mat = np.log(resized_mat + 1)
output_path = "./"
prefix = "matrix_plot"
celltype = "realDGLUT1"
chr_name = "chr2"
start_pos = 500000

matrix_plot = MatrixPlot(output_path, image, prefix, celltype, chr_name, start_pos)

# Plot and save the matrix
matrix_plot.plot()

@tanjimin
Copy link
Owner

Hi @linfanxiao yes you should use preserve_range because skimage is usually used for image processing which is uint8 (0-255). Here these are float tensors.

Also when you compare real and predicted Hi-C data keep in mind that the intensity of predicted matrix is always in the log space (range around 0-10). To get the read counts (range 0 - around 2000), you need to do the exp(x) - 1 transformation.

@linfanxiao
Copy link
Author

Thanks, Jimin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants