Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input Data in Training #6

Open
ekokrek opened this issue Jul 22, 2020 · 0 comments
Open

Input Data in Training #6

ekokrek opened this issue Jul 22, 2020 · 0 comments

Comments

@ekokrek
Copy link

ekokrek commented Jul 22, 2020

Hello Wang,

I tried to predict chr12 with pretrained models (both HindIII_40000 and model_12000).
When I compared the enhanced matrix with the original and down-sampled (1/16) matrices, it was similar to the down-sampled. So the enhancement was quite unsuccessful.

Down-sampled input is generated as follows: GSE63525_GM12878_primary_intrachromosomal_contact_matrices.tar.gz file (HIC001-HIC018) is downloaded and raw contacts at 10kb resolution are down-sampled by a ratio of 1/16. In terms of sequencing depth, reads from HIC001-HIC018 make up ~4B reads and down-sampled version makes up ~220M reads. So I guess your model(s) should be appropriate for this situation.

Technically, what I down-sample is the pairs not reads but would that affect the result that much?

Can I learn which exact GM12878 data was used in training these pretrained models?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant