Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to understand the analysis #26

Closed
kainblue opened this issue Feb 6, 2019 · 3 comments
Closed

Try to understand the analysis #26

kainblue opened this issue Feb 6, 2019 · 3 comments

Comments

@kainblue
Copy link

kainblue commented Feb 6, 2019

Hi Longzhi,

I am very interested in learning construct 3D model from HiC data. After a long search, I recently find your fantastic work. I have downloaded fastq file for for GM12878 cell 1 from https://www.ncbi.nlm.nih.gov/sra/SRX4133191 , and now try to follow the instruction in this repo. However, I have problems during the further imputing steps:
"con_to_ncc.sh impute.con.gz
nuc_dynamics.sh impute.ncc 0.1
dip-c impute3 -3 impute.3dg clean.con.gz | gzip -c > impute3.round1.con.gz
dip-c clean3 -c impute.con.gz impute.3dg > impute.clean.3dg

con_to_ncc.sh impute3.round1.con.gz
nuc_dynamics.sh impute3.round1.ncc 0.1
dip-c impute3 -3 impute3.round1.3dg clean.con.gz | gzip -c > impute3.round2.con.gz
dip-c clean3 -c impute3.round1.con.gz impute3.round1.3dg > impute3.round1.clean.3dg

con_to_ncc.sh impute3.round2.con.gz
nuc_dynamics.sh impute3.round2.ncc 0.1
dip-c impute3 -3 impute3.round2.3dg clean.con.gz | gzip -c > impute3.round3.con.gz
dip-c clean3 -c impute3.round2.con.gz impute3.round2.3dg > impute3.round2.clean.3dg
...
"
I feel like the cleaned 3dg file by clean3 at each step is not involved in the next round. The reason I start to question about this is actually because one error I encountered:
dip-c impute3 -3 GM12878_cell1_dipc_phased.clean.impute.clean.3dg GM12878_cell1_dipc_phased.clean.con.gz | gzip -c > GM12878_cell1_dipc_phased.impute3.round1.con.gz
[M::impute3] read a 3D structure with 55404 particles at 100000 bp resolution
[M::impute3] read 612536 contacts (82.47% intra-chromosomal, 8.94% legs phased)
[M::classes] imputed haplotypes for chromosome pair (13,17): 392 contacts (85.2% phased)
[M::classes] imputed haplotypes for chromosome pair (5,8): 1679 contacts (97.74% phased)
[M::classes] imputed haplotypes for chromosome pair (16,17): 216 contacts (66.2% phased)
[M::classes] imputed haplotypes for chromosome pair (1,20): 1078 contacts (92.76% phased)
Traceback (most recent call last):
File "dip-c", line 130, in
main()
File "dip-c", line 63, in main
return_value = impute3.impute3(sys.argv[1:])
File "impute3.py", line 109, in impute3
con_data.impute_from_g3d_data(g3d_data, max_impute3_distance, max_impute3_ratio, max_impute3_ratio * g3d_resolution, is_male, par_data, vio_file)
File "classes.py", line 907, in impute_from_g3d_data
self.con_lists[ref_name_tuple].impute_from_g3d_data(g3d_data, max_impute3_distance, max_impute3_ratio, min_impute3_separation, is_male, par_data, vio_file)
File "classes.py", line 757, in impute_from_g3d_data
con.impute_from_g3d_data(g3d_data, max_impute3_distance, max_impute3_ratio, min_impute3_separation, is_male, par_data, vio_file)
File "classes.py", line 544, in impute_from_g3d_data
impute3_ratio = impute3_distance / con_distance_tuples[1][1]
TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'

Here are the head lines from two input files:
head GM12878_cell1_dipc_phased.clean.impute.clean.3dg
1(mat) 1200000 7.95772097608 -12.0072914165 6.67592442321
1(mat) 1300000 8.89210987528 -11.4486456224 6.61131843187
1(mat) 1400000 8.8277193141 -10.3798272863 6.83290065793
1(mat) 1500000 8.10570766598 -9.67144265436 6.35097905003
1(mat) 1600000 7.99275487247 -8.53433974384 6.52683266786
1(mat) 1700000 6.70429668241 -8.61794012705 5.86833067325
1(mat) 1800000 5.62631622929 -8.49098630055 5.17833888961
1(mat) 1900000 4.8879961287 -8.44522282731 3.98121528589
1(mat) 2000000 3.80732676666 -7.76977419875 3.35459947567
1(mat) 2100000 3.06260319638 -8.19929825445 4.23444940641
zcat GM12878_cell1_dipc_phased.clean.con.gz | head
1,756415,. 1,1095231,.
1,757502,. 1,1218674,.
1,815689,. 1,1186165,.
1,818341,. 1,862101,.
1,830604,. 1,835996,.
1,839037,. 1,858631,.
1,848406,. 1,850417,.
1,858704,. 1,861316,.
1,861508,. 1,862932,.
1,918117,1 1,1231475,.

Here are the command I used to construct the input files:
seqtk mergepe SRR7226683_1.fastq SRR7226683_2.fastq | lianti trim - | bwa mem -Cp bwa_index_rmchr/Homo_sapiens_assembly19.fasta - | samtools view -uS | sambamba sort -o GM12878_cell1_dipc_rmchr.bam /dev/stdin
dip-c seg -v snps/NA12878.txt.gz GM12878_cell1_dipc_rmchr.bam | gzip -c > GM12878_cell1_dipc_phased.seg.gz
dip-c con GM12878_cell1_dipc_phased.seg.gz | gzip -c > GM12878_cell1_dipc_phased.con.gz
dip-c dedup GM12878_cell1_dipc_phased.con.gz | gzip -c > GM12878_cell1_dipc_phased.dedup.gz
dip-c reg -p hf GM12878_cell1_dipc_phased.dedup.gz | gzip -c > GM12878_cell1_dipc_phased.reg.con.gz
dip-c clean GM12878_cell1_dipc_phased.dedup.gz | gzip -c > GM12878_cell1_dipc_phased.clean.con.gz
dip-c impute GM12878_cell1_dipc_phased.clean.con.gz | gzip -c > GM12878_cell1_dipc_phased.clean.impute.con.gz
con_to_ncc.sh GM12878_cell1_dipc_phased.clean.impute.con.gz
nuc_dynamics.sh GM12878_cell1_dipc_phased.clean.impute.ncc 0.1

Thanks a lot!
Looking forward to your help!

Bo Zhang

@tanlongzhi
Copy link
Owner

Hi Bo,

You're right that the intermediate clean.3dg files were not used in any analysis. They just provided intermediate-resolution 3D structures you can look at while waiting for the final, high-resolution structures.

I'll look into the errors you got.

@kainblue
Copy link
Author

kainblue commented Feb 9, 2019

Hi Longzhi,

Thank you really much for clarify the usage of clean.3dg data.
For the error, I finally fix it by following carefully and exactly everything you describe in this repo. It seems to be caused by skipping the Lianti patch.
"Patching LIANTI
For META read preprocessing, LIANTI needs a patch to replace the LIANTI adapters with the META ones:

Download the LIANTI source code.
Replace LIANTI's trim.c with Dip-C's patch/trim.c.
Compile LIANTI.
"
I went back and did this part. Now I can reproduce what you show in this repo.

Thank you really much for all the help!

Bo

@tanlongzhi
Copy link
Owner

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants