-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to understand the analysis #26
Comments
Hi Bo, You're right that the intermediate I'll look into the errors you got. |
Hi Longzhi, Thank you really much for clarify the usage of clean.3dg data. Download the LIANTI source code. Thank you really much for all the help! Bo |
Great! |
Hi Longzhi,
I am very interested in learning construct 3D model from HiC data. After a long search, I recently find your fantastic work. I have downloaded fastq file for for GM12878 cell 1 from https://www.ncbi.nlm.nih.gov/sra/SRX4133191 , and now try to follow the instruction in this repo. However, I have problems during the further imputing steps:
"con_to_ncc.sh impute.con.gz
nuc_dynamics.sh impute.ncc 0.1
dip-c impute3 -3 impute.3dg clean.con.gz | gzip -c > impute3.round1.con.gz
dip-c clean3 -c impute.con.gz impute.3dg > impute.clean.3dg
con_to_ncc.sh impute3.round1.con.gz
nuc_dynamics.sh impute3.round1.ncc 0.1
dip-c impute3 -3 impute3.round1.3dg clean.con.gz | gzip -c > impute3.round2.con.gz
dip-c clean3 -c impute3.round1.con.gz impute3.round1.3dg > impute3.round1.clean.3dg
con_to_ncc.sh impute3.round2.con.gz
nuc_dynamics.sh impute3.round2.ncc 0.1
dip-c impute3 -3 impute3.round2.3dg clean.con.gz | gzip -c > impute3.round3.con.gz
dip-c clean3 -c impute3.round2.con.gz impute3.round2.3dg > impute3.round2.clean.3dg
...
"
I feel like the cleaned 3dg file by clean3 at each step is not involved in the next round. The reason I start to question about this is actually because one error I encountered:
dip-c impute3 -3 GM12878_cell1_dipc_phased.clean.impute.clean.3dg GM12878_cell1_dipc_phased.clean.con.gz | gzip -c > GM12878_cell1_dipc_phased.impute3.round1.con.gz
[M::impute3] read a 3D structure with 55404 particles at 100000 bp resolution
[M::impute3] read 612536 contacts (82.47% intra-chromosomal, 8.94% legs phased)
[M::classes] imputed haplotypes for chromosome pair (13,17): 392 contacts (85.2% phased)
[M::classes] imputed haplotypes for chromosome pair (5,8): 1679 contacts (97.74% phased)
[M::classes] imputed haplotypes for chromosome pair (16,17): 216 contacts (66.2% phased)
[M::classes] imputed haplotypes for chromosome pair (1,20): 1078 contacts (92.76% phased)
Traceback (most recent call last):
File "dip-c", line 130, in
main()
File "dip-c", line 63, in main
return_value = impute3.impute3(sys.argv[1:])
File "impute3.py", line 109, in impute3
con_data.impute_from_g3d_data(g3d_data, max_impute3_distance, max_impute3_ratio, max_impute3_ratio * g3d_resolution, is_male, par_data, vio_file)
File "classes.py", line 907, in impute_from_g3d_data
self.con_lists[ref_name_tuple].impute_from_g3d_data(g3d_data, max_impute3_distance, max_impute3_ratio, min_impute3_separation, is_male, par_data, vio_file)
File "classes.py", line 757, in impute_from_g3d_data
con.impute_from_g3d_data(g3d_data, max_impute3_distance, max_impute3_ratio, min_impute3_separation, is_male, par_data, vio_file)
File "classes.py", line 544, in impute_from_g3d_data
impute3_ratio = impute3_distance / con_distance_tuples[1][1]
TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'
Here are the head lines from two input files:
head GM12878_cell1_dipc_phased.clean.impute.clean.3dg
1(mat) 1200000 7.95772097608 -12.0072914165 6.67592442321
1(mat) 1300000 8.89210987528 -11.4486456224 6.61131843187
1(mat) 1400000 8.8277193141 -10.3798272863 6.83290065793
1(mat) 1500000 8.10570766598 -9.67144265436 6.35097905003
1(mat) 1600000 7.99275487247 -8.53433974384 6.52683266786
1(mat) 1700000 6.70429668241 -8.61794012705 5.86833067325
1(mat) 1800000 5.62631622929 -8.49098630055 5.17833888961
1(mat) 1900000 4.8879961287 -8.44522282731 3.98121528589
1(mat) 2000000 3.80732676666 -7.76977419875 3.35459947567
1(mat) 2100000 3.06260319638 -8.19929825445 4.23444940641
zcat GM12878_cell1_dipc_phased.clean.con.gz | head
1,756415,. 1,1095231,.
1,757502,. 1,1218674,.
1,815689,. 1,1186165,.
1,818341,. 1,862101,.
1,830604,. 1,835996,.
1,839037,. 1,858631,.
1,848406,. 1,850417,.
1,858704,. 1,861316,.
1,861508,. 1,862932,.
1,918117,1 1,1231475,.
Here are the command I used to construct the input files:
seqtk mergepe SRR7226683_1.fastq SRR7226683_2.fastq | lianti trim - | bwa mem -Cp bwa_index_rmchr/Homo_sapiens_assembly19.fasta - | samtools view -uS | sambamba sort -o GM12878_cell1_dipc_rmchr.bam /dev/stdin
dip-c seg -v snps/NA12878.txt.gz GM12878_cell1_dipc_rmchr.bam | gzip -c > GM12878_cell1_dipc_phased.seg.gz
dip-c con GM12878_cell1_dipc_phased.seg.gz | gzip -c > GM12878_cell1_dipc_phased.con.gz
dip-c dedup GM12878_cell1_dipc_phased.con.gz | gzip -c > GM12878_cell1_dipc_phased.dedup.gz
dip-c reg -p hf GM12878_cell1_dipc_phased.dedup.gz | gzip -c > GM12878_cell1_dipc_phased.reg.con.gz
dip-c clean GM12878_cell1_dipc_phased.dedup.gz | gzip -c > GM12878_cell1_dipc_phased.clean.con.gz
dip-c impute GM12878_cell1_dipc_phased.clean.con.gz | gzip -c > GM12878_cell1_dipc_phased.clean.impute.con.gz
con_to_ncc.sh GM12878_cell1_dipc_phased.clean.impute.con.gz
nuc_dynamics.sh GM12878_cell1_dipc_phased.clean.impute.ncc 0.1
Thanks a lot!
Looking forward to your help!
Bo Zhang
The text was updated successfully, but these errors were encountered: