Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in dip-c seg and intermediate files #21

Open
bioyuyang opened this issue Nov 28, 2018 · 5 comments
Open

Error in dip-c seg and intermediate files #21

bioyuyang opened this issue Nov 28, 2018 · 5 comments

Comments

@bioyuyang
Copy link

Hi Tan,

This's Yuyang from Tsinghua Uni, Beijing.
Hope you have a nice holiday.

I just followed the "Typical Workflow" in my server and got trouble at the very beginning step.

../seqtk-master/seqtk mergepe SRR7226685_1.fastq SRR7226685_2.fastq | ../lianti-master/lianti trim - |../bwa-master/bwa mem -Cp ../hg19.fa - | samtools view -uS |../sambamba-0.6.8-linux-static sort -o aln.bam /dev/stdin
./dip-c seg -v snps/NA12878.txt.gz aln.bam | gzip -c > phased.seg.gz

It throw an error in the second step.

[M::seg] pass 2: read 24000000 alignments, last at chrX:33183726
[M::seg] pass 2: read 24100000 alignments, last at chrX:45099699
[M::seg] pass 2: read 25000000 alignments, last at chrX:149610521
[M::seg] pass 2: read 25100000 alignments, last at chrY:13801064
[M::seg] pass 2: read 25200000 alignments, last at chr9_gl000198_random:71282
[M::seg] pass 2: read 25300000 alignments, last at chrUn_gl000216:27250
[M::seg] pass 2: read 25400000 alignments, last at chrUn_gl000220:139949
[M::seg] pass 2: read 25500000 alignments, last at chrUn_gl000226:14258
[M::seg] pass 2: read 25600000 alignments, last at *
[M::seg] pass 2: read 26800000 alignments, last at *
[M::seg] pass 2: read 26900000 alignments, last at *
[M::seg] pass 2: read 27000000 alignments, last at *
[M::seg] pass 2: cleaning 2230534 candidate reads
[M::seg] pass 2 done: read 27005726 alignments; kept 1815407 candidate reads (6.72% of alignments)
Traceback (most recent call last):
File "./dip-c", line 130, in
main()
File "./dip-c", line 42, in main
return_value = seg.seg(sys.argv[1:])
File "/home/DAILY_WORK/LYY/dip-c-master/seg.py", line 129, in seg
for pileup_column in bam_file.pileup(snp_chr, snp_locus - 1, snp_locus):
File "pysam/libcalignmentfile.pyx", line 1314, in pysam.libcalignmentfile.AlignmentFile.pileup (pysam/libcalignmentfile.c:16452)
File "pysam/libchtslib.pyx", line 675, in pysam.libchtslib.HTSFile.parse_region (pysam/libchtslib.c:11863)
ValueError: invalid contig `1

By the way, could you mind uploading some key intermediate files? It would make the pipeline easy to follow and also for debugging. For example, in the "Interactive Visualization of 3D Genomes" section, cell.3dg is used in the whole section to make the pretty figures. Moreover, the 3D reconstruction process seems a little bit tricky as you also showed in the Fig. S8 in your Science paper. Do you have any suggestions to gain a reasonable simulated 3D structure?

Thanks so much for your help!
Yuyang

@tanlongzhi
Copy link
Owner

Hi Yuyang,

I'll take a look at your error as soon as possible.

For an example .3dg file, there's already a FTP link in README.md; but it's not showing up because GitHub doesn't support FTP links. Here I've pasted it below for your convenience:

ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3271nnn/GSM3271352/suppl/GSM3271352_gm12878_06.impute3.round4.clean.3dg.txt.gz

The corresponding GEO accession contains final files like this for all single cells, as well as all intermediate files starting from raw.con.gz. However, the two earliest files, aln.bam and phased.seg.gz haven't been provided because of their large size.

@tanlongzhi
Copy link
Owner

Your error seems to come from a discrepancy in chromosome naming between your genome file (chr1 in your hg19.fa) and the SNP file you used (1 in snps/NA12878.txt.gz). You must change one of them to match the other.

The importance of chromosome name matching has been mentioned in an earlier comment for this repo, and another comment for the companion repo hickit.

@bioyuyang
Copy link
Author

bioyuyang commented Nov 28, 2018 via email

@liubinnk1
Copy link

Hi Tan,

I run the dip-c seg command and got the issues:

The messages is as following:
Traceback (most recent call last):
File "/THL8/home/liubin/software/dip-c-master/dip-c", line 130, in
main()
File "/THL8/home/liubin/software/dip-c-master/dip-c", line 42, in main
return_value = seg.seg(sys.argv[1:])
File "/THL8/home/liubin/software/dip-c-master/seg.py", line 115, in seg
seg_data.clean()
File "/THL8/home/liubin/software/dip-c-master/classes.py", line 204, in clean
for name in self.reads.keys():
RuntimeError: dictionary changed size during iteration

@tanlongzhi
Copy link
Owner

Hi @liubinnk1, please see my reply to your identical question in the other thread.
Best,
Tan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants