VG giraffe warning[vg::Watchdog] #3954

IsaacDiaz026 · 2023-05-12T16:42:20Z

1. What were you trying to do?
I am trying to use vg giraffe to align paired end sequencing reads to a graph that was assembled using cactus-mini graph. I used a small subset of 10,000 reads to test vg giraffe.

I used the command

READ1=Orlando_sub1.fq
READ2=Orlando_sub2.fq

GRAPH=first_run-pg.gbz

vg giraffe -Z "$GRAPH" -f $READ1 -f $READ2 -o BAM > "$RESULTS"/test_orl.bam

2. What did you want to happen?
I was expecting the run to complete quickly since there are only a few reads to align.

3. What actually happened?
vg giraffe has been running for almost 2 days. It keeps producing these warning messages and is not producing an output bam.

warning[vg::Watchdog]: Thread 36 has been checked in for 10 seconds processing: A00351:529:HMFCNDSXY:4:2113:3341:32612, A00351:529:HMFCNDSXY:4:2113:3341:32612
warning[vg::Watchdog]: Thread 21 finally checked out after 120 seconds and 0 kb memory growth processing: A00351:529:HMFCNDSXY:4:1623:13404:34100, A00351:529:HMFCNDSXY:4:1623:13404:34100
warning[vg::Watchdog]: Thread 21 has been checked in for 10 seconds processing: A00351:529:HMFCNDSXY:4:2478:18656:3270, A00351:529:HMFCNDSXY:4:2478:18656:3270

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?

READ1=Orlando_sub1.fq
READ2=Orlando_sub2.fq

GRAPH=first_run-pg.gbz

vg giraffe -Z "$GRAPH" -f $READ1 -f $READ2 -o BAM > "$RESULTS"/test_orl.bam

6. What does running vg version say?

vg version v1.46.0-43-g0d21fd306 "Altamura"
Compiled with g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 on Linux
Linked against libstd++ 20220421
Built by hickey@boreale

The text was updated successfully, but these errors were encountered:

jltsiren · 2023-05-22T19:55:43Z

What does your graph contain and how did you build it?

If you run the following commands, what is the output?

vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz

If the last command listed any sample names under tag reference_samples, try running the following command for each of them:

vg paths -S sample -L -x graph.gbz | wc -l

Additionally, is the mapping speed reasonable if you use GAM or GAF as the output format?

IsaacDiaz026 · 2023-05-29T14:12:14Z

Hello, although I am now able to run giraffe in an array job. The mapping speed is still very slow ~ 80 reads / per second per thread.

I then ran

vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz

which returned
11775451 paths with names, 7 samples with names, 10 haplotypes, 3261 contigs with names
0
reference_samples PUMM
source jltsiren/gbwt

Then I ran

vg paths -S PUMM -L -x graph.gbz | wc -l

which returned 10

jltsiren · 2023-05-29T21:30:01Z

Are you specifying the number of mapping threads each Giraffe job should use with the -t / --threads option?

Also, what does vg stats -l graph.gbz say and how does that compare to the size of the genome?

IsaacDiaz026 · 2023-05-29T23:07:43Z

I specified 32 threads, and vg stats -l graph.gbz returns 400249670. Which is 50 Mb larger than the reference genome. Although other genomes used in the graph construction are closer to 400Mb

jltsiren · 2023-05-29T23:23:24Z

Can you share the graph and some reads? I think we have ruled out most of the common things that could go wrong.

IsaacDiaz026 · 2023-05-30T01:46:22Z

How should I share it?

jltsiren · 2023-05-31T20:16:51Z

I don't know. What options do you have?

IsaacDiaz026 · 2023-06-09T17:09:06Z

I can share it with you on google drive if that works for you

jltsiren · 2023-06-10T06:48:50Z

Sharing on Google Drive should work. Please let me know once you have uploaded the files.

IsaacDiaz026 · 2023-06-11T15:30:33Z

Just shared the folder to your ucsc email address

jltsiren · 2023-06-12T23:44:23Z

There were no real reads to try, but I managed to map simulated 150 bp and 250 bp reads in a reasonable time on my laptop.

Based on the filenames, the graph you provided is a filter graph from the Minigraph-Cactus pipeline. Filtering removes all nodes used only by a single haplotype from the graph. Because you only have 10 haplotypes to begin with, that results in a large number of short path fragments. That may be a problem, especially if the reads you are trying to map diverge significantly from the haplotypes.

You may get better performance by mapping reads to the default (clip) graph.

hxt163 · 2023-06-26T11:12:14Z

I met the same problom,did you solve it? @IsaacDiaz026

starskyzheng · 2023-06-27T02:20:52Z

vg v1.36.0 works.
vg v1.40.0 and v1.48.0 has this problem, and mapping seems never stop ( v1.36.0 use 20m and 1.40.0 runs more than 1 day).

IsaacDiaz026 · 2023-06-27T15:39:29Z

I met the same problom,did you solve it? @IsaacDiaz026

I haven't yet, working on rebuilding a new pangenome graph first

jeizenga · 2023-06-27T16:23:38Z

vg v1.36.0 works. vg v1.40.0 and v1.48.0 has this problem.

I think it's actually that we added these warnings since v1.36.0, not that the mapping got slower. If you see a few of these warnings in a mapping run, it's not such a huge deal. If you're seeing many of them, then there's probably something to troubleshoot with the mapping speed.

hxt163 · 2023-07-03T13:11:23Z

My graph is constructed by minigraph-cactus pipeline. When i mapping with vg giraffe,I saw lots of warnings above so that mapping got lower.I want to know what went wrong maybe. @jeizenga

Andy-B-123 · 2023-08-02T03:41:09Z

Hi, maybe this helps but from what I can tell when vg giraffe hits these errors when accessing the same index files?

My situation was that I have a single graph (with three files graph.dist, graph.giraffe.gbz, graph.min) and multiple short-read samples that I want to align.

When I run a single sample I get steady progress. When I run my samples in parallel (using a slurm array job) I get almost no progress and lots of watchdog errors. When I copy the 3 graph files to unique names for each sample for eahc array job I get steady progress again.

So it feels like there is some conflict in accessing the three graph files when running vg in parallel? I've got no idea how/why but yeah, having separate graph files for each sample let me run normally. Not something I've seen mentioned?

netwon123 · 2023-09-22T12:15:27Z

Hello, although I am now able to run giraffe in an array job. The mapping speed is still very slow ~ 80 reads / per second per thread.

I then ran
vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz
which returned 11775451 paths with names, 7 samples with names, 10 haplotypes, 3261 contigs with names 0 reference_samples PUMM source jltsiren/gbwt

Then I ran
vg paths -S PUMM -L -x graph.gbz | wc -l
which returned 10

What does your graph contain and how did you build it?

If you run the following commands, what is the output?
vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz
If the last command listed any sample names under tag reference_samples, try running the following command for each of them:
vg paths -S sample -L -x graph.gbz | wc -l
Additionally, is the mapping speed reasonable if you use GAM or GAF as the output format?

Sorry sir. If i get the 0

What does your graph contain and how did you build it?

If you run the following commands, what is the output?
vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz
If the last command listed any sample names under tag reference_samples, try running the following command for each of them:
vg paths -S sample -L -x graph.gbz | wc -l
Additionally, is the mapping speed reasonable if you use GAM or GAF as the output format?

Sorry, sir. If my results were like this, what would it reveal?

vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz
vg paths -S PUMM -L -x graph.gbz | wc -l

returned:
17 paths with names, 17 samples with names, 17 haplotypes, 1 contigs with names
1
reference_samples
source jltsiren/gbwt
0

netwon123 · 2023-09-28T04:22:42Z

Maybe I found the reason. when i use real short-read data, there is no problem happen. but when i use simulated data, waring would happen.

DongyaoLiu · 2023-11-17T16:28:41Z

@Andy-B-123 I am facing exactly same question as you described. Do you use cluster job manage tool like slurm to run it in parallel? And I think the reason may be we should compile vg by ourselves?

Andy-B-123 · 2023-11-17T17:21:34Z

@Andy-B-123 I am facing exactly same question as you described. Do you use cluster job manage tool like slurm to run it in parallel? And I think the reason may be we should compile vg by ourselves?

Yes I use slurm on our HPC. My work around is to copy the graph files (graph.dist, graph.giraffe.gbz, graph.min)at the start of each parallel (eg job 1, graph.1.dist, job 2 graph.2.dist... ), use those for each job and then delete them once complete. It increases the space requirements for the run but actually allows it to run in parallel so in my mind it's worth it. Just make sure to include a delete step at the end to clean up the copies graph files

jltsiren mentioned this issue May 28, 2023

Can't run giraffe in an array job with sbatch #3966

Closed

jltsiren closed this as completed May 28, 2023

jltsiren reopened this May 29, 2023

conJUSTover mentioned this issue Jul 11, 2023

vg giraffe hangs on some samples, works fine on others #4020

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VG giraffe warning[vg::Watchdog] #3954

VG giraffe warning[vg::Watchdog] #3954

IsaacDiaz026 commented May 12, 2023

jltsiren commented May 22, 2023

IsaacDiaz026 commented May 29, 2023

jltsiren commented May 29, 2023

IsaacDiaz026 commented May 29, 2023

jltsiren commented May 29, 2023

IsaacDiaz026 commented May 30, 2023

jltsiren commented May 31, 2023

IsaacDiaz026 commented Jun 9, 2023

jltsiren commented Jun 10, 2023

IsaacDiaz026 commented Jun 11, 2023

jltsiren commented Jun 12, 2023

hxt163 commented Jun 26, 2023

starskyzheng commented Jun 27, 2023 •

edited

Loading

IsaacDiaz026 commented Jun 27, 2023

jeizenga commented Jun 27, 2023

hxt163 commented Jul 3, 2023

Andy-B-123 commented Aug 2, 2023

netwon123 commented Sep 22, 2023

netwon123 commented Sep 28, 2023

DongyaoLiu commented Nov 17, 2023

Andy-B-123 commented Nov 17, 2023

VG giraffe warning[vg::Watchdog] #3954

VG giraffe warning[vg::Watchdog] #3954

Comments

IsaacDiaz026 commented May 12, 2023

jltsiren commented May 22, 2023

IsaacDiaz026 commented May 29, 2023

jltsiren commented May 29, 2023

IsaacDiaz026 commented May 29, 2023

jltsiren commented May 29, 2023

IsaacDiaz026 commented May 30, 2023

jltsiren commented May 31, 2023

IsaacDiaz026 commented Jun 9, 2023

jltsiren commented Jun 10, 2023

IsaacDiaz026 commented Jun 11, 2023

jltsiren commented Jun 12, 2023

hxt163 commented Jun 26, 2023

starskyzheng commented Jun 27, 2023 • edited Loading

IsaacDiaz026 commented Jun 27, 2023

jeizenga commented Jun 27, 2023

hxt163 commented Jul 3, 2023

Andy-B-123 commented Aug 2, 2023

netwon123 commented Sep 22, 2023

netwon123 commented Sep 28, 2023

DongyaoLiu commented Nov 17, 2023

Andy-B-123 commented Nov 17, 2023

starskyzheng commented Jun 27, 2023 •

edited

Loading