Bubble removal and unitig construction while retaining original coordinates #2979

rickbeeloo · 2020-09-14T15:11:42Z

I'm aware questions are preferably answered at Biostars, however, I saw older questions not being addressed there while being here. If this is not convenient just notify me and I will immediately transfer it to Biostars!

Simple example
We created three simple sequences of which two are identical except an SNP, and one has a new subsequence. We constructed the GFA using SeqWish after aligning them using Minimap2:

We used GFAtools (gfatools asm -b 10 -u) to remove bubbles and hereafter merge the nodes to unitigs.

Question
As GFAtools is based on Miniasm, and thus "genome assembly", it does not (yet) remember the original paths. Hence, in the unitig graph, we are unable to tell the locations of the nodes/fragments in the input genomes (which we are interested in). Therefore we wonder whether/how we can pop bubbles and get the unitigs using VG? We took a look at vg simplify however this seems to require BED and VCF files instead of our GFA.

The text was updated successfully, but these errors were encountered:

ekg · 2020-09-14T16:17:53Z

gfatools appears to be unable to record or work with the embedded paths. In this case there is a simple workaround. Map the unitigs back into the original graph. You can then annotate the positions of all the nodes in the original graph with xg's GFA output (https://github.com/vgteam/xg).

The trivial way to do this is to add the unitigs to your source FASTA when you're running seqwish. Align them to the other sequences as well. Then, when you re-induce the graph, the unitigs will be embedded and you'll be able to get the positions of the bubbles relative to them.

I think this will do what you want:

xg -g input.gfa -G >position_annotated.gfa

rickbeeloo · 2020-09-15T12:34:20Z

Thank you @ekg for the quick response!

The simplified data

source.gfa (= left in the previous figure)

H	VN:Z:1.0
S	1	ATCGACTGACACGATCGACTA
S	2	C
S	3	GACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG
S	4	ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
S	5	G
S	6	TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC
L	1	+	2	+	0M
L	1	+	5	+	0M
L	2	+	3	+	0M
L	3	+	4	+	0M
L	5	+	3	+	0M
L	6	+	4	+	0M
P	genome1	1+,2+,3+,4+	*,*,*
P	genome2	1+,5+,3+,4+	*,*,*
P	genome3	6+,4+	*

unitig.gfa (= right in the previous figure)

S	utg0000001l	ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG	LN:i:74	RC:i:3	lc:i:74
A	utg0000001l	0	+	1	0	21
A	utg0000001l	21	+	5	0	1
A	utg0000001l	22	+	3	0	52
S	utg0000002l	ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC	LN:i:289	RC:i:1	lc:i:289
A	utg0000002l	0	+	4	0	289
S	utg0000003l	TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC	LN:i:110	RC:i:1	lc:i:110
A	utg0000003l	0	+	6	0	110
L	utg0000001l	+	utg0000002l	+	0M	L1:i:74	L2:i:289
L	utg0000002l	-	utg0000003l	-	0M	L1:i:289	L2:i:110

Mapping attempt
I was not sure where I'm supposed to use the xg command you provided earlier, but this is what I tried now for the mapping:

# GFA to vg
vg view -F source.gfa -v > source.vg

# Index
vg index -x source_index.gx -g source_index.gcsa source.vg

# Unitig graph to FASTA
gfatools gfa2fa unitig.gfa > query.fasta

# Query the FASTA
vg map -F query.fasta -M 10 -x source_index.gx -g source_index.gcsa -j

However, this is different from what I would expect. For example, let's look at the json for utg0000001l it only returns the matching nodes 1, 5, 3 for genome2 but not for genome1 (although it is mentioned in refpos: of course 1 and 3 overlap but 2 is absent from the output).

{
   "identity":1.0,
   "mapping_quality":60,
   "name":"utg0000001l",
   "path":{
      "mapping":[
         {
            "edit":[
               {
                  "from_length":21,
                  "to_length":21
               }
            ],
            "position":{
               "node_id":"1"
            },
            "rank":"1"```
         },
         {
            "edit":[
               {
                  "from_length":1,
                  "to_length":1
               }
            ],
            "position":{
               "node_id":"5"
            },
            "rank":"2"
         },
         {
            "edit":[
               {
                  "from_length":52,
                  "to_length":52
               }
            ],
            "position":{
               "node_id":"3"
            },
            "rank":"3"
         }

Questions

Am I missing something here as I thought node 2 from genome1 of source.gfa should also be included in the mapping?
Where exactly did you mean to use the xg command you provided?

Thank you in advance!

ekg · 2020-09-15T13:30:58Z

Take the sequences that you used as input and the sequences of the unitigs. Make sure they have unique names. Concatenate them into a single file (s.fa). Map them all together with minimap2 -X s.fa s.fa >s.paf. Induce the variation graph with seqwish. seqwish -s s.fa -p s.paf -g s.gfa Index the graph using xg to get the coordinates of each node relative to all overlapping paths. xg -g s.gfa -G >a.gfa Now in a.gfa you will have a coordinate descriptor tag in JSON for each node. You can use this to find the coordinates of the bubbles versus your unitigs.

…

On Tue, Sep 15, 2020, 14:34 rickbeeloo ***@***.***> wrote: *The simplified data* source.gfa (= left in the previous figure) H VN:Z:1.0 S 1 ATCGACTGACACGATCGACTA S 2 C S 3 GACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG S 4 ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC S 5 G S 6 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC L 1 + 2 + 0M L 1 + 5 + 0M L 2 + 3 + 0M L 3 + 4 + 0M L 5 + 3 + 0M L 6 + 4 + 0M P genome1 1+,2+,3+,4+ *,*,* P genome2 1+,5+,3+,4+ *,*,* P genome3 6+,4+ * unitig.gfa (= right in the previous figure) S utg0000001l ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG LN:i:74 RC:i:3 lc:i:74 A utg0000001l 0 + 1 0 21 A utg0000001l 21 + 5 0 1 A utg0000001l 22 + 3 0 52 S utg0000002l ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC LN:i:289 RC:i:1 lc:i:289 A utg0000002l 0 + 4 0 289 S utg0000003l TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC LN:i:110 RC:i:1 lc:i:110 A utg0000003l 0 + 6 0 110 L utg0000001l + utg0000002l + 0M L1:i:74 L2:i:289 L utg0000002l - utg0000003l - 0M L1:i:289 L2:i:110 *Mapping attempt* I was not sure where I'm supposed to use the xg command you provided earlier, but this is what I tried now for the mapping: # GFA to vg vg view -F source.gfa -v > source.vg # Index vg index -x source_index.gx -g source_index.gcsa source.vg # Unitig graph to FASTA gfatools gfa2fa unitig.gfa > query.fasta # Query the FASTA vg map -F query.fasta -M 10 -x source_index.gx -g source_index.gcsa -j However, this is different from what I would expect. For example, let's look at the json for utg0000001l it only returns the matching nodes 1, 5, 3 for genome2 but not for genome1 (of course 1 and 3 overlap but 2 is absent from the output). [image: image] <https://user-images.githubusercontent.com/19516376/93210373-c490d000-f75f-11ea-82c0-491b80f97889.png> { "identity":1.0,![image](https://user-images.githubusercontent.com/19516376/93209801-e6d61e00-f75e-11ea-8c1a-a24852ffb706.png) "mapping_quality":60, "name":"utg0000001l", "path":{ "mapping":[ { "edit":[ { "from_length":21, "to_length":21 } ], "position":{ "node_id":"1" }, "rank":"1"``` }, { "edit":[ { "from_length":1, "to_length":1 } ], "position":{ "node_id":"5" }, "rank":"2" }, { "edit":[ { "from_length":52, "to_length":52 } ], "position":{ "node_id":"3" }, "rank":"3" } ] }, "refpos":[ { "name":"genome1" }, { "name":"genome2" } ], "score":84, "sequence":"ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG", "time_used":351.0 } *Questions* - Am I missing something here? I thought the -M parameter would allow more than one match and thereby include genome1 too. - Where exactly did you mean to use the xg command you provided? Thank you in advance! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2979 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEJZCUK4CXAIVAEI2JDSF5NNZANCNFSM4RLXDCOQ> .

ekg · 2020-09-15T13:31:43Z

By the way, you can probably apply vg deconstruct at the end rather than xg. Then you will get a VCF. You would specify the unitig names as references. The xg route is just very simple.

…

On Tue, Sep 15, 2020, 15:30 Erik Garrison ***@***.***> wrote: Take the sequences that you used as input and the sequences of the unitigs. Make sure they have unique names. Concatenate them into a single file (s.fa). Map them all together with minimap2 -X s.fa s.fa >s.paf. Induce the variation graph with seqwish. seqwish -s s.fa -p s.paf -g s.gfa Index the graph using xg to get the coordinates of each node relative to all overlapping paths. xg -g s.gfa -G >a.gfa Now in a.gfa you will have a coordinate descriptor tag in JSON for each node. You can use this to find the coordinates of the bubbles versus your unitigs. On Tue, Sep 15, 2020, 14:34 rickbeeloo ***@***.***> wrote: > *The simplified data* > > source.gfa (= left in the previous figure) > > H VN:Z:1.0 > S 1 ATCGACTGACACGATCGACTA > S 2 C > S 3 GACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG > S 4 ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > S 5 G > S 6 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC > L 1 + 2 + 0M > L 1 + 5 + 0M > L 2 + 3 + 0M > L 3 + 4 + 0M > L 5 + 3 + 0M > L 6 + 4 + 0M > P genome1 1+,2+,3+,4+ *,*,* > P genome2 1+,5+,3+,4+ *,*,* > P genome3 6+,4+ * > > unitig.gfa (= right in the previous figure) > > S utg0000001l ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG LN:i:74 RC:i:3 lc:i:74 > A utg0000001l 0 + 1 0 21 > A utg0000001l 21 + 5 0 1 > A utg0000001l 22 + 3 0 52 > S utg0000002l ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC LN:i:289 RC:i:1 lc:i:289 > A utg0000002l 0 + 4 0 289 > S utg0000003l TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC LN:i:110 RC:i:1 lc:i:110 > A utg0000003l 0 + 6 0 110 > L utg0000001l + utg0000002l + 0M L1:i:74 L2:i:289 > L utg0000002l - utg0000003l - 0M L1:i:289 L2:i:110 > > *Mapping attempt* > I was not sure where I'm supposed to use the xg command you provided > earlier, but this is what I tried now for the mapping: > > # GFA to vg > vg view -F source.gfa -v > source.vg > > # Index > vg index -x source_index.gx -g source_index.gcsa source.vg > > # Unitig graph to FASTA > gfatools gfa2fa unitig.gfa > query.fasta > > # Query the FASTA > vg map -F query.fasta -M 10 -x source_index.gx -g source_index.gcsa -j > > However, this is different from what I would expect. For example, let's > look at the json for utg0000001l it only returns the matching nodes 1, > 5, 3 for genome2 but not for genome1 (of course 1 and 3 overlap but 2 is > absent from the output). > [image: image] > <https://user-images.githubusercontent.com/19516376/93210373-c490d000-f75f-11ea-82c0-491b80f97889.png> > > { > "identity":1.0,![image](https://user-images.githubusercontent.com/19516376/93209801-e6d61e00-f75e-11ea-8c1a-a24852ffb706.png) > "mapping_quality":60, > "name":"utg0000001l", > "path":{ > "mapping":[ > { > "edit":[ > { > "from_length":21, > "to_length":21 > } > ], > "position":{ > "node_id":"1" > }, > "rank":"1"``` > }, > { > "edit":[ > { > "from_length":1, > "to_length":1 > } > ], > "position":{ > "node_id":"5" > }, > "rank":"2" > }, > { > "edit":[ > { > "from_length":52, > "to_length":52 > } > ], > "position":{ > "node_id":"3" > }, > "rank":"3" > } > ] > }, > "refpos":[ > { > "name":"genome1" > }, > { > "name":"genome2" > } > ], > "score":84, > "sequence":"ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG", > "time_used":351.0 > } > > *Questions* > > - Am I missing something here? I thought the -M parameter would allow > more than one match and thereby include genome1 too. > - Where exactly did you mean to use the xg command you provided? > > Thank you in advance! > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#2979 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AABDQEJZCUK4CXAIVAEI2JDSF5NNZANCNFSM4RLXDCOQ> > . >

rickbeeloo · 2020-09-16T09:29:59Z

Hi @ekg, thanks again!
I ran the exact steps you suggested however some unexpected things appear :(

I first created the new FASTA, with unitigs and the original sequences, as you suggested:
1. cat query.fasta source.fa > s.fa
Which thus looks like this:

>utg0000001l
ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG
>utg0000002l
ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
>utg0000003l
TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC
>genome1
ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
>genome2
ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
>genome3
TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC

2. minimap2 -X s.fa s.fa >s.paf (not as expected)

The s.paf content:

genome1	363	24	356	+	genome2	363	24	356	332	332	0	tp:A:S	cm:i:58	s1:i:332	dv:f:0.0011	rl:i:0
genome1	363	83	356	+	utg0000002l	289	9	282	273	273	0	tp:A:S	cm:i:46	s1:i:273	dv:f:0	rl:i:0
genome1	363	83	356	+	genome3	399	119	392	273	273	0	tp:A:S	cm:i:46	s1:i:273	dv:f:0.0014	rl:i:0
genome1	363	24	74	+	utg0000001l	74	24	74	50	50	0	tp:A:S	cm:i:9	s1:i:50	dv:f:0.0070	rl:i:0
genome2	363	83	356	+	utg0000002l	289	9	282	273	273	0	tp:A:S	cm:i:46	s1:i:273	dv:f:0	rl:i:0
genome2	363	83	356	+	genome3	399	119	392	273	273	0	tp:A:S	cm:i:46	s1:i:273	dv:f:0.0014	rl:i:0
genome2	363	4	74	+	utg0000001l	74	4	74	70	70	0	tp:A:S	cm:i:12	s1:i:70	dv:f:0	rl:i:0
genome3	399	119	392	+	utg0000002l	289	9	282	273	273	0	tp:A:S	cm:i:46	s1:i:273	dv:f:0	rl:i:0
genome3	399	2	102	+	utg0000003l	110	2	102	100	100	0	tp:A:S	cm:i:14	s1:i:100	dv:f:0	rl:i:0

We know that genome1 and genome2 are identical except for one SNP. utg0000001l should cover positions 0-75 of both genome1 and genome2. In fact, utg0000001l is identical to positions 0-75 of genome2. However, upon inspection of this output we notice that neither of the genomes is correctly covered (does not start at 0) nor are they equally aligned (different start positions). Notably, the SNP is at position 22 probably influencing this start position difference here too, however, if I understood correctly, the goal of this approach was to actually make the unitig cover the exact parts it was originally derived from. Likewise, we know that utg0000003l should start at position 0 on genome3 but it starts at position 2.

I could resolve this by setting the -w (minimizer window size) parameter to 2:
minimap2 -X -w 2 s.fa s.fa >s.paf:

genome1	363	0	362	+	genome2	363	0	362	359	362	0	tp:A:S	cm:i:223	s1:i:359	dv:f:0.0032	rl:i:0
genome1	363	75	362	+	genome3	399	111	398	287	287	0	tp:A:S	cm:i:184	s1:i:287	dv:f:0.0004	rl:i:0
genome1	363	75	362	+	utg0000002l	289	1	288	287	287	0	tp:A:S	cm:i:184	s1:i:287	dv:f:0	rl:i:0
genome1	363	0	74	+	utg0000001l	74	0	74	71	74	0	tp:A:S	cm:i:29	s1:i:71	dv:f:0.0214	rl:i:0
genome2	363	75	362	+	genome3	399	111	398	287	287	0	tp:A:S	cm:i:184	s1:i:287	dv:f:0.0004	rl:i:0
genome2	363	75	362	+	utg0000002l	289	1	288	287	287	0	tp:A:S	cm:i:184	s1:i:287	dv:f:0	rl:i:0
genome2	363	0	74	+	utg0000001l	74	0	74	74	74	0	tp:A:S	cm:i:39	s1:i:74	dv:f:0	rl:i:0
genome3	399	111	398	+	utg0000002l	289	1	288	287	287	0	tp:A:S	cm:i:184	s1:i:287	dv:f:0	rl:i:0
genome3	399	0	110	+	utg0000003l	110	0	110	110	110	0	tp:A:S	cm:i:68	s1:i:110	dv:f:0	rl:i:0

However, when resolving the unitigs heavily depends on the parameters I do wonder how robust this approach is when extending beyond thousands of genomes (which is our plan) where unitigs can be very small till almost the sizes of whole genomes.

3. seqwish -s s.fa -p s.paf -g s.gfa
With s.gfa looking like:

H	VN:Z:1.0
S	1	ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG
S	2	ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
S	3	TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC
S	4	ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
S	5	ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
S	6	TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
P	utg0000001l	1+	*
P	utg0000002l	2+	*
P	utg0000003l	3+	*
P	genome1	4+	*
P	genome2	5+	*
P	genome3	6+	*

Here we see that each sequence gets its own node rather than the overlap it should have?

4. xg -g s.gfa -G >a.gfa
With a.gfa:

H	VN:Z:1.0
S	1	ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG	PO:J:[["utg0000001l","+",0]]
S	2	ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC	PO:J:[["utg0000002l","+",0]]
S	3	TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC	PO:J:[["utg0000003l","+",0]]
S	4	ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC	PO:J:[["genome1","+",0]]
S	5	ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC	PO:J:[["genome2","+",0]]
S	6	TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC	PO:J:[["genome3","+",0]]

rickbeeloo · 2020-09-16T11:48:25Z

Sorry just realized you did not provide -c for the minimap2 command in your example and as you recently noticed (ekg/seqwish#61) it didn't trigger a warning for me. I will report back after I run minimap -c!

ekg · 2020-09-16T12:38:17Z

You need to tell minimap2 to produce cigars to use it in seqwish. Add -c to the mapping command. Otherwise there is no way to know what base pairs map to others.

…

On Wed, Sep 16, 2020, 11:13 rickbeeloo ***@***.***> wrote: Hi @ekg <https://github.com/ekg>, thanks again! I ran the exact steps you suggested however some unexpected things appear :( I first created the new FASTA, with unitigs and the original sequences, as you suggested: *1. cat query.fasta source.fa > s.fa* Which thus looks like this: >utg0000001l ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCG ACATCGATCGACTG >utg0000002l ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCA TCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACT GACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCT ATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCG ACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC >utg0000003l TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGC TACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC >genome1 ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC >genome2 ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC >genome3 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC *2. minimap2 -X s.fa s.fa >s.paf* (*not as expected*) The s.paf content: genome1 363 24 356 + genome2 363 24 356 332 332 0 tp:A:S cm:i:58 s1:i:332 dv:f:0.0011 rl:i:0 genome1 363 83 356 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 genome1 363 83 356 + genome3 399 119 392 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0.0014 rl:i:0 genome1 363 24 74 + utg0000001l 74 24 74 50 50 0 tp:A:S cm:i:9 s1:i:50 dv:f:0.0070 rl:i:0 genome2 363 83 356 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 genome2 363 83 356 + genome3 399 119 392 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0.0014 rl:i:0 genome2 363 4 74 + utg0000001l 74 4 74 70 70 0 tp:A:S cm:i:12 s1:i:70 dv:f:0 rl:i:0 genome3 399 119 392 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 genome3 399 2 102 + utg0000003l 110 2 102 100 100 0 tp:A:S cm:i:14 s1:i:100 dv:f:0 rl:i:0 We know that genome1 and genome2 are identical except for one SNP. utg0000001l should cover positions 1-74 of both genome1 and genome2. In fact, utg0000001l is identical to positions 1-74 of genome2. However, upon inspection of this output we notice that neither of the genomes is correctly covered (does not start at 1) nor are they equally aligned (different start positions): genome1 363 24 74 + utg0000001l genome2 363 4 74 + utg0000001l Perhaps parameter settings can fix this or I can make an issue on the Github of Heng Li, however, I then wonder how reliable this approach is, especially since utg0000001l is even identical to the first part of genome2. Moreover, we plan to extend this to thousands of genomes. I just continued regardless just to see what the GFA would look like in the end. *3. seqwish -s s.fa -p s.paf -g s.gfa* With s.gfa looking like: H VN:Z:1.0 S 1 ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG S 2 ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC S 3 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC S 4 ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC S 5 ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC S 6 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC P utg0000001l 1+ * P utg0000002l 2+ * P utg0000003l 3+ * P genome1 4+ * P genome2 5+ * P genome3 6+ * Here we see that each sequence gets its own node rather than the overlap it should have. Any suggestions? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2979 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEMVSNQB62W7T4OGXRLSGB6TDANCNFSM4RLXDCOQ> .

ekg · 2020-09-16T12:38:47Z

This is why you get a graph with a separate node for each contig.

…

On Wed, Sep 16, 2020, 14:38 Erik Garrison ***@***.***> wrote: You need to tell minimap2 to produce cigars to use it in seqwish. Add -c to the mapping command. Otherwise there is no way to know what base pairs map to others. On Wed, Sep 16, 2020, 11:13 rickbeeloo ***@***.***> wrote: > Hi @ekg <https://github.com/ekg>, thanks again! > I ran the exact steps you suggested however some unexpected things appear > :( > > I first created the new FASTA, with unitigs and the original sequences, > as you suggested: > *1. cat query.fasta source.fa > s.fa* > Which thus looks like this: > > >utg0000001l > ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCG > ACATCGATCGACTG > >utg0000002l > ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCA > TCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACT > GACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCT > ATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCG > ACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > >utg0000003l > TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGC > TACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC > >genome1 > ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > >genome2 > ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > >genome3 > TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > > *2. minimap2 -X s.fa s.fa >s.paf* (*not as expected*) > > The s.paf content: > > genome1 363 24 356 + genome2 363 24 356 332 332 0 tp:A:S cm:i:58 s1:i:332 dv:f:0.0011 rl:i:0 > genome1 363 83 356 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 > genome1 363 83 356 + genome3 399 119 392 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0.0014 rl:i:0 > genome1 363 24 74 + utg0000001l 74 24 74 50 50 0 tp:A:S cm:i:9 s1:i:50 dv:f:0.0070 rl:i:0 > genome2 363 83 356 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 > genome2 363 83 356 + genome3 399 119 392 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0.0014 rl:i:0 > genome2 363 4 74 + utg0000001l 74 4 74 70 70 0 tp:A:S cm:i:12 s1:i:70 dv:f:0 rl:i:0 > genome3 399 119 392 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 > genome3 399 2 102 + utg0000003l 110 2 102 100 100 0 tp:A:S cm:i:14 s1:i:100 dv:f:0 rl:i:0 > > We know that genome1 and genome2 are identical except for one SNP. > utg0000001l should cover positions 1-74 of both genome1 and genome2. In > fact, utg0000001l is identical to positions 1-74 of genome2. However, > upon inspection of this output we notice that neither of the genomes is > correctly covered (does not start at 1) nor are they equally aligned > (different start positions): > > genome1 363 24 74 + utg0000001l > genome2 363 4 74 + utg0000001l > > Perhaps parameter settings can fix this or I can make an issue on the > Github of Heng Li, however, I then wonder how reliable this approach is, > especially since utg0000001l is even identical to the first part of > genome2. Moreover, we plan to extend this to thousands of genomes. > > I just continued regardless just to see what the GFA would look like in > the end. > > *3. seqwish -s s.fa -p s.paf -g s.gfa* > With s.gfa looking like: > > H VN:Z:1.0 > S 1 ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG > S 2 ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > S 3 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC > S 4 ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > S 5 ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > S 6 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > P utg0000001l 1+ * > P utg0000002l 2+ * > P utg0000003l 3+ * > P genome1 4+ * > P genome2 5+ * > P genome3 6+ * > > > Here we see that each sequence gets its own node rather than the overlap > it should have. > > Any suggestions? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2979 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AABDQEMVSNQB62W7T4OGXRLSGB6TDANCNFSM4RLXDCOQ> > . >

ekg · 2020-09-16T12:39:42Z

I'll add a warning to seqwish when this is encountered. It's an obvious gotcha. Sorry for the confusion.

…

On Wed, Sep 16, 2020, 14:38 Erik Garrison ***@***.***> wrote: You need to tell minimap2 to produce cigars to use it in seqwish. Add -c to the mapping command. Otherwise there is no way to know what base pairs map to others. On Wed, Sep 16, 2020, 11:13 rickbeeloo ***@***.***> wrote: > Hi @ekg <https://github.com/ekg>, thanks again! > I ran the exact steps you suggested however some unexpected things appear > :( > > I first created the new FASTA, with unitigs and the original sequences, > as you suggested: > *1. cat query.fasta source.fa > s.fa* > Which thus looks like this: > > >utg0000001l > ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCG > ACATCGATCGACTG > >utg0000002l > ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCA > TCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACT > GACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCT > ATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCG > ACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > >utg0000003l > TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGC > TACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC > >genome1 > ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > >genome2 > ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > >genome3 > TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCtACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > > *2. minimap2 -X s.fa s.fa >s.paf* (*not as expected*) > > The s.paf content: > > genome1 363 24 356 + genome2 363 24 356 332 332 0 tp:A:S cm:i:58 s1:i:332 dv:f:0.0011 rl:i:0 > genome1 363 83 356 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 > genome1 363 83 356 + genome3 399 119 392 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0.0014 rl:i:0 > genome1 363 24 74 + utg0000001l 74 24 74 50 50 0 tp:A:S cm:i:9 s1:i:50 dv:f:0.0070 rl:i:0 > genome2 363 83 356 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 > genome2 363 83 356 + genome3 399 119 392 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0.0014 rl:i:0 > genome2 363 4 74 + utg0000001l 74 4 74 70 70 0 tp:A:S cm:i:12 s1:i:70 dv:f:0 rl:i:0 > genome3 399 119 392 + utg0000002l 289 9 282 273 273 0 tp:A:S cm:i:46 s1:i:273 dv:f:0 rl:i:0 > genome3 399 2 102 + utg0000003l 110 2 102 100 100 0 tp:A:S cm:i:14 s1:i:100 dv:f:0 rl:i:0 > > We know that genome1 and genome2 are identical except for one SNP. > utg0000001l should cover positions 1-74 of both genome1 and genome2. In > fact, utg0000001l is identical to positions 1-74 of genome2. However, > upon inspection of this output we notice that neither of the genomes is > correctly covered (does not start at 1) nor are they equally aligned > (different start positions): > > genome1 363 24 74 + utg0000001l > genome2 363 4 74 + utg0000001l > > Perhaps parameter settings can fix this or I can make an issue on the > Github of Heng Li, however, I then wonder how reliable this approach is, > especially since utg0000001l is even identical to the first part of > genome2. Moreover, we plan to extend this to thousands of genomes. > > I just continued regardless just to see what the GFA would look like in > the end. > > *3. seqwish -s s.fa -p s.paf -g s.gfa* > With s.gfa looking like: > > H VN:Z:1.0 > S 1 ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG > S 2 ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > S 3 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC > S 4 ATCGACTGACACGATCGACTACGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > S 5 ATCGACTGACACGATCGACTAGGACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > S 6 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGCACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC > P utg0000001l 1+ * > P utg0000002l 2+ * > P utg0000003l 3+ * > P genome1 4+ * > P genome2 5+ * > P genome3 6+ * > > > Here we see that each sequence gets its own node rather than the overlap > it should have. > > Any suggestions? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2979 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AABDQEMVSNQB62W7T4OGXRLSGB6TDANCNFSM4RLXDCOQ> > . >

rickbeeloo · 2020-09-17T10:06:17Z

Thanks again @ekg, however, an interesting problem remains. I was looking at how I could format the output and noticed the following:

So what we see here is that the bubble we initially popped, namely the SNP at position 22 (see the first figure I sent) re-appears here. This makes sense as the bubble popping took the genome2 sequence with G at 22 as "replacement" rather than C from genome1. This is detected by SeqWish resulting in a separate node: S 13 C PO:J:[["genome1","+",21]]. Thus this approach does not seem to work with bubble popping unless we:

Would somehow remember where bubble popping occurred
Post-process the a.gfa where we assume that small windows of "missing coverage" were previous bubbles - and thus ignore this.
Perhaps replace the nucleotides in a bubble for N
Just use the minimap output directly? (see below)

Minimap
Perhaps we can just use the minmap output directly as this correctly tells us that e.g. utg0000001l covers 1-74 of genome1 and genome2. To make this more visual I plotted the minimap output where the blocks represent the found alignments for each genome (the unitigs in orange):

Each unitig indeed correctly covers the genomes. In this way, we can still extract the unitig sequences from the unitig GFA and their genomic locations from minimap. In fact, I suppose we can take the unitig GFA and store the minimap positions in the JSON - creating the "location graph". Perhaps I'm missing something in my reason, as I suppose you proposed re-inducing the graph using seqwish for a reason?

The a.gfa (in case you wonder/need this)

H	VN:Z:1.0
S	1	ATCGACTGACACGATCGACTA	PO:J:[["utg0000001l","+",0],["genome1","+",0],["genome2","+",0]]
L	1	+	2	+	0M
L	1	+	13	+	0M
S	2	G	PO:J:[["utg0000001l","+",21],["genome2","+",21]]
L	2	+	3	+	0M
S	3	GACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACAT	PO:J:[["utg0000001l","+",22],["genome1","+",22],["genome2","+",22]]
L	3	+	4	+	0M
S	4	CGATCGA	PO:J:[["utg0000001l","+",64],["utg0000003l","+",98],["genome1","+",64],["genome2","+",64],["genome3","+",98]]
L	4	+	5	+	0M
L	4	+	10	+	0M
S	5	C	PO:J:[["utg0000001l","+",71],["utg0000003l","+",106],["genome1","+",71],["genome2","+",71],["genome3","+",106]]
L	5	+	6	+	0M
L	5	+	11	+	0M
S	6	T	PO:J:[["utg0000001l","+",72],["genome1","+",72],["genome2","+",72]]
L	6	+	7	+	0M
S	7	G	PO:J:[["utg0000001l","+",73],["utg0000003l","+",108],["genome1","+",73],["genome2","+",73],["genome3","+",108]]
L	7	+	8	+	0M
L	7	+	12	+	0M
S	8	ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC	PO:J:[["utg0000002l","+",0],["genome1","+",74],["genome2","+",74],["genome3","+",110]]
S	9	TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTA	PO:J:[["utg0000003l","+",0],["genome3","+",0]]
L	9	+	4	+	0M
S	10	T	PO:J:[["utg0000003l","+",105],["genome3","+",105]]
L	10	+	5	+	0M
S	11	A	PO:J:[["utg0000003l","+",107],["genome3","+",107]]
L	11	+	7	+	0M
S	12	C	PO:J:[["utg0000003l","+",109],["genome3","+",109]]
L	12	+	8	+	0M
S	13	C	PO:J:[["genome1","+",21]]
L	13	+	3	+	0M
P	utg0000001l	1+,2+,3+,4+,5+,6+,7+	21M,1M,42M,7M,1M,1M,1M
P	utg0000002l	8+	289M
P	utg0000003l	9+,4+,10+,5+,11+,7+,12+	98M,7M,1M,1M,1M,1M,1M
P	genome1	1+,13+,3+,4+,5+,6+,7+,8+	21M,1M,42M,7M,1M,1M,1M,289M
P	genome2	1+,2+,3+,4+,5+,6+,7+,8+	21M,1M,42M,7M,1M,1M,1M,289M
P	genome3	9+,4+,10+,5+,11+,7+,12+,8+	98M,7M,1M,1M,1M,1M,1M,289M

ekg · 2020-09-17T10:49:40Z

Just out of curiosity, can you use vg deconstruct? That should give you what you're interested in (variants vs. embedded references). You'd need to pick the reference sequence(s) when you run it. Then you'll get a VCF describing the bubbles versus the reference sequences.

In the near future (sometime in the next week) we'll have a clean way to get MAF out of these graphs, using smoothxg. That should also solve your problem.

In fact, I suppose we can take the unitig GFA and store the minimap positions in the JSON - creating the "location graph". Perhaps I'm missing something in my reason, as I suppose you proposed re-inducing the graph using seqwish for a reason?

I am a little confused by this. The unitigs are embedded in the graph with the seqwish step. The xg index makes accessing and determining the positions randomly fast (not a problem for your small graph, but potentially useful on big ones). You don't need a separate kind of graph to represent the mapping. The graph directly represents it. What's missing for you is something that gives you coordinates for things that are near existing reference sequences. This could be given by vg deconstruct. We might also extend the positional annotation to give the nearest coordinates on any path for sequences that don't have any overlapping paths.

Sorry if this is trouble. It's helpful to me to work through a typical use case. I want to build a tool on top of xg that provides the kinds of interface that you're interested in here. As you see, xg nearly does this, but it's not user-friendly and doesn't quite answer all the questions you have.

p.s. maffer might be able to make MAF from this, solving your problem in yet another way.

rickbeeloo · 2020-09-17T13:42:03Z

Use case
@ekg I'm a PhD student of Bas and we will have a meeting tomorrow so we can discuss everything there in more detail. In short, we want to extract shared sequences between genomes while allowing some variations (hence the bubble pop) and also retain sequences only present in a single genome. For all those sequences we want to know in what genomes they were present and at what locations. So basically:

ATCGATCGACTAGCACGACATC... genome1[start,end], genome2[start,end]...
GGGGGGGGGGGGACGACATC... genome10[start,end], genome3[start,end]...
CTAGCTACGGATCGACTACTGF... genome3[start,end]

Minimap confusion

I am a little confused by this. The unitigs are embedded in the graph with the seqwish step

Here I meant not running seqwish and gx at all. So original graph > bubble pop > unitigs. Then unitig + source fasta > minimap, this would already tell us the location of each unitig in the genomes (while allowing some variability). However, preferably we would not align everything over again as this would consume a lot of extra time for large sequence collections.

Maffer
I just tried maffer on the original graph (before bubble pop etc.) however the alignment breaks on the SNP (of note I did not use odgi sort yet - if this would make a difference). Moreover, looking at the second block, we preferably want something like this to appear without the big gap for genome3, thus having the shared sequence between genome1 and genome2 (and not genome3) separately: GACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTG


a loops=false
s genome2 0 21 + 363 ATCGACTGACACGATCGACTA-
s genome1 0 22 + 363 ATCGACTGACACGATCGACTAC

a loops=false
s genome3 110 289 + 399 ----------------------------------------------------ACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
s genome2  22 341 + 363 GACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC
s genome1  22 341 + 363 GACTAGCACGATCACGACTATCGGCGCGCGATCGATCGACATCGATCGACTGACTACGGACTAGCAGTCGATCGTTTGTTTGATCGATGCTAGCTAGTAGCTACGACTAGCATCGACTACGACTAGCGACTACGACTAGCCTAGCTACGACTAGCTACGACTAGCATCGACTGACTGTACGGACGACTACGATCAGCTACGACGGCCGGCGCATCTAGCATCACGACGAGCTATCATCGATCGATCGATCTACTGACGATCGACTAGCTACTAGCTAGCAGCTAGCTAGTCGACGGATCGATCGATCGATCGATGCATCGACGATCGTACTATCGACTGAC

a loops=false
s genome2 21 1 + 363 G

a loops=false
s genome3 0 110 + 399 TGCTACGATCGTTTTTAGCTAGCATCGATCAGCTAGCTATCAGCTAGCTAGCTAGCATGCTACGTACGATCAGCTAGCTACGTACGACTAGCTAGCTACGATCGATCAGC

Vg deconstruct
I'm not sure what the references here should be?

If you have any other suggestions that would be useful to try before the meeting tomorrow, let me know :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bubble removal and unitig construction while retaining original coordinates #2979

Bubble removal and unitig construction while retaining original coordinates #2979

rickbeeloo commented Sep 14, 2020 •

edited

Loading

ekg commented Sep 14, 2020 •

edited

Loading

rickbeeloo commented Sep 15, 2020 •

edited

Loading

ekg commented Sep 15, 2020 via email

ekg commented Sep 15, 2020 via email

rickbeeloo commented Sep 16, 2020 •

edited

Loading

rickbeeloo commented Sep 16, 2020 •

edited

Loading

ekg commented Sep 16, 2020 via email

ekg commented Sep 16, 2020 via email

ekg commented Sep 16, 2020 via email

rickbeeloo commented Sep 17, 2020 •

edited

Loading

ekg commented Sep 17, 2020

rickbeeloo commented Sep 17, 2020 •

edited

Loading

Bubble removal and unitig construction while retaining original coordinates #2979

Bubble removal and unitig construction while retaining original coordinates #2979

Comments

rickbeeloo commented Sep 14, 2020 • edited Loading

ekg commented Sep 14, 2020 • edited Loading

rickbeeloo commented Sep 15, 2020 • edited Loading

ekg commented Sep 15, 2020 via email

ekg commented Sep 15, 2020 via email

rickbeeloo commented Sep 16, 2020 • edited Loading

rickbeeloo commented Sep 16, 2020 • edited Loading

ekg commented Sep 16, 2020 via email

ekg commented Sep 16, 2020 via email

ekg commented Sep 16, 2020 via email

rickbeeloo commented Sep 17, 2020 • edited Loading

ekg commented Sep 17, 2020

rickbeeloo commented Sep 17, 2020 • edited Loading

rickbeeloo commented Sep 14, 2020 •

edited

Loading

ekg commented Sep 14, 2020 •

edited

Loading

rickbeeloo commented Sep 15, 2020 •

edited

Loading

rickbeeloo commented Sep 16, 2020 •

edited

Loading

rickbeeloo commented Sep 16, 2020 •

edited

Loading

rickbeeloo commented Sep 17, 2020 •

edited

Loading

rickbeeloo commented Sep 17, 2020 •

edited

Loading