Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding variation with map / mod does not seem to work #53

Closed
glennhickey opened this issue Jul 10, 2015 · 2 comments
Closed

adding variation with map / mod does not seem to work #53

glennhickey opened this issue Jul 10, 2015 · 2 comments

Comments

@glennhickey
Copy link
Contributor

I am trying to use map and mod to add sequences to the graph (as new paths). This does not work as expected on some simple examples (derived from existing unit tests). (originally mentioned a while ago in email, but adding to Github where it should have been in 1st place for posteriority, and if Adam wants to take a look). .

Using your test tiny/tiny.fa, I tried to make a point mutation (A->G 2nd base).

vg construct -r tiny/tiny.fa >t.vg
vg index -s -k 11 t.vg
vg view t.vg
H HVN:Z:1.0
S 1 CAAATAAGGCTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG
P 1 x + 50M

vg map -s CGAATAAGGCTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTT t.vg | vg mod -i - t.vg | vg view -
H HVN:Z:1.0
S 1 CAAATAAGGCTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG
P 1 x + 50M
(no change)

Shouldn't I see a bubble in the graph? Same deal if I insert GGG at same position:

vg map -s CGGGAAATAAGGCTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTT t.vg | vg mod -i - t.vg | vg view -
H HVN:Z:1.0
S 1 CAAATAAGGCTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG
P 1 x + 50M
(no change)

Inserting GGG at position 20 seems to work

vg map -s CAAATAAGGCTTGGAAATTTGGGTCTGGAGTTCTATTATATTCCAACTCTCTG t.vg | vg mod -i - t.vg | vg view -
H HVN:Z:1.0
S 2 CAAATAAGGCTTGGAAATTT
P 2 x + 20M
L 2 - 3 + 0M
L 2 - 4 + 0M
S 3 TCTGGAGTTCTATTATATTCCAACTCTCTG
P 3 x + 30M
S 4 GGG
L 4 - 3 + 0M

but I only see the one path for the sequence "x" in tiny.fa. I'd like to have a 2nd path be added that includes the insertion.

@ekg
Copy link
Member

ekg commented Jul 11, 2015

It looks like path naming is partly broken. I think your post includes everything I'll need to test.

@ekg
Copy link
Member

ekg commented Jul 13, 2015

First I set things up in the same way:

test git:(master) ✗ vg construct -r tiny/tiny.fa >t.vg                                                                     
➜  test git:(master) ✗ vg index -s -k 11 t.vg            

It looks like the problem is that introducing the SNP into the first few bases of the read doesn't result in an alignment that detects the SNP. Note that the first edit in the mapping has "to_length" : 2. "from_length" : 0 is implied. This is equivalent to an insertion, but when we have it at the start or end of the alignment it means "soft clip".

test git:(master) ✗ vg map -s CGAATAAGGCTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTT -Q new t.vg | vg view -a - | jq .
{
  "sequence": "CGAATAAGGCTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTT",
  "path": {
    "mapping": [
      {
        "position": {
          "offset": 2,
          "node_id": 1
        },
        "edit": [
          {
            "to_length": 2
          },
          {
            "from_length": 47,
            "to_length": 47
          },
          {
            "to_length": 1
          }
        ]
      }
    ]
  },
  "name": "new",
  "score": 94
}

So there isn't any variation reported by the alignment. This is typical and actually a big part of why single reference-based alignment has problems for stuff like allele specific expression.

Inclusion will work provided we can align through the variant. So, I insert a SNP later on in the read.

test git:(master) ✗ vg construct -r tiny/tiny.fa >t.vg                                                                     
➜  test git:(master) ✗ vg index -s -k 11 t.vg            
➜  test git:(master) ✗ vg map -s CAAATAAGGCTTGGAAATGTTCTGGAGTTCTATTATATTCCAACTCTCTT -Q new t.vg | vg mod -i - t.vg | vg view -
H       HVN:Z:1.0
S       2       CAAATAAGGCTTGGAAAT
P       2       x       +       18M
L       2       -       4       +       0M
L       2       -       6       +       0M
S       4       T
P       4       x       +       1M
L       4       -       5       +       0M
S       5       TTCTGGAGTTCTATTATATTCCAACTCTCTG
P       5       x       +       31M
S       6       G
P       6       new     +       1M
L       6       -       5       +       0M

@ekg ekg closed this as completed Nov 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants