Run with the current master branch:
PROJ_DIR=/private/groups/patenlab/fokamoto/giraffe-loops
GRAPH=$PROJ_DIR/graph/hprc-v2.1-mc-chm13-eval-sampled16o_fragmentlinked-for-real-r10y2025-HG002-full
ALN=$PROJ_DIR/alignments/sim_hifi_HG002_xvn_1m
NAME=S11_17926
ONE=$ALN.${NAME}
REALIGN=$ALN.${NAME}.realigned
vg giraffe --gbz-name $GRAPH.gbz -b hifi -G $ONE.gam --threads 1 --show-work \
--dist-name $GRAPH.dist --minimizer-name $GRAPH.normal.longread.withzip.min \
--zipcode-name $GRAPH.normal.longread.zipcodes > $REALIGN.new.gam 2> $REALIGN.new.log
The alignment produced is hilariously awful (space placed intentionally): 1537X1M5843X18M1X2M1X1M1X2M1X1M1X2M1X 2431M1I266M1X2128M1D690M@130045518- score -926
The chain has a left tail of length 7432 which is basically eaten up by that left-hand part of the CIGAR. Feels like it should've been softclipped. We're really aligning >7k bases of substitutions instead of just admitting that this is hard?
Aligning using https://github.com/vgteam/vg/tree/heuristic-new-dist-index got a much better looking alignment, a few bp over and with that tail softclipped: 7413I 2431M1I266M1X2128M1D690M@130020003- score 5400. So this is probably a highly repetitive region (to explain the small shift) and once we shift the chain anchor over a tad, the tail aligner suddenly decides to avoid taking a softclip.
Should this be a softclip? If so, what happened?
vg version v1.74.1-39-g361e2bec6 "Petrie"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Using HTSlib headers 101990, library 1.19.1-29-g3cfe8769
Built by fokamoto@mustard
Run with the current master branch:
The alignment produced is hilariously awful (space placed intentionally):
1537X1M5843X18M1X2M1X1M1X2M1X1M1X2M1X 2431M1I266M1X2128M1D690M@130045518-score -926The chain has a left tail of length 7432 which is basically eaten up by that left-hand part of the CIGAR. Feels like it should've been softclipped. We're really aligning >7k bases of substitutions instead of just admitting that this is hard?
Aligning using https://github.com/vgteam/vg/tree/heuristic-new-dist-index got a much better looking alignment, a few bp over and with that tail softclipped:
7413I 2431M1I266M1X2128M1D690M@130020003-score 5400. So this is probably a highly repetitive region (to explain the small shift) and once we shift the chain anchor over a tad, the tail aligner suddenly decides to avoid taking a softclip.Should this be a softclip? If so, what happened?